Skip to main content

Showing 1–50 of 82,178 results for author: Chen

.
  1. arXiv:2407.03320  [pdf, other

    cs.CV cs.CL

    InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

    Authors: Pan Zhang, Xiaoyi Dong, Yuhang Zang, Yuhang Cao, Rui Qian, Lin Chen, Qipeng Guo, Haodong Duan, Bin Wang, Linke Ouyang, Songyang Zhang, Wenwei Zhang, Yining Li, Yang Gao, Peng Sun, Xinyue Zhang, Wei Li, **gwen Li, Wenhai Wang, Hang Yan, Conghui He, Xingcheng Zhang, Kai Chen, Jifeng Dai, Yu Qiao , et al. (2 additional authors not shown)

    Abstract: We present InternLM-XComposer-2.5 (IXC-2.5), a versatile large-vision language model that supports long-contextual input and output. IXC-2.5 excels in various text-image comprehension and composition applications, achieving GPT-4V level capabilities with merely 7B LLM backend. Trained with 24K interleaved image-text contexts, it can seamlessly extend to 96K long contexts via RoPE extrapolation. Th… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Technical Report. https://github.com/InternLM/InternLM-XComposer

  2. arXiv:2407.03314  [pdf, other

    cs.CV cs.CL cs.DB

    BACON: Supercharge Your VLM with Bag-of-Concept Graph to Mitigate Hallucinations

    Authors: Zhantao Yang, Ruili Feng, Keyu Yan, Huangji Wang, Zhicai Wang, Shangwen Zhu, Han Zhang, Jie Xiao, **yu Wu, Kai Zhu, Jixuan Chen, Chen-Wei Xie, Chaojie Mao, Yue Yang, Hongyang Zhang, Yu Liu, Fan Cheng

    Abstract: This paper presents Bag-of-Concept Graph (BACON) to gift models with limited linguistic abilities to taste the privilege of Vision Language Models (VLMs) and boost downstream tasks such as detection, visual question answering (VQA), and image generation. Since the visual scenes in physical worlds are structured with complex relations between objects, BACON breaks down annotations into basic minimu… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  3. arXiv:2407.03282  [pdf, other

    cs.CL

    LLM Internal States Reveal Hallucination Risk Faced With a Query

    Authors: Ziwei Ji, Delong Chen, Etsuko Ishii, Samuel Cahyawijaya, Ye** Bang, Bryan Wilie, Pascale Fung

    Abstract: The hallucination problem of Large Language Models (LLMs) significantly limits their reliability and trustworthiness. Humans have a self-awareness process that allows us to recognize what we don't know when faced with queries. Inspired by this, our paper investigates whether LLMs can estimate their own hallucination risk before response generation. We analyze the internal mechanisms of LLMs broadl… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  4. arXiv:2407.03281  [pdf, other

    astro-ph.SR

    Direct evidence of hybrid nature of EUV waves and the reflection of the fast-mode wave

    Authors: Ramesh Chandra, P. F. Chen, Pooja Devi

    Abstract: In current study, we perform the analysis of an extreme ultraviolet (EUV) wave on 2022 March 31. The event originated from the from NOAA active region (AR) 12975 (location: N13W52) in the Atmospheric imaging Assembly (AIA) onboard Solar Dynamics Observatory (SDO) satellite and exactly the west limb in Solar Terrestrial Relations Observatory-Ahead (STEREO-A) observations. The EUV wave was associate… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: 6 figures, 16 pages

  5. arXiv:2407.03245  [pdf, other

    cs.RO cs.AI eess.SY

    TieBot: Learning to Knot a Tie from Visual Demonstration through a Real-to-Sim-to-Real Approach

    Authors: Weikun Peng, Jun Lv, Yuwei Zeng, Haonan Chen, Siheng Zhao, Jicheng Sun, Cewu Lu, Lin Shao

    Abstract: The tie-knotting task is highly challenging due to the tie's high deformation and long-horizon manipulation actions. This work presents TieBot, a Real-to-Sim-to-Real learning from visual demonstration system for the robots to learn to knot a tie. We introduce the Hierarchical Feature Matching approach to estimate a sequence of tie's meshes from the demonstration video. With these estimated meshes… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: initial commit

  6. arXiv:2407.03217  [pdf, other

    cs.CV

    MHNet: Multi-view High-order Network for Diagnosing Neurodevelopmental Disorders Using Resting-state fMRI

    Authors: Yueyang Li, Weiming Zeng, Wenhao Dong, Luhui Cai, Lei Wang, Hongyu Chen, Hongjie Yan, Lingbin Bian, Nizhuan Wang

    Abstract: Background: Deep learning models have shown promise in diagnosing neurodevelopmental disorders (NDD) like ASD and ADHD. However, many models either use graph neural networks (GNN) to construct single-level brain functional networks (BFNs) or employ spatial convolution filtering for local information extraction from rs-fMRI data, often neglecting high-order features crucial for NDD classification.… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: 18 pages

  7. arXiv:2407.03211  [pdf, other

    cs.CL cs.LG

    How Does Quantization Affect Multilingual LLMs?

    Authors: Kelly Marchisio, Saurabh Dash, Hongyu Chen, Dennis Aumiller, Ahmet Üstün, Sara Hooker, Sebastian Ruder

    Abstract: Quantization techniques are widely used to improve inference speed and deployment of large language models. While a wide body of work examines the impact of quantized LLMs on English tasks, none have examined the effect of quantization across languages. We conduct a thorough analysis of quantized multilingual LLMs, focusing on their performance across languages and at varying scales. We use automa… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  8. arXiv:2407.03179  [pdf, other

    cs.CV cs.AI cs.LG

    Motion meets Attention: Video Motion Prompts

    Authors: Qixiang Chen, Lei Wang, Piotr Koniusz, Tom Gedeon

    Abstract: Videos contain rich spatio-temporal information. Traditional methods for extracting motion, used in tasks such as action recognition, often rely on visual contents rather than precise motion features. This phenomenon is referred to as 'blind motion extraction' behavior, which proves inefficient in capturing motions of interest due to a lack of motion-guided cues. Recently, attention mechanisms hav… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Research report

  9. arXiv:2407.03178  [pdf, other

    cs.MM cs.CV cs.LG

    Relating CNN-Transformer Fusion Network for Change Detection

    Authors: Yuhao Gao, Gensheng Pei, Mengmeng Sheng, Zeren Sun, Tao Chen, Yazhou Yao

    Abstract: While deep learning, particularly convolutional neural networks (CNNs), has revolutionized remote sensing (RS) change detection (CD), existing approaches often miss crucial features due to neglecting global context and incomplete change learning. Additionally, transformer networks struggle with low-level details. RCTNet addresses these limitations by introducing \textbf{(1)} an early fusion backbo… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: accepted by IEEE Conference on Multimedia Expo

  10. arXiv:2407.03177  [pdf, other

    cs.HC eess.SP

    EDPNet: An Efficient Dual Prototype Network for Motor Imagery EEG Decoding

    Authors: Can Han, Chen Liu, Crystal Cai, Jun Wang, Dahong Qian

    Abstract: Motor imagery electroencephalograph (MI-EEG) decoding plays a crucial role in develo** motor imagery brain-computer interfaces (MI-BCIs). However, decoding intentions from MI remains challenging due to the inherent complexity of EEG signals relative to the small-sample size. In this paper, we propose an Efficient Dual Prototype Network (EDPNet) to enable accurate and fast MI decoding. EDPNet emp… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  11. arXiv:2407.03133  [pdf, other

    cs.CY cs.AI cs.LG stat.ML

    Quantifying the Cross-sectoral Intersecting Discrepancies within Multiple Groups Using Latent Class Analysis Towards Fairness

    Authors: Yingfang Yuan, Kefan Chen, Mehdi Rizvi, Lynne Baillie, Wei Pang

    Abstract: The growing interest in fair AI development is evident. The ''Leave No One Behind'' initiative urges us to address multiple and intersecting forms of inequality in accessing services, resources, and opportunities, emphasising the significance of fairness in AI. This is particularly relevant as an increasing number of AI tools are applied to decision-making processes, such as resource allocation an… ▽ More

    Submitted 24 May, 2024; originally announced July 2024.

  12. arXiv:2407.03130  [pdf, other

    cs.CV

    Towards Efficient Pixel Labeling for Industrial Anomaly Detection and Localization

    Authors: Hanxi Li, **gqi Wu, Lin Yuanbo, Hao Chen, Deyin Liu, Chunhua Shen

    Abstract: In the realm of practical Anomaly Detection (AD) tasks, manual labeling of anomalous pixels proves to be a costly endeavor. Consequently, many AD methods are crafted as one-class classifiers, tailored for training sets completely devoid of anomalies, ensuring a more cost-effective approach. While some pioneering work has demonstrated heightened AD accuracy by incorporating real anomaly samples in… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: 18 pages, 5 figures

  13. arXiv:2407.03125  [pdf, other

    cs.LG cs.AI

    Foundations and Frontiers of Graph Learning Theory

    Authors: Yu Huang, Min Zhou, Menglin Yang, Zhen Wang, Muhan Zhang, Jie Wang, Hong Xie, Hao Wang, Defu Lian, Enhong Chen

    Abstract: Recent advancements in graph learning have revolutionized the way to understand and analyze data with complex structures. Notably, Graph Neural Networks (GNNs), i.e. neural network architectures designed for learning graph representations, have become a popular paradigm. With these models being usually characterized by intuition-driven design or highly intricate components, placing them within the… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: 36pages,273references

  14. arXiv:2407.03104  [pdf, other

    cs.CV cs.CL cs.MM

    KeyVideoLLM: Towards Large-scale Video Keyframe Selection

    Authors: Hao Liang, Jiapeng Li, Tianyi Bai, Chong Chen, Conghui He, Bin Cui, Wentao Zhang

    Abstract: Recently, with the rise of web videos, managing and understanding large-scale video datasets has become increasingly important. Video Large Language Models (VideoLLMs) have emerged in recent years due to their strong video understanding capabilities. However, training and inference processes for VideoLLMs demand vast amounts of data, presenting significant challenges to data management, particular… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  15. arXiv:2407.03037  [pdf, other

    cs.SE

    Vision-driven Automated Mobile GUI Testing via Multimodal Large Language Model

    Authors: Zhe Liu, Cheng Li, Chunyang Chen, Junjie Wang, Boyu Wu, Yawen Wang, Jun Hu, Qing Wang

    Abstract: With the advancement of software rendering techniques, GUI pages in mobile apps now encompass a wealth of visual information, where the visual semantics of each page contribute to the overall app logic, presenting new challenges to software testing. Despite the progress in automated Graphical User Interface (GUI) testing, the absence of testing oracles has constrained its efficacy to identify only… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  16. arXiv:2407.03026  [pdf, other

    cs.SD cs.AI eess.AS

    Qifusion-Net: Layer-adapted Stream/Non-stream Model for End-to-End Multi-Accent Speech Recognition

    Authors: **ming Chen, **gyi Fang, Yuanzhong Zheng, Yaoxuan Wang, Haojun Fei

    Abstract: Currently, end-to-end (E2E) speech recognition methods have achieved promising performance. However, auto speech recognition (ASR) models still face challenges in recognizing multi-accent speech accurately. We propose a layer-adapted fusion (LAF) model, called Qifusion-Net, which does not require any prior knowledge about the target accent. Based on dynamic chunk strategy, our approach enables str… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: accpeted by interspeech 2014, 5 pages, 1 figure

  17. arXiv:2407.03007  [pdf, other

    cs.CL cs.AI

    What Affects the Stability of Tool Learning? An Empirical Study on the Robustness of Tool Learning Frameworks

    Authors: Chengrui Huang, Zhengliang Shi, Yuntao Wen, Xiuying Chen, Peng Han, Shen Gao, Shuo Shang

    Abstract: Tool learning methods have enhanced the ability of large language models (LLMs) to interact with real-world applications. Many existing works fine-tune LLMs or design prompts to enable LLMs to select appropriate tools and correctly invoke them to meet user requirements. However, it is observed in previous works that the performance of tool learning varies from tasks, datasets, training settings, a… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: 19 pages, 9 figures

  18. arXiv:2407.02973  [pdf, other

    astro-ph.GA

    NOEMA formIng Cluster survEy (NICE): Characterizing eight massive galaxy groups at $1.5 < z < 4$ in the COSMOS field

    Authors: Nikolaj B. Sillassen, Shuowen **, Georgios E. Magdis, Emanuele Daddi, Tao Wang, Shiying Lu, Hanwen Sun, Vinod Arumugam, Daizhong Liu, Malte Brinch, Chiara D'Eugenio, Raphael Gobat, Carlos Gómez-Guijarro, Michael Rich, Eva Schinnerer, Veronica Strazzullo, Qinghua Tan, Francesco Valentino, Yijun Wang, Mengyuan Xiao, Luwenjia Zhou, David Blánquez-Sesé, Zheng Cai, Yanmei Chen, Laure Ciesla , et al. (19 additional authors not shown)

    Abstract: The NOEMA formIng Cluster survEy (NICE) is a large program targeting 69 massive galaxy group candidates at $z>2$ in six deep fields. We report spectroscopic confirmation of eight groups at $1.65\leq z\leq3.61$ in COSMOS. Homogeneously selected as significant overdensities of red IRAC sources with red Herschel colors, four groups are confirmed by CO and [CI] with NOEMA 3mm observations, three are c… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: 44 pages (27pp appendix), 32 figures, 18 tables, accepted for publication in A&A

  19. arXiv:2407.02948  [pdf, ps, other

    econ.TH

    Information Greenhouse: Optimal Persuasion for Medical Test-Avoiders

    Authors: Zhuo Chen

    Abstract: Patients often delay or reject medical tests due to information avoidance, which hinders timely reception of necessary treatments. This paper studies the optimal information policy to persuade an information-avoidant patient to undergo the test and make the best choice that maximizes his health. The patient sequentially decides whether to take the test and the optimal treatment plan. The informati… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  20. arXiv:2407.02940  [pdf, other

    physics.optics

    Optical vortex-antivortex crystallization in free space

    Authors: Haolin Lin, Yixuan Liao, Guohua Liu, Jianbin Ren, Zhen Li, Zhenqiang Chen, Boris A. Malomed, Shenhe Fu

    Abstract: Stable vortex lattices are basic dynamical patterns which have been demonstrated in physical systems including superconductor physics, Bose-Einstein condensates, hydrodynamics and optics. Vortex-antivortex (VAV) ensembles can be produced, self-organizing into the respective polar lattices. However, these structures are in general highly unstable due to the strong VAV attraction. Here, we demonstra… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: to be published in Nature Communications; 21pages, 6 figures

  21. arXiv:2407.02935  [pdf, other

    nucl-ex hep-ex nucl-th

    Properties of the QCD Matter -- An Experimental Review of Selected Results from RHIC BES Program

    Authors: **hui Chen, Xin Dong, Xionghong He, Huanzhong Huang, Feng Liu, Xiaofeng Luo, Yu-Gang Ma, Lijuan Ruan, Ming Shao, Shusu Shi, Xu Sun, Aihong Tang, Zebo Tang, Fuqiang Wang, Hai Wang, Yi Wang, Zhigang Xiao, Guannan Xie, Nu Xu, Qinghua Xu, Zhangbu Xu, Chi Yang, Shuai Yang, Wangmei Zha, Yapeng Zhang , et al. (3 additional authors not shown)

    Abstract: In the paper, we discuss the development of the multi-gap resistive plate chamber Time-of-Flight (TOF) technology and the production of the STAR TOF detector in China at the beginning of the 21st century. Then we review recent experimental results from the first beam energy scan program (BES-I) at the Relativistic Heavy Ion Collider (RHIC). Topics cover measurements of collectivity, chirality, cri… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: 31 pages, 33 figures. This review is dedicated to Professor Wenqing Shen on the occasion to celebrate his leadership of the Chinese STAR Collaboration, the development and production of the STAR MRPC TOF detector in China and many physics analyses

  22. arXiv:2407.02930  [pdf, other

    eess.SP

    Timely Requesting for Time-Critical Content Users in Decentralized F-RANs

    Authors: Xingran Chen, Kai Li, Kun Yang

    Abstract: With the rising demand for high-rate and timely communications, fog radio access networks (F-RANs) offer a promising solution. This work investigates age of information (AoI) performance in F-RANs, consisting of multiple content users (CUs), enhanced remote radio heads (eRRHs), and content providers (CPs). Time-critical CUs need rapid content updates from CPs but cannot communicate directly with t… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  23. arXiv:2407.02927  [pdf, other

    math.AP

    Non-uniqueness for continuous solutions to 1D hyperbolic systems

    Authors: Robin Ming Chen, Alexis F. Vasseur, Cheng Yu

    Abstract: In this paper, we show that a geometrical condition on $2\times2$ systems of conservation laws leads to non-uniqueness in the class of 1D continuous functions. This demonstrates that the Liu Entropy Condition alone is insufficient to guarantee uniqueness, even within the mono-dimensional setting. We provide examples of systems where this pathology holds, even if they verify stability and uniquenes… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    MSC Class: 35L45; 35L65; 76N10

  24. arXiv:2407.02922  [pdf, other

    cs.IT

    Fair Resource Allocation for Probabilistic Semantic Communication in IIoT

    Authors: Siyun Liang, Zhouxiang Zhao, Chen Zhu, Zhaohui Yang, Yinchao Yang, Mohammad Shikh-Bahaei, Zhaoyang Zhang

    Abstract: In this paper, the problem of minimum rate maximization for probabilistic semantic communication (PSCom) in industrial Internet of Things (IIoT) is investigated. In the considered model, users employ semantic information extraction techniques to compress the original data before sending it to the base station (BS). During this semantic compression process, knowledge graphs are employed to represen… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  25. arXiv:2407.02899  [pdf, other

    hep-ex

    Measurement of the branching fraction of the decay $J/ψ\to p \bar{p} η$

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (639 additional authors not shown)

    Abstract: A high precision measurement of the branching fraction of the decay $J/ψ\to p \bar{p} η$ is performed using $(10 087 \pm 44) \times 10^6$ $J/ψ$ events recorded by the {BESIII} detector at the {BEPCII} storage ring. The branching fractions of the two decays $J/ψ\to p \bar{p} η(η\to γγ)$ and $J/ψ\to p \bar{p} η(η\to π^+ π^- π^0)$ are measured individually to be… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  26. arXiv:2407.02887  [pdf, other

    cs.CV

    Explicitly Guided Information Interaction Network for Cross-modal Point Cloud Completion

    Authors: Hang Xu, Chen Long, Wenxiao Zhang, Yuan Liu, Zhen Cao, Zhen Dong, Bisheng Yang

    Abstract: Corresponding author}In this paper, we explore a novel framework, EGIInet (Explicitly Guided Information Interaction Network), a model for View-guided Point cloud Completion (ViPC) task, which aims to restore a complete point cloud from a partial one with a single view image. In comparison with previous methods that relied on the global semantics of input images, EGIInet efficiently combines the i… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  27. arXiv:2407.02848  [pdf, other

    cond-mat.stat-mech

    Efficiency bounds for bipartite information-driven thermodynamic systems

    Authors: Shihao Xia, Shuanglong Han, Ousi Pan, Yuzhuo Pan, **can Chen, Shanhe Su

    Abstract: This study introduces a novel approach to derive a lower bound for the entropy production rate of a subsystem by utilizing the Cauchy-Schwarz inequality. It extends to establishing comprehensive upper and lower bounds for the efficiency of two subsystems. These bounds are applicable to a wide range of Markovian stochastic processes, which enhances the accuracy in depicting the range of energy conv… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: 8 pages, 2 figures

  28. arXiv:2407.02842  [pdf, other

    cs.CV cs.AI cs.CL

    MindBench: A Comprehensive Benchmark for Mind Map Structure Recognition and Analysis

    Authors: Lei Chen, Feng Yan, Yujie Zhong, Shaoxiang Chen, Zequn Jie, Lin Ma

    Abstract: Multimodal Large Language Models (MLLM) have made significant progress in the field of document analysis. Despite this, existing benchmarks typically focus only on extracting text and simple layout information, neglecting the complex interactions between elements in structured documents such as mind maps and flowcharts. To address this issue, we introduce the new benchmark named MindBench, which n… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: technical report

  29. arXiv:2407.02829  [pdf, other

    astro-ph.HE

    Mirage Sources and Large TeV Halo-Pulsar Offsets: Exploring the Parameter Space

    Authors: Yiwei Bao, Ruo-Yu Liu, Gwenael Giacinti, Hai-Ming Zhang, Yang Chen

    Abstract: We investigate the asymmetric propagation of 100 TeV electrons (whose radiation mainly concentrates on 20--30 TeV) in turbulent magnetic fields around pulsars, using GPU-accelerated simulations to explore their trajectories and interactions within pulsar wind nebulae and the interstellar medium. Key results include the identification of ``mirage'' sources indicating significant offsets in high-ene… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  30. arXiv:2407.02791  [pdf, other

    cs.SE cs.AI

    Model-Enhanced LLM-Driven VUI Testing of VPA Apps

    Authors: Suwan Li, Lei Bu, Guangdong Bai, Fuman Xie, Kai Chen, Chang Yue

    Abstract: The flourishing ecosystem centered around voice personal assistants (VPA), such as Amazon Alexa, has led to the booming of VPA apps. The largest app market Amazon skills store, for example, hosts over 200,000 apps. Despite their popularity, the open nature of app release and the easy accessibility of apps also raise significant concerns regarding security, privacy and quality. Consequently, variou… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 13 pages, 11 figures

  31. arXiv:2407.02779  [pdf, other

    cs.AI cs.LG

    Croppable Knowledge Graph Embedding

    Authors: Yushan Zhu, Wen Zhang, Zhiqiang Liu, Mingyang Chen, Lei Liang, Huajun Chen

    Abstract: Knowledge Graph Embedding (KGE) is a common method for Knowledge Graphs (KGs) to serve various artificial intelligence tasks. The suitable dimensions of the embeddings depend on the storage and computing conditions of the specific application scenarios. Once a new dimension is required, a new KGE model needs to be trained from scratch, which greatly increases the training cost and limits the effic… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  32. arXiv:2407.02778  [pdf, other

    cs.CV cs.LG

    Foster Adaptivity and Balance in Learning with Noisy Labels

    Authors: Mengmeng Sheng, Zeren Sun, Tao Chen, Shuchao Pang, Yucheng Wang, Yazhou Yao

    Abstract: Label noise is ubiquitous in real-world scenarios, posing a practical challenge to supervised models due to its effect in hurting the generalization performance of deep neural networks. Existing methods primarily employ the sample selection paradigm and usually rely on dataset-dependent prior knowledge (\eg, a pre-defined threshold) to cope with label noise, inevitably degrading the adaptivity. Mo… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: accepted by the European Conference on Computer Vision (ECCV), 2024

  33. arXiv:2407.02768  [pdf, other

    cs.CV

    Knowledge Transfer with Simulated Inter-Image Erasing for Weakly Supervised Semantic Segmentation

    Authors: Tao Chen, XiRuo Jiang, Gensheng Pei, Zeren Sun, Yucheng Wang, Yazhou Yao

    Abstract: Though adversarial erasing has prevailed in weakly supervised semantic segmentation to help activate integral object regions, existing approaches still suffer from the dilemma of under-activation and over-expansion due to the difficulty in determining when to stop erasing. In this paper, we propose a \textbf{K}nowledge \textbf{T}ransfer with \textbf{S}imulated Inter-Image \textbf{E}rasing (KTSE) a… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: accepted by the European Conference on Computer Vision (ECCV), 2024

  34. arXiv:2407.02767  [pdf, other

    cond-mat.mtrl-sci cond-mat.mes-hall

    Comparison of Short-Range Order in GeSn Grown by Molecular Beam Epitaxy and Chemical Vapor Deposition

    Authors: Shang Liu, Yunfan Liang, Haochen Zhao, Nirosh M. Eldose, **-Hee Bae, Omar Concepcion, Xiaochen **, Shunda Chen, Ilias Bikmukhametov, Austin Akey, Cory T. Cline, Alejandra Cuervo Covian, Xiaoxin Wang, Tianshu Li, Yu** Zeng, Dan Buca, Shui-Qing Yu, Gregory J. Salamo, Shengbai Zhang, Jifeng Liu

    Abstract: Atomic short-range order (SRO) in direct-bandgap GeSn for infrared photonics has recently attracted attention due to its notable impact on band structures. However, the SRO in GeSn thin films grown by different methods have hardly been compared. This paper compares SRO in GeSn thin films of similar compositions grown by molecular beam epitaxy (MBE) and chemical vapor deposition (CVD) using atom pr… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  35. arXiv:2407.02765  [pdf, ps, other

    eess.SY cs.AI math.OC math.PR

    Graphon Particle Systems, Part II: Dynamics of Distributed Stochastic Continuum Optimization

    Authors: Yan Chen, Tao Li

    Abstract: We study the distributed optimization problem over a graphon with a continuum of nodes, which is regarded as the limit of the distributed networked optimization as the number of nodes goes to infinity. Each node has a private local cost function. The global cost function, which all nodes cooperatively minimize, is the integral of the local cost functions on the node set. We propose stochastic grad… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  36. arXiv:2407.02762  [pdf, other

    cs.LG cs.AI

    SF-GNN: Self Filter for Message Lossless Propagation in Deep Graph Neural Network

    Authors: Yushan Zhu, Wen Zhang, Ya**g Xu, Zhen Yao, Mingyang Chen, Huajun Chen

    Abstract: Graph Neural Network (GNN), with the main idea of encoding graph structure information of graphs by propagation and aggregation, has developed rapidly. It achieved excellent performance in representation learning of multiple types of graphs such as homogeneous graphs, heterogeneous graphs, and more complex graphs like knowledge graphs. However, merely stacking GNN layers may not improve the model'… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  37. arXiv:2407.02753  [pdf, other

    astro-ph.CO

    Linear Relativistic Corrections in the Spherical Fourier-Bessel Power Spectrum

    Authors: Robin Y. Wen, Henry S. Grasshorn Gebhardt, Chen Heinrich, Olivier Doré

    Abstract: The three-dimensional galaxy power spectrum is a powerful probe of primordial non-Gaussianity and additional general relativistic (GR) effects on large scales, which can be constrained by the current and upcoming large-scale structure surveys. In this work, we calculate the linear-order relativistic power spectrum in the spherical Fourier-Bessel (SFB) basis, a coordinate system that preserves the… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 52 pages, 12 figures

  38. arXiv:2407.02750  [pdf, other

    cs.CL

    Learning to Reduce: Towards Improving Performance of Large Language Models on Structured Data

    Authors: Younghun Lee, Sungchul Kim, Ryan A. Rossi, Tong Yu, Xiang Chen

    Abstract: Large Language Models (LLMs) have been achieving competent performance on a wide range of downstream tasks, yet existing work shows that inference on structured data is challenging for LLMs. This is because LLMs need to either understand long structured data or select the most relevant evidence before inference, and both approaches are not trivial. This paper proposes a framework, Learning to Redu… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: ICML 2024 Workshop on Long-Context Foundation Models, Vienna, Austria 2024. arXiv admin note: substantial text overlap with arXiv:2402.14195

  39. arXiv:2407.02735  [pdf, other

    quant-ph

    Performance optimization of a finite-time quantum tricycle

    Authors: **gyi Chen, Shihao Xia, **can Chen, Shanhe Su

    Abstract: We establish a finite-time external field-driven quantum tricycle model. Within the framework of slow driving perturbation, the perturbation expansion of heat in powers of time can be derived during the heat exchange processes. Employing the method of Lagrange multiplier, we optimize the cooling performance of the tricycle by considering the cooling rate and the figure of merit, which is the produ… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 10 pages, 11 figures

  40. arXiv:2407.02685  [pdf, other

    cs.CV

    Open Panoramic Segmentation

    Authors: Junwei Zheng, Rui** Liu, Yufan Chen, Kunyu Peng, Chengzhi Wu, Kailun Yang, Jiaming Zhang, Rainer Stiefelhagen

    Abstract: Panoramic images, capturing a 360° field of view (FoV), encompass omnidirectional spatial information crucial for scene understanding. However, it is not only costly to obtain training-sufficient dense-annotated panoramas but also application-restricted when training models in a close-vocabulary setting. To tackle this problem, in this work, we define a new task termed Open Panoramic Segmentation… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024. Project page: https://junweizheng93.github.io/publications/OPS/OPS.html

  41. arXiv:2407.02675  [pdf, other

    eess.IV cs.CV

    Depth-Aware Endoscopic Video Inpainting

    Authors: Francis Xiatian Zhang, Shuang Chen, Xianghua Xie, Hubert P. H. Shum

    Abstract: Video inpainting fills in corrupted video content with plausible replacements. While recent advances in endoscopic video inpainting have shown potential for enhancing the quality of endoscopic videos, they mainly repair 2D visual information without effectively preserving crucial 3D spatial details for clinical reference. Depth-aware inpainting methods attempt to preserve these details by incorpor… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: Accepted by MICCAI 2024

  42. arXiv:2407.02666  [pdf, other

    cs.RO cs.AI

    Commonsense Reasoning for Legged Robot Adaptation with Vision-Language Models

    Authors: Annie S. Chen, Alec M. Lessing, Andy Tang, Govind Chada, Laura Smith, Sergey Levine, Chelsea Finn

    Abstract: Legged robots are physically capable of navigating a diverse variety of environments and overcoming a wide range of obstructions. For example, in a search and rescue mission, a legged robot could climb over debris, crawl through gaps, and navigate out of dead ends. However, the robot's controller needs to respond intelligently to such varied obstacles, and this requires handling unexpected and unu… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 27 pages

  43. arXiv:2407.02631  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Nollywood: Let's Go to the Movies!

    Authors: John E. Ortega, Ibrahim Said Ahmad, William Chen

    Abstract: Nollywood, based on the idea of Bollywood from India, is a series of outstanding movies that originate from Nigeria. Unfortunately, while the movies are in English, they are hard to understand for many native speakers due to the dialect of English that is spoken. In this article, we accomplish two goals: (1) create a phonetic sub-title model that is able to translate Nigerian English speech to Ame… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 8 pages, 4 figures, 2 tables

  44. arXiv:2407.02614  [pdf, other

    cs.HC

    AcuVR: Enhancing Acupuncture Training Workflow with Virtual Reality

    Authors: Menghe Zhang, Chen Chen, Matin Yarmand, Anish Rajeshkumar, Nadir Weibel

    Abstract: Acupuncture is a widely adopted medical practice that involves inserting thin needles into specific points on the body to alleviate pain and treat various health conditions. Current learning practices heavily rely on 2D atlases and practice on peers, which are notably less intuitive and pose risks, particularly in sensitive areas such as the eyes. To address these challenges, we introduce AcuVR, a… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 10 pages

    ACM Class: J.3; J.4; H.5

  45. arXiv:2407.02607  [pdf, other

    math.DG cs.LG math.MG

    Product Geometries on Cholesky Manifolds with Applications to SPD Manifolds

    Authors: Ziheng Chen, Yue Song, Xiao-Jun Wu, Nicu Sebe

    Abstract: This paper presents two new metrics on the Symmetric Positive Definite (SPD) manifold via the Cholesky manifold, i.e., the space of lower triangular matrices with positive diagonal elements. We first unveil that the existing popular Riemannian metric on the Cholesky manifold can be generally characterized as the product metric of a Euclidean metric and a Riemannian metric on the space of n-dimensi… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 25 pages, 1 figures

    MSC Class: 47A64; 26E60; 53C22; 15B48; 58D17; 53C20; 58B20

  46. arXiv:2407.02601  [pdf, other

    cs.LG cs.DS

    Linear Submodular Maximization with Bandit Feedback

    Authors: Wen**g Chen, Victoria G. Crawford

    Abstract: Submodular optimization with bandit feedback has recently been studied in a variety of contexts. In a number of real-world applications such as diversified recommender systems and data summarization, the submodular function exhibits additional linear structure. We consider develo** approximation algorithms for the maximization of a submodular objective function $f:2^U\to\mathbb{R}_{\geq 0}$, whe… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  47. arXiv:2407.02586  [pdf, ps, other

    cs.CV

    Improving Visual Storytelling with Multimodal Large Language Models

    Authors: Xiaochuan Lin, Xiangyong Chen

    Abstract: Visual storytelling is an emerging field that combines images and narratives to create engaging and contextually rich stories. Despite its potential, generating coherent and emotionally resonant visual stories remains challenging due to the complexity of aligning visual and textual information. This paper presents a novel approach leveraging large language models (LLMs) and large vision-language m… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 10 pages

  48. arXiv:2407.02553  [pdf, other

    quant-ph cond-mat.dis-nn physics.atom-ph

    Large-scale quantum reservoir learning with an analog quantum computer

    Authors: Milan Kornjača, Hong-Ye Hu, Chen Zhao, Jonathan Wurtz, Phillip Weinberg, Majd Hamdan, Andrii Zhdanov, Sergio H. Cantu, Hengyun Zhou, Rodrigo Araiza Bravo, Kevin Bagnall, James I. Basham, Joseph Campo, Adam Choukri, Robert DeAngelo, Paige Frederick, David Haines, Julian Hammett, Ning Hsu, Ming-Guang Hu, Florian Huber, Paul Niklas Jepsen, Ningyuan Jia, Thomas Karolyshyn, Minho Kwon , et al. (28 additional authors not shown)

    Abstract: Quantum machine learning has gained considerable attention as quantum technology advances, presenting a promising approach for efficiently learning complex data patterns. Despite this promise, most contemporary quantum methods require significant resources for variational parameter optimization and face issues with vanishing gradients, leading to experiments that are either limited in scale or lac… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 10 + 14 pages, 4 + 7 figures

  49. arXiv:2407.02534  [pdf, other

    cs.CR cs.CV

    Image-to-Text Logic Jailbreak: Your Imagination can Help You Do Anything

    Authors: Xiaotian Zou, Yongkang Chen

    Abstract: Large Visual Language Models (VLMs) such as GPT-4 have achieved remarkable success in generating comprehensive and nuanced responses, surpassing the capabilities of large language models. However, with the integration of visual inputs, new security concerns emerge, as malicious attackers can exploit multiple modalities to achieve their objectives. This has led to increasing attention on the vulner… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  50. arXiv:2407.02532  [pdf

    physics.app-ph

    Broadband planar electromagnetic hyper-lens with uniform magnification in air

    Authors: Ran Sun, Fei Sun, Hanchuan Chen, Yichao Liu, Qi Wang

    Abstract: A planar hyper-lens, capable of creating sub-wavelength imaging for broadband electromagnetic wave, is designed based on electromagnetic null medium. Subsequently, a scheme for the implementation of the proposed hyper-lens is given by using well-designed flexural metal plates, which function as the reduced electromagnetic null medium for TM-polarized microwaves. Both simulated and measured results… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.