Skip to main content

Showing 51–100 of 9,364 results for author: Wang, L

.
  1. arXiv:2406.14433  [pdf

    physics.app-ph cond-mat.mtrl-sci

    Structural and Electrical Properties of Grafted Si/GaAsSb Heterojunction

    Authors: Haris Naeem Abbasi, Seunghyun Lee, Hyemin Jung, Nathan Gajowski, Yi Lu, Linus Wang, Donghyeok Kim, Jie Zhou, Jiarui Gong, Chris Chae, **woo Hwang, Manisha Muduli, Subramanya Nookala, Zhenqiang Ma, Sanjay Krishna

    Abstract: The short-wave infrared (SWIR) wavelength, especially 1.55 um, has attracted significant attention in various areas such as high-speed optical communication and LiDAR systems. Avalanche photodiodes (APDs) are a critical component as a receiver in these systems due to their internal gain which enhances the system performance. Silicon-based APDs are promising since they are CMOS compatible, but they… ▽ More

    Submitted 24 June, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

    Comments: 14 pages, 6 figures

  2. arXiv:2406.14417  [pdf

    cond-mat.mes-hall

    Electrical switching of Ising-superconducting nonreciprocity for quantum neuronal transistor

    Authors: Junlin Xiong, Jiao Xie, Bin Cheng, Yudi Dai, Xinyu Cui, Lizheng Wang, Zenglin Liu, Ji Zhou, Naizhou Wang, Xianghan Xu, Xianhui Chen, Sang-Wook Cheong, Shi-Jun Liang, Feng Miao

    Abstract: Nonreciprocal quantum transport effect is mainly governed by the symmetry breaking of the material systems and is gaining extensive attention in condensed matter physics. Realizing electrical switching of the polarity of the nonreciprocal transport without external magnetic field is essential to the development of nonreciprocal quantum devices. However, electrical switching of superconducting nonr… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Journal ref: Nature Communications 15, 4953 (2024)

  3. arXiv:2406.14185  [pdf, other

    cs.DC cs.AI

    Failure-Resilient Distributed Inference with Model Compression over Heterogeneous Edge Devices

    Authors: Li Wang, Liang Li, Lianming Xu, Xian Peng, Aiguo Fei

    Abstract: The distributed inference paradigm enables the computation workload to be distributed across multiple devices, facilitating the implementations of deep learning based intelligent services on extremely resource-constrained Internet of Things (IoT) scenarios. Yet it raises great challenges to perform complicated inference tasks relying on a cluster of IoT devices that are heterogeneous in their comp… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  4. arXiv:2406.14067  [pdf

    physics.optics eess.SP

    A microwave photonic prototype for concurrent radar detection and spectrum sensing over an 8 to 40 GHz bandwidth

    Authors: Taixia Shi, Dingding Liang, Lu Wang, Lin Li, Shaogang Guo, Jiawei Gao, Xiaowei Li, Chulun Lin, Lei Shi, Baogang Ding, Shiyang Liu, Fangyi Yang, Chi Jiang, Yang Chen

    Abstract: In this work, a microwave photonic prototype for concurrent radar detection and spectrum sensing is proposed, designed, built, and investigated. A direct digital synthesizer and an analog electronic circuit are integrated to generate an intermediate frequency (IF) linearly frequency-modulated (LFM) signal with a tunable center frequency from 2.5 to 9.5 GHz and an instantaneous bandwidth of 1 GHz.… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 18 pages, 12 figures, 1 table

  5. arXiv:2406.13949  [pdf, other

    hep-th gr-qc

    Entanglement island and Page curve of Hawking radiation in rotating Kerr black holes

    Authors: Liqiang Wang, Ran Li

    Abstract: We initiate the study of the information paradox of rotating Kerr black holes by employing the recently proposed island rule. It is known that the scalar field theory near the Kerr black hole horizon can be reduced to the 2-dimensional effective theory. Working within the framework of the 2-dimensional effective theory and assuming the small angular momentum limit, we demonstrate that the entangle… ▽ More

    Submitted 21 June, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

    Comments: 32 pages, 8 figures, references updated

  6. arXiv:2406.13640  [pdf, other

    cs.RO cs.CV cs.LG

    Transferable Tactile Transformers for Representation Learning Across Diverse Sensors and Tasks

    Authors: Jialiang Zhao, Yuxiang Ma, Lirui Wang, Edward H. Adelson

    Abstract: This paper presents T3: Transferable Tactile Transformers, a framework for tactile representation learning that scales across multi-sensors and multi-tasks. T3 is designed to overcome the contemporary issue that camera-based tactile sensing is extremely heterogeneous, i.e. sensors are built into different form factors, and existing datasets were collected for disparate tasks. T3 captures the share… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  7. arXiv:2406.13607  [pdf, other

    cs.CV

    Ultra-High-Definition Restoration: New Benchmarks and A Dual Interaction Prior-Driven Solution

    Authors: Liyan Wang, Cong Wang, **shan Pan, Weixiang Zhou, Xiaoran Sun, Wei Wang, Zhixun Su

    Abstract: Ultra-High-Definition (UHD) image restoration has acquired remarkable attention due to its practical demand. In this paper, we construct UHD snow and rain benchmarks, named UHD-Snow and UHD-Rain, to remedy the deficiency in this field. The UHD-Snow/UHD-Rain is established by simulating the physics process of rain/snow into consideration and each benchmark contains 3200 degraded/clear image pairs o… ▽ More

    Submitted 22 June, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

  8. arXiv:2406.13445  [pdf, other

    cs.CV cs.AI

    Lost in UNet: Improving Infrared Small Target Detection by Underappreciated Local Features

    Authors: Wuzhou Quan, Wei Zhao, Weiming Wang, Haoran Xie, Fu Lee Wang, Mingqiang Wei

    Abstract: Many targets are often very small in infrared images due to the long-distance imaging meachnism. UNet and its variants, as popular detection backbone networks, downsample the local features early and cause the irreversible loss of these local features, leading to both the missed and false detection of small targets in infrared images. We propose HintU, a novel network to recover the local features… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  9. arXiv:2406.13378  [pdf, other

    cs.CV

    Any360D: Towards 360 Depth Anything with Unlabeled 360 Data and Möbius Spatial Augmentation

    Authors: Zidong Cao, ****g Zhu, Weiming Zhang, Lin Wang

    Abstract: Recently, Depth Anything Model (DAM) - a type of depth foundation model - reveals impressive zero-shot capacity for diverse perspective images. Despite its success, it remains an open question regarding DAM's performance on 360 images that enjoy a large field-of-view (180x360) but suffer from spherical distortions. To this end, we establish, to our knowledge, the first benchmark that aims to 1) ev… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  10. arXiv:2406.13372  [pdf, other

    cs.AI

    Thread: A Logic-Based Data Organization Paradigm for How-To Question Answering with Retrieval Augmented Generation

    Authors: Kaikai An, Fangkai Yang, Liqun Li, Junting Lu, Sitao Cheng, Lu Wang, Pu Zhao, Lele Cao, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, Qi Zhang

    Abstract: Current question answering systems leveraging retrieval augmented generation perform well in answering factoid questions but face challenges with non-factoid questions, particularly how-to queries requiring detailed step-by-step instructions and explanations. In this paper, we introduce Thread, a novel data organization paradigm that transforms documents into logic units based on their inter-conne… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 21 pages, 4 figures

  11. arXiv:2406.13230  [pdf, other

    cs.CL

    Enhancing Language Model Factuality via Activation-Based Confidence Calibration and Guided Decoding

    Authors: Xin Liu, Farima Fatahi Bayat, Lu Wang

    Abstract: Calibrating language models (LMs) aligns their generation confidence with the actual likelihood of answer correctness, which can inform users about LMs' reliability and mitigate hallucinated content. However, prior calibration methods, such as self-consistency-based and logit-based approaches, are either limited in inference-time efficiency or fall short of providing informative signals. Moreover,… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  12. arXiv:2406.13007  [pdf, other

    cs.CV

    NTIRE 2024 Challenge on Night Photography Rendering

    Authors: Egor Ershov, Artyom Panshin, Oleg Karasev, Sergey Korchagin, Shepelev Lev, Alexandr Startsev, Daniil Vladimirov, Ekaterina Zaychenkova, Nikola Banić, Dmitrii Iarchuk, Maria Efimova, Radu Timofte, Arseniy Terekhin, Shuwei Yue, Yuyang Liu, Minchen Wei, Lu Xu, Chao Zhang, Yasi Wang, Furkan Kınlı, Doğa Yılmaz, Barış Özcan, Furkan Kıraç, Shuai Liu, **gyuan Xiao , et al. (25 additional authors not shown)

    Abstract: This paper presents a review of the NTIRE 2024 challenge on night photography rendering. The goal of the challenge was to find solutions that process raw camera images taken in nighttime conditions, and thereby produce a photo-quality output images in the standard RGB (sRGB) space. Unlike the previous year's competition, the challenge images were collected with a mobile phone and the speed of algo… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 10 pages, 10 figures

  13. arXiv:2406.12703  [pdf, other

    eess.IV cs.CV

    Coarse-Fine Spectral-Aware Deformable Convolution For Hyperspectral Image Reconstruction

    Authors: **cheng Yang, Lishun Wang, Miao Cao, Huan Wang, Yin** Zhao, Xin Yuan

    Abstract: We study the inverse problem of Coded Aperture Snapshot Spectral Imaging (CASSI), which captures a spatial-spectral data cube using snapshot 2D measurements and uses algorithms to reconstruct 3D hyperspectral images (HSI). However, current methods based on Convolutional Neural Networks (CNNs) struggle to capture long-range dependencies and non-local similarities. The recently popular Transformer-b… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 7 pages, 5 figures, Accepted by ICIP2024

  14. ViDSOD-100: A New Dataset and a Baseline Model for RGB-D Video Salient Object Detection

    Authors: Junhao Lin, Lei Zhu, Jiaxing Shen, Huazhu Fu, Qing Zhang, Liansheng Wang

    Abstract: With the rapid development of depth sensor, more and more RGB-D videos could be obtained. Identifying the foreground in RGB-D videos is a fundamental and important task. However, the existing salient object detection (SOD) works only focus on either static RGB-D images or RGB videos, ignoring the collaborating of RGB-D and video information. In this paper, we first collect a new annotated RGB-D vi… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Journal ref: International Journal of Computer Vision (2024)

  15. arXiv:2406.12243  [pdf, other

    cs.IR cs.AI

    CherryRec: Enhancing News Recommendation Quality via LLM-driven Framework

    Authors: Shaohuang Wang, Lun Wang, Yunhan Bu, Tianwei Huang

    Abstract: Large Language Models (LLMs) have achieved remarkable progress in language understanding and generation. Custom LLMs leveraging textual features have been applied to recommendation systems, demonstrating improvements across various recommendation scenarios. However, most existing methods perform untrained recommendation based on pre-trained knowledge (e.g., movie recommendation), and the auto-regr… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  16. arXiv:2406.12095  [pdf, other

    cs.CV cs.AI cs.RO

    DistillNeRF: Perceiving 3D Scenes from Single-Glance Images by Distilling Neural Fields and Foundation Model Features

    Authors: Letian Wang, Seung Wook Kim, Jiawei Yang, Cunjun Yu, Boris Ivanovic, Steven L. Waslander, Yue Wang, Sanja Fidler, Marco Pavone, Peter Karkus

    Abstract: We propose DistillNeRF, a self-supervised learning framework addressing the challenge of understanding 3D environments from limited 2D observations in autonomous driving. Our method is a generalizable feedforward model that predicts a rich neural scene representation from sparse, single-frame multi-view camera inputs, and is trained self-supervised with differentiable rendering to reconstruct RGB,… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  17. arXiv:2406.12039  [pdf, other

    cond-mat.str-el

    Long-range magnetic order induced surface state in GdBi and DyBi

    Authors: Yevhen Kushnirenko, Lin-Lin Wang, Zhuoqi Li, Brinda Kuthanazhi, Benjamin Schrunk, Evan O'Leary, Andrew Eaton, Paul. Canfield, Adam Kaminski

    Abstract: The recent discovery of unconventional surface-state pairs, which give rise to Fermi arcs and spin textures, in antiferromagnetically ordered rare-earth monopnictides attracted the interest in these materials. We use angle-resolved photoemission spectroscopy (ARPES) measurements in conjunction with density functional theory (DFT) calculations to investigate the evolution of the electronic structur… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  18. arXiv:2406.11836  [pdf, other

    cs.CV cs.GR

    RetinaGS: Scalable Training for Dense Scene Rendering with Billion-Scale 3D Gaussians

    Authors: Bingling Li, Shengyi Chen, Luchao Wang, Kaimin Liao, Sijie Yan, Yuanjun Xiong

    Abstract: In this work, we explore the possibility of training high-parameter 3D Gaussian splatting (3DGS) models on large-scale, high-resolution datasets. We design a general model parallel training method for 3DGS, named RetinaGS, which uses a proper rendering equation and can be applied to any scene and arbitrary distribution of Gaussian primitives. It enables us to explore the scaling behavior of 3DGS i… ▽ More

    Submitted 22 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

  19. arXiv:2406.11303  [pdf, other

    cs.CV cs.AI cs.CL

    VideoVista: A Versatile Benchmark for Video Understanding and Reasoning

    Authors: Yunxin Li, Xinyu Chen, Baotian Hu, Longyue Wang, Haoyuan Shi, Min Zhang

    Abstract: Despite significant breakthroughs in video analysis driven by the rapid development of large multimodal models (LMMs), there remains a lack of a versatile evaluation benchmark to comprehensively assess these models' performance in video understanding and reasoning. To address this, we present VideoVista, a video QA benchmark that integrates challenges across diverse content categories, durations,… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 38 pages, 44 figures

  20. arXiv:2406.11187  [pdf, other

    cs.LG

    Save It All: Enabling Full Parameter Tuning for Federated Large Language Models via Cycle Black Gradient Descent

    Authors: Lin Wang, Zhichao Wang, Xiaoying Tang

    Abstract: The advent of large language models (LLMs) has revolutionized the deep learning paradigm, yielding impressive results across a wide array of tasks. However, the pre-training or fine-tuning of LLMs within a federated learning (FL) framework poses substantial challenges, including considerable computational and memory resource demands, as well as communication bottlenecks between servers and clients… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  21. arXiv:2406.11158  [pdf, other

    eess.SY

    Dynamic Modeling and Control for an Offshore Semisubmersible Floating Wind Turbine

    Authors: Yingjie Gong, Qinmin Yang, Hua Geng, Wenchao Meng, Lin Wang

    Abstract: Floating wind turbines (FWTs) hold significant potential for the exploitation of offshore renewable energy resources. Nevertheless, prior to the construction of FWTs, it is imperative to tackle several critical challenges, especially the issue of performance degradation under combined wind and wave loads. This study initiates with the development of a simplified nonlinear dynamical model for a sem… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  22. arXiv:2406.10999  [pdf, other

    cs.CL cs.AI

    Not All Bias is Bad: Balancing Rational Deviations and Cognitive Biases in Large Language Model Reasoning

    Authors: Liman Wang, Hanyang Zhong

    Abstract: This paper investigates the nuanced role of biases in the decision-making processes of large language models (LLMs). While conventional research typically aims to eliminate all biases, our study reveals that not all biases are detrimental. By examining rational deviations, involving heuristic shortcuts that enhance decision-making efficiency, we highlight their potential benefits when properly bal… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: This article is currently under review. All data will be made publicly available on GitHub once the review is complete

  23. arXiv:2406.10884  [pdf, other

    cs.LG cs.CR cs.DC

    Linkage on Security, Privacy and Fairness in Federated Learning: New Balances and New Perspectives

    Authors: Linlin Wang, Tianqing Zhu, Wanlei Zhou, Philip S. Yu

    Abstract: Federated learning is fast becoming a popular paradigm for applications involving mobile devices, banking systems, healthcare, and IoT systems. Hence, over the past five years, researchers have undertaken extensive studies on the privacy leaks, security threats, and fairness associated with these emerging models. For the most part, these three critical concepts have been studied in isolation; howe… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  24. arXiv:2406.10828  [pdf

    cs.CV

    PyramidMamba: Rethinking Pyramid Feature Fusion with Selective Space State Model for Semantic Segmentation of Remote Sensing Imagery

    Authors: Libo Wang, Dongxu Li, Sijun Dong, Xiaoliang Meng, Xiaokang Zhang, Danfeng Hong

    Abstract: Semantic segmentation, as a basic tool for intelligent interpretation of remote sensing images, plays a vital role in many Earth Observation (EO) applications. Nowadays, accurate semantic segmentation of remote sensing images remains a challenge due to the complex spatial-temporal scenes and multi-scale geo-objects. Driven by the wave of deep learning (DL), CNN- and Transformer-based semantic segm… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  25. arXiv:2406.10802  [pdf, other

    cs.CL cs.AI

    KGPA: Robustness Evaluation for Large Language Models via Cross-Domain Knowledge Graphs

    Authors: Aihua Pei, Zehua Yang, Shunan Zhu, Ruoxi Cheng, Ju Jia, Lina Wang

    Abstract: Existing frameworks for assessing robustness of large language models (LLMs) overly depend on specific benchmarks, increasing costs and failing to evaluate performance of LLMs in professional domains due to dataset limitations. This paper proposes a framework that systematically evaluates the robustness of LLMs under adversarial attack scenarios by leveraging knowledge graphs (KGs). Our framework… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  26. arXiv:2406.10769  [pdf, other

    cond-mat.dis-nn

    Exact complex mobility edges and flagellate spectra for non-Hermitian quasicrystals with exponential hop**s

    Authors: Li Wang, Jiaqi Liu, Zhenbo Wang, Shu Chen

    Abstract: We propose a class of general non-Hermitian quasiperiodic lattice models with exponential hop**s and analytically determine the genuine complex mobility edges by solving its dual counterpart exactly utilizing Avila's global theory. Our analytical formula unveils that the complex mobility edges usually form a loop structure in the complex energy plane. By shifting the eigenenergy a constant $t$,… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: 8 pages, 2 figures

  27. arXiv:2406.10744   

    cs.CV

    Technique Report of CVPR 2024 PBDL Challenges

    Authors: Ying Fu, Yu Li, Shaodi You, Boxin Shi, Jose Alvarez, Coert van Gemeren, Linwei Chen, Yunhao Zou, Zichun Wang, Yichen Li, Yuze Han, Yingkai Zhang, Jianan Wang, Qinglin Liu, Wei Yu, Xiaoqian Lv, Jianing Li, Sheng** Zhang, Xiangyang Ji, Yuanpei Chen, Yuhan Zhang, Weihang Peng, Liwen Zhang, Zhe Xu, Dingyong Gou , et al. (77 additional authors not shown)

    Abstract: The intersection of physics-based vision and deep learning presents an exciting frontier for advancing computer vision technologies. By leveraging the principles of physics to inform and enhance deep learning models, we can develop more robust and accurate vision systems. Physics-based vision aims to invert the processes to recover scene properties such as shape, reflectance, light distribution, a… ▽ More

    Submitted 27 June, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

    Comments: The author list and contents need to be verified by all authors

  28. arXiv:2406.10227  [pdf, other

    cs.CV cs.AI

    VideoGUI: A Benchmark for GUI Automation from Instructional Videos

    Authors: Kevin Qinghong Lin, Linjie Li, Difei Gao, Qinchen WU, Mingyi Yan, Zhengyuan Yang, Lijuan Wang, Mike Zheng Shou

    Abstract: Graphical User Interface (GUI) automation holds significant promise for enhancing human productivity by assisting with computer tasks. Existing task formulations primarily focus on simple tasks that can be specified by a single, language-only instruction, such as "Insert a new slide." In this work, we introduce VideoGUI, a novel multi-modal benchmark designed to evaluate GUI assistants on visual-c… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 24 pages, 16 tables, 17 figures

  29. arXiv:2406.10100  [pdf, other

    cs.CV cs.AI

    SkySenseGPT: A Fine-Grained Instruction Tuning Dataset and Model for Remote Sensing Vision-Language Understanding

    Authors: Junwei Luo, Zhen Pang, Yongjun Zhang, Tingzhu Wang, Linlin Wang, Bo Dang, Jiangwei Lao, Jian Wang, **gdong Chen, Yihua Tan, Yansheng Li

    Abstract: Remote Sensing Large Multi-Modal Models (RSLMMs) are develo** rapidly and showcase significant capabilities in remote sensing imagery (RSI) comprehension. However, due to the limitations of existing datasets, RSLMMs have shortcomings in understanding the rich semantic relations among objects in complex remote sensing scenes. To unlock RSLMMs' complex comprehension ability, we propose a large-sca… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 30 pages, 5 figures, 19 tables, dataset and code see https://github.com/Luo-Z13/SkySenseGPT

  30. arXiv:2406.09873  [pdf, other

    eess.AS cs.AI cs.SD

    Perceiver-Prompt: Flexible Speaker Adaptation in Whisper for Chinese Disordered Speech Recognition

    Authors: Yicong Jiang, Tianzi Wang, Xurong Xie, Juan Liu, Wei Sun, Nan Yan, Hui Chen, Lan Wang, Xunying Liu, Feng Tian

    Abstract: Disordered speech recognition profound implications for improving the quality of life for individuals afflicted with, for example, dysarthria. Dysarthric speech recognition encounters challenges including limited data, substantial dissimilarities between dysarthric and non-dysarthric speakers, and significant speaker variations stemming from the disorder. This paper introduces Perceiver-Prompt, a… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted by interspeech 2024

  31. arXiv:2406.09813  [pdf, other

    astro-ph.IM astro-ph.HE

    Diffuse X-ray Explorer: a high-resolution X-ray spectroscopic sky surveyor on the China Space Station

    Authors: Hai **, Junjie Mao, Liubiao Chen, Naihui Chen, Wei Cui, Bo Gao, **** Li, Xinfeng Li, Jiejia Liu, Jia Quan, Chunyang Jiang, Guole Wang, Le Wang, Qian Wang, Sifan Wang, Aimin Xiao, Shuo Zhang

    Abstract: DIffuse X-ray Explorer (DIXE) is a proposed high-resolution X-ray spectroscopic sky surveyor on the China Space Station (CSS). DIXE will focus on studying hot baryons in the Milky Way. Galactic hot baryons like the X-ray emitting Milky Way halo and eROSITA bubbles are best observed in the sky survey mode with a large field of view. DIXE will take advantage of the orbital motion of the CSS to scan… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 12 pages, 6 figures, the full version is published by Journal of Low Temperature Physics

  32. arXiv:2406.09782  [pdf, other

    cs.CV

    Unsupervised Monocular Depth Estimation Based on Hierarchical Feature-Guided Diffusion

    Authors: Runze Liu, Dongchen Zhu, Guanghui Zhang, Yue Xu, Wenjun Shi, Xiaolin Zhang, Lei Wang, Jiamao Li

    Abstract: Unsupervised monocular depth estimation has received widespread attention because of its capability to train without ground truth. In real-world scenarios, the images may be blurry or noisy due to the influence of weather conditions and inherent limitations of the camera. Therefore, it is particularly important to develop a robust depth estimation model. Benefiting from the training strategies of… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  33. arXiv:2406.09773  [pdf

    cs.CV cs.AI

    Research on Edge Detection of LiDAR Images Based on Artificial Intelligence Technology

    Authors: Haowei Yang, Liyang Wang, **gyu Zhang, Yu Cheng, Ao Xiang

    Abstract: With the widespread application of Light Detection and Ranging (LiDAR) technology in fields such as autonomous driving, robot navigation, and terrain map**, the importance of edge detection in LiDAR images has become increasingly prominent. Traditional edge detection methods often face challenges in accuracy and computational complexity when processing LiDAR images. To address these issues, this… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  34. arXiv:2406.09765  [pdf

    q-fin.RM cs.CL

    Application of Natural Language Processing in Financial Risk Detection

    Authors: Liyang Wang, Yu Cheng, Ao Xiang, **gyu Zhang, Haowei Yang

    Abstract: This paper explores the application of Natural Language Processing (NLP) in financial risk detection. By constructing an NLP-based financial risk detection model, this study aims to identify and predict potential risks in financial documents and communications. First, the fundamental concepts of NLP and its theoretical foundation, including text mining methods, NLP model design principles, and mac… ▽ More

    Submitted 20 June, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

  35. Massive Dirac Fermions and Strong Shubnikov-de Haas Oscillations in Topological Insulator Sm,Fe:Bi2Se3 Single Crystals

    Authors: Weiyao Zhao, Chi Xuan Trang, Qile Li, Lei Chen, Zengji Yue, Abdulhakim Bake, Cheng Tan, Lan Wang, Mitchell Nancarrow, Mark Edmonds, David Cortie, Xiaolin Wang

    Abstract: Topological insulators (TIs) are emergent materials with unique band structure, which allow the study of quantum effect in solids, as well as contribute to high performance quantum devices. To achieve the better performance of TI, here we present a co-do** strategy using synergistic rare-earth Sm and transition-metal Fe dopants in Bi2Se3 single crystals, which combine the advantages of both tran… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 5 figures

    Journal ref: Physical Review B 104, 085153 (2021)

  36. arXiv:2406.09475  [pdf, other

    hep-ex

    Search for $X(1870)$ via the decay $J/ψ\to ωK^+ K^-η$

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (644 additional authors not shown)

    Abstract: Using a sample of $(10087\pm 44)\times10^{6}$ $J/ψ$ events collected by the BESIII detector at the BEPCII collider, we search for the decay $X(1870)\to K^+ K^-η$ via the $J/ψ\to ωK^+ K^- η$ process for the first time. No significant $X(1870)$ signal is observed. The upper limit on the branching fraction of the decay $ J/ψ\to ωX(1870) \toωK^+ K^- η$ is determined to be $9.55\times 10^{-7}$ at the… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  37. arXiv:2406.09410  [pdf, other

    cs.CV cs.AI

    Scene Graph Generation in Large-Size VHR Satellite Imagery: A Large-Scale Dataset and A Context-Aware Approach

    Authors: Yansheng Li, Linlin Wang, Tingzhu Wang, Xue Yang, Junwei Luo, Qi Wang, Youming Deng, Wenbin Wang, Xian Sun, Haifeng Li, Bo Dang, Yongjun Zhang, Yi Yu, Junchi Yan

    Abstract: Scene graph generation (SGG) in satellite imagery (SAI) benefits promoting intelligent understanding of geospatial scenarios from perception to cognition. In SAI, objects exhibit great variations in scales and aspect ratios, and there exist rich relationships between objects (even between spatially disjoint objects), which makes it necessary to holistically conduct SGG in large-size very-high-reso… ▽ More

    Submitted 30 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: This paper releases a SAI-oriented SGG toolkit with about 30 OBD methods and 10 SGG methods, and develops a benchmark based on RSG where our HOD-Net and RPCM significantly outperform the state-of-the-art methods in both OBD and SGG tasks. The RSG dataset and SAI-oriented toolkit will be made publicly available at https://linlin-dev.github.io/project/RSG

  38. arXiv:2406.09317  [pdf, other

    eess.IV cs.CV

    Common and Rare Fundus Diseases Identification Using Vision-Language Foundation Model with Knowledge of Over 400 Diseases

    Authors: Meng Wang, Tian Lin, Aidi Lin, Kai Yu, Yuanyuan Peng, Lianyu Wang, Cheng Chen, Ke Zou, Huiyu Liang, Man Chen, Xue Yao, Meiqin Zhang, Binwei Huang, Chaoxin Zheng, Peixin Zhang, Wei Chen, Yilong Luo, Yifan Chen, Honghe Xia, Tingkun Shi, Qi Zhang, **ming Guo, Xiaolin Chen, **gcheng Wang, Yih Chung Tham , et al. (24 additional authors not shown)

    Abstract: Previous foundation models for retinal images were pre-trained with limited disease categories and knowledge base. Here we introduce RetiZero, a vision-language foundation model that leverages knowledge from over 400 fundus diseases. To RetiZero's pre-training, we compiled 341,896 fundus images paired with text descriptions, sourced from public datasets, ophthalmic literature, and online resources… ▽ More

    Submitted 30 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

  39. arXiv:2406.08911  [pdf, other

    cs.CL eess.AS

    An Initial Investigation of Language Adaptation for TTS Systems under Low-resource Scenarios

    Authors: Cheng Gong, Erica Cooper, Xin Wang, Chunyu Qiang, Mengzhe Geng, Dan Wells, Longbiao Wang, Jianwu Dang, Marc Tessier, Aidan Pine, Korin Richmond, Junichi Yamagishi

    Abstract: Self-supervised learning (SSL) representations from massively multilingual models offer a promising solution for low-resource language speech tasks. Despite advancements, language adaptation in TTS systems remains an open problem. This paper explores the language adaptation capability of ZMM-TTS, a recent SSL-based multilingual TTS system proposed in our previous work. We conducted experiments on… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech 2024

  40. arXiv:2406.08698  [pdf, other

    astro-ph.HE hep-ph

    Constraints on Ultra Heavy Dark Matter Properties from Dwarf Spheroidal Galaxies with LHAASO Observations

    Authors: Zhen Cao, F. Aharonian, Q. An, Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, J. T. Cai, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. H. Chen, S. Z. Chen , et al. (255 additional authors not shown)

    Abstract: In this work we try to search for signals generated by ultra-heavy dark matter at the Large High Altitude Air Shower Observatory (LHAASO) data. We look for possible gamma-ray by dark matter annihilation or decay from 16 dwarf spheroidal galaxies in the field of view of LHAASO. Dwarf spheroidal galaxies are among the most promising targets for indirect detection of dark matter which have low fluxes… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: 17 pages, 12 figures, accepted by PRL

  41. arXiv:2406.08487  [pdf, other

    cs.CV

    Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models

    Authors: Yi-Fan Zhang, Qingsong Wen, Chaoyou Fu, Xue Wang, Zhang Zhang, Liang Wang, Rong **

    Abstract: Seeing clearly with high resolution is a foundation of Large Multimodal Models (LMMs), which has been proven to be vital for visual perception and reasoning. Existing works usually employ a straightforward resolution upscaling method, where the image consists of global and local branches, with the latter being the sliced image patches but resized to the same resolution as the former. This means th… ▽ More

    Submitted 13 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

    Comments: Project page: https://github.com/yfzhang114/SliME

  42. arXiv:2406.08418  [pdf, other

    cs.CV cs.AI

    OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

    Authors: Qingyun Li, Zhe Chen, Weiyun Wang, Wenhai Wang, Shenglong Ye, Zhenjiang **, Guanzhou Chen, Yinan He, Zhangwei Gao, Erfei Cui, Jiashuo Yu, Hao Tian, Jiasheng Zhou, Chao Xu, Bin Wang, Xingjian Wei, Wei Li, Wenjian Zhang, Bo Zhang, Pinlong Cai, Licheng Wen, Xiangchao Yan, Zhenxiang Li, Pei Chu, Yi Wang , et al. (15 additional authors not shown)

    Abstract: Image-text interleaved data, consisting of multiple images and texts arranged in a natural document format, aligns with the presentation paradigm of internet data and closely resembles human reading habits. Recent studies have shown that such data aids multimodal in-context learning and maintains the capabilities of large language models during multimodal fine-tuning. However, the limited scale an… ▽ More

    Submitted 13 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

  43. arXiv:2406.08407  [pdf, other

    cs.CV cs.AI cs.CL

    MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos

    Authors: Xuehai He, Weixi Feng, Kaizhi Zheng, Yujie Lu, Wanrong Zhu, Jiachen Li, Yue Fan, Jianfeng Wang, Linjie Li, Zhengyuan Yang, Kevin Lin, William Yang Wang, Lijuan Wang, Xin Eric Wang

    Abstract: Multimodal Language Language Models (MLLMs) demonstrate the emerging abilities of "world models" -- interpreting and reasoning about complex real-world dynamics. To assess these abilities, we posit videos are the ideal medium, as they encapsulate rich representations of real-world dynamics and causalities. To this end, we introduce MMWorld, a new benchmark for multi-discipline, multi-faceted multi… ▽ More

    Submitted 13 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

  44. arXiv:2406.08380  [pdf, other

    cs.CL cs.SD eess.AS

    Towards Unsupervised Speech Recognition Without Pronunciation Models

    Authors: Junrui Ni, Liming Wang, Yang Zhang, Kaizhi Qian, Heting Gao, Mark Hasegawa-Johnson, Chang D. Yoo

    Abstract: Recent advancements in supervised automatic speech recognition (ASR) have achieved remarkable performance, largely due to the growing availability of large transcribed speech corpora. However, most languages lack sufficient paired speech and text data to effectively train these systems. In this article, we tackle the challenge of develo** ASR systems without paired speech and text corpora by pro… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: This work has been submitted to the IEEE for possible publication

  45. arXiv:2406.08225  [pdf, ps, other

    hep-ex

    Observation of $η_{c}$(1S, 2S) and $χ_{cJ}$ decays to 2$(π^{+}π^{-})η$ via $ψ$(3686) radiative transitions

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (636 additional authors not shown)

    Abstract: Based on $2.7 \times 10^9~ψ(3686)$ decays collected with the BESIII detector, the radiative decay $ψ(3686)\to\gamma2(π^{+}π^{-})η$ is investigated to measure properties of S- and P-wave charmonium states. The branching fraction of the decay $η_{c}(1S) \to 2(π^{+}π^{-})η$, which is found to have a strong dependence on the interference pattern between $η_c(1S)$ and non-$η_c(1S)$ processes, is measur… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  46. arXiv:2406.08204  [pdf, other

    cs.CV

    Diffusion-Promoted HDR Video Reconstruction

    Authors: Yuanshen Guan, Ruikang Xu, Mingde Yao, Ruisheng Gao, Lizhi Wang, Zhiwei Xiong

    Abstract: High dynamic range (HDR) video reconstruction aims to generate HDR videos from low dynamic range (LDR) frames captured with alternating exposures. Most existing works solely rely on the regression-based paradigm, leading to adverse effects such as ghosting artifacts and missing details in saturated regions. In this paper, we propose a diffusion-promoted method for HDR video reconstruction, termed… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Arxiv Preprint

  47. arXiv:2406.08184  [pdf, other

    cs.AI cs.HC

    MobileAgentBench: An Efficient and User-Friendly Benchmark for Mobile LLM Agents

    Authors: Luyuan Wang, Yongyu Deng, Yiwei Zha, Guodong Mao, Qinmin Wang, Tianchen Min, Wei Chen, Shoufa Chen

    Abstract: Large language model (LLM)-based mobile agents are increasingly popular due to their capability to interact directly with mobile phone Graphic User Interfaces (GUIs) and their potential to autonomously manage daily tasks. Despite their promising prospects in both academic and industrial sectors, little research has focused on benchmarking the performance of existing mobile agents, due to the inexh… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  48. arXiv:2406.08128  [pdf, other

    cs.LG

    Short-Long Convolutions Help Hardware-Efficient Linear Attention to Focus on Long Sequences

    Authors: Zicheng Liu, Siyuan Li, Li Wang, Zedong Wang, Yunfan Liu, Stan Z. Li

    Abstract: To mitigate the computational complexity in the self-attention mechanism on long sequences, linear attention utilizes computation tricks to achieve linear complexity, while state space models (SSMs) popularize a favorable practice of using non-data-dependent memory pattern, i.e., emphasize the near and neglect the distant, to processing sequences. Recent studies have shown the priorities by combin… ▽ More

    Submitted 13 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

    Comments: ICML 2024 camera ready

  49. arXiv:2406.07870  [pdf, ps, other

    math.OC

    Event-Triggered Optimal Tracking Control for Strict-Feedback Nonlinear Systems With Non-Affine Nonlinear Faults

    Authors: Ling Wang, Xin Wang, Ziming Wang

    Abstract: This article studies the control ideas of the optimal backstep** technique, proposing an event-triggered optimal tracking control scheme for a class of strict-feedback nonlinear systems with non-affine and nonlinear faults. A simplified identifier-critic-actor framework is employed in the reinforcement learning algorithm to achieve optimal control. The identifier estimates the unknown dynamic fu… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  50. arXiv:2406.07806  [pdf, other

    astro-ph.HE astro-ph.SR

    Probing the Shock Breakout Signal of SN 2024ggi from the Transformation of Early Flash Spectroscopy

    Authors: Jujia Zhang, Luc Dessart, Xiaofeng Wang, Qian Zhai, Yi Yang, Li** Li, Han Lin, Giorgio Valerin, Yongzhi Cai, Zhen Guo, Lingzhi Wang, Zeyi Zhao, Zhenyu Wang, Shengyu Yan

    Abstract: We present early-time, hour-to-day cadence spectroscopy of the nearby type II supernova (SN II) 2024ggi, which was discovered at a phase when the SN shock just emerged from the red-supergiant (RSG) progenitor star. Over the first few days after the first light, SN 2024ggi exhibited prominent narrow emission lines formed through intense and persistent photoionization of the nearby circumstellar mat… ▽ More

    Submitted 29 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

    Comments: 10 pages and 5 figures in the main text (16 pages and 9 figures in total). Accepted for publication in ApJL