Skip to main content

Showing 1–50 of 688 results for author: Yang, K

Searching in archive cs. Search in all archives.
.
  1. MARLP: Time-series Forecasting Control for Agricultural Managed Aquifer Recharge

    Authors: Yuning Chen, Kang Yang, Zhiyu An, Brady Holder, Luke Paloutzian, Khaled Bali, Wan Du

    Abstract: The rapid decline in groundwater around the world poses a significant challenge to sustainable agriculture. To address this issue, agricultural managed aquifer recharge (Ag-MAR) is proposed to recharge the aquifer by artificially flooding agricultural lands using surface water. Ag-MAR requires a carefully selected flooding schedule to avoid affecting the oxygen absorption of crop roots. However, c… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Accepted by KDD 2024

  2. arXiv:2407.00614  [pdf, other

    cs.RO cs.CV eess.IV

    Learning Granularity-Aware Affordances from Human-Object Interaction for Tool-Based Functional Gras** in Dexterous Robotics

    Authors: Fan Yang, Wenrui Chen, Kailun Yang, Haoran Lin, DongSheng Luo, Conghui Tang, Zhiyong Li, Yaonan Wang

    Abstract: To enable robots to use tools, the initial step is teaching robots to employ dexterous gestures for touching specific areas precisely where tasks are performed. Affordance features of objects serve as a bridge in the functional interaction between agents and objects. However, leveraging these affordance cues to help robots achieve functional tool gras** remains unresolved. To address this, we pr… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: The source code and the established dataset will be made publicly available at https://github.com/yangfan293/GAAF-DEX

  3. arXiv:2407.00496  [pdf, other

    cs.LG cs.AI

    A Two-stage Reinforcement Learning-based Approach for Multi-entity Task Allocation

    Authors: Aicheng Gong, Kai Yang, Jiafei Lyu, Xiu Li

    Abstract: Task allocation is a key combinatorial optimization problem, crucial for modern applications such as multi-robot cooperation and resource scheduling. Decision makers must allocate entities to tasks reasonably across different scenarios. However, traditional methods assume static attributes and numbers of tasks and entities, often relying on dynamic programming and heuristic algorithms for solution… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  4. arXiv:2407.00020  [pdf, other

    cs.CV cs.AI cs.CL cs.IT cs.LG

    Visual Language Model based Cross-modal Semantic Communication Systems

    Authors: Feibo Jiang, Chuanguo Tang, Li Dong, Kezhi Wang, Kun Yang, Cunhua Pan

    Abstract: Semantic Communication (SC) has emerged as a novel communication paradigm in recent years, successfully transcending the Shannon physical capacity limits through innovative semantic transmission concepts. Nevertheless, extant Image Semantic Communication (ISC) systems face several challenges in dynamic environments, including low semantic density, catastrophic forgetting, and uncertain Signal-to-N… ▽ More

    Submitted 6 May, 2024; originally announced July 2024.

    Comments: 12 pages, 10 figures

  5. arXiv:2406.18585  [pdf, other

    cs.CV cs.AI

    Flexible ViG: Learning the Self-Saliency for Flexible Object Recognition

    Authors: Lin Zuo, Kunshan Yang, Xianlong Tian, Kunbin He, Yongqi Ding, Mengmeng **g

    Abstract: Existing computer vision methods mainly focus on the recognition of rigid objects, whereas the recognition of flexible objects remains unexplored. Recognizing flexible objects poses significant challenges due to their inherently diverse shapes and sizes, translucent attributes, ambiguous boundaries, and subtle inter-class differences. In this paper, we claim that these problems primarily arise fro… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: under review

  6. arXiv:2406.14885  [pdf, other

    cs.HC

    Ink and Algorithm: Exploring Temporal Dynamics in Human-AI Collaborative Writing

    Authors: Kaixun Yang, Yixin Cheng, Linxuan Zhao, Mladen Raković, Zachari Swiecki, Dragan Gašević, Guanliang Chen

    Abstract: The advent of Generative Artificial Intelligence (GAI) has revolutionized the field of writing, marking a shift towards human-AI collaborative writing in education. However, the dynamics of human-AI interaction in the collaborative writing process are not well understood, and thus it remains largely unknown how human learning can be effectively supported with such cutting-edge GAI technologies. In… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  7. arXiv:2406.13149  [pdf, other

    cs.CV

    High-Fidelity Facial Albedo Estimation via Texture Quantization

    Authors: Zimin Ran, Xingyu Ren, Xiang An, Kaicheng Yang, Xiangzi Dai, Ziyong Feng, Jia Guo, Linchao Zhu, Jiankang Deng

    Abstract: Recent 3D face reconstruction methods have made significant progress in shape estimation, but high-fidelity facial albedo reconstruction remains challenging. Existing methods depend on expensive light-stage captured data to learn facial albedo maps. However, a lack of diversity in subjects limits their ability to recover high-fidelity results. In this paper, we present a novel facial albedo recons… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  8. arXiv:2406.11301  [pdf, other

    cs.AI cs.CL cs.LG

    Optimizing and Testing Instruction-Following: Analyzing the Impact of Fine-Grained Instruction Variants on instruction-tuned LLMs

    Authors: Jiuding Yang, Weidong Guo, Kaitong Yang, Xiangyang Li, Zhuwei Rao, Yu Xu, Di Niu

    Abstract: The effective alignment of Large Language Models (LLMs) with precise instructions is essential for their application in diverse real-world scenarios. Current methods focus on enhancing the diversity and complexity of training and evaluation samples, yet they fall short in accurately assessing LLMs' ability to follow similar instruction variants. We introduce an effective data augmentation techniqu… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  9. arXiv:2406.11093  [pdf, other

    cs.CL

    RAEmoLLM: Retrieval Augmented LLMs for Cross-Domain Misinformation Detection Using In-Context Learning based on Emotional Information

    Authors: Zhiwei Liu, Kailai Yang, Qianqian Xie, Christine de Kock, Sophia Ananiadou, Eduard Hovy

    Abstract: Misinformation is prevalent in various fields such as education, politics, health, etc., causing significant harm to society. However, current methods for cross-domain misinformation detection rely on time and resources consuming fine-tuning and complex model structures. With the outstanding performance of LLMs, many studies have employed them for misinformation detection. Unfortunately, they focu… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  10. arXiv:2406.10809  [pdf, other

    cs.CL cs.AI

    Post-hoc Utterance Refining Method by Entity Mining for Faithful Knowledge Grounded Conversations

    Authors: Yoonna Jang, Suhyune Son, Jeongwoo Lee, Junyoung Son, Yuna Hur, Jungwoo Lim, Hyeonseok Moon, Kisu Yang, Heuiseok Lim

    Abstract: Despite the striking advances in recent language generation performance, model-generated responses have suffered from the chronic problem of hallucinations that are either untrue or unfaithful to a given source. Especially in the task of knowledge grounded conversation, the models are required to generate informative responses, but hallucinated utterances lead to miscommunication. In particular, e… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: Accepted at EMNLP 2023

  11. arXiv:2406.09841  [pdf, other

    cs.LG q-bio.BM

    Learning Multi-view Molecular Representations with Structured and Unstructured Knowledge

    Authors: Yizhen Luo, Kai Yang, Massimo Hong, Xing Yi Liu, Zikun Nie, Hao Zhou, Zaiqing Nie

    Abstract: Capturing molecular knowledge with representation learning approaches holds significant potential in vast scientific fields such as chemistry and life science. An effective and generalizable molecular representation is expected to capture the consensus and complementary molecular expertise from diverse views and perspectives. However, existing works fall short in learning multi-view molecular repr… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 12 pages, 4 figures

  12. arXiv:2406.09467  [pdf, other

    cs.HC

    "I see it as a wellspring for my positive and upward journey in life.": Understanding Current Practices of Assistive Technology's Customized Modification in China

    Authors: Kexin Yang, Junyi Wu, Haokun Xin, Jiangtao Gong

    Abstract: Due to the significant differences in physical conditions and living environments of people with disabilities, standardized assistive technologies (ATs) often fail to meet their needs. Modified AT, especially DIY (Do It Yourself) ATs, are a popular solution in many high-income countries, but there is a lack of documentation for low- and middle-income areas, especially in China, where the culture o… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    MSC Class: H.5.2

    Journal ref: CSCW2024

  13. arXiv:2406.08907  [pdf, other

    cs.CV cs.MM

    Dual Attribute-Spatial Relation Alignment for 3D Visual Grounding

    Authors: Yue Xu, Kaizhi Yang, Jiebo Luo, Xue** Chen

    Abstract: 3D visual grounding is an emerging research area dedicated to making connections between the 3D physical world and natural language, which is crucial for achieving embodied intelligence. In this paper, we propose DASANet, a Dual Attribute-Spatial relation Alignment Network that separately models and aligns object attributes and spatial relation features between language and 3D vision modalities. W… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  14. arXiv:2406.07862  [pdf, other

    cs.LG cs.AI cs.CV cs.NE

    Self-Distillation Learning Based on Temporal-Spatial Consistency for Spiking Neural Networks

    Authors: Lin Zuo, Yongqi Ding, Mengmeng **g, Kunshan Yang, Yunqian Yu

    Abstract: Spiking neural networks (SNNs) have attracted considerable attention for their event-driven, low-power characteristics and high biological interpretability. Inspired by knowledge distillation (KD), recent research has improved the performance of the SNN model with a pre-trained teacher model. However, additional teacher models require significant computational resources, and it is tedious to manua… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: 17 pages, 6 figures

    ACM Class: I.2.6; I.5.1

  15. arXiv:2406.07541  [pdf, other

    cs.LG

    CDSA: Conservative Denoising Score-based Algorithm for Offline Reinforcement Learning

    Authors: Zeyuan Liu, Kai Yang, Xiu Li

    Abstract: Distribution shift is a major obstacle in offline reinforcement learning, which necessitates minimizing the discrepancy between the learned policy and the behavior policy to avoid overestimating rare or unseen actions. Previous conservative offline RL algorithms struggle to generalize to unseen actions, despite their success in learning good in-distribution policy. In contrast, we propose to use t… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  16. arXiv:2406.06973  [pdf, other

    cs.CV

    RWKV-CLIP: A Robust Vision-Language Representation Learner

    Authors: Tiancheng Gu, Kaicheng Yang, Xiang An, Ziyong Feng, Dongnan Liu, Weidong Cai, Jiankang Deng

    Abstract: Contrastive Language-Image Pre-training (CLIP) has significantly improved performance in various vision-language tasks by expanding the dataset with image-text pairs obtained from websites. This paper further explores CLIP from the perspectives of data and model architecture. To address the prevalence of noisy data and enhance the quality of large-scale image-text data crawled from the internet, w… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: 14 pages, 10 figures

  17. arXiv:2406.04721  [pdf, other

    cs.IT eess.SP

    End-to-End Design of Polar Coded Integrated Data and Energy Networking

    Authors: Jie Hu, **gwen Cui, Kun Yang

    Abstract: In order to transmit data and transfer energy to the low-power Internet of Things (IoT) devices, integrated data and energy networking (IDEN) system may be harnessed. In this context, we propose a bitwise end-to-end design for polar coded IDEN systems, where the conventional encoding/decoding, modulation/demodulation, and energy harvesting (EH) modules are replaced by the neural networks (NNs). In… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  18. arXiv:2406.04240  [pdf, other

    cs.LG cs.CL

    Hypernetworks for Personalizing ASR to Atypical Speech

    Authors: Max Müller-Eberstein, Dianna Yee, Karren Yang, Gautam Varma Mantena, Colin Lea

    Abstract: Parameter-efficient fine-tuning (PEFT) for personalizing automatic speech recognition (ASR) has recently shown promise for adapting general population models to atypical speech. However, these approaches assume a priori knowledge of the atypical speech disorder being adapted for -- the diagnosis of which requires expert knowledge that is not always available. Even given this knowledge, data scarci… ▽ More

    Submitted 10 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

  19. arXiv:2406.02310  [pdf, other

    cs.LG

    Disentangled Representation via Variational AutoEncoder for Continuous Treatment Effect Estimation

    Authors: Rui**g Cui, Jianbin Sun, Bingyu He, Kewei Yang, Bingfeng Ge

    Abstract: Continuous treatment effect estimation holds significant practical importance across various decision-making and assessment domains, such as healthcare and the military. However, current methods for estimating dose-response curves hinge on balancing the entire representation by treating all covariates as confounding variables. Although various approaches disentangle covariates into different facto… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  20. arXiv:2406.00376  [pdf, other

    cs.DS cs.DB

    Approaching 100% Confidence in Stream Summary through ReliableSketch

    Authors: Yuhan Wu, Hanbo Wu, Xilai Liu, Yikai Zhao, Tong Yang, Kaicheng Yang, Sha Wang, Lihua Miao, Gaogang Xie

    Abstract: To approximate sums of values in key-value data streams, sketches are widely used in databases and networking systems. They offer high-confidence approximations for any given key while ensuring low time and space overhead. While existing sketches are proficient in estimating individual keys, they struggle to maintain this high confidence across all keys collectively, an objective that is criticall… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  21. arXiv:2405.20550  [pdf

    cs.LG stat.ML

    Uncertainty Quantification for Deep Learning

    Authors: Peter Jan van Leeuwen, J. Christine Chiu, C. Kevin Yang

    Abstract: A complete and statistically consistent uncertainty quantification for deep learning is provided, including the sources of uncertainty arising from (1) the new input data, (2) the training and testing data (3) the weight vectors of the neural network, and (4) the neural network because it is not a perfect predictor. Using Bayes Theorem and conditional probability densities, we demonstrate how each… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 25 pages 4 figures, submitted to Environmental data Science

    MSC Class: 62D99 ACM Class: G.3

  22. arXiv:2405.17216  [pdf, other

    cs.LG cs.AI cs.LO stat.ML

    Autoformalizing Euclidean Geometry

    Authors: Logan Murphy, Kaiyu Yang, Jialiang Sun, Zhaoyu Li, Anima Anandkumar, Xujie Si

    Abstract: Autoformalization involves automatically translating informal math into formal theorems and proofs that are machine-verifiable. Euclidean geometry provides an interesting and controllable domain for studying autoformalization. In this paper, we introduce a neuro-symbolic framework for autoformalizing Euclidean geometry, which combines domain knowledge, SMT solvers, and large language models (LLMs)… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Accepted to ICML 2024. The first two authors contributed equally

  23. arXiv:2405.15170  [pdf, other

    cs.CV cs.RO

    Label-efficient Semantic Scene Completion with Scribble Annotations

    Authors: Song Wang, Jiawei Yu, Wentong Li, Hao Shi, Kailun Yang, Junbo Chen, Jianke Zhu

    Abstract: Semantic scene completion aims to infer the 3D geometric structures with semantic classes from camera or LiDAR, which provide essential occupancy information in autonomous driving. Prior endeavors concentrate on constructing the network or benchmark in a fully supervised manner. While the dense occupancy grids need point-wise semantic annotations, which incur expensive and tedious labeling costs.… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: Accepted by IJCAI2024

  24. arXiv:2405.11238  [pdf, other

    cs.LG cs.AI

    SimAD: A Simple Dissimilarity-based Approach for Time Series Anomaly Detection

    Authors: Zhijie Zhong, Zhiwen Yu, Xing Xi, Yue Xu, Jiahui Chen, Kaixiang Yang

    Abstract: Despite the prevalence of reconstruction-based deep learning methods, time series anomaly detection remains challenging. Existing approaches often struggle with limited temporal contexts, inadequate representation of normal patterns, and flawed evaluation metrics, hindering their effectiveness in identifying aberrant behavior. To address these issues, we introduce $\textbf{SimAD}$, a… ▽ More

    Submitted 18 May, 2024; originally announced May 2024.

    Comments: 18 pages, 12 figures,7 tables, Under review

  25. arXiv:2405.11146  [pdf, other

    cs.SI cs.CY physics.soc-ph

    Election Polls on Social Media: Prevalence, Biases, and Voter Fraud Beliefs

    Authors: Stephen Scarano, Vijayalakshmi Vasudevan, Mattia Samory, Kai-Cheng Yang, JungHwan Yang, Przemyslaw A. Grabowicz

    Abstract: Social media platforms allow users to create polls to gather public opinion on diverse topics. However, we know little about what such polls are used for and how reliable they are, especially in significant contexts like elections. Focusing on the 2020 presidential elections in the U.S., this study shows that outcomes of election polls on Twitter deviate from election results despite their prevale… ▽ More

    Submitted 22 May, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

    Comments: 14 pages, 10 figures

  26. arXiv:2405.10098  [pdf, other

    cs.NE

    When Large Language Model Meets Optimization

    Authors: Sen Huang, Kaixiang Yang, Sheng Qi, Rui Wang

    Abstract: Optimization algorithms and large language models (LLMs) enhance decision-making in dynamic environments by integrating artificial intelligence with traditional techniques. LLMs, with extensive domain knowledge, facilitate intelligent modeling and strategic decision-making in optimization, while optimization algorithms refine LLM architectures and output quality. This synergy offers novel approach… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

  27. arXiv:2405.05518  [pdf, other

    cs.CV cs.RO eess.IV

    DTCLMapper: Dual Temporal Consistent Learning for Vectorized HD Map Construction

    Authors: Siyu Li, Jiacheng Lin, Hao Shi, Jiaming Zhang, Song Wang, You Yao, Zhiyong Li, Kailun Yang

    Abstract: Temporal information plays a pivotal role in Bird's-Eye-View (BEV) driving scene understanding, which can alleviate the visual information sparsity. However, the indiscriminate temporal fusion method will cause the barrier of feature redundancy when constructing vectorized High-Definition (HD) maps. In this paper, we revisit the temporal fusion of vectorized HD maps, focusing on temporal instance… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: The source code will be made publicly available at https://github.com/lynn-yu/DTCLMapper

  28. arXiv:2405.02942  [pdf, other

    physics.optics cs.CV cs.RO eess.IV

    Design, analysis, and manufacturing of a glass-plastic hybrid minimalist aspheric panoramic annular lens

    Authors: Shaohua Gao, Qi Jiang, Yiqi Liao, Yi Qiu, Wanglei Ying, Kailun Yang, Kaiwei Wang, Benhao Zhang, Jian Bai

    Abstract: We propose a high-performance glass-plastic hybrid minimalist aspheric panoramic annular lens (ASPAL) to solve several major limitations of the traditional panoramic annular lens (PAL), such as large size, high weight, and complex system. The field of view (FoV) of the ASPAL is 360°x(35°~110°) and the imaging quality is close to the diffraction limit. This large FoV ASPAL is composed of only 4 len… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: Accepted to Optics & Laser Technology

  29. arXiv:2405.02372  [pdf, ps, other

    stat.ML cs.AI cs.LG

    Triadic-OCD: Asynchronous Online Change Detection with Provable Robustness, Optimality, and Convergence

    Authors: Yancheng Huang, Kai Yang, Zelin Zhu, Leian Chen

    Abstract: The primary goal of online change detection (OCD) is to promptly identify changes in the data stream. OCD problem find a wide variety of applications in diverse areas, e.g., security detection in smart grids and intrusion detection in communication networks. Prior research usually assumes precise knowledge of the system parameters. Nevertheless, this presumption often proves unattainable in practi… ▽ More

    Submitted 4 June, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

    Comments: Accepted at ICML2024

  30. arXiv:2405.01872  [pdf, other

    cs.CV

    Defect Image Sample Generation With Diffusion Prior for Steel Surface Defect Recognition

    Authors: Yichun Tai, Kun Yang, Tao Peng, Zhenzhen Huang, Zhijiang Zhang

    Abstract: The task of steel surface defect recognition is an industrial problem with great industry values. The data insufficiency is the major challenge in training a robust defect recognition network. Existing methods have investigated to enlarge the dataset by generating samples with generative models. However, their generation quality is still limited by the insufficiency of defect image samples. To thi… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

  31. arXiv:2405.01258  [pdf, other

    cs.CV cs.RO eess.IV

    Towards Consistent Object Detection via LiDAR-Camera Synergy

    Authors: Kai Luo, Hao Wu, Kefu Yi, Kailun Yang, Wei Hao, Rongdong Hu

    Abstract: As human-machine interaction continues to evolve, the capacity for environmental perception is becoming increasingly crucial. Integrating the two most common types of sensory data, images, and point clouds, can enhance detection accuracy. However, currently, no model exists that can simultaneously detect an object's position in both point clouds and images and ascertain their corresponding relatio… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: The source code will be made publicly available at https://github.com/xifen523/COD

  32. arXiv:2404.19201  [pdf, other

    eess.IV cs.CV cs.RO physics.optics

    Global Search Optics: Automatically Exploring Optimal Solutions to Compact Computational Imaging Systems

    Authors: Yao Gao, Qi Jiang, Shaohua Gao, Lei Sun, Kailun Yang, Kaiwei Wang

    Abstract: The popularity of mobile vision creates a demand for advanced compact computational imaging systems, which call for the development of both a lightweight optical system and an effective image reconstruction model. Recently, joint design pipelines come to the research forefront, where the two significant components are simultaneously optimized via data-driven learning to realize the optimal system… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: The source code will be made publicly available at https://github.com/wumengshenyou/GSO

  33. arXiv:2404.18041  [pdf, other

    quant-ph cs.LG math.OC

    Variational Optimization for Quantum Problems using Deep Generative Networks

    Authors: Lingxia Zhang, Xiaodie Lin, Peidong Wang, Kaiyan Yang, Xiao Zeng, Zhaohui Wei, Zizhu Wang

    Abstract: Optimization is one of the keystones of modern science and engineering. Its applications in quantum technology and machine learning helped nurture variational quantum algorithms and generative AI respectively. We propose a general approach to design variational optimization algorithms based on generative models: the Variational Generative Optimization Network (VGON). To demonstrate its broad appli… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Comments: 17 pages, 13 figures, comments welcome

  34. arXiv:2404.16456  [pdf, other

    cs.CV

    Correlation-Decoupled Knowledge Distillation for Multimodal Sentiment Analysis with Incomplete Modalities

    Authors: Mingcheng Li, Dingkang Yang, Xiao Zhao, Shuaibing Wang, Yan Wang, Kun Yang, Mingyang Sun, Dongliang Kou, Ziyun Qian, Lihua Zhang

    Abstract: Multimodal sentiment analysis (MSA) aims to understand human sentiment through multimodal data. Most MSA efforts are based on the assumption of modality completeness. However, in real-world applications, some practical factors cause uncertain modality missingness, which drastically degrades the model's performance. To this end, we propose a Correlation-decoupled Knowledge Distillation (CorrKD) fra… ▽ More

    Submitted 10 June, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024

  35. arXiv:2404.16346  [pdf, other

    eess.IV cs.AI cs.CV

    Light-weight Retinal Layer Segmentation with Global Reasoning

    Authors: Xiang He, Weiye Song, Yiming Wang, Fabio Poiesi, Ji Yi, Manishi Desai, Quanqing Xu, Kongzheng Yang, Yi Wan

    Abstract: Automatic retinal layer segmentation with medical images, such as optical coherence tomography (OCT) images, serves as an important tool for diagnosing ophthalmic diseases. However, it is challenging to achieve accurate segmentation due to low contrast and blood flow noises presented in the images. In addition, the algorithm should be light-weight to be deployed for practical clinical applications… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: IEEE Transactions on Instrumentation & Measurement

  36. arXiv:2404.16302  [pdf, other

    cs.CV cs.MM cs.RO eess.IV

    CFMW: Cross-modality Fusion Mamba for Multispectral Object Detection under Adverse Weather Conditions

    Authors: Haoyuan Li, Qi Hu, You Yao, Kailun Yang, Peng Chen

    Abstract: Cross-modality images that integrate visible-infrared spectra cues can provide richer complementary information for object detection. Despite this, existing visible-infrared object detection methods severely degrade in severe weather conditions. This failure stems from the pronounced sensitivity of visible images to environmental perturbations, such as rain, haze, and snow, which frequently cause… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: The dataset and source code will be made publicly available at https://github.com/lhy-zjut/CFMW

  37. arXiv:2404.15297  [pdf, ps, other

    eess.SP cs.IT cs.LG

    Multi-stream Transmission for Directional Modulation Network via Distributed Multi-UAV-aided Multi-active-IRS

    Authors: Ke Yang, Rongen Dong, Wei Gao, Feng Shu, Wei** Shi, Yan Wang, Xuehui Wang, Jiangzhou Wang

    Abstract: Active intelligent reflecting surface (IRS) is a revolutionary technique for the future 6G networks. The conventional far-field single-IRS-aided directional modulation(DM) networks have only one (no direct path) or two (existing direct path) degrees of freedom (DoFs). This means that there are only one or two streams transmitted simultaneously from base station to user and will seriously limit its… ▽ More

    Submitted 28 April, 2024; v1 submitted 26 March, 2024; originally announced April 2024.

  38. arXiv:2404.14132  [pdf, other

    cs.CV eess.IV

    CRNet: A Detail-Preserving Network for Unified Image Restoration and Enhancement Task

    Authors: Kangzhen Yang, Tao Hu, Kexin Dai, Genggeng Chen, Yu Cao, Wei Dong, Peng Wu, Yanning Zhang, Qingsen Yan

    Abstract: In real-world scenarios, images captured often suffer from blurring, noise, and other forms of image degradation, and due to sensor limitations, people usually can only obtain low dynamic range images. To achieve high-quality images, researchers have attempted various image restoration and enhancement operations on photographs, including denoising, deblurring, and high dynamic range imaging. Howev… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: This paper is accepted by CVPR2024 Workshop, Code: https://github.com/CalvinYang0/CRNet

  39. arXiv:2404.14032  [pdf, other

    cs.CV

    1st Place Solution to the 1st SkatingVerse Challenge

    Authors: Tao Sun, Yuanzi Fu, Kaicheng Yang, Jian Wu, Ziyong Feng

    Abstract: This paper presents the winning solution for the 1st SkatingVerse Challenge. We propose a method that involves several steps. To begin, we leverage the DINO framework to extract the Region of Interest (ROI) and perform precise crop** of the raw video footage. Subsequently, we employ three distinct models, namely Unmasked Teacher, UniformerV2, and InfoGCN, to capture different aspects of the data… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: 3 pages, 1st SkatingVerse Challenge, 18th IEEE International Conference on Automatic Face and Gesture Recognition workshop

  40. arXiv:2404.13537  [pdf, other

    eess.IV cs.CV

    Bracketing Image Restoration and Enhancement with High-Low Frequency Decomposition

    Authors: Genggeng Chen, Kexin Dai, Kangzhen Yang, Tao Hu, Xiangyu Chen, Yongqing Yang, Wei Dong, Peng Wu, Yanning Zhang, Qingsen Yan

    Abstract: In real-world scenarios, due to a series of image degradations, obtaining high-quality, clear content photos is challenging. While significant progress has been made in synthesizing high-quality images, previous methods for image restoration and enhancement often overlooked the characteristics of different degradations. They applied the same structure to address various types of degradation, resul… ▽ More

    Submitted 24 April, 2024; v1 submitted 21 April, 2024; originally announced April 2024.

    Comments: This paper is accepted by CVPR 2024 Workshop, code: https://github.com/chengeng0613/HLNet

  41. arXiv:2404.13238  [pdf, other

    cs.LG cs.AI cs.CL

    Personalized Wireless Federated Learning for Large Language Models

    Authors: Feibo Jiang, Li Dong, Siwei Tu, Yubo Peng, Kezhi Wang, Kun Yang, Cunhua Pan, Dusit Niyato

    Abstract: Large Language Models (LLMs) have revolutionized natural language processing tasks. However, their deployment in wireless networks still face challenges, i.e., a lack of privacy and security protection mechanisms. Federated Learning (FL) has emerged as a promising approach to address these challenges. Yet, it suffers from issues including inefficient handling with big and heterogeneous data, resou… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: 8 pages, 5 figures

  42. arXiv:2404.13039  [pdf, other

    cs.CV cs.CL

    LaPA: Latent Prompt Assist Model For Medical Visual Question Answering

    Authors: Tiancheng Gu, Kaicheng Yang, Dongnan Liu, Weidong Cai

    Abstract: Medical visual question answering (Med-VQA) aims to automate the prediction of correct answers for medical images and questions, thereby assisting physicians in reducing repetitive tasks and alleviating their workload. Existing approaches primarily focus on pre-training models using additional and comprehensive datasets, followed by fine-tuning to enhance performance in downstream tasks. However,… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: 10 pages, 4 figures, Accepted by CVPRW2024

  43. arXiv:2404.12794  [pdf, other

    cs.CV cs.MM cs.RO eess.IV

    MambaMOS: LiDAR-based 3D Moving Object Segmentation with Motion-aware State Space Model

    Authors: Kang Zeng, Hao Shi, Jiacheng Lin, Siyu Li, **tao Cheng, Kaiwei Wang, Zhiyong Li, Kailun Yang

    Abstract: LiDAR-based Moving Object Segmentation (MOS) aims to locate and segment moving objects in point clouds of the current scan using motion information from previous scans. Despite the promising results achieved by previous MOS methods, several key issues, such as the weak coupling of temporal and spatial information, still need further study. In this paper, we propose a novel LiDAR-based 3D Moving Ob… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: The source code will be made publicly available at https://github.com/Terminal-K/MambaMOS

  44. arXiv:2404.12534  [pdf, other

    cs.AI cs.LG cs.LO stat.ML

    Towards Large Language Models as Copilots for Theorem Proving in Lean

    Authors: Peiyang Song, Kaiyu Yang, Anima Anandkumar

    Abstract: Theorem proving is an important challenge for large language models (LLMs), as formal proofs can be checked rigorously by proof assistants such as Lean, leaving no room for hallucination. Existing LLM-based provers try to prove theorems in a fully autonomous mode without human intervention. In this mode, they struggle with novel and challenging theorems, for which human insights may be critical. I… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: All code open-sourced at https://github.com/lean-dojo/LeanCopilot

  45. arXiv:2404.10169  [pdf, ps, other

    math.ST cs.IT

    Asymptotic mutual information in quadratic estimation problems over compact groups

    Authors: Kaylee Y. Yang, Timothy L. H. Wee, Zhou Fan

    Abstract: Motivated by applications to group synchronization and quadratic assignment on random data, we study a general problem of Bayesian inference of an unknown ``signal'' belonging to a high-dimensional compact group, given noisy pairwise observations of a featurization of this signal. We establish a quantitative comparison between the signal-observation mutual information in any such problem with that… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  46. arXiv:2404.09939  [pdf, other

    cs.AI

    A Survey on Deep Learning for Theorem Proving

    Authors: Zhaoyu Li, Jialiang Sun, Logan Murphy, Qidong Su, Zenan Li, Xian Zhang, Kaiyu Yang, Xujie Si

    Abstract: Theorem proving is a fundamental aspect of mathematics, spanning from informal reasoning in mathematical language to rigorous derivations in formal systems. In recent years, the advancement of deep learning, especially the emergence of large language models, has sparked a notable surge of research exploring these techniques to enhance the process of theorem proving. This paper presents a pioneerin… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  47. arXiv:2404.08296  [pdf, other

    cs.RO

    High-Speed Interception Multicopter Control by Image-based Visual Servoing

    Authors: Kun Yang, Chenggang Bai, Zhikun She, Quan Quan

    Abstract: In recent years, reports of illegal drones threatening public safety have increased. For the invasion of fully autonomous drones, traditional methods such as radio frequency interference and GPS shielding may fail. This paper proposes a scheme that uses an autonomous multicopter with a strapdown camera to intercept a maneuvering intruder UAV. The interceptor multicopter can autonomously detect and… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

  48. arXiv:2404.07960  [pdf, other

    cs.AI cs.CY

    Content Knowledge Identification with Multi-Agent Large Language Models (LLMs)

    Authors: Kaiqi Yang, Yucheng Chu, Taylor Darwin, Ahreum Han, Hang Li, Hongzhi Wen, Yasemin Copur-Gencturk, Jiliang Tang, Hui Liu

    Abstract: Teachers' mathematical content knowledge (CK) is of vital importance and need in teacher professional development (PD) programs. Computer-aided asynchronous PD systems are the most recent proposed PD techniques, which aim to help teachers improve their PD equally with fewer concerns about costs and limitations of time or location. However, current automatic CK identification methods, which serve a… ▽ More

    Submitted 21 March, 2024; originally announced April 2024.

  49. arXiv:2404.05966  [pdf, other

    cs.CL cs.AI

    THOUGHTSCULPT: Reasoning with Intermediate Revision and Search

    Authors: Yizhou Chi, Kevin Yang, Dan Klein

    Abstract: We present THOUGHTSCULPT, a general reasoning and search method for tasks with outputs that can be decomposed into components. THOUGHTSCULPT explores a search tree of potential solutions using Monte Carlo Tree Search (MCTS), building solutions one action at a time and evaluating according to any domain-specific heuristic, which in practice is often simply an LLM evaluator. Critically, our action s… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: Code and data available at https://github.com/cyzus/thoughtsculpt

  50. arXiv:2404.04927  [pdf, ps, other

    cs.IT

    Holographic Integrated Data and Energy Transfer

    Authors: Qingxiao Huang, Jie Hu, Yizhe Zhao, Kun Yang

    Abstract: Thanks to the application of metamaterials, holographic multiple-input multiple-output (H-MIMO) is expected to achieve a higher spatial diversity gain by enabling the ability to generate any current distribution on the surface. With the aid of electromagnetic (EM) manipulation capability of H-MIMO, integrated data and energy transfer (IDET) system can fully exploits the EM channel to realize energ… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.