Skip to main content

Showing 1–50 of 450 results for author: Jiao, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01523  [pdf, other

    cs.CV cs.CL

    MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations

    Authors: Yubo Ma, Yuhang Zang, Liangyu Chen, Meiqi Chen, Yizhu Jiao, Xinze Li, Xinyuan Lu, Ziyu Liu, Yan Ma, Xiaoyi Dong, Pan Zhang, Liangming Pan, Yu-Gang Jiang, Jiaqi Wang, Yixin Cao, Aixin Sun

    Abstract: Understanding documents with rich layouts and multi-modal components is a long-standing and practical task. Recent Large Vision-Language Models (LVLMs) have made remarkable strides in various tasks, particularly in single-page document understanding (DU). However, their abilities on long-context DU remain an open problem. This work presents MMLongBench-Doc, a long-context, multi-modal benchmark co… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  2. arXiv:2407.00412  [pdf, other

    cs.RO cs.IT cs.MA cs.NI

    C-MASS: Combinatorial Mobility-Aware Sensor Scheduling for Collaborative Perception with Second-Order Topology Approximation

    Authors: Yukuan Jia, Yuxuan Sun, Ruiqing Mao, Zhaojun Nan, Sheng Zhou, Zhisheng Niu

    Abstract: Collaborative Perception (CP) has been a promising solution to address occlusions in the traffic environment by sharing sensor data among collaborative vehicles (CoV) via vehicle-to-everything (V2X) network. With limited wireless bandwidth, CP necessitates task-oriented and receiver-aware sensor scheduling to prioritize important and complementary sensor data. However, due to vehicular mobility, i… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: 14 pages, 10 figures

  3. arXiv:2406.17797  [pdf, other

    physics.chem-ph cs.AI cs.LG

    MoleculeCLA: Rethinking Molecular Benchmark via Computational Ligand-Target Binding Analysis

    Authors: Shikun Feng, Jiaxin Zheng, Yinjun Jia, Yanwen Huang, Fengfeng Zhou, Wei-Ying Ma, Yanyan Lan

    Abstract: Molecular representation learning is pivotal for various molecular property prediction tasks related to drug discovery. Robust and accurate benchmarks are essential for refining and validating current methods. Existing molecular property benchmarks derived from wet experiments, however, face limitations such as data volume constraints, unbalanced label distribution, and noisy labels. To address th… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  4. arXiv:2406.14955  [pdf, other

    cs.CL

    ICLEval: Evaluating In-Context Learning Ability of Large Language Models

    Authors: Wentong Chen, Yankai Lin, ZhenHao Zhou, HongYun Huang, Yantao Jia, Zhao Cao, Ji-Rong Wen

    Abstract: In-Context Learning (ICL) is a critical capability of Large Language Models (LLMs) as it empowers them to comprehend and reason across interconnected inputs. Evaluating the ICL ability of LLMs can enhance their utilization and deepen our understanding of how this ability is acquired at the training stage. However, existing evaluation frameworks primarily focus on language abilities and knowledge,… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  5. arXiv:2406.12195  [pdf, other

    quant-ph cs.LG

    Quantum Compiling with Reinforcement Learning on a Superconducting Processor

    Authors: Z. T. Wang, Qiuhao Chen, Yuxuan Du, Z. H. Yang, Xiaoxia Cai, Kaixuan Huang, **gning Zhang, Kai Xu, Jun Du, Yinan Li, Yuling Jiao, Xingyao Wu, Wu Liu, Xiliang Lu, Huikai Xu, Yirong **, Ruixia Wang, Haifeng Yu, S. P. Zhao

    Abstract: To effectively implement quantum algorithms on noisy intermediate-scale quantum (NISQ) processors is a central task in modern quantum technology. NISQ processors feature tens to a few hundreds of noisy qubits with limited coherence times and gate operations with errors, so NISQ algorithms naturally require employing circuits of short lengths via quantum compilation. Here, we develop a reinforcemen… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  6. arXiv:2406.08961  [pdf, other

    q-bio.BM cs.LG

    SIU: A Million-Scale Structural Small Molecule-Protein Interaction Dataset for Unbiased Bioactivity Prediction

    Authors: Yanwen Huang, Bowen Gao, Yinjun Jia, Hongbo Ma, Wei-Ying Ma, Ya-Qin Zhang, Yanyan Lan

    Abstract: Small molecules play a pivotal role in modern medicine, and scrutinizing their interactions with protein targets is essential for the discovery and development of novel, life-saving therapeutics. The term "bioactivity" encompasses various biological effects resulting from these interactions, including both binding and functional responses. The magnitude of bioactivity dictates the therapeutic or t… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  7. arXiv:2406.07111  [pdf, other

    cs.CV

    NeRSP: Neural 3D Reconstruction for Reflective Objects with Sparse Polarized Images

    Authors: Yufei Han, Heng Guo, Koki Fukai, Hiroaki Santo, Boxin Shi, Fumio Okura, Zhanyu Ma, Yunpeng Jia

    Abstract: We present NeRSP, a Neural 3D reconstruction technique for Reflective surfaces with Sparse Polarized images. Reflective surface reconstruction is extremely challenging as specular reflections are view-dependent and thus violate the multiview consistency for multiview stereo. On the other hand, sparse image inputs, as a practical capture setting, commonly cause incomplete or distorted results due t… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: 10 pages

  8. arXiv:2406.05746  [pdf

    cs.AI cs.HC cs.LG

    Methodology and Real-World Applications of Dynamic Uncertain Causality Graph for Clinical Diagnosis with Explainability and Invariance

    Authors: Zhan Zhang, Qin Zhang, Yang Jiao, Lin Lu, Lin Ma, Aihua Liu, Xiao Liu, Juan Zhao, Yajun Xue, Bing Wei, Mingxia Zhang, Ru Gao, Hong Zhao, Jie Lu, Fan Li, Yang Zhang, Yiming Wang, Lei Zhang, Fengwei Tian, Jie Hu, Xin Gou

    Abstract: AI-aided clinical diagnosis is desired in medical care. Existing deep learning models lack explainability and mainly focus on image analysis. The recently developed Dynamic Uncertain Causality Graph (DUCG) approach is causality-driven, explainable, and invariant across different application scenarios, without problems of data collection, labeling, fitting, privacy, bias, generalization, high cost… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Journal ref: Artificaial Intelligence Review, (2024) 57:151

  9. arXiv:2406.03086  [pdf, other

    cs.MA cs.IT cs.LG

    Task-Oriented Wireless Communications for Collaborative Perception in Intelligent Unmanned Systems

    Authors: Sheng Zhou, Yukuan Jia, Ruiqing Mao, Zhaojun Nan, Yuxuan Sun, Zhisheng Niu

    Abstract: Collaborative Perception (CP) has shown great potential to achieve more holistic and reliable environmental perception in intelligent unmanned systems (IUSs). However, implementing CP still faces key challenges due to the characteristics of the CP task and the dynamics of wireless channels. In this article, a task-oriented wireless communication framework is proposed to jointly optimize the commun… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted by IEEE Network Magazine

  10. arXiv:2406.02133  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    SimulTron: On-Device Simultaneous Speech to Speech Translation

    Authors: Alex Agranovich, Eliya Nachmani, Oleg Rybakov, Yifan Ding, Ye Jia, Nadav Bar, Heiga Zen, Michelle Tadmor Ramanovich

    Abstract: Simultaneous speech-to-speech translation (S2ST) holds the promise of breaking down communication barriers and enabling fluid conversations across languages. However, achieving accurate, real-time translation through mobile devices remains a major challenge. We introduce SimulTron, a novel S2ST architecture designed to tackle this task. SimulTron is a lightweight direct S2ST model that uses the st… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  11. arXiv:2405.17802  [pdf, other

    cs.LG cs.AI q-bio.BM

    Multi-level Interaction Modeling for Protein Mutational Effect Prediction

    Authors: Yuanle Mo, Xin Hong, Bowen Gao, Yinjun Jia, Yanyan Lan

    Abstract: Protein-protein interactions are central mediators in many biological processes. Accurately predicting the effects of mutations on interactions is crucial for guiding the modulation of these interactions, thereby playing a significant role in therapeutic development and drug discovery. Mutations generally affect interactions hierarchically across three levels: mutated residues exhibit different si… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  12. arXiv:2405.16474  [pdf, other

    cs.LG

    Inaccurate Label Distribution Learning with Dependency Noise

    Authors: Zhiqiang Kou, **g Wang, Yuheng Jia, Xin Geng

    Abstract: In this paper, we introduce the Dependent Noise-based Inaccurate Label Distribution Learning (DN-ILDL) framework to tackle the challenges posed by noise in label distribution learning, which arise from dependencies on instances and labels. We start by modeling the inaccurate label distribution matrix as a combination of the true label distribution and a noise matrix influenced by specific instance… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  13. arXiv:2405.13686  [pdf, other

    cs.CV

    Embedding Generalized Semantic Knowledge into Few-Shot Remote Sensing Segmentation

    Authors: Yuyu Jia, Wei Huang, Junyu Gao, Qi Wang, Qiang Li

    Abstract: Few-shot segmentation (FSS) for remote sensing (RS) imagery leverages supporting information from limited annotated samples to achieve query segmentation of novel classes. Previous efforts are dedicated to mining segmentation-guiding visual cues from a constrained set of support samples. However, they still struggle to address the pronounced intra-class differences in RS images, as sparse visual c… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  14. arXiv:2405.12684  [pdf, other

    stat.ML cs.LG

    Model Free Prediction with Uncertainty Assessment

    Authors: Yuling Jiao, Lican Kang, ** Liu, Heng Peng, Heng Zuo

    Abstract: Deep nonparametric regression, characterized by the utilization of deep neural networks to learn target functions, has emerged as a focus of research attention in recent years. Despite considerable progress in understanding convergence rates, the absence of asymptotic properties hinders rigorous statistical inference. To address this gap, we propose a novel framework that transforms the deep estim… ▽ More

    Submitted 16 June, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

  15. arXiv:2405.12543  [pdf, other

    cs.CV cs.AI

    Like Humans to Few-Shot Learning through Knowledge Permeation of Vision and Text

    Authors: Yuyu Jia, Qing Zhou, Wei Huang, Junyu Gao, Qi Wang

    Abstract: Few-shot learning aims to generalize the recognizer from seen categories to an entirely novel scenario. With only a few support samples, several advanced methods initially introduce class names as prior knowledge for identifying novel classes. However, obstacles still impede achieving a comprehensive understanding of how to harness the mutual advantages of visual and textual knowledge. In this pap… ▽ More

    Submitted 22 May, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

  16. arXiv:2405.11457  [pdf, other

    cs.RO cs.AI cs.LG

    Deep Dive into Model-free Reinforcement Learning for Biological and Robotic Systems: Theory and Practice

    Authors: Yusheng Jiao, Feng Ling, Sina Heydari, Nicolas Heess, Josh Merel, Eva Kanso

    Abstract: Animals and robots exist in a physical world and must coordinate their bodies to achieve behavioral objectives. With recent developments in deep reinforcement learning, it is now possible for scientists and engineers to obtain sensorimotor strategies (policies) for specific tasks using physically simulated bodies and environments. However, the utility of these methods goes beyond the constraints o… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

    Comments: 20 pages, 3 figures

  17. arXiv:2405.11451  [pdf, ps, other

    math.NA cs.AI math.AP stat.ML

    Error Analysis of Three-Layer Neural Network Trained with PGD for Deep Ritz Method

    Authors: Yuling Jiao, Yanming Lai, Yang Wang

    Abstract: Machine learning is a rapidly advancing field with diverse applications across various domains. One prominent area of research is the utilization of deep learning techniques for solving partial differential equations(PDEs). In this work, we specifically focus on employing a three-layer tanh neural network within the framework of the deep Ritz method(DRM) to solve second-order elliptic equations wi… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

    MSC Class: 65N12; 65N15; 68T07; 62G05; 35J25

  18. arXiv:2405.06093  [pdf, other

    cs.LG cs.CL

    Selective Fine-tuning on LLM-labeled Data May Reduce Reliance on Human Annotation: A Case Study Using Schedule-of-Event Table Detection

    Authors: Bhawesh Kumar, Jonathan Amar, Eric Yang, Nan Li, Yugang Jia

    Abstract: Large Language Models (LLMs) have demonstrated their efficacy across a broad spectrum of tasks in healthcare applications. However, often LLMs need to be fine-tuned on task-specific expert annotated data to achieve optimal performance, which can be expensive and time consuming. In this study, we fine-tune PaLM-2 with parameter efficient fine-tuning (PEFT) using noisy labels obtained from gemini-pr… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: 21 pages

  19. arXiv:2405.05512  [pdf, other

    cs.LG cs.AI math.NA math.ST

    Characteristic Learning for Provable One Step Generation

    Authors: Zhao Ding, Chenguang Duan, Yuling Jiao, Ruoxuan Li, Jerry Zhijian Yang, **wen Zhang

    Abstract: We propose the characteristic generator, a novel one-step generative model that combines the efficiency of sampling in Generative Adversarial Networks (GANs) with the stable performance of flow-based models. Our model is driven by characteristics, along which the probability density transport can be described by ordinary differential equations (ODEs). Specifically, We estimate the velocity field t… ▽ More

    Submitted 13 May, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

  20. arXiv:2405.02688  [pdf, other

    cs.LG

    Semi-supervised Symmetric Matrix Factorization with Low-Rank Tensor Representation

    Authors: Yuheng Jia, Jia-Nan Li, Wenhui Wu, Ran Wang

    Abstract: Semi-supervised symmetric non-negative matrix factorization (SNMF) utilizes the available supervisory information (usually in the form of pairwise constraints) to improve the clustering ability of SNMF. The previous methods introduce the pairwise constraints from the local perspective, i.e., they either directly refine the similarity matrix element-wisely or restrain the distance of the decomposed… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

  21. arXiv:2405.00515  [pdf, other

    cs.RO cs.CV

    GAD-Generative Learning for HD Map-Free Autonomous Driving

    Authors: Weijian Sun, Yanbo Jia, Qi Zeng, Zihao Liu, Jiang Liao, Yue Li, Xianfeng Li

    Abstract: Deep-learning-based techniques have been widely adopted for autonomous driving software stacks for mass production in recent years, focusing primarily on perception modules, with some work extending this method to prediction modules. However, the downstream planning and control modules are still designed with hefty handcrafted rules, dominated by optimization-based methods such as quadratic progra… ▽ More

    Submitted 31 May, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

  22. arXiv:2404.19527  [pdf, other

    cs.CV

    Revealing the Two Sides of Data Augmentation: An Asymmetric Distillation-based Win-Win Solution for Open-Set Recognition

    Authors: Yunbing Jia, Xiaoyu Kong, Fan Tang, Yixing Gao, Weiming Dong, Yi Yang

    Abstract: In this paper, we reveal the two sides of data augmentation: enhancements in closed-set recognition correlate with a significant decrease in open-set recognition. Through empirical investigation, we find that multi-sample-based augmentations would contribute to reducing feature discrimination, thereby diminishing the open-set criteria. Although knowledge distillation could impair the feature via i… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

  23. arXiv:2404.13309  [pdf, ps, other

    stat.ML cs.LG

    Latent Schr{ö}dinger Bridge Diffusion Model for Generative Learning

    Authors: Yuling Jiao, Lican Kang, Huazhen Lin, ** Liu, Heng Zuo

    Abstract: This paper aims to conduct a comprehensive theoretical analysis of current diffusion models. We introduce a novel generative learning methodology utilizing the Schr{ö}dinger bridge diffusion model in latent space as the framework for theoretical exploration in this domain. Our approach commences with the pre-training of an encoder-decoder architecture using data originating from a distribution tha… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

  24. arXiv:2404.12966  [pdf, other

    cs.CV cs.AI

    Eyes Can Deceive: Benchmarking Counterfactual Reasoning Abilities of Multi-modal Large Language Models

    Authors: Yian Li, Wentao Tian, Yang Jiao, **g**g Chen, Yu-Gang Jiang

    Abstract: Counterfactual reasoning, as a crucial manifestation of human intelligence, refers to making presuppositions based on established facts and extrapolating potential outcomes. Existing multimodal large language models (MLLMs) have exhibited impressive cognitive and reasoning capabilities, which have been examined across a wide range of Visual Question Answering (VQA) benchmarks. Nevertheless, how wi… ▽ More

    Submitted 24 April, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

  25. arXiv:2404.12598  [pdf, ps, other

    cs.LG eess.SY q-fin.CP q-fin.PM

    Continuous-time Risk-sensitive Reinforcement Learning via Quadratic Variation Penalty

    Authors: Yanwei Jia

    Abstract: This paper studies continuous-time risk-sensitive reinforcement learning (RL) under the entropy-regularized, exploratory diffusion process formulation with the exponential-form objective. The risk-sensitive objective arises either as the agent's risk attitude or as a distributionally robust approach against the model uncertainty. Owing to the martingale perspective in Jia and Zhou (2023) the risk-… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: 49 pages, 2 figures, 1 table

    MSC Class: 62L20; 68T05; 93E03; 93E20; 93E35

  26. arXiv:2404.10405  [pdf, other

    cs.CV cs.AI cs.LG

    Integration of Self-Supervised BYOL in Semi-Supervised Medical Image Recognition

    Authors: Hao Feng, Yuanzhe Jia, Ruijia Xu, Mukesh Prasad, Ali Anaissi, Ali Braytee

    Abstract: Image recognition techniques heavily rely on abundant labeled data, particularly in medical contexts. Addressing the challenges associated with obtaining labeled data has led to the prominence of self-supervised learning and semi-supervised learning, especially in scenarios with limited annotated data. In this paper, we proposed an innovative approach by integrating self-supervised learning into s… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: Accepted by ICCS 2024

  27. arXiv:2404.08584  [pdf, other

    cs.CV

    Pathological Primitive Segmentation Based on Visual Foundation Model with Zero-Shot Mask Generation

    Authors: Abu Bakor Hayat Arnob, Xiangxue Wang, Yi** Jiao, Xiao Gan, Wenlong Ming, Jun Xu

    Abstract: Medical image processing usually requires a model trained with carefully crafted datasets due to unique image characteristics and domain-specific challenges, especially in pathology. Primitive detection and segmentation in digitized tissue samples are essential for objective and automated diagnosis and prognosis of cancer. SAM (Segment Anything Model) has recently been developed to segment general… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: 2024 IEEE International Symposium on Biomedical Imaging

    ACM Class: I.4.6; I.2

  28. arXiv:2404.02538  [pdf, other

    stat.ML cs.LG

    Convergence Analysis of Flow Matching in Latent Space with Transformers

    Authors: Yuling Jiao, Yanming Lai, Yang Wang, Bokai Yan

    Abstract: We present theoretical convergence guarantees for ODE-based generative models, specifically flow matching. We use a pre-trained autoencoder network to map high-dimensional original inputs to a low-dimensional latent space, where a transformer network is trained to predict the velocity field of the transformation from a standard normal distribution to the target latent distribution. Our error analy… ▽ More

    Submitted 28 April, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

  29. arXiv:2404.00551  [pdf, other

    stat.ML cs.LG

    Convergence of Continuous Normalizing Flows for Learning Probability Distributions

    Authors: Yuan Gao, Jian Huang, Yuling Jiao, Shurong Zheng

    Abstract: Continuous normalizing flows (CNFs) are a generative method for learning probability distributions, which is based on ordinary differential equations. This method has shown remarkable empirical success across various applications, including large-scale image synthesis, protein structure prediction, and molecule generation. In this work, we study the theoretical properties of CNFs with linear inter… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Comments: 60 pages, 3 tables, and 3 figures

    MSC Class: 62G05; 68T07

  30. arXiv:2403.16535  [pdf, other

    cs.RO

    Arm-Constrained Curriculum Learning for Loco-Manipulation of the Wheel-Legged Robot

    Authors: Zifan Wang, Yufei Jia, Lu Shi, Haoyu Wang, Haizhou Zhao, Xueyang Li, **ni Zhou, Jun Ma, Guyue Zhou

    Abstract: Incorporating a robotic manipulator into a wheel-legged robot enhances its agility and expands its potential for practical applications. However, the presence of potential instability and uncertainties presents additional challenges for control objectives. In this paper, we introduce an arm-constrained curriculum learning architecture to tackle the issues introduced by adding the manipulator. Firs… ▽ More

    Submitted 28 March, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

  31. arXiv:2403.16149  [pdf, other

    cs.CR cs.AI cs.LG

    A Survey on Consumer IoT Traffic: Security and Privacy

    Authors: Yan Jia, Yuxin Song, Zihou Liu, Qingyin Tan, Fangming Wang, Yu Zhang, Zheli Liu

    Abstract: For the past few years, the Consumer Internet of Things (CIoT) has entered public lives. While CIoT has improved the convenience of people's daily lives, it has also brought new security and privacy concerns. In this survey, we try to figure out what researchers can learn about the security and privacy of CIoT by traffic analysis, a popular method in the security community. From the security and p… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

  32. arXiv:2403.16056  [pdf, other

    cs.CL cs.AI

    Qibo: A Large Language Model for Traditional Chinese Medicine

    Authors: Heyi Zhang, Xin Wang, Zhaopeng Meng, Zhe Chen, Pengwei Zhuang, Yongzhe Jia, Dawei Xu, Wenbin Guo

    Abstract: Large Language Models (LLMs) has made significant progress in a number of professional fields, including medicine, law, and finance. However, in traditional Chinese medicine (TCM), there are challenges such as the essential differences between theory and modern medicine, the lack of specialized corpus resources, and the fact that relying only on supervised fine-tuning may lead to overconfident pre… ▽ More

    Submitted 22 June, 2024; v1 submitted 24 March, 2024; originally announced March 2024.

  33. arXiv:2403.15962  [pdf

    cs.LG cs.AI cs.CY

    Detection of Problem Gambling with Less Features Using Machine Learning Methods

    Authors: Yang Jiao, Gloria Wong-Padoongpatt, Mei Yang

    Abstract: Analytic features in gambling study are performed based on the amount of data monitoring on user daily actions. While performing the detection of problem gambling, existing datasets provide relatively rich analytic features for building machine learning based model. However, considering the complexity and cost of collecting the analytic features in real applications, conducting precise detection w… ▽ More

    Submitted 23 March, 2024; originally announced March 2024.

    Comments: 6 pages, 5 tables, 1 figure

  34. arXiv:2403.15800  [pdf, other

    cs.CL

    MRC-based Nested Medical NER with Co-prediction and Adaptive Pre-training

    Authors: Xiao**g Du, Hanjie Zhao, Danyan Xing, Yuxiang Jia, Hongying Zan

    Abstract: In medical information extraction, medical Named Entity Recognition (NER) is indispensable, playing a crucial role in develo** medical knowledge graphs, enhancing medical question-answering systems, and analyzing electronic medical records. The challenge in medical NER arises from the complex nested structures and sophisticated medical terminologies, distinguishing it from its counterparts in tr… ▽ More

    Submitted 23 March, 2024; originally announced March 2024.

  35. arXiv:2403.15374  [pdf, other

    cs.SE

    Enhancing Testing at Meta with Rich-State Simulated Populations

    Authors: Nadia Alshahwan, Arianna Blasi, Kinga Bojarczuk, Andrea Ciancone, Natalija Gucevska, Mark Harman, Simon Schellaert, Inna Harper, Yue Jia, Michał Królikowski, Will Lewis, Dragos Martac, Rubmary Rojas, Kate Ustiuzhanina

    Abstract: This paper reports the results of the deployment of Rich-State Simulated Populations at Meta for both automated and manual testing. We use simulated users (aka test users) to mimic user interactions and acquire state in much the same way that real user accounts acquire state. For automated testing, we present empirical results from deployment on the Facebook, Messenger, and Instagram apps for iOS… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Comments: ICSE 2024

  36. arXiv:2403.15156  [pdf, other

    cs.RO cs.CV eess.SY

    Infrastructure-Assisted Collaborative Perception in Automated Valet Parking: A Safety Perspective

    Authors: Yukuan Jia, Jiawen Zhang, Shimeng Lu, Baokang Fan, Ruiqing Mao, Sheng Zhou, Zhisheng Niu

    Abstract: Environmental perception in Automated Valet Parking (AVP) has been a challenging task due to severe occlusions in parking garages. Although Collaborative Perception (CP) can be applied to broaden the field of view of connected vehicles, the limited bandwidth of vehicular communications restricts its application. In this work, we propose a BEV feature-based CP network architecture for infrastructur… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Comments: 7 pages, 7 figures, 4 tables, accepted by IEEE VTC2024-Spring

  37. arXiv:2403.14487  [pdf, other

    cs.CV

    DesignEdit: Multi-Layered Latent Decomposition and Fusion for Unified & Accurate Image Editing

    Authors: Yueru Jia, Yuhui Yuan, Aosong Cheng, Chuke Wang, Ji Li, Huizhu Jia, Shanghang Zhang

    Abstract: Recently, how to achieve precise image editing has attracted increasing attention, especially given the remarkable success of text-to-image generation models. To unify various spatial-aware image editing abilities into one framework, we adopt the concept of layers from the design domain to manipulate objects flexibly with various operations. The key insight is to transform the spatial-aware image… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: technical report, 15 pages, webpage: https://design-edit.github.io/

  38. arXiv:2403.14414  [pdf, other

    cs.RO

    Efficient Model Learning and Adaptive Tracking Control of Magnetic Micro-Robots for Non-Contact Manipulation

    Authors: Yongyi Jia, Shu Miao, Junjian Zhou, Niandong Jiao, Lianqing Liu, Xiang Li

    Abstract: Magnetic microrobots can be navigated by an external magnetic field to autonomously move within living organisms with complex and unstructured environments. Potential applications include drug delivery, diagnostics, and therapeutic interventions. Existing techniques commonly impart magnetic properties to the target object,or drive the robot to contact and then manipulate the object, both probably… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: 7 pages, 6 figures, received by 2024 IEEE International Conference on Robotics and Automation

  39. arXiv:2403.13237  [pdf, ps, other

    cs.CR math.OC

    Graph Attention Network-based Block Propagation with Optimal AoI and Reputation in Web 3.0

    Authors: Jiana Liao, **bo Wen, Jiawen Kang, Changyan Yi, Yang Zhang, Yutao Jiao, Dusit Niyato, Dong In Kim, Shengli Xie

    Abstract: Web 3.0 is recognized as a pioneering paradigm that empowers users to securely oversee data without reliance on a centralized authority. Blockchains, as a core technology to realize Web 3.0, can facilitate decentralized and transparent data management. Nevertheless, the evolution of blockchain-enabled Web 3.0 is still in its nascent phase, grappling with challenges such as ensuring efficiency and… ▽ More

    Submitted 8 May, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

  40. arXiv:2403.12807  [pdf, ps, other

    cs.GT

    Freshness-aware Block Propagation Optimization in 6G-based Web 3.0: An Evolutionary Game Approach

    Authors: **bo Wen, Jiawen Kang, Zehui Xiong, Hongyang Du, Zhaohui Yang, Dusit Niyato, Meng Shen, Yutao Jiao, Yang Zhang

    Abstract: Driven by the aspiration to establish a decentralized digital economy, Web 3.0 is emerging as the fundamental technology for digital transformation. Incorporating the promising sixth-generation (6G) technology with large bandwidth and space-air-ground integrated coverage, 6G-based Web 3.0 holds great potential in empowering users with enhanced data control and facilitating secure peer-to-peer tran… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  41. arXiv:2403.10622  [pdf, other

    eess.IV cs.CV

    NeuralOCT: Airway OCT Analysis via Neural Fields

    Authors: Yining Jiao, Amy Oldenburg, Yinghan Xu, Srikamal Soundararajan, Carlton Zdanski, Julia Kimbell, Marc Niethammer

    Abstract: Optical coherence tomography (OCT) is a popular modality in ophthalmology and is also used intravascularly. Our interest in this work is OCT in the context of airway abnormalities in infants and children where the high resolution of OCT and the fact that it is radiation-free is important. The goal of airway OCT is to provide accurate estimates of airway geometry (in 2D and 3D) to assess airway abn… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  42. arXiv:2403.07403  [pdf, other

    cs.CV cs.AI

    From Canteen Food to Daily Meals: Generalizing Food Recognition to More Practical Scenarios

    Authors: Guoshan Liu, Yang Jiao, **g**g Chen, Bin Zhu, Yu-Gang Jiang

    Abstract: The precise recognition of food categories plays a pivotal role for intelligent health management, attracting significant research attention in recent years. Prominent benchmarks, such as Food-101 and VIREO Food-172, provide abundant food image resources that catalyze the prosperity of research in this field. Nevertheless, these datasets are well-curated from canteen scenarios and thus deviate fro… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  43. arXiv:2403.07304  [pdf, other

    cs.CV

    Lumen: Unleashing Versatile Vision-Centric Capabilities of Large Multimodal Models

    Authors: Yang Jiao, Shaoxiang Chen, Zequn Jie, **g**g Chen, Lin Ma, Yu-Gang Jiang

    Abstract: Large Multimodal Model (LMM) is a hot research topic in the computer vision area and has also demonstrated remarkable potential across multiple disciplinary fields. A recent trend is to further extend and enhance the perception capabilities of LMMs. The current methods follow the paradigm of adapting the visual task outputs to the format of the language model, which is the main component of a LMM.… ▽ More

    Submitted 28 May, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

    Comments: Technical Report

  44. arXiv:2403.06400  [pdf, other

    cs.CV

    DivCon: Divide and Conquer for Progressive Text-to-Image Generation

    Authors: Yuhao Jia, Wenhan Tan

    Abstract: Diffusion-driven text-to-image (T2I) generation has achieved remarkable advancements. To further improve T2I models' capability in numerical and spatial reasoning, the layout is employed as an intermedium to bridge large language models and layout-based diffusion models. However, these methods still struggle with generating images from textural prompts with multiple objects and complicated spatial… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

  45. arXiv:2403.02998  [pdf, other

    cs.CV

    Towards Calibrated Deep Clustering Network

    Authors: Yuheng Jia, Jianhong Cheng, Hui Liu, Junhui Hou

    Abstract: Deep clustering has exhibited remarkable performance; however, the over-confidence problem, i.e., the estimated confidence for a sample belonging to a particular cluster greatly exceeds its actual prediction accuracy, has been overlooked in prior research. To tackle this critical issue, we pioneer the development of a calibrated deep clustering framework. Specifically, we propose a novel dual-head… ▽ More

    Submitted 2 June, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

  46. arXiv:2403.01799  [pdf, other

    cs.CV

    Superpixel Graph Contrastive Clustering with Semantic-Invariant Augmentations for Hyperspectral Images

    Authors: Jianhan Qi, Yuheng Jia, Hui Liu, Junhui Hou

    Abstract: Hyperspectral images (HSI) clustering is an important but challenging task. The state-of-the-art (SOTA) methods usually rely on superpixels, however, they do not fully utilize the spatial and spectral information in HSI 3-D structure, and their optimization targets are not clustering-oriented. In this work, we first use 3-D and 2-D hybrid convolutional neural networks to extract the high-order spa… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  47. arXiv:2403.01652  [pdf, other

    cs.NI

    Towards Memory-Efficient Traffic Policing in Time-Sensitive Networking

    Authors: Xuyan Jiang, Xiangrui Yang, Tongqing Zhou, Wenwen Fu, Wei Quan, Yihao Jiao, Yinhan Sun, Zhigang Sun

    Abstract: Time-Sensitive Networking (TSN) is an emerging real-time Ethernet technology that provides deterministic communication for time-critical traffic. At its core, TSN relies on Time-Aware Shaper (TAS) for pre-allocating frames in specific time intervals and Per-Stream Filtering and Policing (PSFP) for mitigating the fatal disturbance of unavoidable frame drift. However, as first identified in this wor… ▽ More

    Submitted 3 March, 2024; originally announced March 2024.

  48. arXiv:2403.01063  [pdf, other

    cs.CL

    FaiMA: Feature-aware In-context Learning for Multi-domain Aspect-based Sentiment Analysis

    Authors: Songhua Yang, Xinke Jiang, Hanjie Zhao, Wenxuan Zeng, Hongde Liu, Yuxiang Jia

    Abstract: Multi-domain aspect-based sentiment analysis (ABSA) seeks to capture fine-grained sentiment across diverse domains. While existing research narrowly focuses on single-domain applications constrained by methodological limitations and data scarcity, the reality is that sentiment naturally traverses multiple domains. Although large language models (LLMs) offer a promising solution for ABSA, it is dif… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

  49. arXiv:2402.19481  [pdf, other

    cs.CV

    DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models

    Authors: Muyang Li, Tianle Cai, Jiaxin Cao, Qinsheng Zhang, Han Cai, Junjie Bai, Yangqing Jia, Ming-Yu Liu, Kai Li, Song Han

    Abstract: Diffusion models have achieved great success in synthesizing high-quality images. However, generating high-resolution images with diffusion models is still challenging due to the enormous computational costs, resulting in a prohibitive latency for interactive applications. In this paper, we propose DistriFusion to tackle this problem by leveraging parallelism across multiple GPUs. Our method split… ▽ More

    Submitted 15 April, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

    Comments: CVPR 2024 Highlight Code: https://github.com/mit-han-lab/distrifuser Website: https://hanlab.mit.edu/projects/distrifusion Blog: https://hanlab.mit.edu/blog/distrifusion

  50. arXiv:2402.16843  [pdf, other

    cs.CV cs.AI cs.CL cs.GR cs.LG

    Multi-LoRA Composition for Image Generation

    Authors: Ming Zhong, Yelong Shen, Shuohang Wang, Yadong Lu, Yizhu Jiao, Siru Ouyang, Donghan Yu, Jiawei Han, Weizhu Chen

    Abstract: Low-Rank Adaptation (LoRA) is extensively utilized in text-to-image models for the accurate rendition of specific elements like distinct characters or unique styles in generated images. Nonetheless, existing methods face challenges in effectively composing multiple LoRAs, especially as the number of LoRAs to be integrated grows, thus hindering the creation of complex imagery. In this paper, we stu… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

    Comments: Project Website: https://maszhongming.github.io/Multi-LoRA-Composition/