Skip to main content

Showing 1–50 of 482 results for author: Jiang, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.19263  [pdf, other

    cs.CL cs.CV

    Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding

    Authors: Yue Fan, Lei Ding, Ching-Chen Kuo, Shan Jiang, Yang Zhao, Xinze Guan, Jie Yang, Yi Zhang, Xin Eric Wang

    Abstract: Graphical User Interfaces (GUIs) are central to our interaction with digital devices. Recently, growing efforts have been made to build models for various GUI understanding tasks. However, these efforts largely overlook an important GUI-referring task: screen reading based on user-indicated points, which we name the Screen Point-and-Read (SPR) task. This task is predominantly handled by rigid acce… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  2. arXiv:2406.19249  [pdf, other

    cs.LG

    NTFormer: A Composite Node Tokenized Graph Transformer for Node Classification

    Authors: **song Chen, Siyu Jiang, Kun He

    Abstract: Recently, the emerging graph Transformers have made significant advancements for node classification on graphs. In most graph Transformers, a crucial step involves transforming the input graph into token sequences as the model input, enabling Transformer to effectively learn the node representations. However, we observe that existing methods only express partial graph information of nodes through… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  3. arXiv:2406.18227  [pdf, other

    cs.CV cs.CL

    GUIDE: A Guideline-Guided Dataset for Instructional Video Comprehension

    Authors: Jiafeng Liang, Shixin Jiang, Zekun Wang, Haojie Pan, Zerui Chen, Zheng Chu, Ming Liu, Ruiji Fu, Zhongyuan Wang, Bing Qin

    Abstract: There are substantial instructional videos on the Internet, which provide us tutorials for completing various tasks. Existing instructional video datasets only focus on specific steps at the video level, lacking experiential guidelines at the task level, which can lead to beginners struggling to learn new tasks due to the lack of relevant experience. Moreover, the specific steps without guidelines… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: IJCAI 2024

  4. arXiv:2406.17484  [pdf, other

    cs.CL

    MedCare: Advancing Medical LLMs through Decoupling Clinical Alignment and Knowledge Aggregation

    Authors: Yusheng Liao, Shuyang Jiang, Yanfeng Wang, Yu Wang

    Abstract: Large language models (LLMs) have shown substantial progress in natural language understanding and generation, proving valuable especially in the medical field. Despite advancements, challenges persist due to the complexity and diversity inherent in medical tasks, which can be categorized as knowledge-intensive tasks and alignment-required tasks. Previous approaches either ignore the latter task o… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 19 pages, 6 figures

  5. arXiv:2406.17225  [pdf, other

    eess.IV cs.CV

    Multimodal Cross-Task Interaction for Survival Analysis in Whole Slide Pathological Images

    Authors: Songhan Jiang, Zhengyu Gan, Linghan Cai, Yifeng Wang, Yongbing Zhang

    Abstract: Survival prediction, utilizing pathological images and genomic profiles, is increasingly important in cancer analysis and prognosis. Despite significant progress, precise survival analysis still faces two main challenges: (1) The massive pixels contained in whole slide images (WSIs) complicate the process of pathological images, making it difficult to generate an effective representation of the tu… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  6. arXiv:2406.16518  [pdf

    cs.CV

    Vision Mamba-based autonomous crack segmentation on concrete, asphalt, and masonry surfaces

    Authors: Zhaohui Chen, Elyas Asadi Shamsabadi, Sheng Jiang, Luming Shen, Daniel Dias-da-Costa

    Abstract: Convolutional neural networks (CNNs) and Transformers have shown advanced accuracy in crack detection under certain conditions. Yet, the fixed local attention can compromise the generalisation of CNNs, and the quadratic complexity of the global self-attention restricts the practical deployment of Transformers. Given the emergence of the new-generation architecture of Mamba, this paper proposes a V… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: 23 pages, 9 figures

  7. arXiv:2406.16505  [pdf, other

    q-fin.CP cs.AI

    $\text{Alpha}^2$: Discovering Logical Formulaic Alphas using Deep Reinforcement Learning

    Authors: Feng Xu, Yan Yin, Xinyu Zhang, Tianyuan Liu, Shengyi Jiang, Zongzhang Zhang

    Abstract: Alphas are pivotal in providing signals for quantitative trading. The industry highly values the discovery of formulaic alphas for their interpretability and ease of analysis, compared with the expressive yet overfitting-prone black-box alphas. In this work, we focus on discovering formulaic alphas. Prior studies on automatically generating a collection of formulaic alphas were mostly based on gen… ▽ More

    Submitted 26 June, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

  8. arXiv:2406.11882  [pdf

    cs.AI cs.LG

    Applications of Explainable artificial intelligence in Earth system science

    Authors: Feini Huang, Shijie Jiang, Lu Li, Yongkun Zhang, Ye Zhang, Ruqing Zhang, Qingliang Li, Danxi Li, Wei Shangguan, Yongjiu Dai

    Abstract: In recent years, artificial intelligence (AI) rapidly accelerated its influence and is expected to promote the development of Earth system science (ESS) if properly harnessed. In application of AI to ESS, a significant hurdle lies in the interpretability conundrum, an inherent problem of black-box nature arising from the complexity of AI algorithms. To address this, explainable AI (XAI) offers a s… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  9. arXiv:2406.10744   

    cs.CV

    Technique Report of CVPR 2024 PBDL Challenges

    Authors: Ying Fu, Yu Li, Shaodi You, Boxin Shi, Jose Alvarez, Coert van Gemeren, Linwei Chen, Yunhao Zou, Zichun Wang, Yichen Li, Yuze Han, Yingkai Zhang, Jianan Wang, Qinglin Liu, Wei Yu, Xiaoqian Lv, Jianing Li, Sheng** Zhang, Xiangyang Ji, Yuanpei Chen, Yuhan Zhang, Weihang Peng, Liwen Zhang, Zhe Xu, Dingyong Gou , et al. (77 additional authors not shown)

    Abstract: The intersection of physics-based vision and deep learning presents an exciting frontier for advancing computer vision technologies. By leveraging the principles of physics to inform and enhance deep learning models, we can develop more robust and accurate vision systems. Physics-based vision aims to invert the processes to recover scene properties such as shape, reflectance, light distribution, a… ▽ More

    Submitted 27 June, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

    Comments: The author list and contents need to be verified by all authors

  10. arXiv:2406.10261  [pdf, other

    cs.CL cs.AI

    FoodSky: A Food-oriented Large Language Model that Passes the Chef and Dietetic Examination

    Authors: Pengfei Zhou, Weiqing Min, Chaoran Fu, Ying **, Mingyu Huang, Xiangyang Li, Shuhuan Mei, Shuqiang Jiang

    Abstract: Food is foundational to human life, serving not only as a source of nourishment but also as a cornerstone of cultural identity and social interaction. As the complexity of global dietary needs and preferences grows, food intelligence is needed to enable food perception and reasoning for various tasks, ranging from recipe generation and dietary recommendation to diet-disease correlation discovery a… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 32 pages, 19 figures

  11. arXiv:2406.09798  [pdf, other

    cs.RO cs.CV

    Sim-to-Real Transfer via 3D Feature Fields for Vision-and-Language Navigation

    Authors: Zihan Wang, Xiangyang Li, Jiahao Yang, Yeqi Liu, Shuqiang Jiang

    Abstract: Vision-and-language navigation (VLN) enables the agent to navigate to a remote location in 3D environments following the natural language instruction. In this field, the agent is usually trained and evaluated in the navigation simulators, lacking effective approaches for sim-to-real transfer. The VLN agents with only a monocular camera exhibit extremely limited performance, while the mainstream VL… ▽ More

    Submitted 20 June, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

    Comments: Submitted to CoRL 2024. The code is available at https://github.com/MrZihan/Sim2Real-VLN-3DFF

  12. arXiv:2406.07823  [pdf, other

    cs.CL cs.SD eess.AS

    PRoDeliberation: Parallel Robust Deliberation for End-to-End Spoken Language Understanding

    Authors: Trang Le, Daniel Lazar, Suyoun Kim, Shan Jiang, Duc Le, Adithya Sagar, Aleksandr Livshits, Ahmed Aly, Akshat Shrivastava

    Abstract: Spoken Language Understanding (SLU) is a critical component of voice assistants; it consists of converting speech to semantic parses for task execution. Previous works have explored end-to-end models to improve the quality and robustness of SLU models with Deliberation, however these models have remained autoregressive, resulting in higher latencies. In this work we introduce PRoDeliberation, a no… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  13. arXiv:2406.07025  [pdf, other

    cs.LG cs.AI q-bio.QM stat.ML

    Entropy-Reinforced Planning with Large Language Models for Drug Discovery

    Authors: Xuefeng Liu, Chih-chan Tien, Peng Ding, Songhao Jiang, Rick L. Stevens

    Abstract: The objective of drug discovery is to identify chemical compounds that possess specific pharmaceutical properties toward a binding target. Existing large language models (LLMS) can achieve high token matching scores in terms of likelihood for molecule generation. However, relying solely on LLM decoding often results in the generation of molecules that are either invalid due to a single misused tok… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Published in ICML2024

  14. arXiv:2406.04876  [pdf, other

    cs.CL

    HateDebias: On the Diversity and Variability of Hate Speech Debiasing

    Authors: Nankai Lin, Hongyan Wu, Zhengming Chen, Zijian Li, Lianxi Wang, Shengyi Jiang, Dong Zhou, Aimin Yang

    Abstract: Hate speech on social media is ubiquitous but urgently controlled. Without detecting and mitigating the biases brought by hate speech, different types of ethical problems. While a number of datasets have been proposed to address the problem of hate speech detection, these datasets seldom consider the diversity and variability of bias, making it far from real-world scenarios. To fill this gap, we p… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  15. arXiv:2405.20192  [pdf, other

    cs.CL

    TAIA: Large Language Models are Out-of-Distribution Data Learners

    Authors: Shuyang Jiang, Yusheng Liao, Ya Zhang, Yu Wang, Yanfeng Wang

    Abstract: Fine-tuning on task-specific question-answer pairs is a predominant method for enhancing the performance of instruction-tuned large language models (LLMs) on downstream tasks. However, in certain specialized domains, such as healthcare or harmless content generation, it is nearly impossible to obtain a large volume of high-quality data that matches the downstream distribution. To improve the perfo… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 25 pages

  16. arXiv:2405.17846  [pdf, other

    cs.RO cs.AI

    Safety Control of Service Robots with LLMs and Embodied Knowledge Graphs

    Authors: Yong Qi, Gabriel Kyebambo, Siyuan Xie, Wei Shen, Shenghui Wang, Bitao Xie, Bin He, Zhipeng Wang, Shuo Jiang

    Abstract: Safety limitations in service robotics across various industries have raised significant concerns about the need for robust mechanisms ensuring that robots adhere to safe practices, thereby preventing actions that might harm humans or cause property damage. Despite advances, including the integration of Knowledge Graphs (KGs) with Large Language Models (LLMs), challenges in ensuring consistent saf… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  17. arXiv:2405.16414  [pdf, other

    cs.CV

    PPRSteg: Printing and Photography Robust QR Code Steganography via Attention Flow-Based Model

    Authors: Huayuan Ye, Shenzhuo Zhang, Shiqi Jiang, **g Liao, Shuhang Gu, Changbo Wang, Chenhui Li

    Abstract: Image steganography can hide information in a host image and obtain a stego image that is perceptually indistinguishable from the original one. This technique has tremendous potential in scenarios like copyright protection, information retrospection, etc. Some previous studies have proposed to enhance the robustness of the methods against image disturbances to increase their applicability. However… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

    Comments: 9 content pages

  18. arXiv:2405.15278  [pdf, other

    cs.CV

    MindShot: Brain Decoding Framework Using Only One Image

    Authors: Shuai Jiang, Zhu Meng, Delong Liu, Haiwen Li, Fei Su, Zhicheng Zhao

    Abstract: Brain decoding, which aims at reconstructing visual stimuli from brain signals, primarily utilizing functional magnetic resonance imaging (fMRI), has recently made positive progress. However, it is impeded by significant challenges such as the difficulty of acquiring fMRI-image pairs and the variability of individuals, etc. Most methods have to adopt the per-subject-per-model paradigm, greatly lim… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  19. arXiv:2405.12541  [pdf, other

    cs.AI

    DrHouse: An LLM-empowered Diagnostic Reasoning System through Harnessing Outcomes from Sensor Data and Expert Knowledge

    Authors: Bufang Yang, Siyang Jiang, Lilin Xu, Kaiwei Liu, Hai Li, Guoliang Xing, Hongkai Chen, Xiaofan Jiang, Zhenyu Yan

    Abstract: Large language models (LLMs) have the potential to transform digital healthcare, as evidenced by recent advances in LLM-based virtual doctors. However, current approaches rely on patient's subjective descriptions of symptoms, causing increased misdiagnosis. Recognizing the value of daily data from smart devices, we introduce a novel LLM-based multi-turn consultation virtual doctor system, DrHouse,… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  20. arXiv:2405.11273  [pdf, other

    cs.AI cs.CL cs.CV cs.MM

    Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts

    Authors: Yunxin Li, Shenyuan Jiang, Baotian Hu, Longyue Wang, Wanqi Zhong, Wenhan Luo, Lin Ma, Min Zhang

    Abstract: Recent advancements in Multimodal Large Language Models (MLLMs) underscore the significance of scalable models and data to boost performance, yet this often incurs substantial computational costs. Although the Mixture of Experts (MoE) architecture has been employed to efficiently scale large language and image-text models, these efforts typically involve fewer experts and limited modalities. To ad… ▽ More

    Submitted 18 May, 2024; originally announced May 2024.

    Comments: 22 pages, 13 figures. Project Website: https://uni-moe.github.io/. Working in progress

  21. arXiv:2405.08816  [pdf, other

    cs.CV cs.RO

    The RoboDrive Challenge: Drive Anytime Anywhere in Any Condition

    Authors: Lingdong Kong, Shaoyuan Xie, Hanjiang Hu, Yaru Niu, Wei Tsang Ooi, Benoit R. Cottereau, Lai Xing Ng, Yuexin Ma, Wenwei Zhang, Liang Pan, Kai Chen, Ziwei Liu, Weichao Qiu, Wei Zhang, Xu Cao, Hao Lu, Ying-Cong Chen, Caixin Kang, Xinning Zhou, Chengyang Ying, Wentao Shang, Xingxing Wei, Yinpeng Dong, Bo Yang, Shengyin Jiang , et al. (66 additional authors not shown)

    Abstract: In the realm of autonomous driving, robust perception under out-of-distribution conditions is paramount for the safe deployment of vehicles. Challenges such as adverse weather, sensor malfunctions, and environmental unpredictability can severely impact the performance of autonomous systems. The 2024 RoboDrive Challenge was crafted to propel the development of driving perception technologies that c… ▽ More

    Submitted 29 May, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

    Comments: ICRA 2024; 32 pages, 24 figures, 5 tables; Code at https://robodrive-24.github.io/

  22. arXiv:2405.07309  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    DiffGen: Robot Demonstration Generation via Differentiable Physics Simulation, Differentiable Rendering, and Vision-Language Model

    Authors: Yang **, Jun Lv, Shuqiang Jiang, Cewu Lu

    Abstract: Generating robot demonstrations through simulation is widely recognized as an effective way to scale up robot data. Previous work often trained reinforcement learning agents to generate expert policies, but this approach lacks sample efficiency. Recently, a line of work has attempted to generate robot demonstrations via differentiable simulation, which is promising but heavily relies on reward des… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

  23. arXiv:2405.05590  [pdf, other

    cs.CR cs.AR cs.LG

    TroLLoc: Logic Locking and Layout Hardening for IC Security Closure against Hardware Trojans

    Authors: Fangzhou Wang, Qi**g Wang, Lilas Alrahis, Bangqi Fu, Shui Jiang, Xiaopeng Zhang, Ozgur Sinanoglu, Tsung-Yi Ho, Evangeline F. Y. Young, Johann Knechtel

    Abstract: Due to cost benefits, supply chains of integrated circuits (ICs) are largely outsourced nowadays. However, passing ICs through various third-party providers gives rise to many security threats, like piracy of IC intellectual property or insertion of hardware Trojans, i.e., malicious circuit modifications. In this work, we proactively and systematically protect the physical layouts of ICs against… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  24. arXiv:2405.04021  [pdf, other

    cs.CR

    Robust and Reusable Fuzzy Extractors for Low-entropy Rate Randomness Sources

    Authors: Somnath Panja, Shaoquan Jiang, Reihaneh Safavi-Naini

    Abstract: Fuzzy extractors (FE) are cryptographic primitives that extract reliable cryptographic key from noisy real world random sources such as biometric sources. The FE generation algorithm takes a source sample, extracts a key and generates some helper data that will be used by the reproduction algorithm to recover the key. Reusability of FE guarantees that security holds when FE is used multiple times… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  25. arXiv:2404.16223  [pdf, other

    cs.CV eess.IV

    Deep RAW Image Super-Resolution. A NTIRE 2024 Challenge Survey

    Authors: Marcos V. Conde, Florin-Alexandru Vasluianu, Radu Timofte, Jianxing Zhang, Jia Li, Fan Wang, Xiaopeng Li, Zikun Liu, Hyunhee Park, Sejun Song, Changho Kim, Zhijuan Huang, Hongyuan Yu, Cheng Wan, Wending Xiang, Jiamin Lin, Hang Zhong, Qiaosong Zhang, Yue Sun, Xuanwu Yin, Kunlong Zuo, Senyan Xu, Siyuan Jiang, Zhi**g Sun, Jiaying Zhu , et al. (10 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 RAW Image Super-Resolution Challenge, highlighting the proposed solutions and results. New methods for RAW Super-Resolution could be essential in modern Image Signal Processing (ISP) pipelines, however, this problem is not as explored as in the RGB domain. Th goal of this challenge is to upscale RAW Bayer images by 2x, considering unknown degradations such as nois… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: CVPR 2024 - NTIRE Workshop

  26. arXiv:2404.11824  [pdf, other

    cs.CV

    TextCenGen: Attention-Guided Text-Centric Background Adaptation for Text-to-Image Generation

    Authors: Tianyi Liang, Jiangqi Liu, Sicheng Song, Shiqi Jiang, Yifei Huang, Changbo Wang, Chenhui Li

    Abstract: Recent advancements in Text-to-image (T2I) generation have witnessed a shift from adapting text to fixed backgrounds to creating images around text. Traditional approaches are often limited to generate layouts within static images for effective text placement. Our proposed approach, TextCenGen, introduces a dynamic adaptation of the blank region for text-friendly image generation, emphasizing text… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: 7 pages, 7 figures

  27. arXiv:2404.10237  [pdf, other

    cs.CV cs.CL

    Med-MoE: Mixture of Domain-Specific Experts for Lightweight Medical Vision-Language Models

    Authors: Songtao Jiang, Tuo Zheng, Yan Zhang, Yeying **, Li Yuan, Zuozhu Liu

    Abstract: Recent advancements in general-purpose or domain-specific multimodal large language models (LLMs) have witnessed remarkable progress for medical decision-making. However, they are designated for specific classification or generative tasks, and require model training or finetuning on large-scale datasets with sizeable parameters and tremendous computing, hindering their clinical utility across dive… ▽ More

    Submitted 26 June, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

  28. arXiv:2404.09027  [pdf, other

    cs.CL

    MING-MOE: Enhancing Medical Multi-Task Learning in Large Language Models with Sparse Mixture of Low-Rank Adapter Experts

    Authors: Yusheng Liao, Shuyang Jiang, Yu Wang, Yanfeng Wang

    Abstract: Large language models like ChatGPT have shown substantial progress in natural language understanding and generation, proving valuable across various disciplines, including the medical field. Despite advancements, challenges persist due to the complexity and diversity inherent in medical tasks which often require multi-task learning capabilities. Previous approaches, although beneficial, fall short… ▽ More

    Submitted 13 April, 2024; originally announced April 2024.

    Comments: 15 pages, 3 figures

  29. arXiv:2404.06258  [pdf

    cs.CV

    Robust feature knowledge distillation for enhanced performance of lightweight crack segmentation models

    Authors: Zhaohui Chen, Elyas Asadi Shamsabadi, Sheng Jiang, Luming Shen, Daniel Dias-da-Costa

    Abstract: Vision-based crack detection faces deployment challenges due to the size of robust models and edge device limitations. These can be addressed with lightweight models trained with knowledge distillation (KD). However, state-of-the-art (SOTA) KD methods compromise anti-noise robustness. This paper develops Robust Feature Knowledge Distillation (RFKD), a framework to improve robustness while retainin… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: 24 pages, 13 figures

  30. arXiv:2404.06078  [pdf, other

    cs.IR

    End-to-end training of Multimodal Model and ranking Model

    Authors: Xiuqi Deng, Lu Xu, Xiyao Li, **kai Yu, Erpeng Xue, Zhongyuan Wang, Di Zhang, Zhaojie Liu, Guorui Zhou, Yang Song, Na Mou, Shen Jiang, Han Li

    Abstract: Traditional recommender systems heavily rely on ID features, which often encounter challenges related to cold-start and generalization. Modeling pre-extracted content features can mitigate these issues, but is still a suboptimal solution due to the discrepancies between training tasks and model parameters. End-to-end training presents a promising solution for these problems, yet most of the existi… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: 9 pages, 8 figures

  31. arXiv:2404.04514  [pdf, other

    cs.CL

    Joint Visual and Text Prompting for Improved Object-Centric Perception with Multimodal Large Language Models

    Authors: Songtao Jiang, Yan Zhang, Chenyi Zhou, Yeying **, Yang Feng, Jian Wu, Zuozhu Liu

    Abstract: Multimodal Large Language Models (MLLMs) such as GPT-4V and Gemini Pro face challenges in achieving human-level perception in Visual Question Answering (VQA), particularly in object-oriented perception tasks which demand fine-grained understanding of object identities, locations or attributes, as indicated by empirical findings. This is mainly due to their limited capability to effectively integra… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

  32. arXiv:2404.01943  [pdf, other

    cs.CV cs.RO

    Lookahead Exploration with Neural Radiance Representation for Continuous Vision-Language Navigation

    Authors: Zihan Wang, Xiangyang Li, Jiahao Yang, Yeqi Liu, Junjie Hu, Ming Jiang, Shuqiang Jiang

    Abstract: Vision-and-language navigation (VLN) enables the agent to navigate to a remote location following the natural language instruction in 3D environments. At each navigation step, the agent selects from possible candidate locations and then makes the move. For better navigation planning, the lookahead exploration strategy aims to effectively evaluate the agent's next action by accurately anticipating… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024. The code is available at https://github.com/MrZihan/HNR-VLN

  33. arXiv:2403.18339  [pdf, other

    eess.IV cs.CV

    H2ASeg: Hierarchical Adaptive Interaction and Weighting Network for Tumor Segmentation in PET/CT Images

    Authors: **peng Lu, **gyun Chen, Linghan Cai, Songhan Jiang, Yongbing Zhang

    Abstract: Positron emission tomography (PET) combined with computed tomography (CT) imaging is routinely used in cancer diagnosis and prognosis by providing complementary information. Automatically segmenting tumors in PET/CT images can significantly improve examination efficiency. Traditional multi-modal segmentation solutions mainly rely on concatenation operations for modality fusion, which fail to effec… ▽ More

    Submitted 28 March, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

    Comments: 10 pages,4 figures

  34. arXiv:2403.16463  [pdf, other

    cs.CL

    Few-shot Named Entity Recognition via Superposition Concept Discrimination

    Authors: Jiawei Chen, Hongyu Lin, Xianpei Han, Yaojie Lu, Shanshan Jiang, Bin Dong, Le Sun

    Abstract: Few-shot NER aims to identify entities of target types with only limited number of illustrative instances. Unfortunately, few-shot NER is severely challenged by the intrinsic precise generalization problem, i.e., it is hard to accurately determine the desired target type due to the ambiguity stemming from information deficiency. In this paper, we propose Superposition Concept Discriminator (SuperC… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: Accepted to LREC-COLING 2024

  35. arXiv:2403.16395  [pdf, other

    cs.CV

    Multi-attention Associate Prediction Network for Visual Tracking

    Authors: Xinglong Sun, Haijiang Sun, Shan Jiang, Jiacheng Wang, Xilai Wei, Zhonghe Hu

    Abstract: Classification-regression prediction networks have realized impressive success in several modern deep trackers. However, there is an inherent difference between classification and regression tasks, so they have diverse even opposite demands for feature matching. Existed models always ignore the key issue and only employ a unified matching block in two task branches, decaying the decision quality.… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

  36. arXiv:2403.15815  [pdf, other

    cs.DC

    Resource-efficient Parallel Split Learning in Heterogeneous Edge Computing

    Authors: Ming** Zhang, Jiannong Cao, Yuvraj Sahni, Xiangchun Chen, Shan Jiang

    Abstract: Edge AI has been recently proposed to facilitate the training and deployment of Deep Neural Network (DNN) models in proximity to the sources of data. To enable the training of large models on resource-constraint edge devices and protect data privacy, parallel split learning is becoming a practical and popular approach. However, current parallel split learning neglects the resource heterogeneity of… ▽ More

    Submitted 23 March, 2024; originally announced March 2024.

    Comments: Accepted by International Conference on Computing, Networking and Communications (ICNC 2024)

  37. arXiv:2403.14690  [pdf

    cs.CY cs.AI cs.CL cs.LG

    Incorporating Graph Attention Mechanism into Geometric Problem Solving Based on Deep Reinforcement Learning

    Authors: Xiuqin Zhong, Shengyuan Yan, Gongqi Lin, Hongguang Fu, Liang Xu, Siwen Jiang, Lei Huang, Wei Fang

    Abstract: In the context of online education, designing an automatic solver for geometric problems has been considered a crucial step towards general math Artificial Intelligence (AI), empowered by natural language understanding and traditional logical inference. In most instances, problems are addressed by adding auxiliary components such as lines or points. However, adding auxiliary components automatical… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  38. arXiv:2403.13846  [pdf, other

    cs.LG cs.AI

    A Clustering Method with Graph Maximum Decoding Information

    Authors: Xinrun Xu, Manying Lv, Zhanbiao Lian, Yurong Wu, ** Yan, Shan Jiang, Zhiming Ding

    Abstract: The clustering method based on graph models has garnered increased attention for its widespread applicability across various knowledge domains. Its adaptability to integrate seamlessly with other relevant applications endows the graph model-based clustering analysis with the ability to robustly extract "natural associations" or "graph structures" within datasets, facilitating the modelling of rela… ▽ More

    Submitted 18 April, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: 9 pages, 9 figures, IJCNN 2024

  39. arXiv:2403.13002  [pdf

    cs.HC cs.AI cs.CL

    AutoTRIZ: Artificial Ideation with TRIZ and Large Language Models

    Authors: Shuo Jiang, Jianxi Luo

    Abstract: Researchers and innovators have made enormous efforts in develo** ideation methods, such as morphological analysis and design-by-analogy, to aid engineering design ideation for problem solving and innovation. Among these, the Theory of Inventive Problem Solving (TRIZ) stands out as one of the most well-known approaches, widely applied for systematic innovation. However, the complexity of TRIZ re… ▽ More

    Submitted 22 May, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

    Comments: Proceedings of the ASME 2024 International Design Engineering Technical Conferences and Computers and Information in Engineering Conferences

    ACM Class: I.2.7; I.2.1

  40. arXiv:2403.10299  [pdf, other

    cs.AI

    A Multi-constraint and Multi-objective Allocation Model for Emergency Rescue in IoT Environment

    Authors: Xinrun Xu, Zhanbiao Lian, Yurong Wu, Manying Lv, Zhiming Ding, Jian Yan, Shang Jiang

    Abstract: Emergency relief operations are essential in disaster aftermaths, necessitating effective resource allocation to minimize negative impacts and maximize benefits. In prolonged crises or extensive disasters, a systematic, multi-cycle approach is key for timely and informed decision-making. Leveraging advancements in IoT and spatio-temporal data analytics, we've developed the Multi-Objective Shuffled… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: 5 pages, 5 figures, ISCAS 2024

  41. arXiv:2403.07578  [pdf, other

    cs.CV

    AACP: Aesthetics assessment of children's paintings based on self-supervised learning

    Authors: Shiqi Jiang, Ning Li, Chen Shi, Li** Guo, Changbo Wang, Chenhui Li

    Abstract: The Aesthetics Assessment of Children's Paintings (AACP) is an important branch of the image aesthetics assessment (IAA), playing a significant role in children's education. This task presents unique challenges, such as limited available data and the requirement for evaluation metrics from multiple perspectives. However, previous approaches have relied on training large datasets and subsequently p… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

    Comments: AAAI 2024

  42. arXiv:2403.06798  [pdf, other

    eess.IV cs.CV cs.LG

    Dynamic Perturbation-Adaptive Adversarial Training on Medical Image Classification

    Authors: Shuai Li, Xiaoguang Ma, Shancheng Jiang, Lu Meng

    Abstract: Remarkable successes were made in Medical Image Classification (MIC) recently, mainly due to wide applications of convolutional neural networks (CNNs). However, adversarial examples (AEs) exhibited imperceptible similarity with raw data, raising serious concerns on network robustness. Although adversarial training (AT), in responding to malevolent AEs, was recognized as an effective approach to im… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

    Comments: 9 pages, 4 figures, 2 tables

  43. arXiv:2403.04175  [pdf

    physics.med-ph cs.AI

    Understanding the PULSAR Effect in Combined Radiotherapy and Immunotherapy through Attention Mechanisms with a Transformer Model

    Authors: Hao Peng, Casey Moore, Debabrata Saha, Steve Jiang, Robert Timmerman

    Abstract: PULSAR (personalized, ultra-fractionated stereotactic adaptive radiotherapy) is the adaptation of stereotactic ablative radiotherapy towards personalized cancer management. For the first time, we applied a transformer-based attention mechanism to investigate the underlying interactions between combined PULSAR and PD-L1 blockade immunotherapy based on a murine cancer model (Lewis Lung Carcinoma, LL… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

  44. arXiv:2403.01820  [pdf, other

    math.NA cs.LG

    Macroscopic auxiliary asymptotic preserving neural networks for the linear radiative transfer equations

    Authors: Hongyan Li, Song Jiang, Wenjun Sun, Liwei Xu, Guanyu Zhou

    Abstract: We develop a Macroscopic Auxiliary Asymptotic-Preserving Neural Network (MA-APNN) method to solve the time-dependent linear radiative transfer equations (LRTEs), which have a multi-scale nature and high dimensionality. To achieve this, we utilize the Physics-Informed Neural Networks (PINNs) framework and design a new adaptive exponentially weighted Asymptotic-Preserving (AP) loss function, which i… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

    Comments: 24 pages, 29 figures

  45. arXiv:2402.19434  [pdf, other

    cs.IT eess.SP

    Digital Twin Aided Massive MIMO: CSI Compression and Feedback

    Authors: Shuaifeng Jiang, Ahmed Alkhateeb

    Abstract: Deep learning (DL) approaches have demonstrated high performance in compressing and reconstructing the channel state information (CSI) and reducing the CSI feedback overhead in massive MIMO systems. One key challenge, however, with the DL approaches is the demand for extensive training data. Collecting this real-world CSI data incurs significant overhead that hinders the DL approaches from scaling… ▽ More

    Submitted 29 February, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

    Comments: Accepted in ICC 2024. Dataset and code files will be available soon on the DeepMIMO website https://www.deepmimo.net/

  46. arXiv:2402.18675  [pdf, other

    cs.RO

    Robot Body Schema Learning from Full-body Extero/Proprioception Sensors

    Authors: Shuo Jiang, **kun Zhang, Lawson Wong

    Abstract: For a robot, its body structure is an a-prior knowledge when it is designed. However, when such information is not available, can a robot recognize it by itself? In this paper, we aim to grant a robot such ability to learn its body structure from exteroception and proprioception data collected from on-body sensors. By a novel machine learning method, the robot can learn a binary Heterogeneous Depe… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

  47. arXiv:2402.18258  [pdf, other

    cs.CL

    A BiRGAT Model for Multi-intent Spoken Language Understanding with Hierarchical Semantic Frames

    Authors: Hongshen Xu, Ruisheng Cao, Su Zhu, Sheng Jiang, Hanchong Zhang, Lu Chen, Kai Yu

    Abstract: Previous work on spoken language understanding (SLU) mainly focuses on single-intent settings, where each input utterance merely contains one user intent. This configuration significantly limits the surface form of user utterances and the capacity of output semantics. In this work, we first propose a Multi-Intent dataset which is collected from a realistic in-Vehicle dialogue System, called MIVS.… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

  48. arXiv:2402.17472  [pdf, other

    cs.LG cs.AI

    RAGFormer: Learning Semantic Attributes and Topological Structure for Fraud Detection

    Authors: Haolin Li, Shuyang Jiang, Lifeng Zhang, Siyuan Du, Guangnan Ye, Hongfeng Chai

    Abstract: Fraud detection remains a challenging task due to the complex and deceptive nature of fraudulent activities. Current approaches primarily concentrate on learning only one perspective of the graph: either the topological structure of the graph or the attributes of individual nodes. However, we conduct empirical studies to reveal that these two types of features, while nearly orthogonal, are each in… ▽ More

    Submitted 18 May, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

    Comments: Preprint.Under review

  49. arXiv:2402.17394  [pdf, other

    cs.NI

    A Survey of Network Protocol Fuzzing: Model, Techniques and Directions

    Authors: Shihao Jiang, Yu Zhang, Junqiang Li, Hongfang Yu, Long Luo, Gang Sun

    Abstract: As one of the most successful and effective software testing techniques in recent years, fuzz testing has uncovered numerous bugs and vulnerabilities in modern software, including network protocol software. In contrast to other fuzzing targets, network protocol software exhibits its distinct characteristics and challenges, introducing a plethora of research questions that need to be addressed in t… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

  50. arXiv:2402.12317  [pdf, other

    cs.CL cs.AI

    ARKS: Active Retrieval in Knowledge Soup for Code Generation

    Authors: Hong** Su, Shuyang Jiang, Yuhang Lai, Haoyuan Wu, Boao Shi, Che Liu, Qian Liu, Tao Yu

    Abstract: Recently the retrieval-augmented generation (RAG) paradigm has raised much attention for its potential in incorporating external knowledge into large language models (LLMs) without further training. While widely explored in natural language applications, its utilization in code generation remains under-explored. In this paper, we introduce Active Retrieval in Knowledge Soup (ARKS), an advanced str… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

    Comments: Retrieval-augmented code generation