Skip to main content

Showing 1–50 of 181 results for author: Long, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.14962  [pdf, other

    cs.CV

    Contextual Interaction via Primitive-based Adversarial Training For Compositional Zero-shot Learning

    Authors: Suyi Li, Chenyi Jiang, Shidong Wang, Yang Long, Zheng Zhang, Haofeng Zhang

    Abstract: Compositional Zero-shot Learning (CZSL) aims to identify novel compositions via known attribute-object pairs. The primary challenge in CZSL tasks lies in the significant discrepancies introduced by the complex interaction between the visual primitives of attribute and object, consequently decreasing the classification performance towards novel compositions. Previous remarkable works primarily addr… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  2. arXiv:2406.04882  [pdf, other

    cs.RO cs.AI cs.CL cs.CV

    InstructNav: Zero-shot System for Generic Instruction Navigation in Unexplored Environment

    Authors: Yuxing Long, Wenzhe Cai, Hongcheng Wang, Guanqi Zhan, Hao Dong

    Abstract: Enabling robots to navigate following diverse language instructions in unexplored environments is an attractive goal for human-robot interaction. However, this goal is challenging because different navigation tasks require different strategies. The scarcity of instruction navigation data hinders training an instruction navigation model with varied strategies. Therefore, previous methods are all co… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: Submitted to CoRL 2024

  3. arXiv:2405.18757  [pdf, other

    cs.RO

    Multi-objective Cross-task Learning via Goal-conditioned GPT-based Decision Transformers for Surgical Robot Task Automation

    Authors: Jiawei Fu, Yonghao Long, Kai Chen, Wang Wei, Qi Dou

    Abstract: Surgical robot task automation has been a promising research topic for improving surgical efficiency and quality. Learning-based methods have been recognized as an interesting paradigm and been increasingly investigated. However, existing approaches encounter difficulties in long-horizon goal-conditioned tasks due to the intricate compositional structure, which requires decision-making for a seque… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  4. Wearable-based behaviour interpolation for semi-supervised human activity recognition

    Authors: Haoran Duan, Shidong Wang, Varun Ojha, Shizheng Wang, Yawen Huang, Yang Long, Rajiv Ranjan, Yefeng Zheng

    Abstract: While traditional feature engineering for Human Activity Recognition (HAR) involves a trial-anderror process, deep learning has emerged as a preferred method for high-level representations of sensor-based human activities. However, most deep learning-based HAR requires a large amount of labelled data and extracting HAR features from unlabelled data for effective deep learning training remains chal… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  5. arXiv:2405.15914  [pdf, other

    cs.CV

    ExactDreamer: High-Fidelity Text-to-3D Content Creation via Exact Score Matching

    Authors: Yumin Zhang, Xingyu Miao, Haoran Duan, Bo Wei, Tejal Shah, Yang Long, Rajiv Ranjan

    Abstract: Text-to-3D content creation is a rapidly evolving research area. Given the scarcity of 3D data, current approaches often adapt pre-trained 2D diffusion models for 3D synthesis. Among these approaches, Score Distillation Sampling (SDS) has been widely adopted. However, the issue of over-smoothing poses a significant limitation on the high-fidelity generation of 3D models. To address this challenge,… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  6. arXiv:2405.11252  [pdf, other

    cs.CV

    Dreamer XL: Towards High-Resolution Text-to-3D Generation via Trajectory Score Matching

    Authors: Xingyu Miao, Haoran Duan, Varun Ojha, Jun Song, Tejal Shah, Yang Long, Rajiv Ranjan

    Abstract: In this work, we propose a novel Trajectory Score Matching (TSM) method that aims to solve the pseudo ground truth inconsistency problem caused by the accumulated error in Interval Score Matching (ISM) when using the Denoising Diffusion Implicit Models (DDIM) inversion process. Unlike ISM which adopts the inversion process of DDIM to calculate on a single path, our TSM method leverages the inversi… ▽ More

    Submitted 18 May, 2024; originally announced May 2024.

  7. arXiv:2405.08748  [pdf, other

    cs.CV

    Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

    Authors: Zhimin Li, Jianwei Zhang, Qin Lin, Jiangfeng Xiong, Yanxin Long, Xinchi Deng, Yingfang Zhang, Xingchao Liu, Minbin Huang, Zedong Xiao, Dayou Chen, Jiajun He, Jiahao Li, Wenyue Li, Chen Zhang, Rongwei Quan, Jianxiang Lu, Jiabin Huang, Xiaoyan Yuan, Xiaoxiao Zheng, Yixuan Li, Jihong Zhang, Chao Zhang, Meng Chen, Jie Liu , et al. (20 additional authors not shown)

    Abstract: We present Hunyuan-DiT, a text-to-image diffusion transformer with fine-grained understanding of both English and Chinese. To construct Hunyuan-DiT, we carefully design the transformer structure, text encoder, and positional encoding. We also build from scratch a whole data pipeline to update and evaluate data for iterative model optimization. For fine-grained language understanding, we train a Mu… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

    Comments: Project Page: https://dit.hunyuan.tencent.com/

  8. arXiv:2405.04652  [pdf, ps, other

    cs.HC

    AffirmativeAI: Towards LGBTQ+ Friendly Audit Frameworks for Large Language Models

    Authors: Yinru Long, Zilin Ma, Yiyang Mei, Zhaoyuan Su

    Abstract: LGBTQ+ community face disproportionate mental health challenges, including higher rates of depression, anxiety, and suicidal ideation. Research has shown that LGBTQ+ people have been using large language model-based chatbots, such as ChatGPT, for their mental health needs. Despite the potential for immediate support and anonymity these chatbots offer, concerns regarding their capacity to provide e… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  9. arXiv:2405.00956  [pdf, other

    cs.RO cs.CV cs.GR

    Efficient Data-driven Scene Simulation using Robotic Surgery Videos via Physics-embedded 3D Gaussians

    Authors: Zhenya Yang, Kai Chen, Yonghao Long, Qi Dou

    Abstract: Surgical scene simulation plays a crucial role in surgical education and simulator-based robot learning. Traditional approaches for creating these environments with surgical scene involve a labor-intensive process where designers hand-craft tissues models with textures and geometries for soft body simulations. This manual approach is not only time-consuming but also limited in the scalability and… ▽ More

    Submitted 20 May, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

  10. arXiv:2404.19449  [pdf, other

    cs.IT

    AoI-aware Sensing Scheduling and Trajectory Optimization for Multi-UAV-assisted Wireless Backscatter Networks

    Authors: Yusi Long, Songhan Zhao, Shimin Gong, Bo Gu, Dusit Niyato, Xuemin, Shen

    Abstract: This paper considers multiple unmanned aerial vehicles (UAVs) to assist sensing data transmissions from the ground users (GUs) to a remote base station (BS). Each UAV collects sensing data from the GUs and then forwards the sensing data to the remote BS. The GUs first backscatter their data to the UAVs and then all UAVs forward data to the BS by the nonorthogonal multiple access (NOMA) transmissio… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: This paper has been accepted by IEEE TVT

  11. arXiv:2404.12291  [pdf

    cs.CL cs.AI

    Augmenting emotion features in irony detection with Large language modeling

    Authors: Yucheng Lin, Yuhan Xia, Yunfei Long

    Abstract: This study introduces a novel method for irony detection, applying Large Language Models (LLMs) with prompt-based learning to facilitate emotion-centric text augmentation. Traditional irony detection techniques typically fall short due to their reliance on static linguistic features and predefined knowledge bases, often overlooking the nuanced emotional dimensions integral to irony. In contrast, o… ▽ More

    Submitted 19 April, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

    Comments: 11 pages, 3 tables, 2 figures. Accepted by the 25th Chinese Lexical Semantics Workshop

  12. arXiv:2403.15905  [pdf, other

    cs.LG cs.CV

    Towards Low-Energy Adaptive Personalization for Resource-Constrained Devices

    Authors: Yushan Huang, Josh Millar, Yuxuan Long, Yuchen Zhao, Hamed Haddadi

    Abstract: The personalization of machine learning (ML) models to address data drift is a significant challenge in the context of Internet of Things (IoT) applications. Presently, most approaches focus on fine-tuning either the full base model or its last few layers to adapt to new data, while often neglecting energy costs. However, various types of data drift exist, and fine-tuning the full base model or th… ▽ More

    Submitted 29 March, 2024; v1 submitted 23 March, 2024; originally announced March 2024.

    Comments: Accepetd to The 4th Workshop on Machine Learning and Systems (EuroMLSys '24)

  13. arXiv:2403.15574  [pdf, other

    cs.AI

    SensoryT5: Infusing Sensorimotor Norms into T5 for Enhanced Fine-grained Emotion Classification

    Authors: Yuhan Xia, Qingqing Zhao, Yunfei Long, Ge Xu, Jia Wang

    Abstract: In traditional research approaches, sensory perception and emotion classification have traditionally been considered separate domains. Yet, the significant influence of sensory experiences on emotional responses is undeniable. The natural language processing (NLP) community has often missed the opportunity to merge sensory knowledge with emotion classification. To address this gap, we propose Sens… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Comments: Accepted by CogALex 2024 conference

  14. From Explainable to Interpretable Deep Learning for Natural Language Processing in Healthcare: How Far from Reality?

    Authors: Guangming Huang, Yingya Li, Shoaib Jameel, Yunfei Long, Giorgos Papanastasiou

    Abstract: Deep learning (DL) has substantially enhanced natural language processing (NLP) in healthcare research. However, the increasing complexity of DL-based NLP necessitates transparent model interpretability, or at least explainability, for reliable decision-making. This work presents a thorough sco** review of explainable and interpretable DL in healthcare NLP. The term "eXplainable and Interpretabl… ▽ More

    Submitted 9 May, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: This paper has been accepted by Computational and Structural Biotechnology Journal

  15. arXiv:2403.09363  [pdf, other

    cs.CV

    Sentinel-Guided Zero-Shot Learning: A Collaborative Paradigm without Real Data Exposure

    Authors: Fan Wan, Xingyu Miao, Haoran Duan, **g**g Deng, Rui Gao, Yang Long

    Abstract: With increasing concerns over data privacy and model copyrights, especially in the context of collaborations between AI service providers and data owners, an innovative SG-ZSL paradigm is proposed in this work. SG-ZSL is designed to foster efficient collaboration without the need to exchange models or sensitive data. It consists of a teacher model, a student model and a generator that links both m… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  16. arXiv:2403.08857  [pdf, other

    cs.CV

    DialogGen: Multi-modal Interactive Dialogue System for Multi-turn Text-to-Image Generation

    Authors: Minbin Huang, Yanxin Long, Xinchi Deng, Ruihang Chu, Jiangfeng Xiong, Xiaodan Liang, Hong Cheng, Qinglin Lu, Wei Liu

    Abstract: Text-to-image (T2I) generation models have significantly advanced in recent years. However, effective interaction with these models is challenging for average users due to the need for specialized prompt engineering knowledge and the inability to perform multi-turn image generation, hindering a dynamic and iterative creation process. Recent attempts have tried to equip Multi-modal Large Language M… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

    Comments: Project page: https://hunyuan-dialoggen.github.io/

  17. Towards Deviation-Robust Agent Navigation via Perturbation-Aware Contrastive Learning

    Authors: Bingqian Lin, Yanxin Long, Yi Zhu, Fengda Zhu, Xiaodan Liang, Qixiang Ye, Liang Lin

    Abstract: Vision-and-language navigation (VLN) asks an agent to follow a given language instruction to navigate through a real 3D environment. Despite significant advances, conventional VLN agents are trained typically under disturbance-free environments and may easily fail in real-world scenarios, since they are unaware of how to deal with various possible disturbances, such as sudden obstacles or human in… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: Accepted by TPAMI 2023

    Journal ref: IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI,2023)

  18. arXiv:2402.19350  [pdf, other

    cs.CL

    Prompting Explicit and Implicit Knowledge for Multi-hop Question Answering Based on Human Reading Process

    Authors: Guangming Huang, Yunfei Long, Cun** Luo, Jiaxing Shen, Xia Sun

    Abstract: Pre-trained language models (PLMs) leverage chains-of-thought (CoT) to simulate human reasoning and inference processes, achieving proficient performance in multi-hop QA. However, a gap persists between PLMs' reasoning abilities and those of humans when tackling complex problems. Psychological studies suggest a vital connection between explicit information in passages and human prior knowledge dur… ▽ More

    Submitted 27 June, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

    Comments: This paper has been accepted at COLING 2024

  19. arXiv:2402.18541  [pdf, ps, other

    cs.DS

    Dynamic Deterministic Constant-Approximate Distance Oracles with $n^ε$ Worst-Case Update Time

    Authors: Bernhard Haeupler, Yaowei Long, Thatchaphol Saranurak

    Abstract: We present a new distance oracle in the fully dynamic setting: given a weighted undirected graph $G=(V,E)$ with $n$ vertices undergoing both edge insertions and deletions, and an arbitrary parameter $ε$ where $ε\in[1/\log^{c} n,1]$ and $c>0$ is a small constant, we can deterministically maintain a data structure with $n^ε$ worst-case update time that, given any pair of vertices $(u,v)$, returns a… ▽ More

    Submitted 10 April, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

    Comments: 137 pages

  20. arXiv:2402.15078  [pdf, other

    cs.SE

    LLM-CompDroid: Repairing Configuration Compatibility Bugs in Android Apps with Pre-trained Large Language Models

    Authors: Zhijie Liu, Yutian Tang, Meiyun Li, Xin **, Yunfei Long, Liang Feng Zhang, Xiapu Luo

    Abstract: XML configurations are integral to the Android development framework, particularly in the realm of UI display. However, these configurations can introduce compatibility issues (bugs), resulting in divergent visual outcomes and system crashes across various Android API versions (levels). In this study, we systematically investigate LLM-based approaches for detecting and repairing configuration comp… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

  21. arXiv:2402.10353  [pdf, other

    cs.CL cs.LG

    Prompt-Based Bias Calibration for Better Zero/Few-Shot Learning of Language Models

    Authors: Kang He, Yinghan Long, Kaushik Roy

    Abstract: Prompt learning is susceptible to intrinsic bias present in pre-trained language models (LMs), resulting in sub-optimal performance of prompt-based zero/few-shot learning. In this work, we propose a null-input prompting method to calibrate intrinsic bias encoded in pre-trained LMs. Different from prior efforts that address intrinsic bias primarily for social fairness and often involve excessive co… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

  22. arXiv:2402.09748  [pdf, other

    cs.CL cs.AI cs.LG cs.PF

    Model Compression and Efficient Inference for Large Language Models: A Survey

    Authors: Wenxiao Wang, Wei Chen, Yicong Luo, Yongliu Long, Zhengkai Lin, Liye Zhang, Binbin Lin, Deng Cai, Xiaofei He

    Abstract: Transformer based large language models have achieved tremendous success. However, the significant memory and computational costs incurred during the inference process make it challenging to deploy large models on resource-constrained devices. In this paper, we investigate compression and efficient inference methods for large language models from an algorithmic perspective. Regarding taxonomy, sim… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

    Comments: 47 pages, review 380 papers. The work is ongoing

  23. Evaluating the Experience of LGBTQ+ People Using Large Language Model Based Chatbots for Mental Health Support

    Authors: Zilin Ma, Yiyang Mei, Yinru Long, Zhaoyuan Su, Krzysztof Z. Gajos

    Abstract: LGBTQ+ individuals are increasingly turning to chatbots powered by large language models (LLMs) to meet their mental health needs. However, little research has explored whether these chatbots can adequately and safely provide tailored support for this demographic. We interviewed 18 LGBTQ+ and 13 non-LGBTQ+ participants about their experiences with LLM-based chatbots for mental health needs. LGBTQ+… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

  24. arXiv:2402.09150  [pdf, ps, other

    cs.DS

    Better Decremental and Fully Dynamic Sensitivity Oracles for Subgraph Connectivity

    Authors: Yaowei Long, Yunfan Wang

    Abstract: We study the \emph{sensitivity oracles problem for subgraph connectivity} in the \emph{decremental} and \emph{fully dynamic} settings. In the fully dynamic setting, we preprocess an $n$-vertices $m$-edges undirected graph $G$ with $n_{\rm off}$ deactivated vertices initially and the others are activated. Then we receive a single update $D\subseteq V(G)$ of size $|D| = d \leq d_{\star}$, representi… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

    Comments: 30 pages

  25. arXiv:2402.02380  [pdf

    cs.CL cs.AI cs.HC

    Evaluating Large Language Models in Analysing Classroom Dialogue

    Authors: Yun Long, Haifeng Luo, Yu Zhang

    Abstract: This study explores the application of Large Language Models (LLMs), specifically GPT-4, in the analysis of classroom dialogue, a crucial research task for both teaching diagnosis and quality improvement. Recognizing the knowledge-intensive and labor-intensive nature of traditional qualitative methods in educational research, this study investigates the potential of LLM to streamline and enhance t… ▽ More

    Submitted 22 February, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

  26. arXiv:2402.01950  [pdf, other

    cs.CV

    ConRF: Zero-shot Stylization of 3D Scenes with Conditioned Radiation Fields

    Authors: Xingyu Miao, Yang Bai, Haoran Duan, Fan Wan, Yawen Huang, Yang Long, Yefeng Zheng

    Abstract: Most of the existing works on arbitrary 3D NeRF style transfer required retraining on each single style condition. This work aims to achieve zero-shot controlled stylization in 3D scenes utilizing text or visual input as conditioning factors. We introduce ConRF, a novel method of zero-shot stylization. Specifically, due to the ambiguity of CLIP features, we employ a conversion process that maps th… ▽ More

    Submitted 6 March, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

  27. arXiv:2402.01181  [pdf, other

    cs.RO cs.GR

    Efficient Physically-based Simulation of Soft Bodies in Embodied Environment for Surgical Robot

    Authors: Zhenya Yang, Yonghao Long, Kai Chen, Wang Wei, Qi Dou

    Abstract: Surgical robot simulation platform plays a crucial role in enhancing training efficiency and advancing research on robot learning. Much effort have been made by scholars on develo** open-sourced surgical robot simulators to facilitate research. We also developed SurRoL formerly, an open-source, da Vinci Research Kit (dVRK) compatible and interactive embodied environment for robot learning. Despi… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

    Comments: 8 pages

  28. arXiv:2401.04861  [pdf, other

    cs.CV

    CTNeRF: Cross-Time Transformer for Dynamic Neural Radiance Field from Monocular Video

    Authors: Xingyu Miao, Yang Bai, Haoran Duan, Yawen Huang, Fan Wan, Yang Long, Yefeng Zheng

    Abstract: The goal of our work is to generate high-quality novel views from monocular videos of complex and dynamic scenes. Prior methods, such as DynamicNeRF, have shown impressive performance by leveraging time-varying dynamic radiation fields. However, these methods have limitations when it comes to accurately modeling the motion of complex objects, which can lead to inaccurate and blurry renderings of d… ▽ More

    Submitted 26 June, 2024; v1 submitted 9 January, 2024; originally announced January 2024.

    Comments: Accepted by Pattern Recognition

  29. arXiv:2312.16217  [pdf, other

    cs.CV cs.RO

    ManipLLM: Embodied Multimodal Large Language Model for Object-Centric Robotic Manipulation

    Authors: Xiaoqi Li, Mingxu Zhang, Yiran Geng, Haoran Geng, Yuxing Long, Yan Shen, Renrui Zhang, Jiaming Liu, Hao Dong

    Abstract: Robot manipulation relies on accurately predicting contact points and end-effector directions to ensure successful operation. However, learning-based robot manipulation, trained on a limited category within a simulator, often struggles to achieve generalizability, especially when confronted with extensive categories. Therefore, we introduce an innovative approach for robot manipulation that levera… ▽ More

    Submitted 24 December, 2023; originally announced December 2023.

  30. arXiv:2311.12071  [pdf, other

    eess.IV cs.CV cs.LG

    Enhancing Low-dose CT Image Reconstruction by Integrating Supervised and Unsupervised Learning

    Authors: Ling Chen, Zhishen Huang, Yong Long, Saiprasad Ravishankar

    Abstract: Traditional model-based image reconstruction (MBIR) methods combine forward and noise models with simple object priors. Recent application of deep learning methods for image reconstruction provides a successful data-driven approach to addressing the challenges when reconstructing images with undersampled measurements or various types of noise. In this work, we propose a hybrid supervised-unsupervi… ▽ More

    Submitted 19 November, 2023; originally announced November 2023.

    Comments: submitted to IEEE Transactions on Medical Imaging

  31. arXiv:2311.09805  [pdf, other

    cs.CL

    DocMath-Eval: Evaluating Numerical Reasoning Capabilities of LLMs in Understanding Long Documents with Tabular Data

    Authors: Yilun Zhao, Yitao Long, Hongjun Liu, Linyong Nan, Lyuhao Chen, Ryo Kamoi, Yixin Liu, Xiangru Tang, Rui Zhang, Arman Cohan

    Abstract: Recent LLMs have demonstrated remarkable performance in solving exam-like math word problems. However, the degree to which these numerical reasoning skills are effective in real-world scenarios, particularly in expert domains, is still largely unexplored. This paper introduces DocMath-Eval, a comprehensive benchmark specifically designed to evaluate the numerical reasoning and problem-solving capa… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

    Comments: work in progress

  32. arXiv:2311.09797  [pdf, other

    cs.CL

    KnowledgeMath: Knowledge-Intensive Math Word Problem Solving in Finance Domains

    Authors: Yilun Zhao, Hongjun Liu, Yitao Long, Rui Zhang, Chen Zhao, Arman Cohan

    Abstract: We introduce KnowledgeMath, a novel benchmark designed to evaluate LLMs' capabilities in applying financial knowledge to solve complex math word problems. Compared to prior works, this study features three core advancements. First, KnowledgeMath includes 1,259 problems with a hybrid of textual and tabular content and require college-level knowledge in the finance domain for effective resolution. S… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

    Comments: work in progress

  33. arXiv:2311.08829  [pdf, other

    cs.SD eess.AS

    Autoencoder with Group-based Decoder and Multi-task Optimization for Anomalous Sound Detection

    Authors: Yifan Zhou, Dongxing Xu, Haoran Wei, Yanhua Long

    Abstract: In industry, machine anomalous sound detection (ASD) is in great demand. However, collecting enough abnormal samples is difficult due to the high cost, which boosts the rapid development of unsupervised ASD algorithms. Autoencoder (AE) based methods have been widely used for unsupervised ASD, but suffer from problems including 'shortcut', poor anti-noise ability and sub-optimal quality of features… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

    Comments: Submitted to the 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2024)

  34. arXiv:2309.13833  [pdf, other

    cs.CV cs.AI

    Dual Feature Augmentation Network for Generalized Zero-shot Learning

    Authors: Lei Xiang, Yuan Zhou, Haoran Duan, Yang Long

    Abstract: Zero-shot learning (ZSL) aims to infer novel classes without training samples by transferring knowledge from seen classes. Existing embedding-based approaches for ZSL typically employ attention mechanisms to locate attributes on an image. However, these methods often ignore the complex entanglement among different attributes' visual features in the embedding space. Additionally, these methods empl… ▽ More

    Submitted 24 September, 2023; originally announced September 2023.

    Comments: Accepted to BMVC2023

  35. arXiv:2309.11382  [pdf, other

    cs.RO cs.AI cs.CL cs.CV

    Discuss Before Moving: Visual Language Navigation via Multi-expert Discussions

    Authors: Yuxing Long, Xiaoqi Li, Wenzhe Cai, Hao Dong

    Abstract: Visual language navigation (VLN) is an embodied task demanding a wide range of skills encompassing understanding, perception, and planning. For such a multifaceted challenge, previous VLN methods totally rely on one model's own thinking to make predictions within one round. However, existing models, even the most advanced large language model GPT4, still struggle with dealing with multiple tasks b… ▽ More

    Submitted 20 September, 2023; originally announced September 2023.

    Comments: Submitted to ICRA 2024

  36. arXiv:2309.10309  [pdf, other

    cs.RO

    Bridging Zero-shot Object Navigation and Foundation Models through Pixel-Guided Navigation Skill

    Authors: Wenzhe Cai, Siyuan Huang, Guangran Cheng, Yuxing Long, Peng Gao, Changyin Sun, Hao Dong

    Abstract: Zero-shot object navigation is a challenging task for home-assistance robots. This task emphasizes visual grounding, commonsense inference and locomotion abilities, where the first two are inherent in foundation models. But for the locomotion part, most works still depend on map-based planning approaches. The gap between RGB space and map space makes it difficult to directly transfer the knowledge… ▽ More

    Submitted 20 September, 2023; v1 submitted 19 September, 2023; originally announced September 2023.

    Comments: 8 pages, 5 figures

  37. arXiv:2309.09292  [pdf, other

    cs.DC cs.PL

    An Auto-Parallelizer for Distributed Computing in Haskell

    Authors: Yuxi Long, Shiyou Wu, Yingjie Xu

    Abstract: One of the main challenges in distributed computing is building interfaces and APIs that allow programmers with limited background in distributed systems to write scalable, performant, and fault-tolerant applications on large clusters. In this demonstration, we designed and implemented a Haskell auto-parallelizer with a simple yet powerful interface by taking advantage of the default purity of Has… ▽ More

    Submitted 17 September, 2023; originally announced September 2023.

    Comments: 2 pages excluding title page and reference page. 2 figures. This work was submitted to the 28th ACM SIGPLAN International Conference on Functional Programming, Haskell Symposium. This work was accepted for oral presentation and was presented on Sep 8, 2023

  38. arXiv:2309.07387  [pdf, other

    cs.CL cs.CV

    VDialogUE: A Unified Evaluation Benchmark for Visually-grounded Dialogue

    Authors: Yunshui Li, Binyuan Hui, Zhaochao Yin, Wanwei He, Run Luo, Yuxing Long, Min Yang, Fei Huang, Yongbin Li

    Abstract: Visually-grounded dialog systems, which integrate multiple modes of communication such as text and visual inputs, have become an increasingly popular area of investigation. However, the absence of a standardized evaluation framework poses a challenge in assessing the development of this field. To this end, we propose \textbf{VDialogUE}, a \textbf{V}isually-grounded \textbf{Dialog}ue benchmark for… ▽ More

    Submitted 13 September, 2023; originally announced September 2023.

  39. arXiv:2309.02046  [pdf, other

    cs.IT math.OC

    A Fast and Provable Algorithm for Sparse Phase Retrieval

    Authors: Jian-Feng Cai, Yu Long, Ruixue Wen, Jiaxi Ying

    Abstract: We study the sparse phase retrieval problem, which seeks to recover a sparse signal from a limited set of magnitude-only measurements. In contrast to prevalent sparse phase retrieval algorithms that primarily use first-order methods, we propose an innovative second-order algorithm that employs a Newton-type method with hard thresholding. This algorithm overcomes the linear convergence limitations… ▽ More

    Submitted 19 March, 2024; v1 submitted 5 September, 2023; originally announced September 2023.

  40. arXiv:2309.00957  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    Visual-Kinematics Graph Learning for Procedure-agnostic Instrument Tip Segmentation in Robotic Surgeries

    Authors: Jiaqi Liu, Yonghao Long, Kai Chen, Cheuk Hei Leung, Zerui Wang, Qi Dou

    Abstract: Accurate segmentation of surgical instrument tip is an important task for enabling downstream applications in robotic surgery, such as surgical skill assessment, tool-tissue interaction and deformation modeling, as well as surgical autonomy. However, this task is very challenging due to the small sizes of surgical instrument tips, and significant variance of surgical scenes across different proced… ▽ More

    Submitted 2 September, 2023; originally announced September 2023.

    Comments: Accepted to IROS 2023

  41. arXiv:2308.12526  [pdf, other

    eess.AS cs.LG cs.SD

    UNISOUND System for VoxCeleb Speaker Recognition Challenge 2023

    Authors: Yu Zheng, Yajun Zhang, Chuanying Niu, Yibin Zhan, Yanhua Long, Dongxing Xu

    Abstract: This report describes the UNISOUND submission for Track1 and Track2 of VoxCeleb Speaker Recognition Challenge 2023 (VoxSRC 2023). We submit the same system on Track 1 and Track 2, which is trained with only VoxCeleb2-dev. Large-scale ResNet and RepVGG architectures are developed for the challenge. We propose a consistency-aware score calibration method, which leverages the stability of audio voice… ▽ More

    Submitted 23 August, 2023; originally announced August 2023.

  42. arXiv:2308.09977  [pdf, other

    cs.CV

    Whether you can locate or not? Interactive Referring Expression Generation

    Authors: Fulong Ye, Yuxing Long, Fangxiang Feng, Xiaojie Wang

    Abstract: Referring Expression Generation (REG) aims to generate unambiguous Referring Expressions (REs) for objects in a visual scene, with a dual task of Referring Expression Comprehension (REC) to locate the referred object. Existing methods construct REG models independently by using only the REs as ground truth for model training, without considering the potential interaction between REG and REC models… ▽ More

    Submitted 19 August, 2023; originally announced August 2023.

    Comments: 10 papges, 7 figures

  43. DS-Depth: Dynamic and Static Depth Estimation via a Fusion Cost Volume

    Authors: Xingyu Miao, Yang Bai, Haoran Duan, Yawen Huang, Fan Wan, Xinxing Xu, Yang Long, Yefeng Zheng

    Abstract: Self-supervised monocular depth estimation methods typically rely on the reprojection error to capture geometric relationships between successive frames in static environments. However, this assumption does not hold in dynamic objects in scenarios, leading to errors during the view synthesis stage, such as feature mismatch and occlusion, which can significantly reduce the accuracy of the generated… ▽ More

    Submitted 14 August, 2023; originally announced August 2023.

  44. arXiv:2307.16503  [pdf, other

    cs.RO cs.AI cs.LG

    Value-Informed Skill Chaining for Policy Learning of Long-Horizon Tasks with Surgical Robot

    Authors: Tao Huang, Kai Chen, Wang Wei, Jianan Li, Yonghao Long, Qi Dou

    Abstract: Reinforcement learning is still struggling with solving long-horizon surgical robot tasks which involve multiple steps over an extended duration of time due to the policy exploration challenge. Recent methods try to tackle this problem by skill chaining, in which the long-horizon task is decomposed into multiple subtasks for easing the exploration burden and subtask policies are temporally connect… ▽ More

    Submitted 31 July, 2023; originally announced July 2023.

    Comments: Accepted to IROS 2023

  45. arXiv:2307.02106  [pdf, other

    cs.CR cs.DB cs.LG

    SoK: Privacy-Preserving Data Synthesis

    Authors: Yuzheng Hu, Fan Wu, Qinbin Li, Yunhui Long, Gonzalo Munilla Garrido, Chang Ge, Bolin Ding, David Forsyth, Bo Li, Dawn Song

    Abstract: As the prevalence of data analysis grows, safeguarding data privacy has become a paramount concern. Consequently, there has been an upsurge in the development of mechanisms aimed at privacy-preserving data analyses. However, these approaches are task-specific; designing algorithms for new tasks is a cumbersome process. As an alternative, one can create synthetic data that is (ideally) devoid of pr… ▽ More

    Submitted 5 August, 2023; v1 submitted 5 July, 2023; originally announced July 2023.

    Comments: Accepted at IEEE S&P (Oakland) 2024

  46. arXiv:2306.11309  [pdf, other

    cs.SD cs.CL eess.AS eess.SP

    Multi-pass Training and Cross-information Fusion for Low-resource End-to-end Accented Speech Recognition

    Authors: Xuefei Wang, Yanhua Long, Yijie Li, Haoran Wei

    Abstract: Low-resource accented speech recognition is one of the important challenges faced by current ASR technology in practical applications. In this study, we propose a Conformer-based architecture, called Aformer, to leverage both the acoustic information from large non-accented and limited accented training data. Specifically, a general encoder and an accent encoder are designed in the Aformer to extr… ▽ More

    Submitted 20 June, 2023; originally announced June 2023.

  47. arXiv:2305.18212  [pdf, other

    cs.IR cs.AI cs.CL cs.CV cs.LG cs.MM

    Multimodal Recommendation Dialog with Subjective Preference: A New Challenge and Benchmark

    Authors: Yuxing Long, Binyuan Hui, Caixia Yuan1, Fei Huang, Yongbin Li, Xiaojie Wang

    Abstract: Existing multimodal task-oriented dialog data fails to demonstrate the diverse expressions of user subjective preferences and recommendation acts in the real-life shop** scenario. This paper introduces a new dataset SURE (Multimodal Recommendation Dialog with SUbjective PREference), which contains 12K shop** dialogs in complex store scenes. The data is built in two phases with human annotation… ▽ More

    Submitted 26 May, 2023; originally announced May 2023.

    Comments: ACL 2023

  48. arXiv:2305.16340  [pdf, other

    cs.CL cs.AI cs.LG

    Segmented Recurrent Transformer: An Efficient Sequence-to-Sequence Model

    Authors: Yinghan Long, Sayeed Shafayet Chowdhury, Kaushik Roy

    Abstract: Transformers have shown dominant performance across a range of domains including language and vision. However, their computational cost grows quadratically with the sequence length, making their usage prohibitive for resource-constrained applications. To counter this, our approach is to divide the whole sequence into segments and apply attention to the individual segments. We propose a segmented r… ▽ More

    Submitted 22 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: EMNLP 2023 Findings

  49. arXiv:2305.10055  [pdf, other

    cs.IT eess.SP

    Optimized Joint Beamforming for Wireless Powered Over-the-Air Computation

    Authors: Siyao Zhang, Xinmin Li, Yin Long, Jie Xu, Shuguang Cui

    Abstract: This correspondence studies the wireless powered over-the-air computation (AirComp) for achieving sustainable wireless data aggregation (WDA) by integrating AirComp and wireless power transfer (WPT) into a joint design. In particular, we consider that a multi-antenna hybrid access point (HAP) employs the transmit energy beamforming to charge multiple single-antenna low-power wireless devices (WDs)… ▽ More

    Submitted 17 May, 2023; originally announced May 2023.

    Comments: 3 figures

  50. arXiv:2304.14082  [pdf, other

    cs.LG cs.SE

    JaxPruner: A concise library for sparsity research

    Authors: Joo Hyung Lee, Wonpyo Park, Nicole Mitchell, Jonathan Pilault, Johan Obando-Ceron, Han-Byul Kim, Namhoon Lee, Elias Frantar, Yun Long, Amir Yazdanbakhsh, Shivani Agrawal, Suvinay Subramanian, Xin Wang, Sheng-Chun Kao, Xingyao Zhang, Trevor Gale, Aart Bik, Woohyun Han, Milen Ferev, Zhonglin Han, Hong-Seok Kim, Yann Dauphin, Gintare Karolina Dziugaite, Pablo Samuel Castro, Utku Evci

    Abstract: This paper introduces JaxPruner, an open-source JAX-based pruning and sparse training library for machine learning research. JaxPruner aims to accelerate research on sparse neural networks by providing concise implementations of popular pruning and sparse training algorithms with minimal memory and latency overhead. Algorithms implemented in JaxPruner use a common API and work seamlessly with the… ▽ More

    Submitted 18 December, 2023; v1 submitted 27 April, 2023; originally announced April 2023.

    Comments: Jaxpruner is hosted at http://github.com/google-research/jaxpruner