Skip to main content

Showing 1–50 of 610 results for author: Ma, K

.
  1. arXiv:2407.02315  [pdf, other

    cs.CV cs.AI

    VFIMamba: Video Frame Interpolation with State Space Models

    Authors: Guozhen Zhang, Chunxu Liu, Yutao Cui, Xiaotong Zhao, Kai Ma, Limin Wang

    Abstract: Inter-frame modeling is pivotal in generating intermediate frames for video frame interpolation (VFI). Current approaches predominantly rely on convolution or attention-based models, which often either lack sufficient receptive fields or entail significant computational overheads. Recently, Selective State Space Models (S6) have emerged, tailored specifically for long sequence modeling, offering b… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  2. arXiv:2407.01964  [pdf, other

    cs.CL

    Enabling Discriminative Reasoning in LLMs for Legal Judgment Prediction

    Authors: Chenlong Deng, Kelong Mao, Yuyao Zhang, Zhicheng Dou

    Abstract: Legal judgment prediction is essential for enhancing judicial efficiency. In this work, we identify that existing large language models (LLMs) underperform in this domain due to challenges in understanding case complexities and distinguishing between similar charges. To adapt LLMs for effective legal judgment prediction, we introduce the Ask-Discriminate-Predict (ADAPT) reasoning framework inspire… ▽ More

    Submitted 2 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

  3. Sequential Manipulation Against Rank Aggregation: Theory and Algorithm

    Authors: Ke Ma, Qianqian Xu, **shan Zeng, Wei Liu, Xiaochun Cao, Yingfei Sun, Qingming Huang

    Abstract: Rank aggregation with pairwise comparisons is widely encountered in sociology, politics, economics, psychology, sports, etc . Given the enormous social impact and the consequent incentives, the potential adversary has a strong motivation to manipulate the ranking list. However, the ideal attack opportunity and the excessive adversarial capability cause the existing methods to be impractical. To fu… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Accepted by IEEE TPAMI URL: https://ieeexplore.ieee.org/document/10564181

  4. arXiv:2407.00565  [pdf, other

    cs.DC cs.NI

    Joint Task Allocation and Scheduling for Multi-Hop Distributed Computing

    Authors: Ke Ma, Junfei Xie

    Abstract: The rise of the Internet of Things and edge computing has shifted computing resources closer to end-users, benefiting numerous delay-sensitive, computation-intensive applications. To speed up computation, distributed computing is a promising technique that allows parallel execution of tasks across multiple compute nodes. However, current research predominantly revolves around the master-worker par… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  5. arXiv:2406.19853  [pdf, other

    cs.CL cs.AI

    YuLan: An Open-source Large Language Model

    Authors: Yutao Zhu, Kun Zhou, Kelong Mao, Wentong Chen, Yiding Sun, Zhipeng Chen, Qian Cao, Yihan Wu, Yushuo Chen, Feng Wang, Lei Zhang, Junyi Li, Xiaolei Wang, Lei Wang, Beichen Zhang, Zican Dong, Xiaoxue Cheng, Yuhan Chen, Xinyu Tang, Yupeng Hou, Qiangqiang Ren, Xincheng Pang, Shufang Xie, Wayne Xin Zhao, Zhicheng Dou , et al. (13 additional authors not shown)

    Abstract: Large language models (LLMs) have become the foundation of many applications, leveraging their extensive capabilities in processing and understanding natural language. While many open-source LLMs have been released with technical reports, the lack of training details hinders further research and development. This paper presents the development of YuLan, a series of open-source LLMs with $12$ billi… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  6. arXiv:2406.19760  [pdf, other

    cs.IR cs.CL

    Learning Interpretable Legal Case Retrieval via Knowledge-Guided Case Reformulation

    Authors: Chenlong Deng, Kelong Mao, Zhicheng Dou

    Abstract: Legal case retrieval for sourcing similar cases is critical in upholding judicial fairness. Different from general web search, legal case retrieval involves processing lengthy, complex, and highly specialized legal documents. Existing methods in this domain often overlook the incorporation of legal expert knowledge, which is crucial for accurately understanding and modeling legal cases, leading to… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  7. arXiv:2406.14515  [pdf, other

    cs.CV cs.MM

    MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding

    Authors: Xinyu Fang, Kangrui Mao, Haodong Duan, Xiangyu Zhao, Yining Li, Dahua Lin, Kai Chen

    Abstract: The advent of large vision-language models (LVLMs) has spurred research into their applications in multi-modal contexts, particularly in video understanding. Traditional VideoQA benchmarks, despite providing quantitative metrics, often fail to encompass the full spectrum of video content and inadequately assess models' temporal comprehension. To address these limitations, we introduce MMBench-Vide… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  8. arXiv:2406.11247  [pdf, other

    cs.CV

    STEVE Series: Step-by-Step Construction of Agent Systems in Minecraft

    Authors: Zhonghan Zhao, Wenhao Chai, Xuan Wang, Ke Ma, Kewei Chen, Dongxu Guo, Tian Ye, Yanting Zhang, Hongwei Wang, Gaoang Wang

    Abstract: Building an embodied agent system with a large language model (LLM) as its core is a promising direction. Due to the significant costs and uncontrollable factors associated with deploying and training such agents in the real world, we have decided to begin our exploration within the Minecraft environment. Our STEVE Series agents can complete basic tasks in a virtual environment and more challengin… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: CVPR 2024 Embodied AI Workshop

  9. arXiv:2406.09688  [pdf, other

    cs.CL

    FreeCtrl: Constructing Control Centers with Feedforward Layers for Learning-Free Controllable Text Generation

    Authors: Zijian Feng, Hanzhang Zhou, Zixiao Zhu, Kezhi Mao

    Abstract: Controllable text generation (CTG) seeks to craft texts adhering to specific attributes, traditionally employing learning-based techniques such as training, fine-tuning, or prefix-tuning with attribute-specific datasets. These approaches, while effective, demand extensive computational and data resources. In contrast, some proposed learning-free alternatives circumvent learning but often yield inf… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: ACL 2024

  10. arXiv:2406.08187  [pdf, other

    cs.RO

    Learning-based Traversability Costmap for Autonomous Off-road Navigation

    Authors: Qiumin Zhu, Zhen Sun, Songpengcheng Xia, Guoqing Liu, Kehui Ma, Ling Pei, Zheng Gong

    Abstract: Traversability estimation in off-road terrains is an essential procedure for autonomous navigation. However, creating reliable labels for complex interactions between the robot and the surface is still a challenging problem in learning-based costmap generation. To address this, we propose a method that predicts traversability costmaps by leveraging both visual and geometric information of the envi… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  11. arXiv:2406.05013  [pdf, other

    cs.IR cs.CL

    CHIQ: Contextual History Enhancement for Improving Query Rewriting in Conversational Search

    Authors: Fengran Mo, Abbas Ghaddar, Kelong Mao, Mehdi Rezagholizadeh, Boxing Chen, Qun Liu, Jian-Yun Nie

    Abstract: In this paper, we study how open-source large language models (LLMs) can be effectively deployed for improving query rewriting in conversational search, especially for ambiguous queries. We introduce CHIQ, a two-step method that leverages the capabilities of LLMs to resolve ambiguities in the conversation history before query rewriting. This approach contrasts with prior studies that predominantly… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  12. arXiv:2406.04548  [pdf, other

    cs.LG cs.IR cs.SI

    GNNAnatomy: Systematic Generation and Evaluation of Multi-Level Explanations for Graph Neural Networks

    Authors: Hsiao-Ying Lu, Yiran Li, Ujwal Pratap Krishna Kaluvakolanu Thyagarajan, Kwan-Liu Ma

    Abstract: Graph Neural Networks (GNNs) have proven highly effective in various machine learning (ML) tasks involving graphs, such as node/graph classification and link prediction. However, explaining the decisions made by GNNs poses challenges because of the aggregated relational information based on graph structure, leading to complex data transformations. Existing methods for explaining GNNs often face li… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  13. arXiv:2406.00009  [pdf, other

    cs.RO

    ULTra-AV: A Unified Longitudinal Trajectory Dataset for Automated Vehicle

    Authors: Hang Zhou, Ke Ma, Shixiao Liang, Xiaopeng Li, Xiaobo Qu

    Abstract: Automated Vehicles (AVs) promise significant advances in transportation. Critical to these improvements is understanding AVs' longitudinal behavior, relying heavily on real-world trajectory data. Existing open-source trajectory datasets of AV, however, often fall short in refinement, reliability, and completeness, hindering effective performance metrics analysis and model development. This study a… ▽ More

    Submitted 16 May, 2024; originally announced June 2024.

    Comments: NA

  14. arXiv:2405.20890  [pdf, other

    hep-ph

    Constraining Gluonic Contact Interaction of a Neutrino-philic Dark Fermion at Hadron Colliders and Direct Detection Experiments

    Authors: Kai Ma, Lin-Yun He

    Abstract: Weakly interacting fermion with the Standard Model particles is a promising candidate of the genuine dark matter. In this paper, we study signatures of the gluonic interactions of a dark fermion and a neutrino at hadron colliders and direct detection experiments. The lowest order interactions are described by contact operators in dimension 7. At hadron colliders, the mono-jet production is the mos… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

    Comments: 32 pages, 7 captioned figures; 1 figure and 3 tables in the Appendix

  15. arXiv:2405.20612  [pdf, other

    cs.CL cs.AI

    UniBias: Unveiling and Mitigating LLM Bias through Internal Attention and FFN Manipulation

    Authors: Hanzhang Zhou, Zijian Feng, Zixiao Zhu, Junlang Qian, Kezhi Mao

    Abstract: Large language models (LLMs) have demonstrated impressive capabilities in various tasks using the in-context learning (ICL) paradigm. However, their effectiveness is often compromised by inherent bias, leading to prompt brittleness, i.e., sensitivity to design settings such as example selection, order, and prompt formatting. Previous studies have addressed LLM bias through external adjustment of m… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  16. arXiv:2405.20343  [pdf, other

    cs.CV cs.GR cs.LG

    Unique3D: High-Quality and Efficient 3D Mesh Generation from a Single Image

    Authors: Kailu Wu, Fangfu Liu, Zhihan Cai, Runjie Yan, Hanyang Wang, Yating Hu, Yueqi Duan, Kaisheng Ma

    Abstract: In this work, we introduce Unique3D, a novel image-to-3D framework for efficiently generating high-quality 3D meshes from single-view images, featuring state-of-the-art generation fidelity and strong generalizability. Previous methods based on Score Distillation Sampling (SDS) can produce diversified 3D results by distilling 3D knowledge from large 2D diffusion models, but they usually suffer from… ▽ More

    Submitted 13 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

    Comments: Project page: https://wukailu.github.io/Unique3D

    ACM Class: I.2.10

  17. arXiv:2405.19885  [pdf, other

    cs.LG cs.RO

    Fourier Controller Networks for Real-Time Decision-Making in Embodied Learning

    Authors: Hengkai Tan, Songming Liu, Kai Ma, Chengyang Ying, Xingxing Zhang, Hang Su, Jun Zhu

    Abstract: Transformer has shown promise in reinforcement learning to model time-varying features for obtaining generalized low-level robot policies on diverse robotics datasets in embodied learning. However, it still suffers from the issues of low data efficiency and high inference latency. In this paper, we propose to investigate the task from a new perspective of the frequency domain. We first observe tha… ▽ More

    Submitted 5 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

  18. arXiv:2405.19327  [pdf, other

    cs.CL cs.AI cs.LG

    MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series

    Authors: Ge Zhang, Scott Qu, Jiaheng Liu, Chenchen Zhang, Chenghua Lin, Chou Leuang Yu, Danny Pan, Esther Cheng, Jie Liu, Qunshu Lin, Raven Yuan, Tuney Zheng, Wei Pang, Xinrun Du, Yiming Liang, Yinghao Ma, Yizhi Li, Ziyang Ma, Bill Lin, Emmanouil Benetos, Huan Yang, Junting Zhou, Kai**g Ma, Minghao Liu, Morry Niu , et al. (20 additional authors not shown)

    Abstract: Large Language Models (LLMs) have made great strides in recent years to achieve unprecedented performance across different tasks. However, due to commercial interest, the most competitive models like GPT, Gemini, and Claude have been gated behind proprietary interfaces without disclosing the training details. Recently, many institutions have open-sourced several strong LLMs like LLaMA-3, comparabl… ▽ More

    Submitted 2 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

    Comments: https://map-neo.github.io/

  19. arXiv:2405.18435  [pdf, other

    eess.IV cs.CV

    QUBIQ: Uncertainty Quantification for Biomedical Image Segmentation Challenge

    Authors: Hongwei Bran Li, Fernando Navarro, Ivan Ezhov, Amirhossein Bayat, Dhritiman Das, Florian Kofler, Suprosanna Shit, Diana Waldmannstetter, Johannes C. Paetzold, Xiaobin Hu, Benedikt Wiestler, Lucas Zimmer, Tamaz Amiranashvili, Chinmay Prabhakar, Christoph Berger, Jonas Weidner, Michelle Alonso-Basant, Arif Rashid, Ujjwal Baid, Wesam Adel, Deniz Ali, Bhakti Baheti, Yingbin Bai, Ishaan Bhatt, Sabri Can Cetindag , et al. (55 additional authors not shown)

    Abstract: Uncertainty in medical image segmentation tasks, especially inter-rater variability, arising from differences in interpretations and annotations by various experts, presents a significant challenge in achieving consistent and reliable image segmentation. This variability not only reflects the inherent complexity and subjective nature of medical image interpretation but also directly impacts the de… ▽ More

    Submitted 24 June, 2024; v1 submitted 19 March, 2024; originally announced May 2024.

    Comments: initial technical report

  20. arXiv:2405.16886  [pdf, other

    cs.CV

    Hawk: Learning to Understand Open-World Video Anomalies

    Authors: Jiaqi Tang, Hao Lu, Ruizheng Wu, Xiaogang Xu, Ke Ma, Cheng Fang, Bin Guo, Jiangbo Lu, Qifeng Chen, Ying-Cong Chen

    Abstract: Video Anomaly Detection (VAD) systems can autonomously monitor and identify disturbances, reducing the need for manual labor and associated costs. However, current VAD systems are often limited by their superficial semantic understanding of scenes and minimal user interaction. Additionally, the prevalent data scarcity in existing datasets restricts their applicability in open-world scenarios. In t… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  21. arXiv:2405.16878  [pdf, other

    hep-ph hep-ex

    Complementary Search of Fermionic Absorption Operators at Hadron Collider and Direct Detection Experiments

    Authors: Kai Ma, Shao-Feng Ge, Lin-Yun He, Ning Zhou

    Abstract: Instead of the energy recoil signal at direct detection experiments, dark matter appears always as missing energy at high energy colliders. For a fermionc dark matter coupled with quarks and neutrino via absorption operators, its production is always accompanied by an invisible neutrino. We study in details the mono-X (photon, jet, and $Z$) productions at the Large Hadron Collider (LHC). To make e… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: 46 pages, 20 captioned figures, 4 tables. The main results of this paper have been reported at the workshop "Roadmap of Dark Matter models for Run 3"

  22. arXiv:2405.15318  [pdf, other

    cs.CL cs.AI

    Are Long-LLMs A Necessity For Long-Context Tasks?

    Authors: Hong** Qian, Zheng Liu, Peitian Zhang, Kelong Mao, Yujia Zhou, Xu Chen, Zhicheng Dou

    Abstract: The learning and deployment of long-LLMs remains a challenging problem despite recent progresses. In this work, we argue that the long-LLMs are not a necessity to solve long-context tasks, as common long-context tasks are short-context solvable, i.e. they can be solved by purely working with oracle short-contexts within the long-context tasks' inputs. On top of this argument, we propose a framewor… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: 18 pages

  23. arXiv:2405.13113  [pdf, other

    astro-ph.GA astro-ph.CO astro-ph.IM

    MAMMOTH-Subaru. II. Diverse Populations of Circumgalactic Ly$α$ Nebulae at Cosmic Noon

    Authors: Mingyu Li, Haibin Zhang, Zheng Cai, Yongming Liang, Nobunari Kashikawa, Ke Ma, Xiaohui Fan, J. Xavier Prochaska, Bjorn H. C. Emonts, Xin Wang, Yun**g Wu, Shiwu Zhang, Qiong Li, Sean D. Johnson, Minghao Yue, Fabrizio Arrigoni Battaia, Sebastiano Cantalupo, Joseph F. Hennawi, Satoshi Kikuta, Yuanhang Ning, Masami Ouchi, Rhythm Shimakawa, Ben Wang, Weichen Wang, Zheng Zheng , et al. (1 additional authors not shown)

    Abstract: Circumgalactic Lyman-alpha (Ly$α$) nebulae are gaseous halos around galaxies exhibiting luminous extended Ly$α$ emission. This work investigates Ly$α$ nebulae from deep imaging of $\sim12~\mathrm{deg}^2$ sky, targeted by the MAMMOTH-Subaru survey. Utilizing the wide-field capability of Hyper Suprime-Cam (HSC), we present one of the largest blind Ly$α$ nebula selections, including QSO nebulae, Ly… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: 26 pages, 10 figures, 3 tables, submitted to ApJS, comments welcome

  24. arXiv:2405.12569  [pdf, other

    eess.SP

    TypeII-CsiNet: CSI Feedback with TypeII Codebook

    Authors: Yiliang Sang, Ke Ma, Yang Ming, ** Lian, Zhaocheng Wang

    Abstract: The latest TypeII codebook selects partial strongest angular-delay ports for the feedback of downlink channel state information (CSI), whereas its performance is limited due to the deficiency of utilizing the correlations among the port coefficients. To tackle this issue, we propose a tailored autoencoder named TypeII-CsiNet to effectively integrate the TypeII codebook with deep learning, wherein… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  25. arXiv:2405.11891  [pdf, ps, other

    cs.CL cs.AI

    Unveiling and Manipulating Prompt Influence in Large Language Models

    Authors: Zijian Feng, Hanzhang Zhou, Zixiao Zhu, Junlang Qian, Kezhi Mao

    Abstract: Prompts play a crucial role in guiding the responses of Large Language Models (LLMs). However, the intricate role of individual tokens in prompts, known as input saliency, in sha** the responses remains largely underexplored. Existing saliency methods either misalign with LLM generation objectives or rely heavily on linearity assumptions, leading to potential inaccuracies. To address this, we pr… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: ICLR 2024

  26. arXiv:2405.11672  [pdf

    cs.LG

    Interpretable Machine Learning Enhances Disease Prognosis: Applications on COVID-19 and Onward

    Authors: **zhi Shen, Ke Ma

    Abstract: In response to the COVID-19 pandemic, the integration of interpretable machine learning techniques has garnered significant attention, offering transparent and understandable insights crucial for informed clinical decision making. This literature review delves into the applications of interpretable machine learning in predicting the prognosis of respiratory diseases, particularly focusing on COVID… ▽ More

    Submitted 20 May, 2024; v1 submitted 19 May, 2024; originally announced May 2024.

  27. arXiv:2405.10988  [pdf, other

    cs.LG cs.AI

    Flow Score Distillation for Diverse Text-to-3D Generation

    Authors: Runjie Yan, Kailu Wu, Kaisheng Ma

    Abstract: Recent advancements in Text-to-3D generation have yielded remarkable progress, particularly through methods that rely on Score Distillation Sampling (SDS). While SDS exhibits the capability to create impressive 3D assets, it is hindered by its inherent maximum-likelihood-seeking essence, resulting in limited diversity in generation outcomes. In this paper, we discover that the Denoise Diffusion Im… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

  28. arXiv:2405.08487  [pdf, other

    cs.CV cs.CR

    Semantic Contextualization of Face Forgery: A New Definition, Dataset, and Detection Method

    Authors: Mian Zou, Baosheng Yu, Yibing Zhan, Siwei Lyu, Kede Ma

    Abstract: In recent years, deep learning has greatly streamlined the process of generating realistic fake face images. Aware of the dangers, researchers have developed various tools to spot these counterfeits. Yet none asked the fundamental question: What digital manipulations make a real photographic face image fake, while others do not? In this paper, we put face forgery in a semantic context and define t… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  29. arXiv:2405.06600  [pdf, other

    cs.CV

    Multi-Object Tracking in the Dark

    Authors: Xinzhe Wang, Kang Ma, Qiankun Liu, Yunhao Zou, Ying Fu

    Abstract: Low-light scenes are prevalent in real-world applications (e.g. autonomous driving and surveillance at night). Recently, multi-object tracking in various practical use cases have received much attention, but multi-object tracking in dark scenes is rarely considered. In this paper, we focus on multi-object tracking in dark scenes. To address the lack of datasets, we first build a Low-light Multi-Ob… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

    Comments: Accepted by CVPR2024

  30. arXiv:2405.03234  [pdf, other

    cs.HC cs.LG

    A Reliable Framework for Human-in-the-Loop Anomaly Detection in Time Series

    Authors: Ziquan Deng, Xiwei Xuan, Kwan-Liu Ma, Zhaodan Kong

    Abstract: Time series anomaly detection is a critical machine learning task for numerous applications, such as finance, healthcare, and industrial systems. However, even high-performed models may exhibit potential issues such as biases, leading to unreliable outcomes and misplaced confidence. While model explanation techniques, particularly visual explanations, offer valuable insights to detect such issues… ▽ More

    Submitted 7 May, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

    Comments: The manuscript is currently under review

  31. arXiv:2405.02345  [pdf, other

    cs.HC cs.AI

    Exploring the Capabilities of Large Language Models for Generating Diverse Design Solutions

    Authors: Kevin Ma, Daniele Grandi, Christopher McComb, Kosa Goucher-Lambert

    Abstract: Access to large amounts of diverse design solutions can support designers during the early stage of the design process. In this paper, we explore the efficacy of large language models (LLM) in producing diverse design solutions, investigating the level of impact that parameter tuning and various prompt engineering techniques can have on the diversity of LLM-generated design solutions. Specifically… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: preprint of journal paper

  32. arXiv:2404.18402  [pdf, other

    quant-ph

    Entanglement enhancement of two giant atoms with multiple connection points in bidirectional-chiral quantum waveguide-QED system

    Authors: Jie Liu, Yue Cai, Kang-Jie Ma, Lei Tan, Wu-Ming Liu

    Abstract: We study the entanglement generation of two giant atoms within a one-dimensional bidirectional-chiral waveguide quantum electrodynamics (QED) system, where the initial state of the two giant atoms are $|e_a,g_b\rangle $. Here, each giant atom is coupled to the waveguide through three connection points, with the configurations divided into five types based on the arrangement of coupling points betw… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

    Comments: 10 pages,8 figures

  33. arXiv:2404.16068  [pdf, other

    cs.AI cs.CL cs.LG

    SemEval-2024 Task 9: BRAINTEASER: A Novel Task Defying Common Sense

    Authors: Yifan Jiang, Filip Ilievski, Kaixin Ma

    Abstract: While vertical thinking relies on logical and commonsense reasoning, lateral thinking requires systems to defy commonsense associations and overwrite them through unconventional thinking. Lateral thinking has been shown to be challenging for current models but has received little attention. A recent benchmark, BRAINTEASER, aims to evaluate current models' lateral thinking ability in a zero-shot se… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  34. arXiv:2404.13591  [pdf, other

    cs.CV cs.LG

    MARVEL: Multidimensional Abstraction and Reasoning through Visual Evaluation and Learning

    Authors: Yifan Jiang, Jiarui Zhang, Kexuan Sun, Zhivar Sourati, Kian Ahrabian, Kaixin Ma, Filip Ilievski, Jay Pujara

    Abstract: While multi-modal large language models (MLLMs) have shown significant progress on many popular visual reasoning benchmarks, whether they possess abstract visual reasoning abilities remains an open question. Similar to the Sudoku puzzles, abstract visual reasoning (AVR) problems require finding high-level patterns (e.g., repetition constraints) that control the input shapes (e.g., digits) in a spe… ▽ More

    Submitted 24 April, 2024; v1 submitted 21 April, 2024; originally announced April 2024.

  35. arXiv:2404.13556  [pdf, other

    cs.IR cs.CL

    ChatRetriever: Adapting Large Language Models for Generalized and Robust Conversational Dense Retrieval

    Authors: Kelong Mao, Chenlong Deng, Haonan Chen, Fengran Mo, Zheng Liu, Tetsuya Sakai, Zhicheng Dou

    Abstract: Conversational search requires accurate interpretation of user intent from complex multi-turn contexts. This paper presents ChatRetriever, which inherits the strong generalization capability of large language models to robustly represent complex conversational sessions for dense retrieval. To achieve this, we propose a simple and effective dual-learning approach that adapts LLM for retrieval via c… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

  36. arXiv:2404.12347  [pdf, other

    cs.CV cs.GR

    AniClipart: Clipart Animation with Text-to-Video Priors

    Authors: Ronghuan Wu, Wanchao Su, Kede Ma, **g Liao

    Abstract: Clipart, a pre-made graphic art form, offers a convenient and efficient way of illustrating visual content. Traditional workflows to convert static clipart images into motion sequences are laborious and time-consuming, involving numerous intricate steps like rigging, key animation and in-betweening. Recent advancements in text-to-video generation hold great potential in resolving this problem. Nev… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: Project Page: https://aniclipart.github.io/

  37. arXiv:2404.08008  [pdf, other

    cs.LG cs.CL cs.HC

    Sample-Efficient Human Evaluation of Large Language Models via Maximum Discrepancy Competition

    Authors: Kehua Feng, Keyan Ding, Kede Ma, Zhihua Wang, Qiang Zhang, Huajun Chen

    Abstract: The past years have witnessed a proliferation of large language models (LLMs). Yet, automated and unbiased evaluation of LLMs is challenging due to the inaccuracy of standard metrics in reflecting human preferences and the inefficiency in sampling informative and diverse test examples. While human evaluation remains the gold standard, it is expensive and time-consuming, especially when dealing wit… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: 32 pages, 6 figures

  38. arXiv:2404.06419  [pdf, other

    hep-ph

    Exploring Four Fermion Contact Couplings of a Dark Fermion and an Electron at Hadron Colliders and Direct Detection Experiments

    Authors: Kai Ma

    Abstract: Both the collider searches and direct detections are promising approaches to probe a fermionic dark matter. In this paper we study signatures of the four fermion contact operators involving a dark fermion, an electron and a quark pair. We show that the mono-electron production channel at hadron collider can provide strong constraints. Associated productions of a charged electron with a photon/jet… ▽ More

    Submitted 23 May, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

    Comments: 32 pages, 15 captioned figures and 2 tables; v2: invisibility of the dark fermion at hadron collider is discussed, and the title is improved

  39. arXiv:2404.04619  [pdf, other

    cs.AI cs.CV

    Do We Really Need a Complex Agent System? Distill Embodied Agent into a Single Model

    Authors: Zhonghan Zhao, Ke Ma, Wenhao Chai, Xuan Wang, Kewei Chen, Dongxu Guo, Yanting Zhang, Hongwei Wang, Gaoang Wang

    Abstract: With the power of large language models (LLMs), open-ended embodied agents can flexibly understand human instructions, generate interpretable guidance strategies, and output executable actions. Nowadays, Multi-modal Language Models~(MLMs) integrate multi-modal signals into LLMs, further bringing richer perception to entity agents and allowing embodied agents to perceive world-understanding tasks m… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

    Comments: arXiv admin note: text overlap with arXiv:2403.08282

  40. arXiv:2404.03543  [pdf, other

    cs.SE cs.AI cs.CL cs.LG

    CodeEditorBench: Evaluating Code Editing Capability of Large Language Models

    Authors: Jiawei Guo, Ziming Li, Xueling Liu, Kai**g Ma, Tianyu Zheng, Zhouliang Yu, Ding Pan, Yizhi LI, Ruibo Liu, Yue Wang, Shuyue Guo, Xingwei Qu, Xiang Yue, Ge Zhang, Wenhu Chen, Jie Fu

    Abstract: Large Language Models (LLMs) for code are rapidly evolving, with code editing emerging as a critical capability. We introduce CodeEditorBench, an evaluation framework designed to rigorously assess the performance of LLMs in code editing tasks, including debugging, translating, polishing, and requirement switching. Unlike existing benchmarks focusing solely on code generation, CodeEditorBench empha… ▽ More

    Submitted 6 April, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

  41. arXiv:2404.01672  [pdf, other

    cs.IT eess.SP

    The Meta Distribution of the SIR in Joint Communication and Sensing Networks

    Authors: Kun Ma, Chenyuan Feng, Giovanni Geraci, Howard H. Yang

    Abstract: In this paper, we introduce a novel mathematical framework for assessing the performance of joint communication and sensing (JCAS) in wireless networks, employing stochastic geometry as an analytical tool. We focus on deriving the meta distribution of the signal-to-interference ratio (SIR) for JCAS networks. This approach enables a fine-grained quantification of individual user or radar performanc… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

  42. arXiv:2404.00417  [pdf, other

    cs.LG cs.AI cs.CV

    Orchestrate Latent Expertise: Advancing Online Continual Learning with Multi-Level Supervision and Reverse Self-Distillation

    Authors: HongWei Yan, Liyuan Wang, Kaisheng Ma, Yi Zhong

    Abstract: To accommodate real-world dynamics, artificial intelligence systems need to cope with sequentially arriving content in an online manner. Beyond regular Continual Learning (CL) attempting to address catastrophic forgetting with offline training of each task, Online Continual Learning (OCL) is a more challenging yet realistic setting that performs CL in a one-pass data stream. Current OCL methods pr… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Comments: CVPR 2024

  43. arXiv:2404.00252  [pdf, other

    eess.IV cs.CV

    Learned Scanpaths Aid Blind Panoramic Video Quality Assessment

    Authors: Kanglong Fan, Wen Wen, Mu Li, Yifan Peng, Kede Ma

    Abstract: Panoramic videos have the advantage of providing an immersive and interactive viewing experience. Nevertheless, their spherical nature gives rise to various and uncertain user viewing behaviors, which poses significant challenges for panoramic video quality assessment (PVQA). In this work, we propose an end-to-end optimized, blind PVQA method with explicit modeling of user viewing patterns through… ▽ More

    Submitted 15 May, 2024; v1 submitted 30 March, 2024; originally announced April 2024.

    Comments: Accepted to CVPR 2024

  44. arXiv:2403.19417  [pdf, other

    cs.CV

    OAKINK2: A Dataset of Bimanual Hands-Object Manipulation in Complex Task Completion

    Authors: Xinyu Zhan, Lixin Yang, Yifei Zhao, Kangrui Mao, Hanlin Xu, Zenan Lin, Kailin Li, Cewu Lu

    Abstract: We present OAKINK2, a dataset of bimanual object manipulation tasks for complex daily activities. In pursuit of constructing the complex tasks into a structured representation, OAKINK2 introduces three level of abstraction to organize the manipulation tasks: Affordance, Primitive Task, and Complex Task. OAKINK2 features on an object-centric perspective for decoding the complex tasks, treating them… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: To be appeared in CVPR 2024. 26 pages

  45. The Galactic latitude dependency of Faraday complexity in the S-PASS/ATCA RM catalogue

    Authors: S. Ranchod, S. A. Mao, R. Deane, S. S. Sridhar, A. Damas-Segovia, J. D. Livingston, Y. K. Ma

    Abstract: The S-band Polarisation All Sky Survey (SPASS/ATCA) rotation measure (RM) catalogue is the largest broadband RM catalogue to date, increasing the RM density in the sparse southern sky. Through analysis of this catalogue, we report a latitude dependency of the Faraday complexity of polarised sources in this catalogue within 10$^\circ$ of the Galactic plane towards the inner Galaxy. In this study, w… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: 16 pages, 16 figures

    Journal ref: A&A 686, A104 (2024)

  46. arXiv:2403.12504  [pdf, other

    cs.RO

    TON-VIO: Online Time Offset Modeling Networks for Robust Temporal Alignment in High Dynamic Motion VIO

    Authors: Chaoran Xiong, Guoqing Liu, Qi Wu, Songpengcheng Xia, Tong Hua, Kehui Ma, Zhen Sun, Yan Xiang, Ling Pei

    Abstract: Temporal misalignment (time offset) between sensors is common in low cost visual-inertial odometry (VIO) systems. Such temporal misalignment introduces inconsistent constraints for state estimation, leading to a significant positioning drift especially in high dynamic motion scenarios. In this article, we focus on online temporal calibration to reduce the positioning drift caused by the time offse… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  47. arXiv:2403.12369  [pdf, other

    eess.SP

    Block-Dominant Compressed Sensing for Near-Field Communications: Fundamentals, Solutions and Future Directions

    Authors: Liyang Lu, Ke Ma, Zhaocheng Wang

    Abstract: Near-field (NF) communications draw much attention in the context of extremely large-scale antenna arrays (ELAA). Owing to a large number of antennas and high carrier frequency, the NF coverage distance is quite substantial, where the electromagnetic radiation propagates by spherical waves, in contrast to the conventional planar waves of the far-field. Motivated by these facts, the block-dominant… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: Submitted to IEEE for possible publication

  48. arXiv:2403.12105  [pdf, ps, other

    quant-ph physics.optics

    2-D isotropic negative refractive index in a N-type four-level atomic system

    Authors: Shun-Cai Zhao, Qi-Xuan Wu, Kun Ma

    Abstract: 2-D(Two-dimensional) isotropic negative refractive index (NRI) is explicitly realized via the orthogonal signal and coupling standing-wave fields coupling the N-type four-level atomic system. Under some key parameters of the dense vapor media, the atomic system exhibits isotropic NRI with simultaneous negative permittivity and permeability (i.e. Left-handedness) in the 2-D x-y plane. Compared with… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

    Comments: 6 pages, 4 figures

    Journal ref: Open Phys. 2015; 13:349-354

  49. arXiv:2403.11335  [pdf, other

    cs.IR cs.CL

    ConvSDG: Session Data Generation for Conversational Search

    Authors: Fengran Mo, Bole Yi, Kelong Mao, Chen Qu, Kaiyu Huang, Jian-Yun Nie

    Abstract: Conversational search provides a more convenient interface for users to search by allowing multi-turn interaction with the search engine. However, the effectiveness of the conversational dense retrieval methods is limited by the scarcity of training data required for their fine-tuning. Thus, generating more training conversational sessions with relevant labels could potentially improve search perf… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

    Comments: Accepted by WWW 2024 Workshop

  50. arXiv:2403.10854  [pdf, other

    cs.CV

    A Comprehensive Study of Multimodal Large Language Models for Image Quality Assessment

    Authors: Tianhe Wu, Kede Ma, Jie Liang, Yujiu Yang, Lei Zhang

    Abstract: While Multimodal Large Language Models (MLLMs) have experienced significant advancement on visual understanding and reasoning, their potentials to serve as powerful, flexible, interpretable, and text-driven models for Image Quality Assessment (IQA) remains largely unexplored. In this paper, we conduct a comprehensive and systematic study of prompting MLLMs for IQA. Specifically, we first investiga… ▽ More

    Submitted 16 March, 2024; originally announced March 2024.