Skip to main content

Showing 1–50 of 234 results for author: Meng, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.17095  [pdf, other

    cs.CL

    Attention Instruction: Amplifying Attention in the Middle via Prompting

    Authors: Meiru Zhang, Zaiqiao Meng, Nigel Collier

    Abstract: The context window of large language models has been extended to 128k tokens or more. However, language models still suffer from position bias and have difficulty in accessing and using the middle part of the context due to the lack of attention. We examine the relative position awareness of LLMs and the feasibility of mitigating disproportional attention through prompting. We augment the original… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  2. arXiv:2406.16021  [pdf, other

    cs.CL cs.AI

    Harvesting Events from Multiple Sources: Towards a Cross-Document Event Extraction Paradigm

    Authors: Qiang Gao, Zixiang Meng, Bobo Li, Jun Zhou, Fei Li, Chong Teng, Donghong Ji

    Abstract: Document-level event extraction aims to extract structured event information from unstructured text. However, a single document often contains limited event information and the roles of different event arguments may be biased due to the influence of the information source. This paper addresses the limitations of traditional document-level event extraction by proposing the task of cross-document ev… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: ACL2024(Findings)

  3. arXiv:2406.15990  [pdf, other

    cs.CL cs.AI

    Enhancing Cross-Document Event Coreference Resolution by Discourse Structure and Semantic Information

    Authors: Qiang Gao, Bobo Li, Zixiang Meng, Yunlong Li, Jun Zhou, Fei Li, Chong Teng, Donghong Ji

    Abstract: Existing cross-document event coreference resolution models, which either compute mention similarity directly or enhance mention representation by extracting event arguments (such as location, time, agent, and patient), lacking the ability to utilize document-level information. As a result, they struggle to capture long-distance dependencies. This shortcoming leads to their underwhelming performan… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Report number: https://aclanthology.org/2024.lrec-main.523/

    Journal ref: LREC|COLING,Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation,2024,5907-5921

  4. arXiv:2406.15741  [pdf, other

    cs.CL cs.AI cs.LG

    Ladder: A Model-Agnostic Framework Boosting LLM-based Machine Translation to the Next Level

    Authors: Zhaopeng Feng, Ruizhe Chen, Yan Zhang, Zijie Meng, Zuozhu Liu

    Abstract: General-purpose Large Language Models (LLMs) like GPT-4 have achieved remarkable advancements in machine translation (MT) by leveraging extensive web content. On the other hand, translation-specific LLMs are built by pre-training on domain-specific monolingual corpora and fine-tuning with human-annotated translation data. Despite the superior performance, these methods either demand an unprecedent… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: Our code is available at https://github.com/fzp0424/Ladder

  5. arXiv:2406.14701  [pdf, other

    cs.AI cs.CL cs.SD eess.AS

    Speech Prefix-Tuning with RNNT Loss for Improving LLM Predictions

    Authors: Murali Karthick Baskar, Andrew Rosenberg, Bhuvana Ramabhadran, Neeraj Gaur, Zhong Meng

    Abstract: In this paper, we focus on addressing the constraints faced when applying LLMs to ASR. Recent works utilize prefixLM-type models, which directly apply speech as a prefix to LLMs for ASR. We have found that optimizing speech prefixes leads to better ASR performance and propose applying RNNT loss to perform speech prefix-tuning. This is a simple approach and does not increase the model complexity or… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  6. arXiv:2406.14056  [pdf, other

    cs.CV

    VGA: Vision GUI Assistant -- Minimizing Hallucinations through Image-Centric Fine-Tuning

    Authors: Ziyang Meng, Yu Dai, Zezheng Gong, Shaoxiong Guo, Minglong Tang, Tongquan Wei

    Abstract: Recent advances in Large Vision-Language Models (LVLMs) have significantly improve performance in image comprehension tasks, such as formatted charts and rich-content images. Yet, Graphical User Interface (GUI) pose a greater challenge due to their structured format and detailed textual information. Existing LVLMs often overly depend on internal knowledge and neglect image content, resulting in ha… ▽ More

    Submitted 21 June, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

    Comments: 18 pages

    MSC Class: 68-04 68-04 ACM Class: I.2.7; I.2.10

  7. arXiv:2406.11460  [pdf, other

    cs.CL cs.AI

    TRACE the Evidence: Constructing Knowledge-Grounded Reasoning Chains for Retrieval-Augmented Generation

    Authors: **yuan Fang, Zaiqiao Meng, Craig Macdonald

    Abstract: Retrieval-augmented generation (RAG) offers an effective approach for addressing question answering (QA) tasks. However, the imperfections of the retrievers in RAG models often result in the retrieval of irrelevant information, which could introduce noises and degrade the performance, especially when handling multi-hop questions that require multiple steps of reasoning. To enhance the multi-hop re… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  8. arXiv:2406.08855  [pdf, other

    cs.RO

    Trajectory Planning for Autonomous Driving in Unstructured Scenarios Based on Graph Neural Network and Numerical Optimization

    Authors: Sumin Zhang, Kuo Li, Rui He, Zhiwei Meng, Yupeng Chang, Xiaosong **, Ri Bai

    Abstract: In unstructured environments, obstacles are diverse and lack lane markings, making trajectory planning for intelligent vehicles a challenging task. Traditional trajectory planning methods typically involve multiple stages, including path planning, speed planning, and trajectory optimization. These methods require the manual design of numerous parameters for each stage, resulting in significant wor… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  9. arXiv:2406.07418  [pdf, other

    cs.AI cs.LG q-bio.GN

    Enhanced Gene Selection in Single-Cell Genomics: Pre-Filtering Synergy and Reinforced Optimization

    Authors: Weiliang Zhang, Zhen Meng, Dongjie Wang, Min Wu, Kunpeng Liu, Yuanchun Zhou, Meng Xiao

    Abstract: Recent advancements in single-cell genomics necessitate precision in gene panel selection to interpret complex biological data effectively. Those methods aim to streamline the analysis of scRNA-seq data by focusing on the most informative genes that contribute significantly to the specific analysis task. Traditional selection methods, which often rely on expert domain knowledge, embedded machine l… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: 25 pages

  10. arXiv:2406.07063  [pdf, other

    physics.ao-ph cs.AI physics.flu-dyn

    Reconstructing the Tropical Pacific Upper Ocean using Online Data Assimilation with a Deep Learning model

    Authors: Zilu Meng, Gregory J. Hakim

    Abstract: A deep learning (DL) model, based on a transformer architecture, is trained on a climate-model dataset and compared with a standard linear inverse model (LIM) in the tropical Pacific. We show that the DL model produces more accurate forecasts compared to the LIM when tested on a reanalysis dataset. We then assess the ability of an ensemble Kalman filter to reconstruct the monthly-averaged upper oc… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  11. arXiv:2406.05543  [pdf, other

    cs.CV cs.AI

    VP-LLM: Text-Driven 3D Volume Completion with Large Language Models through Patchification

    Authors: Jianmeng Liu, Yichen Liu, Yuyao Zhang, Zeyuan Meng, Yu-Wing Tai, Chi-Keung Tang

    Abstract: Recent conditional 3D completion works have mainly relied on CLIP or BERT to encode textual information, which cannot support complex instruction. Meanwhile, large language models (LLMs) have shown great potential in multi-modal understanding and generation tasks. Inspired by the recent advancements of LLM, we present Volume Patch LLM (VP-LLM), which leverages LLMs to perform conditional 3D comple… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

    Comments: 27pages, 16 figures

  12. arXiv:2406.02921  [pdf, other

    cs.CL cs.AI cs.LG cs.NE eess.AS

    Text Injection for Neural Contextual Biasing

    Authors: Zhong Meng, Zelin Wu, Rohit Prabhavalkar, Cal Peyser, Weiran Wang, Nanxin Chen, Tara N. Sainath, Bhuvana Ramabhadran

    Abstract: Neural contextual biasing effectively improves automatic speech recognition (ASR) for crucial phrases within a speaker's context, particularly those that are infrequent in the training data. This work proposes contextual text injection (CTI) to enhance contextual ASR. CTI leverages not only the paired speech-text data, but also a much larger corpus of unpaired text to optimize the ASR model and it… ▽ More

    Submitted 11 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

    Comments: 5 pages, 1 figure

    Journal ref: Interspeech 2024, Kos Island, Greece

  13. arXiv:2406.02004  [pdf, ps, other

    cs.CR cs.CL cs.SD eess.AS

    Efficiently Train ASR Models that Memorize Less and Perform Better with Per-core Clip**

    Authors: Lun Wang, Om Thakkar, Zhong Meng, Nicole Rafidi, Rohit Prabhavalkar, Arun Narayanan

    Abstract: Gradient clip** plays a vital role in training large-scale automatic speech recognition (ASR) models. It is typically applied to minibatch gradients to prevent gradient explosion, and to the individual sample gradients to mitigate unintended memorization. This work systematically investigates the impact of a specific granularity of gradient clip**, namely per-core clip-** (PCC), across train… ▽ More

    Submitted 5 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech'24

  14. arXiv:2406.01651  [pdf, other

    q-bio.QM cs.AI cs.LG q-bio.BM

    FusionDTI: Fine-grained Binding Discovery with Token-level Fusion for Drug-Target Interaction

    Authors: Zhaohan Meng, Zaiqiao Meng, Iadh Ounis

    Abstract: Predicting drug-target interaction (DTI) is critical in the drug discovery process. Despite remarkable advances in recent DTI models through the integration of representations from diverse drug and target encoders, such models often struggle to capture the fine-grained interactions between drugs and protein, i.e. the binding of specific drug atoms (or substructures) and key amino acids of proteins… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 10 pages, 8 figures

  15. arXiv:2405.20764  [pdf, other

    cs.CV

    CoMoFusion: Fast and High-quality Fusion of Infrared and Visible Image with Consistency Model

    Authors: Zhiming Meng, Hui Li, Zeyang Zhang, Zhongwei Shen, Yunlong Yu, Xiaoning Song, Xiaojun Wu

    Abstract: Generative models are widely utilized to model the distribution of fused images in the field of infrared and visible image fusion. However, current generative models based fusion methods often suffer from unstable training and slow inference speed. To tackle this problem, a novel fusion method based on consistency model is proposed, termed as CoMoFusion, which can generate the high-quality images… ▽ More

    Submitted 11 June, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

  16. arXiv:2405.16059  [pdf, other

    cs.SI

    Interpretable Transformer Hawkes Processes: Unveiling Complex Interactions in Social Networks

    Authors: Zizhuo Meng, Ke Wan, Yadong Huang, Zhidong Li, Yang Wang, Feng Zhou

    Abstract: Social networks represent complex ecosystems where the interactions between users or groups play a pivotal role in information dissemination, opinion formation, and social interactions. Effectively harnessing event sequence data within social networks to unearth interactions among users or groups has persistently posed a challenging frontier within the realm of point processes. Current deep point… ▽ More

    Submitted 28 May, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

  17. arXiv:2405.15278  [pdf, other

    cs.CV

    MindShot: Brain Decoding Framework Using Only One Image

    Authors: Shuai Jiang, Zhu Meng, Delong Liu, Haiwen Li, Fei Su, Zhicheng Zhao

    Abstract: Brain decoding, which aims at reconstructing visual stimuli from brain signals, primarily utilizing functional magnetic resonance imaging (fMRI), has recently made positive progress. However, it is impeded by significant challenges such as the difficulty of acquiring fMRI-image pairs and the variability of individuals, etc. Most methods have to adopt the per-subject-per-model paradigm, greatly lim… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  18. arXiv:2405.02764  [pdf, other

    cs.CL cs.LG

    Assessing Adversarial Robustness of Large Language Models: An Empirical Study

    Authors: Zeyu Yang, Zhao Meng, Xiaochen Zheng, Roger Wattenhofer

    Abstract: Large Language Models (LLMs) have revolutionized natural language processing, but their robustness against adversarial attacks remains a critical concern. We presents a novel white-box style attack approach that exposes vulnerabilities in leading open-source LLMs, including Llama, OPT, and T5. We assess the impact of model size, structure, and fine-tuning strategies on their resistance to adversar… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

    Comments: 16 pages, 9 figures, 10 tables

  19. arXiv:2404.18816  [pdf, other

    cs.CR cs.SE

    AppPoet: Large Language Model based Android malware detection via multi-view prompt engineering

    Authors: Wenxiang Zhao, Juntao Wu, Zhaoyi Meng

    Abstract: Due to the vast array of Android applications, their multifarious functions and intricate behavioral semantics, attackers can adopt various tactics to conceal their genuine attack intentions within legitimate functions. However, numerous feature engineering based methods suffer from a limitation in mining behavioral semantic information, thus impeding the accuracy and efficiency of Android malware… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  20. arXiv:2404.15516  [pdf, other

    cs.CV cs.AI

    Visual Delta Generator with Large Multi-modal Models for Semi-supervised Composed Image Retrieval

    Authors: Young Kyun Jang, Donghyun Kim, Zihang Meng, Dat Huynh, Ser-Nam Lim

    Abstract: Composed Image Retrieval (CIR) is a task that retrieves images similar to a query, based on a provided textual modification. Current techniques rely on supervised learning for CIR models using labeled triplets of the reference image, text, target image. These specific triplets are not as commonly available as simple image-text pairs, limiting the widespread use of CIR and its scalability. On the o… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: 15 pages

  21. arXiv:2404.10180  [pdf, other

    cs.CL cs.AI cs.LG cs.NE eess.AS

    Deferred NAM: Low-latency Top-K Context Injection via Deferred Context Encoding for Non-Streaming ASR

    Authors: Zelin Wu, Gan Song, Christopher Li, Pat Rondon, Zhong Meng, Xavier Velez, Weiran Wang, Diamantino Caseiro, Golan Pundak, Tsendsuren Munkhdalai, Angad Chandorkar, Rohit Prabhavalkar

    Abstract: Contextual biasing enables speech recognizers to transcribe important phrases in the speaker's context, such as contact names, even if they are rare in, or absent from, the training data. Attention-based biasing is a leading approach which allows for full end-to-end cotraining of the recognizer and biasing system and requires no separate inference-time components. Such biasers typically consist of… ▽ More

    Submitted 23 April, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

    Comments: 9 pages, 3 figures, accepted by NAACL 2024 - Industry Track

    Journal ref: 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics - Industry Track

  22. arXiv:2404.04998  [pdf, other

    cs.CV cs.AI cs.IR

    Weakly Supervised Deep Hyperspherical Quantization for Image Retrieval

    Authors: **peng Wang, Bin Chen, Qiang Zhang, Zaiqiao Meng, Shangsong Liang, Shu-Tao Xia

    Abstract: Deep quantization methods have shown high efficiency on large-scale image retrieval. However, current models heavily rely on ground-truth information, hindering the application of quantization in label-hungry scenarios. A more realistic demand is to learn from inexhaustible uploaded images that are associated with informal tags provided by amateur users. Though such sketchy tags do not obviously r… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

    Comments: In proceedings of AAAI 2021. Code and data are available

  23. arXiv:2404.00729  [pdf, other

    eess.SY cs.LG

    Nonparametric End-to-End Probabilistic Forecasting of Distributed Generation Outputs Considering Missing Data Imputation

    Authors: Minghui Chen, Zichao Meng, Yan** Liu, Longbo Luo, Ye Guo, Kang Wang

    Abstract: In this paper, we introduce a nonparametric end-to-end method for probabilistic forecasting of distributed renewable generation outputs while including missing data imputation. Firstly, we employ a nonparametric probabilistic forecast model utilizing the long short-term memory (LSTM) network to model the probability distributions of distributed renewable generations' outputs. Secondly, we design a… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

  24. arXiv:2404.00230  [pdf, other

    cs.CV

    Latent Watermark: Inject and Detect Watermarks in Latent Diffusion Space

    Authors: Zheling Meng, Bo Peng, **g Dong

    Abstract: Watermarking is a tool for actively identifying and attributing the images generated by latent diffusion models. Existing methods face the dilemma of watermark robustness and image quality. The reason for this dilemma is that watermark detection is performed in pixel space, implying an intrinsic link between image quality and watermark robustness. In this paper, we highlight that an effective solu… ▽ More

    Submitted 29 March, 2024; originally announced April 2024.

  25. arXiv:2403.16056  [pdf, other

    cs.CL cs.AI

    Qibo: A Large Language Model for Traditional Chinese Medicine

    Authors: Heyi Zhang, Xin Wang, Zhaopeng Meng, Zhe Chen, Pengwei Zhuang, Yongzhe Jia, Dawei Xu, Wenbin Guo

    Abstract: Large Language Models (LLMs) has made significant progress in a number of professional fields, including medicine, law, and finance. However, in traditional Chinese medicine (TCM), there are challenges such as the essential differences between theory and modern medicine, the lack of specialized corpus resources, and the fact that relying only on supervised fine-tuning may lead to overconfident pre… ▽ More

    Submitted 22 June, 2024; v1 submitted 24 March, 2024; originally announced March 2024.

  26. arXiv:2403.16034  [pdf, other

    cs.CV

    V2X-Real: a Largs-Scale Dataset for Vehicle-to-Everything Cooperative Perception

    Authors: Hao Xiang, Zhaoliang Zheng, Xin Xia, Runsheng Xu, Letian Gao, Zewei Zhou, Xu Han, Xinkai Ji, Mingxi Li, Zonglin Meng, Li **, Mingyue Lei, Zhaoyang Ma, Zihang He, Haoxuan Ma, Yunshuang Yuan, Yingqian Zhao, Jiaqi Ma

    Abstract: Recent advancements in Vehicle-to-Everything (V2X) technologies have enabled autonomous vehicles to share sensing information to see through occlusions, greatly boosting the perception capability. However, there are no real-world datasets to facilitate the real V2X cooperative perception research -- existing datasets either only support Vehicle-to-Infrastructure cooperation or Vehicle-to-Vehicle c… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

  27. arXiv:2403.11172  [pdf, other

    cs.CV

    Artifact Feature Purification for Cross-domain Detection of AI-generated Images

    Authors: Zheling Meng, Bo Peng, **g Dong, Tieniu Tan

    Abstract: In the era of AIGC, the fast development of visual content generation technologies, such as diffusion models, bring potential security risks to our society. Existing generated image detection methods suffer from performance drop when faced with out-of-domain generators and image scenes. To relieve this problem, we propose Artifact Purification Network (APN) to facilitate the artifact extraction fr… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

    Comments: This work is under consideration at Computer Vision and Image Understanding

  28. arXiv:2403.10353  [pdf, other

    cs.CV

    SimPB: A Single Model for 2D and 3D Object Detection from Multiple Cameras

    Authors: Yingqi Tang, Zhaotie Meng, Guoliang Chen, Erkang Cheng

    Abstract: The field of autonomous driving has attracted considerable interest in approaches that directly infer 3D objects in the Bird's Eye View (BEV) from multiple cameras. Some attempts have also explored utilizing 2D detectors from single images to enhance the performance of 3D detection. However, these approaches rely on a two-stage process with separate detectors, where the 2D detection results are ut… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  29. arXiv:2403.05422  [pdf, other

    cs.CV

    EVD4UAV: An Altitude-Sensitive Benchmark to Evade Vehicle Detection in UAV

    Authors: Huiming Sun, Jiacheng Guo, Zibo Meng, Tianyun Zhang, Jianwu Fang, Yuewei Lin, Hongkai Yu

    Abstract: Vehicle detection in Unmanned Aerial Vehicle (UAV) captured images has wide applications in aerial photography and remote sensing. There are many public benchmark datasets proposed for the vehicle detection and tracking in UAV images. Recent studies show that adding an adversarial patch on objects can fool the well-trained deep neural networks based object detectors, posing security concerns to th… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

  30. arXiv:2403.05018  [pdf, other

    cs.CV

    InstructGIE: Towards Generalizable Image Editing

    Authors: Zichong Meng, Changdi Yang, Jun Liu, Hao Tang, Pu Zhao, Yanzhi Wang

    Abstract: Recent advances in image editing have been driven by the development of denoising diffusion models, marking a significant leap forward in this field. Despite these advances, the generalization capabilities of recent image editing approaches remain constrained. In response to this challenge, our study introduces a novel image editing framework with enhanced generalization robustness by boosting in-… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

    Comments: Preprint

  31. arXiv:2403.05016  [pdf, other

    cs.CV

    DiffClass: Diffusion-Based Class Incremental Learning

    Authors: Zichong Meng, Jie Zhang, Changdi Yang, Zheng Zhan, Pu Zhao, Yanzhi WAng

    Abstract: Class Incremental Learning (CIL) is challenging due to catastrophic forgetting. On top of that, Exemplar-free Class Incremental Learning is even more challenging due to forbidden access to previous task data. Recent exemplar-free CIL methods attempt to mitigate catastrophic forgetting by synthesizing previous task data. However, they fail to overcome the catastrophic forgetting due to the inabilit… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

    Comments: Preprint

  32. arXiv:2403.01924  [pdf, other

    cs.CL cs.AI

    To Generate or to Retrieve? On the Effectiveness of Artificial Contexts for Medical Open-Domain Question Answering

    Authors: Giacomo Frisoni, Alessio Cocchieri, Alex Presepi, Gianluca Moro, Zaiqiao Meng

    Abstract: Medical open-domain question answering demands substantial access to specialized knowledge. Recent efforts have sought to decouple knowledge from model parameters, counteracting architectural scaling and allowing for training on common low-resource hardware. The retrieve-then-read paradigm has become ubiquitous, with model predictions grounded on relevant knowledge pieces from external repositorie… ▽ More

    Submitted 13 June, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

    Comments: ACL 2024 (camera-ready paper)

  33. arXiv:2402.17184  [pdf, other

    cs.CL cs.SD eess.AS

    Extreme Encoder Output Frame Rate Reduction: Improving Computational Latencies of Large End-to-End Models

    Authors: Rohit Prabhavalkar, Zhong Meng, Weiran Wang, Adam Stooke, Xingyu Cai, Yanzhang He, Arun Narayanan, Dongseong Hwang, Tara N. Sainath, Pedro J. Moreno

    Abstract: The accuracy of end-to-end (E2E) automatic speech recognition (ASR) models continues to improve as they are scaled to larger sizes, with some now reaching billions of parameters. Widespread deployment and adoption of these models, however, requires computationally efficient strategies for decoding. In the present work, we study one such strategy: applying multiple frame reduction layers in the enc… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

    Comments: Accepted to 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2024)

  34. arXiv:2402.16424  [pdf, other

    cs.CV

    COMAE: COMprehensive Attribute Exploration for Zero-shot Hashing

    Authors: Yihang Zhou, Qingqing Long, Yuchen Yan, Xiao Luo, Zeyu Dong, Xuezhi Wang, Zhen Meng, Pengfei Wang, Yuanchun Zhou

    Abstract: Zero-shot hashing (ZSH) has shown excellent success owing to its efficiency and generalization in large-scale retrieval scenarios. While considerable success has been achieved, there still exist urgent limitations. Existing works ignore the locality relationships of representations and attributes, which have effective transferability between seeable classes and unseeable classes. Also, the continu… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

    Comments: 13 pages, 7 figures

  35. arXiv:2402.14551  [pdf, other

    cs.CV cs.AI cs.LG

    CLCE: An Approach to Refining Cross-Entropy and Contrastive Learning for Optimized Learning Fusion

    Authors: Zijun Long, George Killick, Lipeng Zhuang, Gerardo Aragon-Camarasa, Zaiqiao Meng, Richard Mccreadie

    Abstract: State-of-the-art pre-trained image models predominantly adopt a two-stage approach: initial unsupervised pre-training on large-scale datasets followed by task-specific fine-tuning using Cross-Entropy loss~(CE). However, it has been demonstrated that CE can compromise model generalization and stability. While recent works employing contrastive learning address some of these limitations by enhancing… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    Comments: arXiv admin note: text overlap with arXiv:2308.14893

  36. arXiv:2402.07610  [pdf, other

    cs.CL cs.AI

    Step-On-Feet Tuning: Scaling Self-Alignment of LLMs via Bootstrap**

    Authors: Haoyu Wang, Guozheng Ma, Ziqiao Meng, Zeyu Qin, Li Shen, Zhong Zhang, Bingzhe Wu, Liu Liu, Yatao Bian, Tingyang Xu, Xueqian Wang, Peilin Zhao

    Abstract: Self-alignment is an effective way to reduce the cost of human annotation while ensuring promising model capability. However, most current methods complete the data collection and training steps in a single round, which may overlook the continuously improving ability of self-aligned models. This gives rise to a key query: What if we do multi-time bootstrap** self-alignment? Does this strategy en… ▽ More

    Submitted 27 June, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

  37. arXiv:2402.05390  [pdf, other

    cs.NI eess.SP

    Integrated Sensing and Communication Driven Digital Twin for Intelligent Machine Network

    Authors: Zhiqing Wei, Yucong Du, Qixun Zhang, Wangjun Jiang, Yanpeng Cui, Zeyang Meng, Huici Wu, Zhiyong Feng

    Abstract: Intelligent machines (IMs), including industrial machines, unmanned aerial vehicles (UAVs), and unmanned vehicles, etc., could perform effective cooperation in complex environment when they form IM network. The efficient environment sensing and communication are crucial for IM network, enabling the real-time and stable control of IMs. With the emergence of integrated sensing and communication (ISA… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

    Comments: 9 pages, 5 figures, 1 Table

    ACM Class: C.2.1

  38. arXiv:2401.12520  [pdf, other

    cs.CL cs.IR cs.LG

    Key Information Retrieval to Classify the Unstructured Data Content of Preferential Trade Agreements

    Authors: Jiahui Zhao, Ziyi Meng, Stepan Gordeev, Zijie Pan, Dong** Song, Sandro Steinbach, Caiwen Ding

    Abstract: With the rapid proliferation of textual data, predicting long texts has emerged as a significant challenge in the domain of natural language processing. Traditional text prediction methods encounter substantial difficulties when grappling with long texts, primarily due to the presence of redundant and irrelevant information, which impedes the model's capacity to capture pivotal insights from the t… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

    Comments: AI4TS Workshop@AAAI 2024 accepted publication

  39. arXiv:2401.05190  [pdf, other

    cs.CL

    DCR: Divide-and-Conquer Reasoning for Multi-choice Question Answering with LLMs

    Authors: Zijie Meng, Yan Zhang, Zhaopeng Feng, Zuozhu Liu

    Abstract: Large language models (LLMs) have shown impressive performance in reasoning benchmarks with the emergence of Chain-of-Thought (CoT), particularly in multi-choice question (MCQ). However, current works equally resolve questions regardless of the problem-solving difficulty, leading to an excessive focus on simple items while insufficient attention on intricate ones. To address this challenge, we pro… ▽ More

    Submitted 2 April, 2024; v1 submitted 10 January, 2024; originally announced January 2024.

    Comments: Technique Report

  40. arXiv:2312.16498  [pdf, other

    cs.CV

    A Non-Uniform Low-Light Image Enhancement Method with Multi-Scale Attention Transformer and Luminance Consistency Loss

    Authors: Xiao Fang, Xin Gao, Baofeng Li, Feng Zhai, Yu Qin, Zhihang Meng, Jiansheng Lu, Chun Xiao

    Abstract: Low-light image enhancement aims to improve the perception of images collected in dim environments and provide high-quality data support for image recognition tasks. When dealing with photos captured under non-uniform illumination, existing methods cannot adaptively extract the differentiated luminance information, which will easily cause over-exposure and under-exposure. From the perspective of u… ▽ More

    Submitted 27 December, 2023; originally announced December 2023.

  41. arXiv:2312.02568  [pdf, other

    cs.CV

    Prompt2NeRF-PIL: Fast NeRF Generation via Pretrained Implicit Latent

    Authors: Jianmeng Liu, Yuyao Zhang, Zeyuan Meng, Yu-Wing Tai, Chi-Keung Tang

    Abstract: This paper explores promptable NeRF generation (e.g., text prompt or single image prompt) for direct conditioning and fast generation of NeRF parameters for the underlying 3D scenes, thus undoing complex intermediate steps while providing full 3D generation with conditional control. Unlike previous diffusion-CLIP-based pipelines that involve tedious per-prompt optimizations, Prompt2NeRF-PIL is cap… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

  42. arXiv:2312.00104  [pdf

    cs.MM

    A Metadata Generation System with Semantic Understanding for Video Retrieval in Film Production

    Authors: Feilin Han, Zhaoxu Meng

    Abstract: In film production, metadata plays an important role in original raw video indexing and classification within the industrial post-production software. Inspired by deep visual-semantic methods, we propose an automated image information extraction process to extend the diversity of metadata entities for massive large-scale raw video searching and retrieval. In this paper, we introduce the proposed s… ▽ More

    Submitted 30 November, 2023; originally announced December 2023.

    Comments: Accepted by 2022 IEEE International Conference on Virtual Reality and Visualization (ICVRV), received Best Paper Award

  43. arXiv:2311.10118  [pdf, other

    eess.IV cs.CV q-bio.QM

    Now and Future of Artificial Intelligence-based Signet Ring Cell Diagnosis: A Survey

    Authors: Zhu Meng, Junhao Dong, Limei Guo, Fei Su, Guangxi Wang, Zhicheng Zhao

    Abstract: Since signet ring cells (SRCs) are associated with high peripheral metastasis rate and dismal survival, they play an important role in determining surgical approaches and prognosis, while they are easily missed by even experienced pathologists. Although automatic diagnosis SRCs based on deep learning has received increasing attention to assist pathologists in improving the diagnostic efficiency an… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

  44. arXiv:2310.18030  [pdf, other

    cs.NI

    Confucius: Achieving Consistent Low Latency with Practical Queue Management for Real-Time Communications

    Authors: Zili Meng, Nirav Atre, Mingwei Xu, Justine Sherry, Maria Apostolaki

    Abstract: Real-time communication applications require consistently low latency, which is often disrupted by latency spikes caused by competing flows, especially Web traffic. We identify the root cause of disruptions in such cases as the mismatch between the abrupt bandwidth allocation adjustment of queue scheduling and gradual congestion window adjustment of congestion control. For example, when a sudden b… ▽ More

    Submitted 7 February, 2024; v1 submitted 27 October, 2023; originally announced October 2023.

  45. arXiv:2310.16450  [pdf, other

    cs.CL

    CLEX: Continuous Length Extrapolation for Large Language Models

    Authors: Guanzheng Chen, Xin Li, Zaiqiao Meng, Shangsong Liang, Lidong Bing

    Abstract: Transformer-based Large Language Models (LLMs) are pioneering advances in many natural language processing tasks, however, their exceptional capabilities are restricted within the preset context window of Transformer. Position Embedding (PE) scaling methods, while effective in extending the context window to a specific length, demonstrate either notable limitations in their extrapolation abilities… ▽ More

    Submitted 24 March, 2024; v1 submitted 25 October, 2023; originally announced October 2023.

    Comments: ICLR 2024

  46. arXiv:2310.16131  [pdf, other

    cs.CL

    GenKIE: Robust Generative Multimodal Document Key Information Extraction

    Authors: Panfeng Cao, Ye Wang, Qiang Zhang, Zaiqiao Meng

    Abstract: Key information extraction (KIE) from scanned documents has gained increasing attention because of its applications in various domains. Although promising results have been achieved by some recent KIE approaches, they are usually built based on discriminative models, which lack the ability to handle optical character recognition (OCR) errors and require laborious token-level labelling. In this pap… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

    Comments: Accepted by EMNLP 2023, Findings paper

  47. arXiv:2310.15560  [pdf, ps, other

    cs.PF

    Modeling and Design of the Communication Sensing and Control Coupled Closed-Loop Industrial System

    Authors: Zeyang Meng, Dingyou Ma, Shengfeng Wang, Zhiqing Wei, Zhiyong Feng

    Abstract: With the advent of 5G era, factories are transitioning towards wireless networks to break free from the limitations of wired networks. In 5G-enabled factories, unmanned automatic devices such as automated guided vehicles and robotic arms complete production tasks cooperatively through the periodic control loops. In such loops, the sensing data is generated by sensors, and transmitted to the contro… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

    Comments: 6 pages, 3 figures, received by GlobeCom 2023

    MSC Class: 93C55; 94A99 ACM Class: C.4

  48. arXiv:2310.15533  [pdf, other

    cs.CV

    Learning with Noisy Labels Using Collaborative Sample Selection and Contrastive Semi-Supervised Learning

    Authors: Qing Miao, Xiaohe Wu, Chao Xu, Yanli Ji, Wangmeng Zuo, Yiwen Guo, Zhaopeng Meng

    Abstract: Learning with noisy labels (LNL) has been extensively studied, with existing approaches typically following a framework that alternates between clean sample selection and semi-supervised learning (SSL). However, this approach has a limitation: the clean set selected by the Deep Neural Network (DNN) classifier, trained through self-training, inevitably contains noisy samples. This mixture of clean… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

  49. arXiv:2310.07122  [pdf, ps, other

    cs.PF

    Spectrum Sharing Towards Delay Deterministic Wireless Network: Delay Performance Analysis

    Authors: Zhiqing Wei, Ling Zhang, Gaofeng Nie, Huici Wu, Ning Zhang, Zeyang Meng, Zhiyong Feng

    Abstract: To accommodate Machine-type Communication (MTC) service, the wireless network needs to support low-delay and low-jitter data transmission, realizing delay deterministic wireless network. This paper analyzes the delay and jitter of the wireless network with and without spectrum sharing. When sharing the spectrum of the licensed network, the spectrum band of wireless network can be expanded, such th… ▽ More

    Submitted 10 October, 2023; originally announced October 2023.

    Comments: 15 pages, 14 figures

    MSC Class: 94A99 ACM Class: H.1.1

  50. arXiv:2310.06285  [pdf, ps, other

    cs.NI

    Fast Neighbor Discovery for Wireless Ad Hoc Network with Successive Interference Cancellation

    Authors: Zhiqing Wei, Yueyue Liang, Zeyang Meng, Zhiyong Feng, Kaifeng Han, Huici Wu

    Abstract: Neighbor discovery (ND) is a key step in wireless ad hoc network, which directly affects the efficiency of wireless networking. Improving the speed of ND has always been the goal of ND algorithms. The classical ND algorithms lose packets due to the collision of multiple packets, which greatly affects the speed of the ND algorithms. Traditional methods detect packet collision and implement retransm… ▽ More

    Submitted 9 October, 2023; originally announced October 2023.

    Comments: 16 pages, 16 figures

    MSC Class: 60B99; 94A99 ACM Class: C.2.2