Skip to main content

Showing 1–50 of 104 results for author: Lv, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.00753  [pdf, other

    eess.AS cs.SD

    FLY-TTS: Fast, Lightweight and High-Quality End-to-End Text-to-Speech Synthesis

    Authors: Yinlin Guo, Yening Lv, **qiao Dou, Yan Zhang, Yuehai Wang

    Abstract: While recent advances in Text-To-Speech synthesis have yielded remarkable improvements in generating high-quality speech, research on lightweight and fast models is limited. This paper introduces FLY-TTS, a new fast, lightweight and high-quality speech synthesis system based on VITS. Specifically, 1) We replace the decoder with ConvNeXt blocks that generate Fourier spectral coefficients followed b… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: Accepted to Interspeech 2024. 5 pages, 1 figure

  2. arXiv:2406.09844  [pdf, other

    cs.SD eess.AS

    Vec-Tok-VC+: Residual-enhanced Robust Zero-shot Voice Conversion with Progressive Constraints in a Dual-mode Training Strategy

    Authors: Linhan Ma, Xinfa Zhu, Yuanjun Lv, Zhichao Wang, Ziqian Wang, Wendi He, Hongbin Zhou, Lei Xie

    Abstract: Zero-shot voice conversion (VC) aims to transform source speech into arbitrary unseen target voice while kee** the linguistic content unchanged. Recent VC methods have made significant progress, but semantic losses in the decoupling process as well as training-inference mismatch still hinder conversion performance. In this paper, we propose Vec-Tok-VC+, a novel prompt-based zero-shot VC model im… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted by INTERSPEECH2024

  3. arXiv:2406.08196  [pdf, other

    cs.SD eess.AS

    FreeV: Free Lunch For Vocoders Through Pseudo Inversed Mel Filter

    Authors: Yuanjun Lv, Hai Li, Ying Yan, Junhui Liu, Danming Xie, Lei Xie

    Abstract: Vocoders reconstruct speech waveforms from acoustic features and play a pivotal role in modern TTS systems. Frequent-domain GAN vocoders like Vocos and APNet2 have recently seen rapid advancements, outperforming time-domain models in inference speed while achieving comparable audio quality. However, these frequency-domain vocoders suffer from large parameter sizes, thus introducing extra memory bu… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted by InterSpeech 2024; 5 pages, 5 figures

  4. arXiv:2406.07498  [pdf, other

    cs.SD eess.AS

    RaD-Net 2: A causal two-stage repairing and denoising speech enhancement network with knowledge distillation and complex axial self-attention

    Authors: Mingshuai Liu, Zhuangqi Chen, Xiaopeng Yan, Yuanjun Lv, Xianjun Xia, Chuanzeng Huang, Yijian Xiao, Lei Xie

    Abstract: In real-time speech communication systems, speech signals are often degraded by multiple distortions. Recently, a two-stage Repair-and-Denoising network (RaD-Net) was proposed with superior speech quality improvement in the ICASSP 2024 Speech Signal Improvement (SSI) Challenge. However, failure to use future information and constraint receptive field of convolution layers limit the system's perfor… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  5. arXiv:2405.11265  [pdf, other

    cs.CL cs.AI

    EnviroExam: Benchmarking Environmental Science Knowledge of Large Language Models

    Authors: Yu Huang, Liang Guo, Wanqian Guo, Zhe Tao, Yang Lv, Zhihao Sun, Dongfang Zhao

    Abstract: In the field of environmental science, it is crucial to have robust evaluation metrics for large language models to ensure their efficacy and accuracy. We propose EnviroExam, a comprehensive evaluation method designed to assess the knowledge of large language models in the field of environmental science. EnviroExam is based on the curricula of top international universities, covering undergraduate… ▽ More

    Submitted 18 May, 2024; originally announced May 2024.

  6. arXiv:2405.10504  [pdf

    cs.CV

    Multi-scale Semantic Prior Features Guided Deep Neural Network for Urban Street-view Image

    Authors: Jianshun Zeng, Wang Li, Yanjie Lv, Shuai Gao, YuChu Qin

    Abstract: Street-view image has been widely applied as a crucial mobile map** data source. The inpainting of street-view images is a critical step for street-view image processing, not only for the privacy protection, but also for the urban environment map** applications. This paper presents a novel Deep Neural Network (DNN), multi-scale semantic prior Feature guided image inpainting Network (MFN) for i… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

  7. arXiv:2405.04909  [pdf, other

    cs.CV cs.AI

    Traj-LLM: A New Exploration for Empowering Trajectory Prediction with Pre-trained Large Language Models

    Authors: Zhengxing Lan, Hongbo Li, Lingshan Liu, Bo Fan, Yisheng Lv, Yilong Ren, Zhiyong Cui

    Abstract: Predicting the future trajectories of dynamic traffic actors is a cornerstone task in autonomous driving. Though existing notable efforts have resulted in impressive performance improvements, a gap persists in scene cognitive and understanding of the complex traffic semantics. This paper proposes Traj-LLM, the first to investigate the potential of using Large Language Models (LLMs) without explici… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  8. arXiv:2405.02590  [pdf, other

    cs.IT

    Performance Evaluation of PAC Decoding with Deep Neural Networks

    Authors: **gxin Dai, Hang Yin, Yansong Lv, Yuhuan Wang, Rui Lv

    Abstract: By concatenating a polar transform with a convolutional transform, polarization-adjusted convolutional (PAC) codes can reach the dispersion approximation bound in certain rate cases. However, the sequential decoding nature of traditional PAC decoding algorithms results in high decoding latency. Due to the parallel computing capability, deep neural network (DNN) decoders have emerged as a promising… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

  9. arXiv:2404.12149  [pdf, other

    cs.AI

    AccidentBlip2: Accident Detection With Multi-View MotionBlip2

    Authors: Yihua Shao, Hongyi Cai, Xinwei Long, Weiyi Lang, Zhe Wang, Haoran Wu, Yan Wang, Jiayi Yin, Yang Yang, Yisheng Lv, Zhen Lei

    Abstract: Intelligent vehicles have demonstrated excellent capabilities in many transportation scenarios. The inference capabilities of neural networks using cameras limit the accuracy of accident detection in complex transportation systems. This paper presents AccidentBlip2, a pure vision-based multi-modal large model Blip2 for accident detection. Our method first processes the multi-view images through Vi… ▽ More

    Submitted 7 May, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

  10. arXiv:2403.13263  [pdf, other

    cs.CV

    SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models

    Authors: Tongtian Yue, Jie Cheng, Longteng Guo, Xingyuan Dai, Zijia Zhao, Xingjian He, Gang Xiong, Yisheng Lv, **g Liu

    Abstract: Recent trends in Large Vision Language Models (LVLMs) research have been increasingly focusing on advancing beyond general image understanding towards more nuanced, object-level referential comprehension. In this paper, we present and delve into the self-consistency capability of LVLMs, a crucial aspect that reflects the models' ability to both generate informative captions for specific objects an… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR2024

  11. arXiv:2403.05029  [pdf, other

    cs.AI

    BjTT: A Large-scale Multimodal Dataset for Traffic Prediction

    Authors: Chengyang Zhang, Yong Zhang, Qitan Shao, Jiangtao Feng, Bo Li, Yisheng Lv, Xinglin Piao, Baocai Yin

    Abstract: Traffic prediction is one of the most significant foundations in Intelligent Transportation Systems (ITS). Traditional traffic prediction methods rely only on historical traffic data to predict traffic trends and face two main challenges. 1) insensitivity to unusual events. 2) limited performance in long-term prediction. In this work, we explore how generative models combined with text describing… ▽ More

    Submitted 14 March, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

  12. arXiv:2402.17257  [pdf, other

    cs.LG cs.AI cs.RO

    RIME: Robust Preference-based Reinforcement Learning with Noisy Preferences

    Authors: Jie Cheng, Gang Xiong, Xingyuan Dai, Qinghai Miao, Yisheng Lv, Fei-Yue Wang

    Abstract: Preference-based Reinforcement Learning (PbRL) circumvents the need for reward engineering by harnessing human preferences as the reward signal. However, current PbRL methods excessively depend on high-quality feedback from domain experts, which results in a lack of robustness. In this paper, we present RIME, a robust PbRL algorithm for effective reward learning from noisy preferences. Our method… ▽ More

    Submitted 30 May, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

    Comments: Accepted by ICML2024

  13. arXiv:2402.07729  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension

    Authors: Qian Yang, ** Xu, Wenrui Liu, Yunfei Chu, Ziyue Jiang, Xiaohuan Zhou, Yichong Leng, Yuanjun Lv, Zhou Zhao, Chang Zhou, **gren Zhou

    Abstract: Recently, instruction-following audio-language models have received broad attention for human-audio interaction. However, the absence of benchmarks capable of evaluating audio-centric interaction capabilities has impeded advancements in this field. Previous models primarily focus on assessing different fundamental tasks, such as Automatic Speech Recognition (ASR), and lack an assessment of the ope… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

  14. arXiv:2401.12681  [pdf, other

    cs.LG cs.AI

    Non-Neighbors Also Matter to Kriging: A New Contrastive-Prototypical Learning

    Authors: Zhishuai Li, Yunhao Nie, Ziyue Li, Lei Bai, Yisheng Lv, Rui Zhao

    Abstract: Kriging aims at estimating the attributes of unsampled geo-locations from observations in the spatial vicinity or physical connections, which helps mitigate skewed monitoring caused by under-deployed sensors. Existing works assume that neighbors' information offers the basis for estimating the attributes of the unobserved target while ignoring non-neighbors. However, non-neighbors could also offer… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

    Comments: Accepted in AISTATS 2024

  15. Methods and strategies for improving the novel view synthesis quality of neural radiation field

    Authors: Shun Fang, Ming Cui, Xing Feng, Yanna Lv

    Abstract: Neural Radiation Field (NeRF) technology can learn a 3D implicit model of a scene from 2D images and synthesize realistic novel view images. This technology has received widespread attention from the industry and has good application prospects. In response to the problem that the rendering quality of NeRF images needs to be improved, many researchers have proposed various methods to improve the re… ▽ More

    Submitted 17 April, 2024; v1 submitted 22 January, 2024; originally announced January 2024.

    ACM Class: I.2; I.4; I.6

    Journal ref: IEEE ACCESS 12 (2024) 50548-50555

  16. arXiv:2401.08438  [pdf, other

    cs.CL cs.AI cs.LG

    CogGPT: Unleashing the Power of Cognitive Dynamics on Large Language Models

    Authors: Yaojia Lv, Haojie Pan, Ruiji Fu, Ming Liu, Zhongyuan Wang, Bing Qin

    Abstract: Cognitive dynamics are pivotal to advance human understanding of the world. Recent advancements in large language models (LLMs) reveal their potential for cognitive simulation. However, these LLM-based cognitive studies primarily focus on static modeling, overlooking the dynamic nature of cognition. To bridge this gap, we propose the concept of the cognitive dynamics of LLMs and present a correspo… ▽ More

    Submitted 5 January, 2024; originally announced January 2024.

  17. arXiv:2401.04389  [pdf, other

    cs.SD eess.AS

    RaD-Net: A Repairing and Denoising Network for Speech Signal Improvement

    Authors: Mingshuai Liu, Zhuangqi Chen, Xiaopeng Yan, Yuanjun Lv, Xianjun Xia, Chuanzeng Huang, Yijian Xiao, Lei Xie

    Abstract: This paper introduces our repairing and denoising network (RaD-Net) for the ICASSP 2024 Speech Signal Improvement (SSI) Challenge. We extend our previous framework based on a two-stage network and propose an upgraded model. Specifically, we replace the repairing network with COM-Net from TEA-PSE. In addition, multi-resolution discriminators and multi-band discriminators are adopted in the training… ▽ More

    Submitted 9 January, 2024; originally announced January 2024.

    Comments: submitted to ICASSP 2024

  18. Context-Aware Interaction Network for RGB-T Semantic Segmentation

    Authors: Ying Lv, Zhi Liu, Gongyang Li

    Abstract: RGB-T semantic segmentation is a key technique for autonomous driving scenes understanding. For the existing RGB-T semantic segmentation methods, however, the effective exploration of the complementary relationship between different modalities is not implemented in the information interaction between multiple levels. To address such an issue, the Context-Aware Interaction Network (CAINet) is propo… ▽ More

    Submitted 3 January, 2024; originally announced January 2024.

    Comments: 13 pages, 7 figures, Accepted by IEEE Transactions on Multimedia 2024

  19. arXiv:2312.14264  [pdf

    cs.ET cond-mat.mes-hall cs.AI cs.AR eess.SY

    Experimental demonstration of magnetic tunnel junction-based computational random-access memory

    Authors: Yang Lv, Brandon R. Zink, Robert P. Bloom, Hüsrev Cılasun, Pravin Khanal, Salonik Resch, Zamshed Chowdhury, Ali Habiboglu, Weigang Wang, Sachin S. Sapatnekar, Ulya Karpuzcu, Jian-** Wang

    Abstract: Conventional computing paradigm struggles to fulfill the rapidly growing demands from emerging applications, especially those for machine intelligence, because much of the power and energy is consumed by constant data transfers between logic and memory modules. A new paradigm, called "computational random-access memory (CRAM)" has emerged to address this fundamental limitation. CRAM performs logic… ▽ More

    Submitted 29 May, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

  20. arXiv:2312.04889  [pdf, other

    cs.AI cs.CL cs.LG

    KwaiAgents: Generalized Information-seeking Agent System with Large Language Models

    Authors: Haojie Pan, Zepeng Zhai, Hao Yuan, Yaojia Lv, Ruiji Fu, Ming Liu, Zhongyuan Wang, Bing Qin

    Abstract: Driven by curiosity, humans have continually sought to explore and understand the world around them, leading to the invention of various tools to satiate this inquisitiveness. Despite not having the capacity to process and memorize vast amounts of information in their brains, humans excel in critical thinking, planning, reflection, and harnessing available tools to interact with and interpret the… ▽ More

    Submitted 10 January, 2024; v1 submitted 8 December, 2023; originally announced December 2023.

  21. arXiv:2311.16203  [pdf, other

    cs.LG cs.AI cs.CL

    ChatTraffic: Text-to-Traffic Generation via Diffusion Model

    Authors: Chengyang Zhang, Yong Zhang, Qitan Shao, Bo Li, Yisheng Lv, Xinglin Piao, Baocai Yin

    Abstract: Traffic prediction is one of the most significant foundations in Intelligent Transportation Systems (ITS). Traditional traffic prediction methods rely only on historical traffic data to predict traffic trends and face two main challenges. 1) insensitivity to unusual events. 2) limited performance in long-term prediction. In this work, we explore how generative models combined with text describing… ▽ More

    Submitted 4 February, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

  22. arXiv:2310.19859  [pdf, other

    cs.CV cs.AI

    Res-Tuning: A Flexible and Efficient Tuning Paradigm via Unbinding Tuner from Backbone

    Authors: Zeyinzi Jiang, Chaojie Mao, Ziyuan Huang, Ao Ma, Yiliang Lv, Yujun Shen, Deli Zhao, **gren Zhou

    Abstract: Parameter-efficient tuning has become a trend in transferring large-scale foundation models to downstream applications. Existing methods typically embed some light-weight tuners into the backbone, where both the design and the learning of the tuners are highly dependent on the base model. This work offers a new tuning paradigm, dubbed Res-Tuning, which intentionally unbinds tuners from the backbon… ▽ More

    Submitted 30 October, 2023; originally announced October 2023.

    Comments: Accepted to NeurIPS 2023

  23. arXiv:2310.10209  [pdf, other

    eess.IV cs.CV cs.LG

    Self-supervised Fetal MRI 3D Reconstruction Based on Radiation Diffusion Generation Model

    Authors: Junpeng Tan, Xin Zhang, Yao Lv, Xiangmin Xu, Gang Li

    Abstract: Although the use of multiple stacks can handle slice-to-volume motion correction and artifact removal problems, there are still several problems: 1) The slice-to-volume method usually uses slices as input, which cannot solve the problem of uniform intensity distribution and complementarity in regions of different fetal MRI stacks; 2) The integrity of 3D space is not considered, which adversely aff… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

  24. arXiv:2310.07246  [pdf, other

    cs.SD eess.AS

    Vec-Tok Speech: speech vectorization and tokenization for neural speech generation

    Authors: Xinfa Zhu, Yuanjun Lv, Yi Lei, Tao Li, Wendi He, Hongbin Zhou, Heng Lu, Lei Xie

    Abstract: Language models (LMs) have recently flourished in natural language processing and computer vision, generating high-fidelity texts or images in various tasks. In contrast, the current speech generative models are still struggling regarding speech quality and task generalization. This paper presents Vec-Tok Speech, an extensible framework that resembles multiple speech generation tasks, generating e… ▽ More

    Submitted 12 October, 2023; v1 submitted 11 October, 2023; originally announced October 2023.

    Comments: 15 pages, 2 figures

  25. arXiv:2310.07212  [pdf, other

    cs.CV cs.AI

    Multi-Task Learning-Enabled Automatic Vessel Draft Reading for Intelligent Maritime Surveillance

    Authors: **gxiang Qu, Ryan Wen Liu, Chenjie Zhao, Yu Guo, Sendren Sheng-Dong Xu, Fenghua Zhu, Yisheng Lv

    Abstract: The accurate and efficient vessel draft reading (VDR) is an important component of intelligent maritime surveillance, which could be exploited to assist in judging whether the vessel is normally loaded or overloaded. The computer vision technique with an excellent price-to-performance ratio has become a popular medium to estimate vessel draft depth. However, the traditional estimation methods easi… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

    Comments: 12 pages,11 figures, submitted to IEEE T-ITS

  26. arXiv:2310.05666  [pdf, other

    cs.CV

    Anchor-Intermediate Detector: Decoupling and Coupling Bounding Boxes for Accurate Object Detection

    Authors: Yilong Lv, Min Li, Yujie He, Shaopeng Li, Zhuzhen He, Aitao Yang

    Abstract: Anchor-based detectors have been continuously developed for object detection. However, the individual anchor box makes it difficult to predict the boundary's offset accurately. Instead of taking each bounding box as a closed individual, we consider using multiple boxes together to get prediction boxes. To this end, this paper proposes the \textbf{Box Decouple-Couple(BDC) strategy} in the inference… ▽ More

    Submitted 9 October, 2023; originally announced October 2023.

    Comments: Submitted 29 September, 2023; originally announced October 2023. Accepted by ICCV2023

  27. arXiv:2310.05051  [pdf, other

    cs.SD eess.AS

    SALT: Distinguishable Speaker Anonymization Through Latent Space Transformation

    Authors: Yuanjun Lv, Jixun Yao, Peikun Chen, Hongbin Zhou, Heng Lu, Lei Xie

    Abstract: Speaker anonymization aims to conceal a speaker's identity without degrading speech quality and intelligibility. Most speaker anonymization systems disentangle the speaker representation from the original speech and achieve anonymization by averaging or modifying the speaker representation. However, the anonymized speech is subject to reduction in pseudo speaker distinctiveness, speech quality and… ▽ More

    Submitted 8 October, 2023; originally announced October 2023.

    Comments: 8 pages, 3 figures; Accepted by ASRU2023

  28. arXiv:2310.00708  [pdf, other

    cs.LG cs.AI

    A Simple Yet Effective Strategy to Robustify the Meta Learning Paradigm

    Authors: Qi Wang, Yiqin Lv, Yanghe Feng, Zheng Xie, **cai Huang

    Abstract: Meta learning is a promising paradigm to enable skill transfer across tasks. Most previous methods employ the empirical risk minimization principle in optimization. However, the resulting worst fast adaptation to a subset of tasks can be catastrophic in risk-sensitive scenarios. To robustify fast adaptation, this paper optimizes meta learning pipelines from a distributionally robust perspective an… ▽ More

    Submitted 1 October, 2023; originally announced October 2023.

  29. arXiv:2309.13907  [pdf, other

    cs.SD eess.AS

    HiGNN-TTS: Hierarchical Prosody Modeling with Graph Neural Networks for Expressive Long-form TTS

    Authors: Dake Guo, Xinfa Zhu, Liumeng Xue, Tao Li, Yuanjun Lv, Yuepeng Jiang, Lei Xie

    Abstract: Recent advances in text-to-speech, particularly those based on Graph Neural Networks (GNNs), have significantly improved the expressiveness of short-form synthetic speech. However, generating human-parity long-form speech with high dynamic prosodic variations is still challenging. To address this problem, we expand the capabilities of GNNs with a hierarchical prosody modeling approach, named HiGNN… ▽ More

    Submitted 6 October, 2023; v1 submitted 25 September, 2023; originally announced September 2023.

    Comments: Accepted by ASRU2023

  30. arXiv:2308.09601  [pdf, other

    cs.SI

    MONA: An Efficient and Scalable Strategy for Targeted k-Nodes Collapse

    Authors: Yuqian Lv, Bo Zhou, **huan Wang, Shanqing Yu, Qi Xuan

    Abstract: The concept of k-core plays an important role in measuring the cohesiveness and engagement of a network. And recent studies have shown the vulnerability of k-core under adversarial attacks. However, there are few researchers concentrating on the vulnerability of individual nodes within k-core. Therefore, in this paper, we attempt to study Targeted k-Nodes Collapse Problem (TNsCP), which focuses on… ▽ More

    Submitted 18 August, 2023; originally announced August 2023.

    Comments: 5 pages, 6 figures, 1 table, 5 algorithms

  31. arXiv:2308.07733  [pdf, other

    eess.IV cs.CV cs.MM

    Dynamic Low-Rank Instance Adaptation for Universal Neural Image Compression

    Authors: Yue Lv, **xi Xiang, Jun Zhang, Wenming Yang, Xiao Han, Wei Yang

    Abstract: The latest advancements in neural image compression show great potential in surpassing the rate-distortion performance of conventional standard codecs. Nevertheless, there exists an indelible domain gap between the datasets utilized for training (i.e., natural images) and those utilized for inference (e.g., artistic images). Our proposal involves a low-rank adaptation approach aimed at addressing… ▽ More

    Submitted 15 August, 2023; originally announced August 2023.

    Comments: Accepted by ACM MM 2023, 13 pages, 12 figures

    ACM Class: I.4.2; E.4

  32. arXiv:2308.04948  [pdf, other

    cs.CL

    Extrapolating Large Language Models to Non-English by Aligning Languages

    Authors: Wenhao Zhu, Yunzhe Lv, Qingxiu Dong, Fei Yuan, **g**g Xu, Shujian Huang, Lingpeng Kong, Jiajun Chen, Lei Li

    Abstract: Existing large language models show disparate capability across different languages, due to the imbalance in the training data. Their performances on English tasks are often stronger than on tasks of other languages. In this paper, we empower pre-trained LLMs on non-English languages by building semantic alignment across languages. We start from targeting individual languages by performing cross-l… ▽ More

    Submitted 9 October, 2023; v1 submitted 9 August, 2023; originally announced August 2023.

  33. arXiv:2308.02561  [pdf, other

    cs.AI

    Large-scale Generative Simulation Artificial Intelligence: the Next Hotspot in Generative AI

    Authors: Qi Wang, Yanghe Feng, **cai Huang, Yiqin Lv, Zheng Xie, Xiaoshan Gao

    Abstract: The concept of GenAI has been developed for decades. Until recently, it has impressed us with substantial breakthroughs in natural language processing and computer vision, actively engaging in industrial scenarios. Noticing the practical challenges, e.g., limited learning resources, and overly dependencies on scientific discovery empiricism, we nominate large-scale generative simulation artificial… ▽ More

    Submitted 2 August, 2023; originally announced August 2023.

  34. arXiv:2308.01862  [pdf, other

    cs.CL

    Wider and Deeper LLM Networks are Fairer LLM Evaluators

    Authors: Xinghua Zhang, Bowen Yu, Haiyang Yu, Yangyu Lv, Tingwen Liu, Fei Huang, Hongbo Xu, Yongbin Li

    Abstract: Measuring the quality of responses generated by LLMs is a challenging task, particularly when it comes to evaluating whether the response is aligned with human preference. A novel approach involves using the LLM itself to make evaluation and stabilizing the results through multiple independent evaluations, similar to a single-layer narrow LLM network. This network consists of a fixed number of neu… ▽ More

    Submitted 3 August, 2023; originally announced August 2023.

    Comments: Work in Progress

  35. arXiv:2307.16579  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    Contrastive Conditional Latent Diffusion for Audio-visual Segmentation

    Authors: Yuxin Mao, **g Zhang, Mochu Xiang, Yunqiu Lv, Yiran Zhong, Yuchao Dai

    Abstract: We propose a latent diffusion model with contrastive learning for audio-visual segmentation (AVS) to extensively explore the contribution of audio. We interpret AVS as a conditional generation task, where audio is defined as the conditional variable for sound producer(s) segmentation. With our new interpretation, it is especially necessary to model the correlation between audio and the final segme… ▽ More

    Submitted 31 July, 2023; originally announced July 2023.

  36. arXiv:2307.07678  [pdf, other

    cs.CV

    Both Spatial and Frequency Cues Contribute to High-Fidelity Image Inpainting

    Authors: Ze Lu, Yalei Lv, Wenqi Wang, Pengfei Xiong

    Abstract: Deep generative approaches have obtained great success in image inpainting recently. However, most generative inpainting networks suffer from either over-smooth results or aliasing artifacts. The former lacks high-frequency details, while the latter lacks semantic structure. To address this issue, we propose an effective Frequency-Spatial Complementary Network (FSCN) by exploiting rich semantic in… ▽ More

    Submitted 14 July, 2023; originally announced July 2023.

    Comments: Frequency Cues, Image Inpainting

  37. arXiv:2307.04651  [pdf, other

    cs.CV

    Joint Salient Object Detection and Camouflaged Object Detection via Uncertainty-aware Learning

    Authors: Aixuan Li, **g Zhang, Yunqiu Lv, Tong Zhang, Yiran Zhong, Mingyi He, Yuchao Dai

    Abstract: Salient objects attract human attention and usually stand out clearly from their surroundings. In contrast, camouflaged objects share similar colors or textures with the environment. In this case, salient objects are typically non-camouflaged, and camouflaged objects are usually not salient. Due to this inherent contradictory attribute, we introduce an uncertainty-aware learning pipeline to extens… ▽ More

    Submitted 10 July, 2023; originally announced July 2023.

  38. arXiv:2307.03376  [pdf, other

    cs.CV

    Weakly-supervised Contrastive Learning for Unsupervised Object Discovery

    Authors: Yunqiu Lv, **g Zhang, Nick Barnes, Yuchao Dai

    Abstract: Unsupervised object discovery (UOD) refers to the task of discriminating the whole region of objects from the background within a scene without relying on labeled datasets, which benefits the task of bounding-box-level localization and pixel-level segmentation. This task is promising due to its ability to discover objects in a generic manner. We roughly categorise existing techniques into two main… ▽ More

    Submitted 7 July, 2023; originally announced July 2023.

  39. arXiv:2306.03515  [pdf, other

    cs.LG cs.AI cs.LO

    Logic Diffusion for Knowledge Graph Reasoning

    Authors: Xiaoying Xie, Biao Gong, Yiliang Lv, Zhen Han, Guoshuai Zhao, Xueming Qian

    Abstract: Most recent works focus on answering first order logical queries to explore the knowledge graph reasoning via multi-hop logic predictions. However, existing reasoning models are limited by the circumscribed logical paradigms of training samples, which leads to a weak generalization of unseen logic. To address these issues, we propose a plug-in module called Logic Diffusion (LoD) to discover unseen… ▽ More

    Submitted 6 June, 2023; originally announced June 2023.

    Comments: 10 pages, 6 figures

  40. arXiv:2305.02901  [pdf, other

    cs.LG cs.AI cs.CR

    Single Node Injection Label Specificity Attack on Graph Neural Networks via Reinforcement Learning

    Authors: Dayuan Chen, Jian Zhang, Yuqian Lv, **huan Wang, Hongjie Ni, Shanqing Yu, Zhen Wang, Qi Xuan

    Abstract: Graph neural networks (GNNs) have achieved remarkable success in various real-world applications. However, recent studies highlight the vulnerability of GNNs to malicious perturbations. Previous adversaries primarily focus on graph modifications or node injections to existing graphs, yielding promising results but with notable limitations. Graph modification attack~(GMA) requires manipulation of t… ▽ More

    Submitted 4 May, 2023; originally announced May 2023.

  41. arXiv:2303.15230  [pdf, other

    cs.CV cs.CL cs.LG

    Troika: Multi-Path Cross-Modal Traction for Compositional Zero-Shot Learning

    Authors: Siteng Huang, Biao Gong, Yutong Feng, Min Zhang, Yiliang Lv, Donglin Wang

    Abstract: Recent compositional zero-shot learning (CZSL) methods adapt pre-trained vision-language models (VLMs) by constructing trainable prompts only for composed state-object pairs. Relying on learning the joint representation of seen compositions, these methods ignore the explicit modeling of the state and object, thus limiting the exploitation of pre-trained knowledge and generalization to unseen compo… ▽ More

    Submitted 25 March, 2024; v1 submitted 27 March, 2023; originally announced March 2023.

    Comments: CVPR 2024

  42. arXiv:2303.12609  [pdf, other

    cs.IT

    Parity-check-aided Dynamic SCL-Flip Decoder with A Simplified Flip Metric for Polar Codes

    Authors: Yansong Lv, Hang Yin, Zhanxin Yang

    Abstract: Since polar codes were proposed, improving the performance of polar codes at limited code lengths has received significant attention. One of the effective solutions is a series of list flip decoders proposed in recent years. To further enhance performance, we proposed a parity-check-aided dynamic successive cancellation list flip (PC-DSCLF) decoder in this paper. First, we designed a simplified fl… ▽ More

    Submitted 22 March, 2023; originally announced March 2023.

  43. arXiv:2303.08561  [pdf, other

    cs.SD eess.AS

    Enhancing Unsupervised Audio Representation Learning via Adversarial Sample Generation

    Authors: Yulin Pan, Xiangteng He, Biao Gong, Yuxin Peng, Yiliang Lv

    Abstract: Existing audio analysis methods generally first transform the audio stream to spectrogram, and then feed it into CNN for further analysis. A standard CNN recognizes specific visual patterns over feature map, then pools for high-level representation, which overlooks the positional information of recognized patterns. However, unlike natural image, the semantic of an audio spectrogram is sensitive to… ▽ More

    Submitted 15 March, 2023; originally announced March 2023.

    Comments: 8 pages, 4 figures

  44. Scanning Only Once: An End-to-end Framework for Fast Temporal Grounding in Long Videos

    Authors: Yulin Pan, Xiangteng He, Biao Gong, Yiliang Lv, Yujun Shen, Yuxin Peng, Deli Zhao

    Abstract: Video temporal grounding aims to pinpoint a video segment that matches the query description. Despite the recent advance in short-form videos (\textit{e.g.}, in minutes), temporal grounding in long videos (\textit{e.g.}, in hours) is still at its early stage. To address this challenge, a common practice is to employ a sliding window, yet can be inefficient and inflexible due to the limited number… ▽ More

    Submitted 22 March, 2023; v1 submitted 14 March, 2023; originally announced March 2023.

    Comments: 11 pages, 8 figures

    Journal ref: 2023 IEEE/CVF International Conference on Computer Vision (ICCV)

  45. arXiv:2303.06911  [pdf, other

    cs.CV

    ViM: Vision Middleware for Unified Downstream Transferring

    Authors: Yutong Feng, Biao Gong, Jianwen Jiang, Yiliang Lv, Yujun Shen, Deli Zhao, **gren Zhou

    Abstract: Foundation models are pre-trained on massive data and transferred to downstream tasks via fine-tuning. This work presents Vision Middleware (ViM), a new learning paradigm that targets unified transferring from a single foundation model to a variety of downstream tasks. ViM consists of a zoo of lightweight plug-in modules, each of which is independently learned on a midstream dataset with a shared… ▽ More

    Submitted 13 March, 2023; originally announced March 2023.

  46. arXiv:2303.05093  [pdf, other

    cs.CV cs.CL

    Improving Video Retrieval by Adaptive Margin

    Authors: Feng He, Qi Wang, Zhifan Feng, Wenbin Jiang, Yajuan Lv, Yong zhu, Xiao Tan

    Abstract: Video retrieval is becoming increasingly important owing to the rapid emergence of videos on the Internet. The dominant paradigm for video retrieval learns video-text representations by pushing the distance between the similarity of positive pairs and that of negative pairs apart from a fixed margin. However, negative pairs used for training are sampled randomly, which indicates that the semantics… ▽ More

    Submitted 9 March, 2023; originally announced March 2023.

    Comments: Accepted by SIGIR 2021

  47. Point Cloud Classification Using Content-based Transformer via Clustering in Feature Space

    Authors: Yahui Liu, Bin Tian, Yisheng Lv, Lingxi Li, Feiyue Wang

    Abstract: Recently, there have been some attempts of Transformer in 3D point cloud classification. In order to reduce computations, most existing methods focus on local spatial attention, but ignore their content and fail to establish relationships between distant but relevant points. To overcome the limitation of local spatial attention, we propose a point content-based Transformer architecture, called Poi… ▽ More

    Submitted 8 March, 2023; originally announced March 2023.

    Comments: This paper is accepted to IEEE/CAA Journal of Automatica Sinica (JAS)

  48. arXiv:2303.00690  [pdf, other

    cs.CV

    Rethinking Efficient Tuning Methods from a Unified Perspective

    Authors: Zeyinzi Jiang, Chaojie Mao, Ziyuan Huang, Yiliang Lv, Deli Zhao, **gren Zhou

    Abstract: Parameter-efficient transfer learning (PETL) based on large-scale pre-trained foundation models has achieved great success in various downstream applications. Existing tuning methods, such as prompt, prefix, and adapter, perform task-specific lightweight adjustments to different parts of the original architecture. However, they take effect on only some parts of the pre-trained models, i.e., only t… ▽ More

    Submitted 1 March, 2023; originally announced March 2023.

  49. arXiv:2302.13574  [pdf, other

    cs.CL

    kNN-BOX: A Unified Framework for Nearest Neighbor Generation

    Authors: Wenhao Zhu, Qianfeng Zhao, Yunzhe Lv, Shujian Huang, Siheng Zhao, Sizhe Liu, Jiajun Chen

    Abstract: Augmenting the base neural model with a token-level symbolic datastore is a novel generation paradigm and has achieved promising results in machine translation (MT). In this paper, we introduce a unified framework kNN-BOX, which enables quick development and interactive analysis for this novel paradigm. kNN-BOX decomposes the datastore-augmentation approach into three modules: datastore, retriever… ▽ More

    Submitted 27 February, 2023; originally announced February 2023.

  50. arXiv:2302.11283  [pdf, other

    cs.CV

    Asynchronous Trajectory Matching-Based Multimodal Maritime Data Fusion for Vessel Traffic Surveillance in Inland Waterways

    Authors: Yu Guo, Ryan Wen Liu, **gxiang Qu, Yuxu Lu, Fenghua Zhu, Yisheng Lv

    Abstract: The automatic identification system (AIS) and video cameras have been widely exploited for vessel traffic surveillance in inland waterways. The AIS data could provide the vessel identity and dynamic information on vessel position and movements. In contrast, the video data could describe the visual appearances of moving vessels, but without knowing the information on identity, position and movement… ▽ More

    Submitted 22 February, 2023; originally announced February 2023.