Skip to main content

Showing 1–50 of 142 results for author: Duan, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.19126  [pdf, other

    physics.optics cs.AI

    Super-resolution imaging using super-oscillatory diffractive neural networks

    Authors: Hang Chen, Sheng Gao, Zejia Zhao, Zhengyang Duan, Haiou Zhang, Gordon Wetzstein, Xing Lin

    Abstract: Optical super-oscillation enables far-field super-resolution imaging beyond diffraction limits. However, the existing super-oscillatory lens for the spatial super-resolution imaging system still confronts critical limitations in performance due to the lack of a more advanced design method and the limited design degree of freedom. Here, we propose an optical super-oscillatory diffractive neural net… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 18 pages, 7 figures, 1 table

  2. arXiv:2406.15459  [pdf, other

    cs.GT cs.CE cs.LG

    Large-Scale Contextual Market Equilibrium Computation through Deep Learning

    Authors: Yunxuan Ma, Yide Bian, Hao Xu, Weitao Yang, **gshu Zhao, Zhijian Duan, Feng Wang, Xiaotie Deng

    Abstract: Market equilibrium is one of the most fundamental solution concepts in economics and social optimization analysis. Existing works on market equilibrium computation primarily focus on settings with a relatively small number of buyers. Motivated by this, our paper investigates the computation of market equilibrium in scenarios with a large-scale buyer population, where buyers and goods are represent… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 22 pages

  3. arXiv:2406.14401  [pdf, other

    cs.LG cs.AI

    Fair Streaming Feature Selection

    Authors: Zhangling Duan, Tianci Li, Xingyu Wu, Zhaolong Ling, **gye Yang, Zhaohong Jia

    Abstract: Streaming feature selection techniques have become essential in processing real-time data streams, as they facilitate the identification of the most relevant attributes from continuously updating information. Despite their performance, current algorithms to streaming feature selection frequently fall short in managing biases and avoiding discrimination that could be perpetuated by sensitive attrib… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 30 pages, 10 figures

  4. arXiv:2406.14176  [pdf, other

    cs.SD cs.AI cs.MM eess.AS

    A Multi-Stream Fusion Approach with One-Class Learning for Audio-Visual Deepfake Detection

    Authors: Kyungbok Lee, You Zhang, Zhiyao Duan

    Abstract: This paper addresses the challenge of develo** a robust audio-visual deepfake detection model. In practical use cases, new generation algorithms are continually emerging, and these algorithms are not encountered during the development of detection methods. This calls for the generalization ability of the method. Additionally, to ensure the credibility of detection methods, it is beneficial for t… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  5. arXiv:2406.14130  [pdf, other

    cs.CV

    ExVideo: Extending Video Diffusion Models via Parameter-Efficient Post-Tuning

    Authors: Zhongjie Duan, Wenmeng Zhou, Cen Chen, Yaliang Li, Weining Qian

    Abstract: Recently, advancements in video synthesis have attracted significant attention. Video synthesis models such as AnimateDiff and Stable Video Diffusion have demonstrated the practical applicability of diffusion models in creating dynamic visual content. The emergence of SORA has further spotlighted the potential of video generation technologies. Nonetheless, the extension of video lengths has been c… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 8 pages, 5 figures

  6. arXiv:2406.10842  [pdf, other

    cs.CL cs.AI cs.HC

    Large Language Models for Automatic Milestone Detection in Group Discussions

    Authors: Zhuoxu Duan, Zhengye Yang, Samuel Westby, Christoph Riedl, Brooke Foucault Welles, Richard J. Radke

    Abstract: Large language models like GPT have proven widely successful on natural language understanding tasks based on written text documents. In this paper, we investigate an LLM's performance on recordings of a group oral communication task in which utterances are often truncated or not well-formed. We propose a new group task experiment involving a puzzle with several milestones that can be achieved in… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  7. arXiv:2406.10514  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    GTR-Voice: Articulatory Phonetics Informed Controllable Expressive Speech Synthesis

    Authors: Zehua Kcriss Li, Meiying Melissa Chen, Yi Zhong, Pinxin Liu, Zhiyao Duan

    Abstract: Expressive speech synthesis aims to generate speech that captures a wide range of para-linguistic features, including emotion and articulation, though current research primarily emphasizes emotional aspects over the nuanced articulatory features mastered by professional voice actors. Inspired by this, we explore expressive speech synthesis through the lens of articulatory phonetics. Specifically,… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  8. arXiv:2406.08222  [pdf

    cs.CV cs.AI cs.CY cs.HC

    A Sociotechnical Lens for Evaluating Computer Vision Models: A Case Study on Detecting and Reasoning about Gender and Emotion

    Authors: Sha Luo, Sang Jung Kim, Zening Duan, Kai** Chen

    Abstract: In the evolving landscape of computer vision (CV) technologies, the automatic detection and interpretation of gender and emotion in images is a critical area of study. This paper investigates social biases in CV models, emphasizing the limitations of traditional evaluation metrics such as precision, recall, and accuracy. These metrics often fall short in capturing the complexities of gender and em… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  9. arXiv:2406.06216  [pdf, other

    cs.CV

    Lighting Every Darkness with 3DGS: Fast Training and Real-Time Rendering for HDR View Synthesis

    Authors: Xin **, Pengyi Jiao, Zheng-Peng Duan, Xingchao Yang, Chun-Le Guo, Bo Ren, Chongyi Li

    Abstract: Volumetric rendering based methods, like NeRF, excel in HDR view synthesis from RAWimages, especially for nighttime scenes. While, they suffer from long training times and cannot perform real-time rendering due to dense sampling requirements. The advent of 3D Gaussian Splatting (3DGS) enables real-time rendering and faster training. However, implementing RAW image-based view synthesis directly usi… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  10. arXiv:2406.02438  [pdf, other

    eess.AS cs.MM cs.SD

    CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection

    Authors: Yongyi Zang, Jiatong Shi, You Zhang, Ryuichi Yamamoto, Jionghao Han, Yuxun Tang, Shengyuan Xu, Wenxiao Zhao, **g Guo, Tomoki Toda, Zhiyao Duan

    Abstract: Recent singing voice synthesis and conversion advancements necessitate robust singing voice deepfake detection (SVDD) models. Current SVDD datasets face challenges due to limited controllability, diversity in deepfake methods, and licensing restrictions. Addressing these gaps, we introduce CtrSVDD, a large-scale, diverse collection of bonafide and deepfake singing vocals. These vocals are synthesi… ▽ More

    Submitted 18 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  11. arXiv:2405.05244  [pdf, other

    eess.AS cs.AI cs.MM cs.SD

    SVDD Challenge 2024: A Singing Voice Deepfake Detection Challenge Evaluation Plan

    Authors: You Zhang, Yongyi Zang, Jiatong Shi, Ryuichi Yamamoto, Jionghao Han, Yuxun Tang, Tomoki Toda, Zhiyao Duan

    Abstract: The rapid advancement of AI-generated singing voices, which now closely mimic natural human singing and align seamlessly with musical scores, has led to heightened concerns for artists and the music industry. Unlike spoken voice, singing voice presents unique challenges due to its musical nature and the presence of strong background music, making singing voice deepfake detection (SVDD) a specializ… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: Evaluation plan of the SVDD Challenge @ SLT 2024

  12. arXiv:2405.03194  [pdf, other

    cs.CV

    CityLLaVA: Efficient Fine-Tuning for VLMs in City Scenario

    Authors: Zhizhao Duan, Hao Cheng, Duo Xu, Xi Wu, Xiangxie Zhang, Xi Ye, Zhen Xie

    Abstract: In the vast and dynamic landscape of urban settings, Traffic Safety Description and Analysis plays a pivotal role in applications ranging from insurance inspection to accident prevention. This paper introduces CityLLaVA, a novel fine-tuning framework for Visual Language Models (VLMs) designed for urban scenarios. CityLLaVA enhances model comprehension and prediction accuracy through (1) employing… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: Accepted by AICITY2024 Workshop Track2 at CVPR2024

  13. arXiv:2404.16687  [pdf, other

    cs.CV

    NTIRE 2024 Quality Assessment of AI-Generated Content Challenge

    Authors: Xiaohong Liu, Xiongkuo Min, Guangtao Zhai, Chunyi Li, Tengchuan Kou, Wei Sun, Haoning Wu, Yixuan Gao, Yuqin Cao, Zicheng Zhang, Xiele Wu, Radu Timofte, Fei Peng, Huiyuan Fu, Anlong Ming, Chuanming Wang, Huadong Ma, Shuai He, Zifei Dou, Shu Chen, Huacong Zhang, Haiyi Xie, Chengwei Wang, Baoying Chen, Jishen Zeng , et al. (89 additional authors not shown)

    Abstract: This paper reports on the NTIRE 2024 Quality Assessment of AI-Generated Content Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2024. This challenge is to address a major challenge in the field of image and video processing, namely, Image Quality Assessment (IQA) and Video Quality Assessment (VQA) for AI-Generated Conte… ▽ More

    Submitted 7 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

  14. Detecting Compromised IoT Devices Using Autoencoders with Sequential Hypothesis Testing

    Authors: Md Mainuddin, Zhenhai Duan, Yingfei Dong

    Abstract: IoT devices fundamentally lack built-in security mechanisms to protect themselves from security attacks. Existing works on improving IoT security mostly focus on detecting anomalous behaviors of IoT devices. However, these existing anomaly detection schemes may trigger an overwhelmingly large number of false alerts, rendering them unusable in detecting compromised IoT devices. In this paper we dev… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

    Comments: 2023 IEEE International Conference on Big Data (BigData)

  15. arXiv:2404.09624  [pdf, other

    cs.CV

    AesExpert: Towards Multi-modality Foundation Model for Image Aesthetics Perception

    Authors: Yipo Huang, Xiangfei Sheng, Zhichao Yang, Quan Yuan, Zhichao Duan, Pengfei Chen, Leida Li, Weisi Lin, Guangming Shi

    Abstract: The highly abstract nature of image aesthetics perception (IAP) poses significant challenge for current multimodal large language models (MLLMs). The lack of human-annotated multi-modality aesthetic data further exacerbates this dilemma, resulting in MLLMs falling short of aesthetics perception capabilities. To address the above challenge, we first introduce a comprehensively annotated Aesthetic M… ▽ More

    Submitted 18 April, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

  16. arXiv:2404.09466  [pdf, other

    cs.SD cs.LG eess.AS

    Scoring Intervals using Non-Hierarchical Transformer For Automatic Piano Transcription

    Authors: Yujia Yan, Zhiyao Duan

    Abstract: The neural semi-Markov Conditional Random Field (semi-CRF) framework has demonstrated promise for event-based piano transcription. In this framework, all events (notes or pedals) are represented as closed intervals tied to specific event types. The neural semi-CRF approach requires an interval scoring matrix that assigns a score for every candidate interval. However, designing an efficient and exp… ▽ More

    Submitted 23 May, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

    Comments: Fixed Typos

  17. arXiv:2404.07507  [pdf, other

    eess.IV cs.CV

    Learning to Classify New Foods Incrementally Via Compressed Exemplars

    Authors: Justin Yang, Zhihao Duan, Jiangpeng He, Fengqing Zhu

    Abstract: Food image classification systems play a crucial role in health monitoring and diet tracking through image-based dietary assessment techniques. However, existing food recognition systems rely on static datasets characterized by a pre-defined fixed number of food classes. This contrasts drastically with the reality of food consumption, which features constantly changing data. Therefore, food image… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  18. arXiv:2403.18535  [pdf, other

    eess.IV cs.LG

    Theoretical Bound-Guided Hierarchical VAE for Neural Image Codecs

    Authors: Yichi Zhang, Zhihao Duan, Yuning Huang, Fengqing Zhu

    Abstract: Recent studies reveal a significant theoretical link between variational autoencoders (VAEs) and rate-distortion theory, notably in utilizing VAEs to estimate the theoretical upper bound of the information rate-distortion function of images. Such estimated theoretical bounds substantially exceed the performance of existing neural image codecs (NICs). To narrow this gap, we propose a theoretical bo… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: 2024 IEEE International Conference on Multimedia and Expo (ICME2024)

  19. arXiv:2403.10493  [pdf, other

    cs.SD eess.AS eess.SP

    MusicHiFi: Fast High-Fidelity Stereo Vocoding

    Authors: Ge Zhu, Juan-Pablo Caceres, Zhiyao Duan, Nicholas J. Bryan

    Abstract: Diffusion-based audio and music generation models commonly generate music by constructing an image representation of audio (e.g., a mel-spectrogram) and then converting it to audio using a phase reconstruction model or vocoder. Typical vocoders, however, produce monophonic audio at lower resolutions (e.g., 16-24 kHz), which limits their effectiveness. We propose MusicHiFi -- an efficient high-fide… ▽ More

    Submitted 20 March, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

  20. arXiv:2403.06288  [pdf, other

    cs.CV

    Probing Image Compression For Class-Incremental Learning

    Authors: Justin Yang, Zhihao Duan, Andrew Peng, Yuning Huang, Jiangpeng He, Fengqing Zhu

    Abstract: Image compression emerges as a pivotal tool in the efficient handling and transmission of digital images. Its ability to substantially reduce file size not only facilitates enhanced data storage capacity but also potentially brings advantages to the development of continual machine learning (ML) systems, which learn new knowledge incrementally from sequential data. Continual ML systems often rely… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

    Comments: Picture Coding Symposium (PCS) 2024

  21. arXiv:2403.04908  [pdf, other

    cs.CV

    Self-Adapting Large Visual-Language Models to Edge Devices across Visual Modalities

    Authors: Kaiwen Cai, Zhekai Duan, Gaowen Liu, Charles Fleming, Chris Xiaoxuan Lu

    Abstract: Recent advancements in Vision-Language (VL) models have sparked interest in their deployment on edge devices, yet challenges in handling diverse visual modalities, manual annotation, and computational constraints remain. We introduce EdgeVL, a novel framework that bridges this gap by seamlessly integrating dual-modality knowledge distillation and quantization-aware contrastive learning. This appro… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

    Comments: Under review

  22. arXiv:2402.15569  [pdf, other

    eess.AS cs.LG cs.SD

    Toward Fully Self-Supervised Multi-Pitch Estimation

    Authors: Frank Cwitkowitz, Zhiyao Duan

    Abstract: Multi-pitch estimation is a decades-long research problem involving the detection of pitch activity associated with concurrent musical events within multi-instrument mixtures. Supervised learning techniques have demonstrated solid performance on more narrow characterizations of the task, but suffer from limitations concerning the shortage of large-scale and diverse polyphonic music datasets with m… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

  23. arXiv:2402.11904  [pdf, other

    cs.GT cs.LG

    Scalable Virtual Valuations Combinatorial Auction Design by Combining Zeroth-Order and First-Order Optimization Method

    Authors: Zhijian Duan, Haoran Sun, Yichong Xia, Siqiang Wang, Zhilin Zhang, Chuan Yu, Jian Xu, Bo Zheng, Xiaotie Deng

    Abstract: Automated auction design seeks to discover empirically high-revenue and incentive-compatible mechanisms using machine learning. Ensuring dominant strategy incentive compatibility (DSIC) is crucial, and the most effective approach is to confine the mechanism to Affine Maximizer Auctions (AMAs). Nevertheless, existing AMA-based approaches encounter challenges such as scalability issues (arising from… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

  24. arXiv:2402.08256  [pdf, other

    cs.IR cs.AI

    Modeling Balanced Explicit and Implicit Relations with Contrastive Learning for Knowledge Concept Recommendation in MOOCs

    Authors: Hengnian Gu, Zhiyi Duan, Pan Xie, Dongdai Zhou

    Abstract: The knowledge concept recommendation in Massive Open Online Courses (MOOCs) is a significant issue that has garnered widespread attention. Existing methods primarily rely on the explicit relations between users and knowledge concepts on the MOOC platforms for recommendation. However, there are numerous implicit relations (e.g., shared interests or same knowledge levels between users) generated wit… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

    Comments: Accepted to WWW 2024

  25. arXiv:2402.06986  [pdf, other

    cs.SD eess.AS

    Cacophony: An Improved Contrastive Audio-Text Model

    Authors: Ge Zhu, Jordan Darefsky, Zhiyao Duan

    Abstract: Despite recent advancements in audio-text modeling, audio-text contrastive models still lag behind their image-text counterparts in scale and performance. We propose a method to improve both the scale and the training of audio-text contrastive models. Specifically, we craft a large-scale audio-text dataset containing 13,000 hours of text-labeled audio, using pretrained language models to process n… ▽ More

    Submitted 29 April, 2024; v1 submitted 10 February, 2024; originally announced February 2024.

    Comments: Work in Progress

  26. arXiv:2401.16224  [pdf, other

    cs.CV

    Diffutoon: High-Resolution Editable Toon Shading via Diffusion Models

    Authors: Zhongjie Duan, Chengyu Wang, Cen Chen, Weining Qian, Jun Huang

    Abstract: Toon shading is a type of non-photorealistic rendering task of animation. Its primary purpose is to render objects with a flat and stylized appearance. As diffusion models have ascended to the forefront of image synthesis methodologies, this paper delves into an innovative form of toon shading based on diffusion models, aiming to directly render photorealistic videos into anime styles. In video st… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

  27. arXiv:2312.15561  [pdf, other

    cs.CL cs.AI

    README: Bridging Medical Jargon and Lay Understanding for Patient Education through Data-Centric NLP

    Authors: Zonghai Yao, Nandyala Siddharth Kantu, Guanghao Wei, Hieu Tran, Zhangqi Duan, Sunjae Kwon, Zhichao Yang, README annotation team, Hong Yu

    Abstract: The advancement in healthcare has shifted focus toward patient-centric approaches, particularly in self-care and patient education, facilitated by access to Electronic Health Records (EHR). However, medical jargon in EHRs poses significant challenges in patient comprehension. To address this, we introduce a new task of automatically generating lay definitions, aiming to simplify complex medical te… ▽ More

    Submitted 16 June, 2024; v1 submitted 24 December, 2023; originally announced December 2023.

  28. arXiv:2312.15380  [pdf, other

    cs.NI eess.SP

    Battery-Care Resource Allocation and Task Offloading in Multi-Agent Post-Disaster MEC Environment

    Authors: Yiwei Tang, Hualong Huang, Wenhan Zhan, Geyong Min, Zhekai Duan, Yuchuan Lei

    Abstract: Being an up-and-coming application scenario of mobile edge computing (MEC), the post-disaster rescue suffers multitudinous computing-intensive tasks but unstably guaranteed network connectivity. In rescue environments, quality of service (QoS), such as task execution delay, energy consumption and battery state of health (SoH), is of significant meaning. This paper studies a multi-user post-disaste… ▽ More

    Submitted 23 December, 2023; originally announced December 2023.

    Comments: accepted by wcnc2024

  29. arXiv:2312.11063  [pdf, ps, other

    cs.GT cs.AI cs.DS cs.LG econ.TH

    A survey on algorithms for Nash equilibria in finite normal-form games

    Authors: Hanyu Li, Wenhan Huang, Zhijian Duan, David Henry Mguni, Kun Shao, Jun Wang, Xiaotie Deng

    Abstract: Nash equilibrium is one of the most influential solution concepts in game theory. With the development of computer science and artificial intelligence, there is an increasing demand on Nash equilibrium computation, especially for Internet economics and multi-agent learning. This paper reviews various algorithms computing the Nash equilibrium and its approximation solutions in finite normal-form ga… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

    Comments: The published version is in Computer Science Review

  30. arXiv:2312.06544  [pdf, ps, other

    cs.DC cs.PF

    Complexity Evaluation of Parallel Execution of the RAPiD Deep-Learning Algorithm on Intel CPU

    Authors: Dominic Konrad, Zhihao Duan, Mertcan Cokbas, Prakash Ishwar

    Abstract: Knowing how many and where are people in various indoor spaces is critical for reducing HVAC energy waste, space management, spatial analytics and in emergency scenarios. While a range of technologies have been proposed to detect and track people in large indoor spaces, ceiling-mounted fisheye cameras have recently emerged as strong contenders. Currently, RAPiD is the SOTA algorithm for people det… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

    Report number: ECE-2022-04

  31. arXiv:2312.05990  [pdf, other

    cs.CL

    Constructing Vec-tionaries to Extract Message Features from Texts: A Case Study of Moral Appeals

    Authors: Zening Duan, Anqi Shao, Yicheng Hu, Heysung Lee, Xining Liao, Yoo Ji Suh, Jisoo Kim, Kai-Cheng Yang, Kai** Chen, Sijia Yang

    Abstract: While researchers often study message features like moral content in text, such as party manifestos and social media, their quantification remains a challenge. Conventional human coding struggles with scalability and intercoder reliability. While dictionary-based methods are cost-effective and computationally efficient, they often lack contextual sensitivity and are limited by the vocabularies dev… ▽ More

    Submitted 8 March, 2024; v1 submitted 10 December, 2023; originally announced December 2023.

  32. arXiv:2311.09265  [pdf, other

    cs.CV

    FastBlend: a Powerful Model-Free Toolkit Making Video Stylization Easier

    Authors: Zhongjie Duan, Chengyu Wang, Cen Chen, Weining Qian, Jun Huang, Mingyi **

    Abstract: With the emergence of diffusion models and rapid development in image processing, it has become effortless to generate fancy images in tasks such as style transfer and image editing. However, these impressive image processing approaches face consistency issues in video processing. In this paper, we propose a powerful model-free toolkit called FastBlend to address the consistency problem for video… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

    Comments: 13 pages, 10 figures

  33. arXiv:2311.08667  [pdf, other

    cs.SD eess.AS

    EDMSound: Spectrogram Based Diffusion Models for Efficient and High-Quality Audio Synthesis

    Authors: Ge Zhu, Yutong Wen, Marc-André Carbonneau, Zhiyao Duan

    Abstract: Audio diffusion models can synthesize a wide variety of sounds. Existing models often operate on the latent domain with cascaded phase recovery modules to reconstruct waveform. This poses challenges when generating high-fidelity audio. In this paper, we propose EDMSound, a diffusion-based generative model in spectrogram domain under the framework of elucidated diffusion models (EDM). Combining wit… ▽ More

    Submitted 18 November, 2023; v1 submitted 14 November, 2023; originally announced November 2023.

    Comments: Accepted at NeurIPS Workshop: Machine Learning for Audio (Camera Ready)

  34. arXiv:2311.06761  [pdf, other

    cs.CL

    Learning Knowledge-Enhanced Contextual Language Representations for Domain Natural Language Understanding

    Authors: Ruyao Xu, Taolin Zhang, Chengyu Wang, Zhongjie Duan, Cen Chen, Minghui Qiu, Dawei Cheng, Xiaofeng He, Weining Qian

    Abstract: Knowledge-Enhanced Pre-trained Language Models (KEPLMs) improve the performance of various downstream NLP tasks by injecting knowledge facts from large-scale Knowledge Graphs (KGs). However, existing methods for pre-training KEPLMs with relational triples are difficult to be adapted to close domains due to the lack of sufficient domain graph semantics. In this paper, we propose a Knowledge-enhance… ▽ More

    Submitted 12 November, 2023; originally announced November 2023.

    Comments: emnlp 2023

  35. arXiv:2310.11678  [pdf, other

    cs.LG cs.AI cs.FL cs.LO

    Using Experience Classification for Training Non-Markovian Tasks

    Authors: Ruixuan Miao, Xu Lu, Cong Tian, Bin Yu, Zhenhua Duan

    Abstract: Unlike the standard Reinforcement Learning (RL) model, many real-world tasks are non-Markovian, whose rewards are predicated on state history rather than solely on the current state. Solving a non-Markovian task, frequently applied in practical applications such as autonomous driving, financial trading, and medical diagnosis, can be quite challenging. We propose a novel RL approach to achieve non-… ▽ More

    Submitted 17 October, 2023; originally announced October 2023.

  36. arXiv:2309.09085  [pdf, other

    cs.SD cs.IR cs.MM eess.AS eess.SP

    SynthTab: Leveraging Synthesized Data for Guitar Tablature Transcription

    Authors: Yongyi Zang, Yi Zhong, Frank Cwitkowitz, Zhiyao Duan

    Abstract: Guitar tablature is a form of music notation widely used among guitarists. It captures not only the musical content of a piece, but also its implementation and ornamentation on the instrument. Guitar Tablature Transcription (GTT) is an important task with broad applications in music education, composition, and entertainment. Existing GTT datasets are quite limited in size and scope, rendering mode… ▽ More

    Submitted 24 January, 2024; v1 submitted 16 September, 2023; originally announced September 2023.

    Comments: Accepted to ICASSP 2024

  37. arXiv:2309.07525  [pdf, other

    cs.SD cs.AI eess.AS

    SingFake: Singing Voice Deepfake Detection

    Authors: Yongyi Zang, You Zhang, Mojtaba Heydari, Zhiyao Duan

    Abstract: The rise of singing voice synthesis presents critical challenges to artists and industry stakeholders over unauthorized voice usage. Unlike synthesized speech, synthesized singing voices are typically released in songs containing strong background music that may hide synthesis artifacts. Additionally, singing voices present different acoustic and linguistic characteristics from speech utterances.… ▽ More

    Submitted 21 January, 2024; v1 submitted 14 September, 2023; originally announced September 2023.

    Comments: Accepted at ICASSP 2024

  38. arXiv:2309.05534  [pdf, other

    cs.CL cs.AI cs.CV

    PAI-Diffusion: Constructing and Serving a Family of Open Chinese Diffusion Models for Text-to-image Synthesis on the Cloud

    Authors: Chengyu Wang, Zhongjie Duan, Bingyan Liu, Xinyi Zou, Cen Chen, Kui Jia, Jun Huang

    Abstract: Text-to-image synthesis for the Chinese language poses unique challenges due to its large vocabulary size, and intricate character relationships. While existing diffusion models have shown promise in generating images from textual descriptions, they often neglect domain-specific contexts and lack robustness in handling the Chinese language. This paper introduces PAI-Diffusion, a comprehensive fram… ▽ More

    Submitted 11 September, 2023; originally announced September 2023.

  39. arXiv:2308.12060  [pdf, other

    cs.CL cs.AI

    FlexKBQA: A Flexible LLM-Powered Framework for Few-Shot Knowledge Base Question Answering

    Authors: Zhenyu Li, Sunqi Fan, Yu Gu, Xiuxing Li, Zhichao Duan, Bowen Dong, Ning Liu, Jianyong Wang

    Abstract: Knowledge base question answering (KBQA) is a critical yet challenging task due to the vast number of entities within knowledge bases and the diversity of natural language questions posed by users. Unfortunately, the performance of most KBQA models tends to decline significantly in real-world scenarios where high-quality annotated data is insufficient. To mitigate the burden associated with manual… ▽ More

    Submitted 26 January, 2024; v1 submitted 23 August, 2023; originally announced August 2023.

    Comments: Accepted as AAAI-24 Oral paper; Knowledge Base Question Answering; Large Language Model; Data Generation; Few-Shot & Zero-Shot

  40. arXiv:2308.03463  [pdf, other

    cs.CV cs.MM

    DiffSynth: Latent In-Iteration Deflickering for Realistic Video Synthesis

    Authors: Zhongjie Duan, Lizhou You, Chengyu Wang, Cen Chen, Ziheng Wu, Weining Qian, Jun Huang

    Abstract: In recent years, diffusion models have emerged as the most powerful approach in image synthesis. However, applying these models directly to video synthesis presents challenges, as it often leads to noticeable flickering contents. Although recently proposed zero-shot methods can alleviate flicker to some extent, we still struggle to generate coherent videos. In this paper, we propose DiffSynth, a n… ▽ More

    Submitted 9 August, 2023; v1 submitted 7 August, 2023; originally announced August 2023.

    Comments: 9 pages, 6 figures

  41. arXiv:2307.14547  [pdf, other

    eess.AS cs.SD

    Mitigating Cross-Database Differences for Learning Unified HRTF Representation

    Authors: Yutong Wen, You Zhang, Zhiyao Duan

    Abstract: Individualized head-related transfer functions (HRTFs) are crucial for accurate sound positioning in virtual auditory displays. As the acoustic measurement of HRTFs is resource-intensive, predicting individualized HRTFs using machine learning models is a promising approach at scale. Training such models require a unified HRTF representation across multiple databases to utilize their respectively l… ▽ More

    Submitted 26 July, 2023; originally announced July 2023.

    Comments: 5 pages, 4 figures, accepted by IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) 2023

  42. arXiv:2306.07709  [pdf, other

    cs.GT cs.LG econ.TH

    Coordinated Dynamic Bidding in Repeated Second-Price Auctions with Budgets

    Authors: Yurong Chen, Qian Wang, Zhijian Duan, Haoran Sun, Zhaohua Chen, Xiang Yan, Xiaotie Deng

    Abstract: In online ad markets, a rising number of advertisers are employing bidding agencies to participate in ad auctions. These agencies are specialized in designing online algorithms and bidding on behalf of their clients. Typically, an agency usually has information on multiple advertisers, so she can potentially coordinate bids to help her clients achieve higher utilities than those under independent… ▽ More

    Submitted 13 June, 2023; originally announced June 2023.

    Comments: 43 pages, 12 figures

  43. arXiv:2306.03985  [pdf, other

    cs.LG

    Agent Performing Autonomous Stock Trading under Good and Bad Situations

    Authors: Yunfei Luo, Zhangqi Duan

    Abstract: Stock trading is one of the popular ways for financial management. However, the market and the environment of economy is unstable and usually not predictable. Furthermore, engaging in stock trading requires time and effort to analyze, create strategies, and make decisions. It would be convenient and effective if an agent could assist or even do the task of analyzing and modeling the past data and… ▽ More

    Submitted 6 June, 2023; originally announced June 2023.

    Comments: Published in ICML Workshop: AI for Agent Based Modeling, 2023

  44. Phase perturbation improves channel robustness for speech spoofing countermeasures

    Authors: Yongyi Zang, You Zhang, Zhiyao Duan

    Abstract: In this paper, we aim to address the problem of channel robustness in speech countermeasure (CM) systems, which are used to distinguish synthetic speech from human natural speech. On the basis of two hypotheses, we suggest an approach for perturbing phase information during the training of time-domain CM systems. Communication networks often employ lossy compression codec that encodes only magnitu… ▽ More

    Submitted 6 October, 2023; v1 submitted 6 June, 2023; originally announced June 2023.

    Comments: 5 pages; Proceedings of Interspeech 2023

  45. arXiv:2305.19492  [pdf, other

    cs.CV cs.AI

    CVSNet: A Computer Implementation for Central Visual System of The Brain

    Authors: Ruimin Gao, Hao Zou, Zhekai Duan

    Abstract: In computer vision, different basic blocks are created around different matrix operations, and models based on different basic blocks have achieved good results. Good results achieved in vision tasks grants them rationality. However, these experimental-based models also make deep learning long criticized for principle and interpretability. Deep learning originated from the concept of neurons in ne… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

  46. arXiv:2305.17716  [pdf, other

    cs.CV cs.AI

    InDL: A New Dataset and Benchmark for In-Diagram Logic Interpretation based on Visual Illusion

    Authors: Haobo Yang, Wenyu Wang, Ze Cao, Zhekai Duan, Xuchen Liu

    Abstract: This paper introduces a novel approach to evaluating deep learning models' capacity for in-diagram logic interpretation. Leveraging the intriguing realm of visual illusions, we establish a unique dataset, InDL, designed to rigorously test and benchmark these models. Deep learning has witnessed remarkable progress in domains such as computer vision and natural language processing. However, models o… ▽ More

    Submitted 5 June, 2023; v1 submitted 28 May, 2023; originally announced May 2023.

    Comments: arXiv admin note: text overlap with arXiv:2305.02299, arXiv:2302.11939, arXiv:2301.13287, arXiv:2305.12686

  47. Optimal Linear Subspace Search: Learning to Construct Fast and High-Quality Schedulers for Diffusion Models

    Authors: Zhongjie Duan, Chengyu Wang, Cen Chen, Jun Huang, Weining Qian

    Abstract: In recent years, diffusion models have become the most popular and powerful methods in the field of image synthesis, even rivaling human artists in artistic creativity. However, the key issue currently limiting the application of diffusion models is its extremely slow generation process. Although several methods were proposed to speed up the generation process, there still exists a trade-off betwe… ▽ More

    Submitted 10 August, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: 13 pages, 5 figures

  48. arXiv:2305.12755  [pdf, other

    cs.SD cs.CL eess.AS

    GNCformer Enhanced Self-attention for Automatic Speech Recognition

    Authors: J. Li, Z. Duan, S. Li, X. Yu, G. Yang

    Abstract: In this paper,an Enhanced Self-Attention (ESA) mechanism has been put forward for robust feature extraction.The proposed ESA is integrated with the recursive gated convolution and self-attention mechanism.In particular, the former is used to capture multi-order feature interaction and the latter is for global feature extraction.In addition, the location of interest that is suitable for inserting t… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

    Comments: 5 pages,3 figures,

  49. arXiv:2305.12162  [pdf, other

    cs.GT cs.AI cs.LG cs.MA

    A Scalable Neural Network for DSIC Affine Maximizer Auction Design

    Authors: Zhijian Duan, Haoran Sun, Yurong Chen, Xiaotie Deng

    Abstract: Automated auction design aims to find empirically high-revenue mechanisms through machine learning. Existing works on multi item auction scenarios can be roughly divided into RegretNet-like and affine maximizer auctions (AMAs) approaches. However, the former cannot strictly ensure dominant strategy incentive compatibility (DSIC), while the latter faces scalability issue due to the large number of… ▽ More

    Submitted 17 January, 2024; v1 submitted 20 May, 2023; originally announced May 2023.

    Comments: NeurIPS 2023 (spotlight)

  50. arXiv:2305.01898  [pdf, other

    cs.AI cs.RO cs.SE

    VSRQ: Quantitative Assessment Method for Safety Risk of Vehicle Intelligent Connected System

    Authors: Tian Zhang, Wenshan Guan, Hao Miao, Xiujie Huang, Zhiquan Liu, Chaonan Wang, Quanlong Guan, Liangda Fang, Zhifei Duan

    Abstract: The field of intelligent connected in modern vehicles continues to expand, and the functions of vehicles become more and more complex with the development of the times. This has also led to an increasing number of vehicle vulnerabilities and many safety issues. Therefore, it is particularly important to identify high-risk vehicle intelligent connected systems, because it can inform security person… ▽ More

    Submitted 3 May, 2023; originally announced May 2023.