Skip to main content

Showing 1–50 of 230 results for author: Qin, T

Searching in archive cs. Search in all archives.
.
  1. Crowd-Sourced NeRF: Collecting Data from Production Vehicles for 3D Street View Reconstruction

    Authors: Tong Qin, Changze Li, Haoyang Ye, Shaowei Wan, Minzhen Li, Hongwei Liu, Ming Yang

    Abstract: Recently, Neural Radiance Fields (NeRF) achieved impressive results in novel view synthesis. Block-NeRF showed the capability of leveraging NeRF to build large city-scale models. For large-scale modeling, a mass of image data is necessary. Collecting images from specially designed data-collection vehicles can not support large-scale applications. How to acquire massive high-quality data remains an… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  2. arXiv:2406.15777  [pdf, other

    cs.SE

    ISS-Scenario: Scenario-based Testing in CARLA

    Authors: Renjue Li, Tianhang Qin, Cas Widdershoven

    Abstract: The rapidly evolving field of autonomous driving systems (ADSs) is full of promise. However, in order to fulfil these promises, ADSs need to be safe in all circumstances. This paper introduces ISS-Scenario, an autonomous driving testing framework in the paradigm of scenario-based testing. ISS-Scenario is designed for batch testing, exploration of test cases (e.g., potentially dangerous scenarios),… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: TASE 2024, 8 pages

  3. arXiv:2406.10485  [pdf, other

    cs.LG cs.CV

    A Label is Worth a Thousand Images in Dataset Distillation

    Authors: Tian Qin, Zhiwei Deng, David Alvarez-Melis

    Abstract: Data $\textit{quality}$ is a crucial factor in the performance of machine learning models, a principle that dataset distillation methods exploit by compressing training datasets into much smaller counterparts that maintain similar downstream performance. Understanding how and why data distillation methods work is vital not only for improving these methods but also for revealing fundamental charact… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  4. arXiv:2406.03438  [pdf, other

    cs.IT eess.SP

    CSI-GPT: Integrating Generative Pre-Trained Transformer with Federated-Tuning to Acquire Downlink Massive MIMO Channels

    Authors: Ye Zeng, Li Qiao, Zhen Gao, Tong Qin, Zhonghuai Wu, Sheng Chen, Mohsen Guizani

    Abstract: In massive multiple-input multiple-output (MIMO) systems, how to reliably acquire downlink channel state information (CSI) with low overhead is challenging. In this work, by integrating the generative pre-trained Transformer (GPT) with federated-tuning, we propose a CSI-GPT approach to realize efficient downlink CSI acquisition. Specifically, we first propose a Swin Transformer-based channel acqui… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  5. arXiv:2405.19730  [pdf

    cs.AI cs.CV cs.LG

    Research on Foundation Model for Spatial Data Intelligence: China's 2024 White Paper on Strategic Development of Spatial Data Intelligence

    Authors: Shaohua Wang, Xing Xie, Yong Li, Danhuai Guo, Zhi Cai, Yu Liu, Yang Yue, Xiao Pan, Feng Lu, Huayi Wu, Zhipeng Gui, Zhiming Ding, Bolong Zheng, Fuzheng Zhang, Tao Qin, **gyuan Wang, Chuang Tao, Zhengchao Chen, Hao Lu, Jiayi Li, Hongyang Chen, Peng Yue, Wenhao Yu, Yao Yao, Leilei Sun , et al. (9 additional authors not shown)

    Abstract: This report focuses on spatial data intelligent large models, delving into the principles, methods, and cutting-edge applications of these models. It provides an in-depth discussion on the definition, development history, current status, and trends of spatial data intelligent large models, as well as the challenges they face. The report systematically elucidates the key technologies of spatial dat… ▽ More

    Submitted 29 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

    Comments: in Chinese language

  6. arXiv:2405.12783  [pdf, other

    stat.ML cs.LG

    Epanechnikov Variational Autoencoder

    Authors: Tian Qin, Wei-Min Huang

    Abstract: In this paper, we bridge Variational Autoencoders (VAEs) [17] and kernel density estimations (KDEs) [25 ],[23] by approximating the posterior by KDEs and deriving an upper bound of the Kullback-Leibler (KL) divergence in the evidence lower bound (ELBO). The flexibility of KDEs makes the optimization of posteriors in VAEs possible, which not only addresses the limitations of Gaussian latent space i… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  7. arXiv:2405.07010  [pdf, other

    cs.CY cs.CL

    Deciphering public attention to geoengineering and climate issues using machine learning and dynamic analysis

    Authors: Ramit Debnath, Pengyu Zhang, Tianzhu Qin, R. Michael Alvarez, Shaun D. Fitzgerald

    Abstract: As the conversation around using geoengineering to combat climate change intensifies, it is imperative to engage the public and deeply understand their perspectives on geoengineering research, development, and potential deployment. Through a comprehensive data-driven investigation, this paper explores the types of news that captivate public interest in geoengineering. We delved into 30,773 English… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

    Comments: 46 page, 6 main figures and SI

    ACM Class: J.4; K.4

  8. arXiv:2405.02823  [pdf, other

    cs.IT eess.SP

    Reconfigurable Massive MIMO: Precoding Design and Channel Estimation in the Electromagnetic Domain

    Authors: Keke Ying, Zhen Gao, Yu Su, Tong Qin, Michail Matthaiou, Robert Schober

    Abstract: Reconfigurable massive multiple-input multiple-output (RmMIMO) technology offers increased flexibility for future communication systems by exploiting previously untapped degrees of freedom in the electromagnetic (EM) domain. The representation of the traditional spatial domain channel state information (sCSI) limits the insights into the potential of EM domain channel properties, constraining the… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: This work is being submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  9. arXiv:2403.03100  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

    Authors: Zeqian Ju, Yuancheng Wang, Kai Shen, Xu Tan, Detai Xin, Dongchao Yang, Yanqing Liu, Yichong Leng, Kaitao Song, Siliang Tang, Zhizheng Wu, Tao Qin, Xiang-Yang Li, Wei Ye, Shikun Zhang, Jiang Bian, Lei He, **yu Li, Sheng Zhao

    Abstract: While recent large-scale text-to-speech (TTS) models have achieved significant progress, they still fall short in speech quality, similarity, and prosody. Considering speech intricately encompasses various attributes (e.g., content, prosody, timbre, and acoustic details) that pose significant challenges for generation, a natural idea is to factorize speech into individual subspaces representing di… ▽ More

    Submitted 23 April, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

    Comments: Achieving human-level quality and naturalness on multi-speaker datasets (e.g., LibriSpeech) in a zero-shot way

  10. arXiv:2403.01528  [pdf, other

    cs.CL cs.AI q-bio.BM

    Leveraging Biomolecule and Natural Language through Multi-Modal Learning: A Survey

    Authors: Qizhi Pei, Lijun Wu, Kaiyuan Gao, **hua Zhu, Yue Wang, Zun Wang, Tao Qin, Rui Yan

    Abstract: The integration of biomolecular modeling with natural language (BL) has emerged as a promising interdisciplinary area at the intersection of artificial intelligence, chemistry and biology. This approach leverages the rich, multifaceted descriptions of biomolecules contained within textual data sources to enhance our fundamental understanding and enable downstream computational tasks such as biomol… ▽ More

    Submitted 5 March, 2024; v1 submitted 3 March, 2024; originally announced March 2024.

    Comments: Survey Paper. 25 pages, 9 figures, and 3 tables

  11. arXiv:2403.00999  [pdf, other

    cs.LG

    Distributional Dataset Distillation with Subtask Decomposition

    Authors: Tian Qin, Zhiwei Deng, David Alvarez-Melis

    Abstract: What does a neural network learn when training from a task-specific dataset? Synthesizing this knowledge is the central idea behind Dataset Distillation, which recent work has shown can be used to compress large datasets into a small set of input-label pairs ($\textit{prototypes}$) that capture essential aspects of the original dataset. In this paper, we make the key observation that existing meth… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

  12. arXiv:2402.18512  [pdf, other

    cs.LG

    Log Neural Controlled Differential Equations: The Lie Brackets Make a Difference

    Authors: Benjamin Walker, Andrew D. McLeod, Tiexin Qin, Yichuan Cheng, Haoliang Li, Terry Lyons

    Abstract: The vector field of a controlled differential equation (CDE) describes the relationship between a control path and the evolution of a solution path. Neural CDEs (NCDEs) treat time series data as observations from a control path, parameterise a CDE's vector field using a neural network, and use the solution path as a continuously evolving hidden state. As their formulation makes them robust to irre… ▽ More

    Submitted 11 June, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

    Comments: 23 pages, 5 figures, International Conference on Machine Learning 2024

  13. arXiv:2402.17810  [pdf, other

    q-bio.QM cs.AI cs.CE cs.LG q-bio.BM

    BioT5+: Towards Generalized Biological Understanding with IUPAC Integration and Multi-task Tuning

    Authors: Qizhi Pei, Lijun Wu, Kaiyuan Gao, Xiaozhuan Liang, Yin Fang, **hua Zhu, Shufang Xie, Tao Qin, Rui Yan

    Abstract: Recent research trends in computational biology have increasingly focused on integrating text and bio-entity modeling, especially in the context of molecules and proteins. However, previous efforts like BioT5 faced challenges in generalizing across diverse tasks and lacked a nuanced understanding of molecular structures, particularly in their textual representations (e.g., IUPAC). This paper intro… ▽ More

    Submitted 31 May, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

    Comments: Accepted by ACL 2024 (Findings)

  14. arXiv:2402.04550  [pdf, other

    stat.ML cs.LG

    Riemann-Lebesgue Forest for Regression

    Authors: Tian Qin, Wei-Min Huang

    Abstract: We propose a novel ensemble method called Riemann-Lebesgue Forest (RLF) for regression. The core idea in RLF is to mimic the way how a measurable function can be approximated by partitioning its range into a few intervals. With this idea in mind, we develop a new tree learner named Riemann-Lebesgue Tree (RLT) which has a chance to perform Lebesgue type cutting,i.e splitting the node from response… ▽ More

    Submitted 9 May, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

  15. arXiv:2402.03563  [pdf, other

    cs.LG cs.AI cs.CL

    Distinguishing the Knowable from the Unknowable with Language Models

    Authors: Gustaf Ahdritz, Tian Qin, Nikhil Vyas, Boaz Barak, Benjamin L. Edelman

    Abstract: We study the feasibility of identifying epistemic uncertainty (reflecting a lack of knowledge), as opposed to aleatoric uncertainty (reflecting entropy in the underlying distribution), in the outputs of large language models (LLMs) over free-form text. In the absence of ground-truth probabilities, we explore a setting where, in order to (approximately) disentangle a given LLM's uncertainty, a sign… ▽ More

    Submitted 27 February, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

  16. arXiv:2401.00283  [pdf, other

    cs.IT eess.SP

    Near-Space Communications: the Last Piece of 6G Space-Air-Ground-Sea Integrated Network Puzzle

    Authors: Hongshan Liu, Tong Qin, Zhen Gao, Tianqi Mao, Keke Ying, Ziwei Wan, Li Qiao, Rui Na, Zhongxiang Li, Chun Hu, Yikun Mei, Tuan Li, Guanghui Wen, Lei Chen, Zhonghuai Wu, Ruiqi Liu, Gaojie Chen, Shuo Wang, Dezhi Zheng

    Abstract: This article presents a comprehensive study on the emerging near-space communications (NS-COM) within the context of space-air-ground-sea integrated network (SAGSIN). Specifically, we firstly explore the recent technical developments of NS-COM, followed by the discussions about motivations behind integrating NS-COM into SAGSIN. To further demonstrate the necessity of NS-COM, a comparative analysis… ▽ More

    Submitted 4 March, 2024; v1 submitted 30 December, 2023; originally announced January 2024.

    Comments: 28 pages, 8 figures, 2 tables

  17. arXiv:2312.09866  [pdf, other

    cs.CV

    PLGSLAM: Progressive Neural Scene Represenation with Local to Global Bundle Adjustment

    Authors: Tianchen Deng, Guole Shen, Tong Qin, Jianyu Wang, Wentao Zhao, **gchuan Wang, Danwei Wang, Weidong Chen

    Abstract: Neural implicit scene representations have recently shown encouraging results in dense visual SLAM. However, existing methods produce low-quality scene reconstruction and low-accuracy localization performance when scaling up to large indoor scenes and long sequences. These limitations are mainly due to their single, global radiance field with finite capacity, which does not adapt to large scenario… ▽ More

    Submitted 29 March, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

    Comments: Accepted by CVPR 2024

  18. arXiv:2312.01587  [pdf, other

    cs.GT cs.LG

    Scalable and Independent Learning of Nash Equilibrium Policies in $n$-Player Stochastic Games with Unknown Independent Chains

    Authors: Tiancheng Qin, S. Rasoul Etesami

    Abstract: We study a subclass of $n$-player stochastic games, namely, stochastic games with independent chains and unknown transition matrices. In this class of games, players control their own internal Markov chains whose transitions do not depend on the states/actions of other players. However, players' decisions are coupled through their payoff functions. We assume players can receive only realizations o… ▽ More

    Submitted 3 December, 2023; originally announced December 2023.

  19. arXiv:2311.17410  [pdf, other

    cs.DC cs.LG

    GNNFlow: A Distributed Framework for Continuous Temporal GNN Learning on Dynamic Graphs

    Authors: Yuchen Zhong, Guangming Sheng, Tianzuo Qin, Minjie Wang, Quan Gan, Chuan Wu

    Abstract: Graph Neural Networks (GNNs) play a crucial role in various fields. However, most existing deep graph learning frameworks assume pre-stored static graphs and do not support training on graph streams. In contrast, many real-world graphs are dynamic and contain time domain information. We introduce GNNFlow, a distributed framework that enables efficient continuous temporal graph representation learn… ▽ More

    Submitted 29 November, 2023; v1 submitted 29 November, 2023; originally announced November 2023.

  20. arXiv:2311.16452  [pdf, other

    cs.CL

    Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine

    Authors: Harsha Nori, Yin Tat Lee, Sheng Zhang, Dean Carignan, Richard Edgar, Nicolo Fusi, Nicholas King, Jonathan Larson, Yuanzhi Li, Weishung Liu, Renqian Luo, Scott Mayer McKinney, Robert Osazuwa Ness, Hoifung Poon, Tao Qin, Naoto Usuyama, Chris White, Eric Horvitz

    Abstract: Generalist foundation models such as GPT-4 have displayed surprising capabilities in a wide variety of domains and tasks. Yet, there is a prevalent assumption that they cannot match specialist capabilities of fine-tuned models. For example, most explorations to date on medical competency benchmarks have leveraged domain-specific training, as exemplified by efforts on BioGPT and Med-PaLM. We build… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

    Comments: 21 pages, 7 figures

    ACM Class: I.2.7

  21. arXiv:2311.08732  [pdf, other

    cs.CL

    Enhancing Emergency Decision-making with Knowledge Graphs and Large Language Models

    Authors: Minze Chen, Zhenxiang Tao, Weitong Tang, Tingxin Qin, Rui Yang, Chunli Zhu

    Abstract: Emergency management urgently requires comprehensive knowledge while having a high possibility to go beyond individuals' cognitive scope. Therefore, artificial intelligence(AI) supported decision-making under that circumstance is of vital importance. Recent emerging large language models (LLM) provide a new direction for enhancing targeted machine intelligence. However, the utilization of LLM dire… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

    Comments: 26 pages, 6 figures

  22. arXiv:2311.02827  [pdf, other

    stat.ML cs.LG stat.AP

    On Subagging Boosted Probit Model Trees

    Authors: Tian Qin, Wei-Min Huang

    Abstract: With the insight of variance-bias decomposition, we design a new hybrid bagging-boosting algorithm named SBPMT for classification problems. For the boosting part of SBPMT, we propose a new tree model called Probit Model Tree (PMT) as base classifiers in AdaBoost procedure. For the bagging part, instead of subsampling from the dataset at each step of boosting, we perform boosted PMTs on each subagg… ▽ More

    Submitted 5 November, 2023; originally announced November 2023.

  23. arXiv:2310.07990  [pdf

    q-bio.GN cs.IR cs.LG stat.AP

    Multi-View Variational Autoencoder for Missing Value Imputation in Untargeted Metabolomics

    Authors: Chen Zhao, Kuan-Jui Su, Chong Wu, Xuewei Cao, Qiuying Sha, Wu Li, Zhe Luo, Tian Qin, Chuan Qiu, Lan Juan Zhao, Anqi Liu, Lindong Jiang, Xiao Zhang, Hui Shen, Weihua Zhou, Hong-Wen Deng

    Abstract: Background: Missing data is a common challenge in mass spectrometry-based metabolomics, which can lead to biased and incomplete analyses. The integration of whole-genome sequencing (WGS) data with metabolomics data has emerged as a promising approach to enhance the accuracy of data imputation in metabolomics studies. Method: In this study, we propose a novel method that leverages the information f… ▽ More

    Submitted 12 March, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

    Comments: 19 pages, 3 figures

  24. arXiv:2310.06763  [pdf, other

    cs.LG cs.AI q-bio.BM

    FABind: Fast and Accurate Protein-Ligand Binding

    Authors: Qizhi Pei, Kaiyuan Gao, Lijun Wu, **hua Zhu, Yingce Xia, Shufang Xie, Tao Qin, Kun He, Tie-Yan Liu, Rui Yan

    Abstract: Modeling the interaction between proteins and ligands and accurately predicting their binding structures is a critical yet challenging task in drug discovery. Recent advancements in deep learning have shown promise in addressing this challenge, with sampling-based and regression-based methods emerging as two prominent approaches. However, these methods have notable limitations. Sampling-based meth… ▽ More

    Submitted 8 January, 2024; v1 submitted 10 October, 2023; originally announced October 2023.

    Comments: Accepted by Neural Information Processing Systems 2023 (NeurIPS 2023)

  25. arXiv:2309.02285  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    PromptTTS 2: Describing and Generating Voices with Text Prompt

    Authors: Yichong Leng, Zhifang Guo, Kai Shen, Xu Tan, Zeqian Ju, Yanqing Liu, Yufei Liu, Dongchao Yang, Leying Zhang, Kaitao Song, Lei He, Xiang-Yang Li, Sheng Zhao, Tao Qin, Jiang Bian

    Abstract: Speech conveys more information than text, as the same word can be uttered in various voices to convey diverse information. Compared to traditional text-to-speech (TTS) methods relying on speech prompts (reference speech) for voice variability, using text prompts (descriptions) is more user-friendly since speech prompts can be hard to find or may not exist at all. TTS approaches based on the text… ▽ More

    Submitted 11 October, 2023; v1 submitted 5 September, 2023; originally announced September 2023.

    Comments: Demo page: https://speechresearch.github.io/prompttts2

  26. arXiv:2308.16418  [pdf, other

    cs.MM

    End-Edge Coordinated Joint Encoding and Neural Enhancement for Low-Light Video Analytics

    Authors: Yuanyi He, Peng Yang, Tian Qin, Ning Zhang

    Abstract: In this paper, we investigate video analytics in low-light environments, and propose an end-edge coordinated system with joint video encoding and enhancement. It adaptively transmits low-light videos from cameras and performs enhancement and inference tasks at the edge. Firstly, according to our observations, both encoding and enhancement for low-light videos have a significant impact on inference… ▽ More

    Submitted 30 August, 2023; originally announced August 2023.

  27. arXiv:2308.03258  [pdf, other

    cs.CV cs.CR

    APBench: A Unified Benchmark for Availability Poisoning Attacks and Defenses

    Authors: Tianrui Qin, Xitong Gao, Juanjuan Zhao, Kejiang Ye, Cheng-Zhong Xu

    Abstract: The efficacy of availability poisoning, a method of poisoning data by injecting imperceptible perturbations to prevent its use in model training, has been a hot subject of investigation. Previous research suggested that it was difficult to effectively counteract such poisoning attacks. However, the introduction of various defense methods has challenged this notion. Due to the rapid progress in thi… ▽ More

    Submitted 6 August, 2023; originally announced August 2023.

  28. arXiv:2306.04123  [pdf, other

    cs.AI cs.LG

    Retrosynthesis Prediction with Local Template Retrieval

    Authors: Shufang Xie, Rui Yan, Junliang Guo, Yingce Xia, Lijun Wu, Tao Qin

    Abstract: Retrosynthesis, which predicts the reactants of a given target molecule, is an essential task for drug discovery. In recent years, the machine learing based retrosynthesis methods have achieved promising results. In this work, we introduce RetroKNN, a local reaction template retrieval method to further boost the performance of template-based systems with non-parametric retrieval. We first build an… ▽ More

    Submitted 6 June, 2023; originally announced June 2023.

    Comments: AAAI-2023 camera ready

  29. arXiv:2306.02242  [pdf, other

    cs.CL cs.AI

    Extract and Attend: Improving Entity Translation in Neural Machine Translation

    Authors: Zixin Zeng, Rui Wang, Yichong Leng, Junliang Guo, Xu Tan, Tao Qin, Tie-yan Liu

    Abstract: While Neural Machine Translation(NMT) has achieved great progress in recent years, it still suffers from inaccurate translation of entities (e.g., person/organization name, location), due to the lack of entity training instances. When we humans encounter an unknown entity during translation, we usually first look up in a dictionary and then organize the entity translation together with the transla… ▽ More

    Submitted 3 June, 2023; originally announced June 2023.

  30. arXiv:2305.10688  [pdf, other

    cs.CL

    MolXPT: Wrap** Molecules with Text for Generative Pre-training

    Authors: Zequn Liu, Wei Zhang, Yingce Xia, Lijun Wu, Shufang Xie, Tao Qin, Ming Zhang, Tie-Yan Liu

    Abstract: Generative pre-trained Transformer (GPT) has demonstrates its great success in natural language processing and related techniques have been adapted into molecular modeling. Considering that text is the most important record for scientific discovery, in this paper, we propose MolXPT, a unified language model of text and molecules pre-trained on SMILES (a sequence representation of molecules) wrappe… ▽ More

    Submitted 26 May, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

    Comments: Accepted to ACL 2023; add more details about MoleculeNet finetune

  31. arXiv:2305.01622  [pdf, other

    cs.RO cs.AI

    FlowMap: Path Generation for Automated Vehicles in Open Space Using Traffic Flow

    Authors: Wenchao Ding, Jieru Zhao, Yubin Chu, Haihui Huang, Tong Qin, Chun**g Xu, Yuxiang Guan, Zhongxue Gan

    Abstract: There is extensive literature on perceiving road structures by fusing various sensor inputs such as lidar point clouds and camera images using deep neural nets. Leveraging the latest advance of neural architects (such as transformers) and bird-eye-view (BEV) representation, the road cognition accuracy keeps improving. However, how to cognize the ``road'' for automated vehicles where there is no we… ▽ More

    Submitted 11 May, 2023; v1 submitted 2 May, 2023; originally announced May 2023.

    Comments: Accepted to ICRA2023

  32. arXiv:2304.14802  [pdf, other

    cs.CL cs.AI cs.LG cs.NE

    ResiDual: Transformer with Dual Residual Connections

    Authors: Shufang Xie, Huishuai Zhang, Junliang Guo, Xu Tan, Jiang Bian, Hany Hassan Awadalla, Arul Menezes, Tao Qin, Rui Yan

    Abstract: Transformer networks have become the preferred architecture for many tasks due to their state-of-the-art performance. However, the optimal way to implement residual connections in Transformer, which are essential for effective training, is still debated. Two widely used variants are the Post-Layer-Normalization (Post-LN) and Pre-Layer-Normalization (Pre-LN) Transformers, which apply layer normaliz… ▽ More

    Submitted 28 April, 2023; originally announced April 2023.

  33. arXiv:2304.09407  [pdf, other

    cs.AI

    Pointerformer: Deep Reinforced Multi-Pointer Transformer for the Traveling Salesman Problem

    Authors: Yan **, Yuandong Ding, Xuanhao Pan, Kun He, Li Zhao, Tao Qin, Lei Song, Jiang Bian

    Abstract: Traveling Salesman Problem (TSP), as a classic routing optimization problem originally arising in the domain of transportation and logistics, has become a critical task in broader domains, such as manufacturing and biology. Recently, Deep Reinforcement Learning (DRL) has been increasingly employed to solve TSP due to its high inference efficiency. Nevertheless, most of existing end-to-end DRL algo… ▽ More

    Submitted 18 April, 2023; originally announced April 2023.

    Comments: Accepted by AAAI 2023, February 2023

  34. arXiv:2304.09116  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers

    Authors: Kai Shen, Zeqian Ju, Xu Tan, Yanqing Liu, Yichong Leng, Lei He, Tao Qin, Sheng Zhao, Jiang Bian

    Abstract: Scaling text-to-speech (TTS) to large-scale, multi-speaker, and in-the-wild datasets is important to capture the diversity in human speech such as speaker identities, prosodies, and styles (e.g., singing). Current large TTS systems usually quantize speech into discrete tokens and use language models to generate these tokens one by one, which suffer from unstable prosody, word skip**/repeating is… ▽ More

    Submitted 30 May, 2023; v1 submitted 18 April, 2023; originally announced April 2023.

    Comments: A large-scale text-to-speech and singing voice synthesis system with latent diffusion models. Update: NaturalSpeech 2 extension to voice conversion and speech enhancement

  35. arXiv:2303.15127  [pdf, other

    cs.LG cs.CR cs.CV

    Learning the Unlearnable: Adversarial Augmentations Suppress Unlearnable Example Attacks

    Authors: Tianrui Qin, Xitong Gao, Juanjuan Zhao, Kejiang Ye, Cheng-Zhong Xu

    Abstract: Unlearnable example attacks are data poisoning techniques that can be used to safeguard public data against unauthorized use for training deep learning models. These methods add stealthy perturbations to the original image, thereby making it difficult for deep learning models to learn from these training data effectively. Current research suggests that adversarial training can, to a certain degree… ▽ More

    Submitted 27 March, 2023; originally announced March 2023.

    Comments: UEraser introduces adversarial augmentations to suppress unlearnable example attacks and outperforms current defenses

  36. arXiv:2303.07457  [pdf, other

    cs.CL cs.AI

    AMOM: Adaptive Masking over Masking for Conditional Masked Language Model

    Authors: Yisheng Xiao, Ruiyang Xu, Lijun Wu, Juntao Li, Tao Qin, Yan-Tie Liu, Min Zhang

    Abstract: Transformer-based autoregressive (AR) methods have achieved appealing performance for varied sequence-to-sequence generation tasks, e.g., neural machine translation, summarization, and code generation, but suffer from low inference efficiency. To speed up the inference stage, many non-autoregressive (NAR) strategies have been proposed in the past few years. Among them, the conditional masked langu… ▽ More

    Submitted 13 March, 2023; originally announced March 2023.

    Comments: Accepted by AAAI2023

  37. arXiv:2302.11354  [pdf, other

    cs.LG cs.AI

    Learning Dynamic Graph Embeddings with Neural Controlled Differential Equations

    Authors: Tiexin Qin, Benjamin Walker, Terry Lyons, Hong Yan, Haoliang Li

    Abstract: This paper focuses on representation learning for dynamic graphs with temporal interactions. A fundamental issue is that both the graph structure and the nodes own their own dynamics, and their blending induces intractable complexity in the temporal evolution over graphs. Drawing inspiration from the recent process of physical dynamic models in deep neural networks, we propose Graph Neural Control… ▽ More

    Submitted 22 February, 2023; originally announced February 2023.

    Comments: 13 pages, 3 figures

  38. arXiv:2302.01129  [pdf, other

    cs.LG cs.AI

    De Novo Molecular Generation via Connection-aware Motif Mining

    Authors: Zijie Geng, Shufang Xie, Yingce Xia, Lijun Wu, Tao Qin, Jie Wang, Yongdong Zhang, Feng Wu, Tie-Yan Liu

    Abstract: De novo molecular generation is an essential task for science discovery. Recently, fragment-based deep generative models have attracted much research attention due to their flexibility in generating novel molecules based on existing molecule fragments. However, the motif vocabulary, i.e., the collection of frequent fragments, is usually built upon heuristic rules, which brings difficulties to capt… ▽ More

    Submitted 26 February, 2023; v1 submitted 2 February, 2023; originally announced February 2023.

  39. arXiv:2301.13755  [pdf, other

    cs.AI cs.LG

    Retrosynthetic Planning with Dual Value Networks

    Authors: Guoqing Liu, Di Xue, Shufang Xie, Yingce Xia, Austin Tripp, Krzysztof Maziarz, Marwin Segler, Tao Qin, Zongzhang Zhang, Tie-Yan Liu

    Abstract: Retrosynthesis, which aims to find a route to synthesize a target molecule from commercially available starting materials, is a critical task in drug discovery and materials design. Recently, the combination of ML-based single-step reaction predictors with multi-step planners has led to promising results. However, the single-step predictors are mostly trained offline to optimize the single-step ac… ▽ More

    Submitted 3 March, 2024; v1 submitted 31 January, 2023; originally announced January 2023.

    Comments: Accepted to ICML 2023

  40. arXiv:2301.12866  [pdf, other

    cs.CL cs.LG

    N-Gram Nearest Neighbor Machine Translation

    Authors: Rui Lv, Junliang Guo, Rui Wang, Xu Tan, Qi Liu, Tao Qin

    Abstract: Nearest neighbor machine translation augments the Autoregressive Translation~(AT) with $k$-nearest-neighbor retrieval, by comparing the similarity between the token-level context representations of the target tokens in the query and the datastore. However, the token-level representation may introduce noise when translating ambiguous words, or fail to provide accurate retrieval results when the rep… ▽ More

    Submitted 7 February, 2023; v1 submitted 30 January, 2023; originally announced January 2023.

  41. arXiv:2301.10295  [pdf, other

    cs.CV cs.SD eess.AS

    Object Segmentation with Audio Context

    Authors: Kaihui Zheng, Yuqing Ren, Zixin Shen, Tianxu Qin

    Abstract: Visual objects often have acoustic signatures that are naturally synchronized with them in audio-bearing video recordings. For this project, we explore the multimodal feature aggregation for video instance segmentation task, in which we integrate audio features into our video segmentation model to conduct an audio-visual learning scheme. Our method is based on existing video instance segmentation… ▽ More

    Submitted 3 January, 2023; originally announced January 2023.

    Comments: Research project for Introduction to Deep Learning (11785) at Carnegie Mellon University

  42. arXiv:2301.08846  [pdf, other

    cs.LG cs.AI cs.CL cs.CV eess.AS

    Regeneration Learning: A Learning Paradigm for Data Generation

    Authors: Xu Tan, Tao Qin, Jiang Bian, Tie-Yan Liu, Yoshua Bengio

    Abstract: Machine learning methods for conditional data generation usually build a map** from source conditional data X to target data Y. The target Y (e.g., text, speech, music, image, video) is usually high-dimensional and complex, and contains information that does not exist in source data, which hinders effective and efficient learning on the source-target map**. In this paper, we present a learning… ▽ More

    Submitted 20 January, 2023; originally announced January 2023.

  43. arXiv:2212.13654  [pdf

    physics.optics cs.CV eess.IV

    Large-scale single-photon imaging

    Authors: Liheng Bian, Haoze Song, Lintao Peng, Xuyang Chang, Xi Yang, Roarke Horstmeyer, Lin Ye, Tong Qin, Dezhi Zheng, Jun Zhang

    Abstract: Benefiting from its single-photon sensitivity, single-photon avalanche diode (SPAD) array has been widely applied in various fields such as fluorescence lifetime imaging and quantum computing. However, large-scale high-fidelity single-photon imaging remains a big challenge, due to the complex hardware manufacture craft and heavy noise disturbance of SPAD arrays. In this work, we introduce deep lea… ▽ More

    Submitted 27 December, 2022; originally announced December 2022.

  44. arXiv:2212.12735  [pdf, other

    cs.LG

    An Adaptive Deep RL Method for Non-Stationary Environments with Piecewise Stable Context

    Authors: Xiaoyu Chen, Xiangming Zhu, Yufeng Zheng, Pushi Zhang, Li Zhao, Wenxue Cheng, Peng Cheng, Yongqiang Xiong, Tao Qin, Jianyu Chen, Tie-Yan Liu

    Abstract: One of the key challenges in deploying RL to real-world applications is to adapt to variations of unknown environment contexts, such as changing terrains in robotic tasks and fluctuated bandwidth in congestion control. Existing works on adaptation to unknown environment contexts either assume the contexts are the same for the whole episode or assume the context variables are Markovian. However, in… ▽ More

    Submitted 24 December, 2022; originally announced December 2022.

    Comments: NeurIPS 2022

  45. arXiv:2212.09979  [pdf, other

    cs.CR cs.CV

    Flareon: Stealthy any2any Backdoor Injection via Poisoned Augmentation

    Authors: Tianrui Qin, Xianghuan He, Xitong Gao, Yiren Zhao, Kejiang Ye, Cheng-Zhong Xu

    Abstract: Open software supply chain attacks, once successful, can exact heavy costs in mission-critical applications. As open-source ecosystems for deep learning flourish and become increasingly universal, they present attackers previously unexplored avenues to code-inject malicious backdoors in deep neural network models. This paper proposes Flareon, a small, stealthy, seemingly harmless code modification… ▽ More

    Submitted 19 December, 2022; originally announced December 2022.

  46. arXiv:2212.02125  [pdf, other

    stat.ML cs.AI cs.LG

    TD3 with Reverse KL Regularizer for Offline Reinforcement Learning from Mixed Datasets

    Authors: Yuanying Cai, Chuheng Zhang, Li Zhao, Wei Shen, Xuyun Zhang, Lei Song, Jiang Bian, Tao Qin, Tieyan Liu

    Abstract: We consider an offline reinforcement learning (RL) setting where the agent need to learn from a dataset collected by rolling out multiple behavior policies. There are two challenges for this setting: 1) The optimal trade-off between optimizing the RL signal and the behavior cloning (BC) signal changes on different states due to the variation of the action coverage induced by different behavior pol… ▽ More

    Submitted 5 December, 2022; originally announced December 2022.

    Comments: Accepted by ICDM-22 (Best Student Paper Runner-Up Awards)

  47. arXiv:2212.01039  [pdf, other

    cs.CL cs.LG eess.AS

    SoftCorrect: Error Correction with Soft Detection for Automatic Speech Recognition

    Authors: Yichong Leng, Xu Tan, Wenjie Liu, Kaitao Song, Rui Wang, Xiang-Yang Li, Tao Qin, Edward Lin, Tie-Yan Liu

    Abstract: Error correction in automatic speech recognition (ASR) aims to correct those incorrect words in sentences generated by ASR models. Since recent ASR models usually have low word error rate (WER), to avoid affecting originally correct tokens, error correction models should only modify incorrect words, and therefore detecting incorrect words is important for error correction. Previous works on error… ▽ More

    Submitted 20 December, 2023; v1 submitted 2 December, 2022; originally announced December 2022.

    Comments: AAAI 2023

  48. arXiv:2211.12733  [pdf, other

    cs.AI cs.RO

    Safety Analysis of Autonomous Driving Systems Based on Model Learning

    Authors: Renjue Li, Tianhang Qin, Pengfei Yang, Cheng-Chao Huang, Youcheng Sun, Lijun Zhang

    Abstract: We present a practical verification method for safety analysis of the autonomous driving system (ADS). The main idea is to build a surrogate model that quantitatively depicts the behaviour of an ADS in the specified traffic scenario. The safety properties proved in the resulting surrogate model apply to the original ADS with a probabilistic guarantee. Furthermore, we explore the safe and the unsaf… ▽ More

    Submitted 23 November, 2022; originally announced November 2022.

  49. arXiv:2211.11209  [pdf, other

    cs.RO

    A Novel Uncalibrated Visual Servoing Controller Baesd on Model-Free Adaptive Control Method with Neural Network

    Authors: Haibin Zeng, Yueyong Lyu, Jiaming Qi, Shuangquan Zou, Tanghao Qin, Wenyu Qin

    Abstract: Nowadays, with the continuous expansion of application scenarios of robotic arms, there are more and more scenarios where nonspecialist come into contact with robotic arms. However, in terms of robotic arm visual servoing, traditional Position-based Visual Servoing (PBVS) requires a lot of calibration work, which is challenging for the nonspecialist to cope with. To cope with this situation, Uncal… ▽ More

    Submitted 21 November, 2022; originally announced November 2022.

    Comments: 16 pages, 8 figures

  50. arXiv:2211.11178  [pdf, other

    cs.RO

    Adaptive Finite-Time Model Estimation and Control for Manipulator Visual Servoing using Sliding Mode Control and Neural Networks

    Authors: Haibin Zeng, Yueyong Lyu, Jiaming Qi, Shuangquan Zou, Tanghao Qin, Wenyu Qin

    Abstract: The image-based visual servoing without models of system is challenging since it is hard to fetch an accurate estimation of hand-eye relationship via merely visual measurement. Whereas, the accuracy of estimated hand-eye relationship expressed in local linear format with Jacobian matrix is important to whole system's performance. In this article, we proposed a finite-time controller as well as a J… ▽ More

    Submitted 20 November, 2022; originally announced November 2022.

    Comments: 24 pages, 10 figures