Skip to main content

Showing 1–50 of 133 results for author: Fang, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.13908  [pdf, other

    cs.RO

    A Decision-Making GPT Model Augmented with Entropy Regularization for Autonomous Vehicles

    Authors: Jiaqi Liu, Shiyu Fang, Xuekai Liu, Lulu Guo, Peng Hang, Jian Sun

    Abstract: In the domain of autonomous vehicles (AVs), decision-making is a critical factor that significantly influences the efficacy of autonomous navigation. As the field progresses, the enhancement of decision-making capabilities in complex environments has become a central area of research within data-driven methodologies. Despite notable advances, existing learning-based decision-making strategies in a… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  2. arXiv:2405.18724  [pdf, other

    q-bio.QM cs.AI cs.LG

    Adapting Differential Molecular Representation with Hierarchical Prompts for Multi-label Property Prediction

    Authors: Linjia Kang, Songhua Zhou, Shuyan Fang, Shichao Liu, Wen Zhang

    Abstract: Accurate prediction of molecular properties is critical in the field of drug discovery. However, existing methods do not fully consider the fact that molecules in the real world usually possess multiple property labels, and complex high-order relationships may exist among these labels. Therefore, molecular representation learning models should generate differential molecular representations that c… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  3. arXiv:2405.10718  [pdf, other

    cs.CV cs.CL

    SignLLM: Sign Languages Production Large Language Models

    Authors: Sen Fang, Lei Wang, Ce Zheng, Yapeng Tian, Chen Chen

    Abstract: In this paper, we introduce the first comprehensive multilingual sign language dataset named Prompt2Sign, which builds from public data including American Sign Language (ASL) and seven others. Our dataset transforms a vast array of videos into a streamlined, model-friendly format, optimized for training with translation models like seq2seq and text2text. Building on this new dataset, we propose Si… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: 33 pages, website at https://signllm.github.io/

  4. arXiv:2404.12782  [pdf, other

    cs.CV cs.AI

    Sentiment-oriented Transformer-based Variational Autoencoder Network for Live Video Commenting

    Authors: Fengyi Fu, Shancheng Fang, Weidong Chen, Zhendong Mao

    Abstract: Automatic live video commenting is with increasing attention due to its significance in narration generation, topic explanation, etc. However, the diverse sentiment consideration of the generated comments is missing from the current methods. Sentimental factors are critical in interactive commenting, and lack of research so far. Thus, in this paper, we propose a Sentiment-oriented Transformer-base… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: 27 pages, 10 figures, ACM Transactions on Multimedia Computing, Communications and Applications, 2024

  5. arXiv:2404.11947  [pdf, other

    cs.LG cs.CV

    VCC-INFUSE: Towards Accurate and Efficient Selection of Unlabeled Examples in Semi-supervised Learning

    Authors: Shijie Fang, Qianhan Feng, Tong Lin

    Abstract: Despite the progress of Semi-supervised Learning (SSL), existing methods fail to utilize unlabeled data effectively and efficiently. Many pseudo-label-based methods select unlabeled examples based on inaccurate confidence scores from the classifier. Most prior work also uses all available unlabeled data without pruning, making it difficult to handle large amounts of unlabeled data. To address thes… ▽ More

    Submitted 21 April, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

    Comments: Accepted paper of IJCAI 2024. Shijie Fang and Qianhan Feng contributed equally to this paper. New version, some problems and typos are fixed

  6. arXiv:2403.19622  [pdf, other

    cs.RO cs.CV

    RH20T-P: A Primitive-Level Robotic Dataset Towards Composable Generalization Agents

    Authors: Zeren Chen, Zhelun Shi, Xiaoya Lu, Lehan He, Sucheng Qian, Hao Shu Fang, Zhenfei Yin, Wanli Ouyang, **g Shao, Yu Qiao, Cewu Lu, Lu Sheng

    Abstract: The ultimate goals of robotic learning is to acquire a comprehensive and generalizable robotic system capable of performing both seen skills within the training distribution and unseen skills in novel environments. Recent progress in utilizing language models as high-level planners has demonstrated that the complexity of tasks can be reduced through decomposing them into primitive-level plans, mak… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: 24 pages, 12 figures, 6 tables

  7. arXiv:2403.17740  [pdf, other

    cs.IR cs.AI

    All-in-One: Heterogeneous Interaction Modeling for Cold-Start Rating Prediction

    Authors: Shuheng Fang, Kangfei Zhao, Yu Rong, Zhixun Li, Jeffrey Xu Yu

    Abstract: Cold-start rating prediction is a fundamental problem in recommender systems that has been extensively studied. Many methods have been proposed that exploit explicit relations among existing data, such as collaborative filtering, social recommendations and heterogeneous information network, to alleviate the data insufficiency issue for cold-start users and items. However, the explicit relations co… ▽ More

    Submitted 28 March, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

    Comments: 14 pages, 9 figures

  8. arXiv:2403.12986  [pdf, other

    cs.CV cs.LG

    BaCon: Boosting Imbalanced Semi-supervised Learning via Balanced Feature-Level Contrastive Learning

    Authors: Qianhan Feng, Lu**g Xie, Shijie Fang, Tong Lin

    Abstract: Semi-supervised Learning (SSL) reduces the need for extensive annotations in deep learning, but the more realistic challenge of imbalanced data distribution in SSL remains largely unexplored. In Class Imbalanced Semi-supervised Learning (CISSL), the bias introduced by unreliable pseudo-labels can be exacerbated by imbalanced data distributions. Most existing methods address this issue at instance-… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

    Comments: Accpeted paper of AAAI2024

  9. arXiv:2403.06951  [pdf, other

    cs.CV

    DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations

    Authors: Tianhao Qi, Shancheng Fang, Yanze Wu, Hongtao Xie, Jiawei Liu, Lang Chen, Qian He, Yongdong Zhang

    Abstract: The diffusion-based text-to-image model harbors immense potential in transferring reference style. However, current encoder-based approaches significantly impair the text controllability of text-to-image models while transferring styles. In this paper, we introduce DEADiff to address this issue using the following two strategies: 1) a mechanism to decouple the style and semantics of reference imag… ▽ More

    Submitted 11 March, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR 2024

  10. arXiv:2402.14136  [pdf, other

    cs.RO cs.LG eess.SP

    GDTM: An Indoor Geospatial Tracking Dataset with Distributed Multimodal Sensors

    Authors: Ho Lyun Jeong, Ziqi Wang, Colin Samplawski, Jason Wu, Shiwei Fang, Lance M. Kaplan, Deepak Ganesan, Benjamin Marlin, Mani Srivastava

    Abstract: Constantly locating moving objects, i.e., geospatial tracking, is essential for autonomous building infrastructure. Accurate and robust geospatial tracking often leverages multimodal sensor fusion algorithms, which require large datasets with time-aligned, synchronized data from various sensor types. However, such datasets are not readily available. Hence, we propose GDTM, a nine-hour dataset for… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

  11. arXiv:2402.12913  [pdf, other

    cs.CL

    OPDAI at SemEval-2024 Task 6: Small LLMs can Accelerate Hallucination Detection with Weakly Supervised Data

    Authors: Chengcheng Wei, Ze Chen, Songtan Fang, Jiarong He, Max Gao

    Abstract: This paper mainly describes a unified system for hallucination detection of LLMs, which wins the second prize in the model-agnostic track of the SemEval-2024 Task 6, and also achieves considerable results in the model-aware track. This task aims to detect hallucination with LLMs for three different text-generation tasks without labeled training data. We utilize prompt engineering and few-shot lear… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

  12. arXiv:2401.17626  [pdf

    cs.SE cs.AI cs.LG

    Generative AI to Generate Test Data Generators

    Authors: Benoit Baudry, Khashayar Etemadi, Sen Fang, Yogya Gamage, Yi Liu, Yuxin Liu, Martin Monperrus, Javier Ron, André Silva, Deepika Tiwari

    Abstract: Generating fake data is an essential dimension of modern software testing, as demonstrated by the number and significance of data faking libraries. Yet, developers of faking libraries cannot keep up with the wide range of data to be generated for different natural languages and domains. In this paper, we assess the ability of generative AI for generating test data in different domains. We design t… ▽ More

    Submitted 14 June, 2024; v1 submitted 31 January, 2024; originally announced January 2024.

    Journal ref: IEEE Software, 2024

  13. arXiv:2401.15159  [pdf, other

    cs.RO

    RABBIT: A Robot-Assisted Bed Bathing System with Multimodal Perception and Integrated Compliance

    Authors: Rishabh Madan, Skyler Valdez, David Kim, Sujie Fang, Luoyan Zhong, Diego Virtue, Tapomayukh Bhattacharjee

    Abstract: This paper introduces RABBIT, a novel robot-assisted bed bathing system designed to address the growing need for assistive technologies in personal hygiene tasks. It combines multimodal perception and dual (software and hardware) compliance to perform safe and comfortable physical human-robot interaction. Using RGB and thermal imaging to segment dry, soapy, and wet skin regions accurately, RABBIT… ▽ More

    Submitted 26 January, 2024; originally announced January 2024.

    Comments: 10 pages, 8 figures, 19th Annual ACM/IEEE International Conference on Human Robot Interaction (HRI)

  14. arXiv:2401.14051  [pdf, other

    cs.GR cs.CV

    A real-time rendering method for high albedo anisotropic materials with multiple scattering

    Authors: Shun Fang, Xing Feng, Ming Cui

    Abstract: We propose a neural network-based real-time volume rendering method for realistic and efficient rendering of volumetric media. The traditional volume rendering method uses path tracing to solve the radiation transfer equation, which requires a huge amount of calculation and cannot achieve real-time rendering. Therefore, this paper uses neural networks to simulate the iterative integration process… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

  15. arXiv:2401.12862  [pdf, other

    cs.CV cs.AI

    FedRSU: Federated Learning for Scene Flow Estimation on Roadside Units

    Authors: Shaoheng Fang, Rui Ye, Wenhao Wang, Zuhong Liu, Yuxiao Wang, Yafei Wang, Siheng Chen, Yanfeng Wang

    Abstract: Roadside unit (RSU) can significantly improve the safety and robustness of autonomous vehicles through Vehicle-to-Everything (V2X) communication. Currently, the usage of a single RSU mainly focuses on real-time inference and V2X collaboration, while neglecting the potential value of the high-quality data collected by RSU sensors. Integrating the vast amounts of data from numerous RSUs can provide… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

  16. arXiv:2401.12456  [pdf, ps, other

    cs.CV cs.AI cs.GR

    Exploration and Improvement of Nerf-based 3D Scene Editing Techniques

    Authors: Shun Fang, Ming Cui, Xing Feng, Yanan Zhang

    Abstract: NeRF's high-quality scene synthesis capability was quickly accepted by scholars in the years after it was proposed, and significant progress has been made in 3D scene representation and synthesis. However, the high computational cost limits intuitive and efficient editing of scenes, making NeRF's development in the scene editing field facing many challenges. This paper reviews the preliminary expl… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

  17. Methods and strategies for improving the novel view synthesis quality of neural radiation field

    Authors: Shun Fang, Ming Cui, Xing Feng, Yanna Lv

    Abstract: Neural Radiation Field (NeRF) technology can learn a 3D implicit model of a scene from 2D images and synthesize realistic novel view images. This technology has received widespread attention from the industry and has good application prospects. In response to the problem that the rendering quality of NeRF images needs to be improved, many researchers have proposed various methods to improve the re… ▽ More

    Submitted 17 April, 2024; v1 submitted 22 January, 2024; originally announced January 2024.

    ACM Class: I.2; I.4; I.6

    Journal ref: IEEE ACCESS 12 (2024) 50548-50555

  18. arXiv:2401.11499  [pdf, other

    cs.CV

    Self-Supervised Bird's Eye View Motion Prediction with Cross-Modality Signals

    Authors: Shaoheng Fang, Zuhong Liu, Mingyu Wang, Chenxin Xu, Yiqi Zhong, Siheng Chen

    Abstract: Learning the dense bird's eye view (BEV) motion flow in a self-supervised manner is an emerging research for robotics and autonomous driving. Current self-supervised methods mainly rely on point correspondences between point clouds, which may introduce the problems of fake flow and inconsistency, hindering the model's ability to learn accurate and realistic motion. In this paper, we introduce a no… ▽ More

    Submitted 21 January, 2024; originally announced January 2024.

  19. arXiv:2312.15698  [pdf, other

    cs.SE cs.LG

    RepairLLaMA: Efficient Representations and Fine-Tuned Adapters for Program Repair

    Authors: André Silva, Sen Fang, Martin Monperrus

    Abstract: Automated Program Repair (APR) has evolved significantly with the advent of Large Language Models (LLMs). Fine-tuning LLMs for program repair is a recent avenue of research, with many dimensions which have not been explored. Existing work mostly fine-tune LLMs with naive code representations and does not scale to frontier models. To address this problem, we propose RepairLLaMA, a novel program rep… ▽ More

    Submitted 7 June, 2024; v1 submitted 25 December, 2023; originally announced December 2023.

  20. arXiv:2312.12735  [pdf

    cs.CV

    MetaSegNet: Metadata-collaborative Vision-Language Representation Learning for Semantic Segmentation of Remote Sensing Images

    Authors: Libo Wang, Sijun Dong, Ying Chen, Xiaoliang Meng, Shenghui Fang, Ayman Habib, Songlin Fei

    Abstract: Semantic segmentation of remote sensing images plays a vital role in a wide range of Earth Observation (EO) applications, such as land use land cover map**, environment monitoring, and sustainable development. Driven by rapid developments in Artificial Intelligence (AI), deep learning (DL) has emerged as the mainstream tool for semantic segmentation and has achieved many breakthroughs in the fie… ▽ More

    Submitted 25 March, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

  21. arXiv:2312.11225  [pdf, other

    cs.CR

    MAD-MulW: A Multi-Window Anomaly Detection Framework for BGP Security Events

    Authors: Songtao Peng, Yi** Chen, Xincheng Shu, Wu Shuai, Shenhao Fang, Zhongyuan Ruan, Qi Xuan

    Abstract: In recent years, various international security events have occurred frequently and interacted between real society and cyberspace. Traditional traffic monitoring mainly focuses on the local anomalous status of events due to a large amount of data. BGP-based event monitoring makes it possible to perform differential analysis of international events. For many existing traffic anomaly detection meth… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

    Comments: 10 pages, 8 figures

  22. arXiv:2311.05606  [pdf, other

    cs.LG

    Diffusion-Generative Multi-Fidelity Learning for Physical Simulation

    Authors: Zheng Wang, Shibo Li, Shikai Fang, Shandian Zhe

    Abstract: Multi-fidelity surrogate learning is important for physical simulation related applications in that it avoids running numerical solvers from scratch, which is known to be costly, and it uses multi-fidelity examples for training and greatly reduces the cost of data collection. Despite the variety of existing methods, they all build a model to map the input parameters outright to the solution output… ▽ More

    Submitted 9 November, 2023; originally announced November 2023.

  23. arXiv:2311.04829  [pdf, other

    cs.LG stat.ML

    Functional Bayesian Tucker Decomposition for Continuous-indexed Tensor Data

    Authors: Shikai Fang, Xin Yu, Zheng Wang, Shibo Li, Mike Kirby, Shandian Zhe

    Abstract: Tucker decomposition is a powerful tensor model to handle multi-aspect data. It demonstrates the low-rank property by decomposing the grid-structured data as interactions between a core tensor and a set of object representations (factors). A fundamental assumption of such decomposition is that there are finite objects in each aspect or mode, corresponding to discrete indexes of data entries. Howev… ▽ More

    Submitted 18 March, 2024; v1 submitted 8 November, 2023; originally announced November 2023.

    Journal ref: The Twelfth International Conference on Learning Representations (ICLR 2024)

  24. arXiv:2311.04465  [pdf, other

    cs.LG cs.CE

    Solving High Frequency and Multi-Scale PDEs with Gaussian Processes

    Authors: Shikai Fang, Madison Cooley, Da Long, Shibo Li, Robert Kirby, Shandian Zhe

    Abstract: Machine learning based solvers have garnered much attention in physical simulation and scientific computing, with a prominent example, physics-informed neural networks (PINNs). However, PINNs often struggle to solve high-frequency and multi-scale PDEs, which can be due to spectral bias during neural network training. To address this problem, we resort to the Gaussian process (GP) framework. To fle… ▽ More

    Submitted 18 March, 2024; v1 submitted 8 November, 2023; originally announced November 2023.

    Journal ref: The Twelfth International Conference on Learning Representations (ICLR 2024)

  25. arXiv:2310.19666  [pdf, other

    cs.LG stat.ML

    Dynamic Tensor Decomposition via Neural Diffusion-Reaction Processes

    Authors: Zheng Wang, Shikai Fang, Shibo Li, Shandian Zhe

    Abstract: Tensor decomposition is an important tool for multiway data analysis. In practice, the data is often sparse yet associated with rich temporal information. Existing methods, however, often under-use the time information and ignore the structural knowledge within the sparsely observed tensor entries. To overcome these limitations and to better capture the underlying temporal structure, we propose Dy… ▽ More

    Submitted 30 October, 2023; originally announced October 2023.

  26. arXiv:2310.19112  [pdf, other

    cs.CV cs.AI cs.LG

    Efficient IoT Inference via Context-Awareness

    Authors: Mohammad Mehdi Rastikerdar, ** Huang, Shiwei Fang, Hui Guan, Deepak Ganesan

    Abstract: While existing strategies to execute deep learning-based classification on low-power platforms assume the models are trained on all classes of interest, this paper posits that adopting context-awareness i.e. narrowing down a classification task to the current deployment context consisting of only recent inference queries can substantially enhance performance in resource-constrained environments. W… ▽ More

    Submitted 3 December, 2023; v1 submitted 29 October, 2023; originally announced October 2023.

    Comments: 12 pages, 8 figures

  27. arXiv:2310.17879  [pdf, other

    cs.RO

    Split Covariance Intersection Filter Based Visual Localization With Accurate AprilTag Map For Warehouse Robot Navigation

    Authors: Susu Fang, Yanhao Li, Hao Li

    Abstract: Accurate and efficient localization with conveniently-established map is the fundamental requirement for mobile robot operation in warehouse environments. An accurate AprilTag map can be conveniently established with the help of LiDAR-based SLAM. It is true that a LiDAR-based system is usually not commercially competitive in contrast with a vision-based system, yet fortunately for warehouse applic… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

  28. arXiv:2310.17021  [pdf, other

    cs.LG

    Streaming Factor Trajectory Learning for Temporal Tensor Decomposition

    Authors: Shikai Fang, Xin Yu, Shibo Li, Zheng Wang, Robert Kirby, Shandian Zhe

    Abstract: Practical tensor data is often along with time information. Most existing temporal decomposition approaches estimate a set of fixed factors for the objects in each tensor mode, and hence cannot capture the temporal evolution of the objects' representation. More important, we lack an effective approach to capture such evolution from streaming data, which is common in real-world applications. To add… ▽ More

    Submitted 7 November, 2023; v1 submitted 25 October, 2023; originally announced October 2023.

  29. arXiv:2310.07739  [pdf, other

    cs.CY cs.HC

    Identity Collapse? Realignment of Taiwanese Voters in the 2024 Presidential Elections on Social Media

    Authors: Ho-Chun Herbert Chang, Sunny Fang

    Abstract: The 2024 Taiwanese Presidential Election is not just a critical geopolitical event, it also engages with long-standing debate in politics regarding the factors that lead to the rise of new political parties and candidates. In 2021, the Economist called Taiwan "the most dangerous place on earth" due to its critical role in a fragile supply chain. Additionally, a four-candidate race has emerged in a… ▽ More

    Submitted 10 October, 2023; originally announced October 2023.

    Comments: 23 pages

  30. arXiv:2309.14846  [pdf, other

    cs.SE cs.AI

    Supersonic: Learning to Generate Source Code Optimizations in C/C++

    Authors: Zimin Chen, Sen Fang, Martin Monperrus

    Abstract: Software optimization refines programs for resource efficiency while preserving functionality. Traditionally, it is a process done by developers and compilers. This paper introduces a third option, automated optimization at the source code level. We present Supersonic, a neural approach targeting minor source code modifications for optimization. Using a seq2seq model, Supersonic is trained on C/C+… ▽ More

    Submitted 2 October, 2023; v1 submitted 26 September, 2023; originally announced September 2023.

  31. arXiv:2309.07765  [pdf, other

    cs.SD cs.CL eess.AS

    Echotune: A Modular Extractor Leveraging the Variable-Length Nature of Speech in ASR Tasks

    Authors: Sizhou Chen, Songyang Gao, Sen Fang

    Abstract: The Transformer architecture has proven to be highly effective for Automatic Speech Recognition (ASR) tasks, becoming a foundational component for a plethora of research in the domain. Historically, many approaches have leaned on fixed-length attention windows, which becomes problematic for varied speech samples in duration and complexity, leading to data over-smoothing and neglect of essential lo… ▽ More

    Submitted 7 April, 2024; v1 submitted 14 September, 2023; originally announced September 2023.

  32. arXiv:2309.04917  [pdf, other

    cs.CV

    Editing 3D Scenes via Text Prompts without Retraining

    Authors: Shuangkang Fang, Yufeng Wang, Yi Yang, Yi-Hsuan Tsai, Wenrui Ding, Shuchang Zhou, Ming-Hsuan Yang

    Abstract: Numerous diffusion models have recently been applied to image synthesis and editing. However, editing 3D scenes is still in its early stages. It poses various challenges, such as the requirement to design specific methods for different editing types, retraining new models for various 3D scenes, and the absence of convenient human interaction during editing. To tackle these issues, we introduce a t… ▽ More

    Submitted 29 November, 2023; v1 submitted 9 September, 2023; originally announced September 2023.

    Comments: Project Website: https://sk-fun.fun/DN2N

  33. arXiv:2309.04566  [pdf, other

    cs.IT cs.CR

    STAR-RIS-Assisted-Full-Duplex Jamming Design for Secure Wireless Communications System

    Authors: Yun Wen, Gaojie Chen, Sisai Fang, Zheng Chu, Pei Xiao, Rahim Tafazolli

    Abstract: Physical layer security (PLS) technologies are expected to play an important role in the next-generation wireless networks, by providing secure communication to protect critical and sensitive information from illegitimate devices. In this paper, we propose a novel secure communication scheme where the legitimate receiver use full-duplex (FD) technology to transmit jamming signals with the assistan… ▽ More

    Submitted 8 September, 2023; originally announced September 2023.

    Comments: 12 pages, 7 figures

  34. arXiv:2308.16082  [pdf, other

    cs.CV

    SignDiff: Learning Diffusion Models for American Sign Language Production

    Authors: Sen Fang, Chunyu Sui, Xuedong Zhang, Yapeng Tian

    Abstract: The field of Sign Language Production (SLP) lacked a large-scale, pre-trained model based on deep learning for continuous American Sign Language (ASL) production in the past decade. This limitation hampers communication for all individuals with disabilities relying on ASL. To address this issue, we undertook the secondary development and utilization of How2Sign, one of the largest publicly availab… ▽ More

    Submitted 30 August, 2023; originally announced August 2023.

  35. arXiv:2308.14906  [pdf, other

    cs.LG stat.ML

    BayOTIDE: Bayesian Online Multivariate Time series Imputation with functional decomposition

    Authors: Shikai Fang, Qingsong Wen, Yingtao Luo, Shandian Zhe, Liang Sun

    Abstract: In real-world scenarios like traffic and energy, massive time-series data with missing values and noises are widely observed, even sampled irregularly. While many imputation methods have been proposed, most of them work with a local horizon, which means models are trained by splitting the long sequence into batches of fit-sized patches. This local horizon can make models ignore global trends or pe… ▽ More

    Submitted 30 May, 2024; v1 submitted 28 August, 2023; originally announced August 2023.

    Comments: Accepted by The 41st International Conference on Machine Learning (ICML 2024)

  36. arXiv:2308.11557  [pdf, other

    eess.IV cs.CV

    Open Set Synthetic Image Source Attribution

    Authors: Shengbang Fang, Tai D. Nguyen, Matthew C. Stamm

    Abstract: AI-generated images have become increasingly realistic and have garnered significant public attention. While synthetic images are intriguing due to their realism, they also pose an important misinformation threat. To address this new threat, researchers have developed multiple algorithms to detect synthetic images and identify their source generators. However, most existing source attribution tech… ▽ More

    Submitted 22 August, 2023; originally announced August 2023.

  37. arXiv:2308.10158  [pdf, other

    cs.CV

    HODN: Disentangling Human-Object Feature for HOI Detection

    Authors: Shuman Fang, Zhiwen Lin, Ke Yan, Jie Li, Xianming Lin, Rongrong Ji

    Abstract: The task of Human-Object Interaction (HOI) detection is to detect humans and their interactions with surrounding objects, where transformer-based methods show dominant advances currently. However, these methods ignore the relationship among humans, objects, and interactions: 1) human features are more contributive than object ones to interaction prediction; 2) interactive information disturbs the… ▽ More

    Submitted 7 December, 2023; v1 submitted 20 August, 2023; originally announced August 2023.

    Comments: Accepted by TMM 2023

  38. arXiv:2308.02606  [pdf, other

    cs.CV

    Improving Human-Object Interaction Detection via Virtual Image Learning

    Authors: Shuman Fang, Shuai Liu, Jie Li, Guannan Jiang, Xianming Lin, Rongrong Ji

    Abstract: Human-Object Interaction (HOI) detection aims to understand the interactions between humans and objects, which plays a curtail role in high-level semantic understanding tasks. However, most works pursue designing better architectures to learn overall features more efficiently, while ignoring the long-tail nature of interaction-object pair categories. In this paper, we propose to alleviate the impa… ▽ More

    Submitted 4 August, 2023; originally announced August 2023.

    Comments: Accepted by ACM MM 2023

  39. arXiv:2307.15898  [pdf, other

    cs.SD cs.AI eess.AS

    UniBriVL: Robust Universal Representation and Generation of Audio Driven Diffusion Models

    Authors: Sen Fang, Bowen Gao, Yangjian Wu, Teik Toe Teoh

    Abstract: Multimodal large models have been recognized for their advantages in various performance and downstream tasks. The development of these models is crucial towards achieving general artificial intelligence in the future. In this paper, we propose a novel universal language representation learning method called UniBriVL, which is based on Bridging-Vision-and-Language (BriVL). Universal BriVL embeds a… ▽ More

    Submitted 9 September, 2023; v1 submitted 29 July, 2023; originally announced July 2023.

    Comments: Voice-Text fusion input; The first work of audio driven diffusion model. arXiv admin note: text overlap with arXiv:2303.04585

  40. arXiv:2307.11778  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Transsion TSUP's speech recognition system for ASRU 2023 MADASR Challenge

    Authors: Xiaoxiao Li, Gaosheng Zhang, An Zhu, Weiyong Li, Shuming Fang, Xiaoyue Yang, Jianchao Zhu

    Abstract: This paper presents a speech recognition system developed by the Transsion Speech Understanding Processing Team (TSUP) for the ASRU 2023 MADASR Challenge. The system focuses on adapting ASR models for low-resource Indian languages and covers all four tracks of the challenge. For tracks 1 and 2, the acoustic model utilized a squeezeformer encoder and bidirectional transformer decoder with joint CTC… ▽ More

    Submitted 19 July, 2023; originally announced July 2023.

  41. arXiv:2307.00300  [pdf, other

    cs.CV

    DreamIdentity: Improved Editability for Efficient Face-identity Preserved Image Generation

    Authors: Zhuowei Chen, Shancheng Fang, Wei Liu, Qian He, Mengqi Huang, Yongdong Zhang, Zhendong Mao

    Abstract: While large-scale pre-trained text-to-image models can synthesize diverse and high-quality human-centric images, an intractable problem is how to preserve the face identity for conditioned face images. Existing methods either require time-consuming optimization for each face-identity or learning an efficient encoder at the cost of harming the editability of models. In this work, we present an opti… ▽ More

    Submitted 1 July, 2023; originally announced July 2023.

    Comments: Project page: https://dreamidentity.github.io/

  42. arXiv:2306.17561  [pdf, ps, other

    cs.IT

    Weighted Sum Rate Enhancement by Using Dual-Side IOS-Assisted Full-Duplex for Multi-User MIMO Systems

    Authors: Sisai Fang, Gaojie Chen, Chong Huang, Yue Gao, Yonghui Li, Kai-Kit Wong, Jonathon A. Chambers

    Abstract: This paper established a novel multi-input multi-output (MIMO) communication network, in the presence of full-duplex (FD) transmitters and receivers with the assistance of dual-side intelligent omni surface. Compared with the traditional IOS, the dual-side IOS allows signals from both sides to reflect and refract simultaneously, which further exploits the potential of metasurfaces to avoid frequen… ▽ More

    Submitted 30 June, 2023; originally announced June 2023.

  43. arXiv:2306.02407  [pdf, other

    cs.CV cs.AI cs.DC cs.LG

    Heteroskedastic Geospatial Tracking with Distributed Camera Networks

    Authors: Colin Samplawski, Shiwei Fang, Ziqi Wang, Deepak Ganesan, Mani Srivastava, Benjamin M. Marlin

    Abstract: Visual object tracking has seen significant progress in recent years. However, the vast majority of this work focuses on tracking objects within the image plane of a single camera and ignores the uncertainty associated with predicted object locations. In this work, we focus on the geospatial object tracking problem using data from a distributed camera network. The goal is to predict an object's tr… ▽ More

    Submitted 4 June, 2023; originally announced June 2023.

  44. arXiv:2305.11489  [pdf, ps, other

    cs.LG cs.AI

    Incomplete Multi-view Clustering via Diffusion Completion

    Authors: Sifan Fang

    Abstract: Incomplete multi-view clustering is a challenging and non-trivial task to provide effective data analysis for large amounts of unlabeled data in the real world. All incomplete multi-view clustering methods need to address the problem of how to reduce the impact of missing views. To address this issue, we propose diffusion completion to recover the missing views integrated into an incomplete multi-… ▽ More

    Submitted 19 May, 2023; originally announced May 2023.

  45. arXiv:2305.07386  [pdf, other

    cs.LG

    One-step Bipartite Graph Cut: A Normalized Formulation and Its Application to Scalable Subspace Clustering

    Authors: Si-Guo Fang, Dong Huang, Chang-Dong Wang, Jian-Huang Lai

    Abstract: The bipartite graph structure has shown its promising ability in facilitating the subspace clustering and spectral clustering algorithms for large-scale datasets. To avoid the post-processing via k-means during the bipartite graph partitioning, the constrained Laplacian rank (CLR) is often utilized for constraining the number of connected components (i.e., clusters) in the bipartite graph, which,… ▽ More

    Submitted 12 May, 2023; originally announced May 2023.

  46. arXiv:2305.07247  [pdf, other

    cs.LG

    Provably Convergent Schrödinger Bridge with Applications to Probabilistic Time Series Imputation

    Authors: Yu Chen, Wei Deng, Shikai Fang, Fengpei Li, Nicole Tianjiao Yang, Yikai Zhang, Kashif Rasul, Shandian Zhe, Anderson Schneider, Yuriy Nevmyvaka

    Abstract: The Schrödinger bridge problem (SBP) is gaining increasing attention in generative modeling and showing promising potential even in comparison with the score-based generative models (SGMs). SBP can be interpreted as an entropy-regularized optimal transport problem, which conducts projections onto every other marginal alternatingly. However, in practice, only approximated projections are accessible… ▽ More

    Submitted 10 September, 2023; v1 submitted 12 May, 2023; originally announced May 2023.

    Comments: Accepted by ICML 2023

  47. arXiv:2305.05784  [pdf, other

    cs.CV cs.CR

    Comprehensive Dataset of Synthetic and Manipulated Overhead Imagery for Development and Evaluation of Forensic Tools

    Authors: Brandon B. May, Kirill Trapeznikov, Shengbang Fang, Matthew C. Stamm

    Abstract: We present a first of its kind dataset of overhead imagery for development and evaluation of forensic tools. Our dataset consists of real, fully synthetic and partially manipulated overhead imagery generated from a custom diffusion model trained on two sets of different zoom levels and on two sources of pristine data. We developed our model to support controllable generation of multiple manipulati… ▽ More

    Submitted 9 May, 2023; originally announced May 2023.

  48. arXiv:2305.03563  [pdf, other

    cs.MA

    Cooperative Driving of Connected Autonomous Vehicles in Heterogeneous Mixed Traffic: A Game Theoretic Approach

    Authors: Shiyu Fang, Peng Hang, Chongfeng Wei, Yang Xing, Jian Sun

    Abstract: High-density, unsignalized intersection has always been a bottleneck of efficiency and safety. The emergence of Connected Autonomous Vehicles (CAVs) results in a mixed traffic condition, further increasing the complexity of the transportation system. Against this background, this paper aims to study the intricate and heterogeneous interaction of vehicles and conflict resolution at the high-density… ▽ More

    Submitted 5 May, 2023; originally announced May 2023.

  49. arXiv:2304.04012  [pdf, other

    cs.CV

    Progressive Volume Distillation with Active Learning for Efficient NeRF Architecture Conversion

    Authors: Shuangkang Fang, Yufeng Wang, Yi Yang, Weixin Xu, Heng Wang, Wenrui Ding, Shuchang Zhou

    Abstract: Neural Radiance Fields (NeRF) have been widely adopted as practical and versatile representations for 3D scenes, facilitating various downstream tasks. However, different architectures, including the plain Multi-Layer Perceptron (MLP), Tensors, low-rank Tensors, Hashtables, and their combinations, entail distinct trade-offs. For instance, representations based on Hashtables enable faster rendering… ▽ More

    Submitted 18 May, 2024; v1 submitted 8 April, 2023; originally announced April 2023.

    Comments: Project website: https://sk-fun.fun/PVD-AL

  50. arXiv:2303.09998  [pdf, other

    cs.CV

    TBP-Former: Learning Temporal Bird's-Eye-View Pyramid for Joint Perception and Prediction in Vision-Centric Autonomous Driving

    Authors: Shaoheng Fang, Zi Wang, Yiqi Zhong, Junhao Ge, Siheng Chen, Yanfeng Wang

    Abstract: Vision-centric joint perception and prediction (PnP) has become an emerging trend in autonomous driving research. It predicts the future states of the traffic participants in the surrounding environment from raw RGB images. However, it is still a critical challenge to synchronize features obtained at multiple camera views and timestamps due to inevitable geometric distortions and further exploit t… ▽ More

    Submitted 22 March, 2023; v1 submitted 17 March, 2023; originally announced March 2023.

    Comments: CVPR 2023