Skip to main content

Showing 1–50 of 88 results for author: Xiang, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.19055  [pdf, other

    cs.CV

    SimpleFusion: A Simple Fusion Framework for Infrared and Visible Images

    Authors: Ming Chen, Yuxuan Cheng, Xinwei He, Xinyue Wang, Yan Aze, **hai Xiang

    Abstract: Integrating visible and infrared images into one high-quality image, also known as visible and infrared image fusion, is a challenging yet critical task for many downstream vision tasks. Most existing works utilize pretrained deep neural networks or design sophisticated frameworks with strong priors for this task, which may be unsuitable or lack flexibility. This paper presents SimpleFusion, a sim… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: code:https://github.com/hxwxss/SimpleFusion-A-Simple-Fusion-Framework-for-Infrared-and-Visible-Images

  2. arXiv:2406.09904  [pdf, other

    cs.LG

    QQQ: Quality Quattuor-Bit Quantization for Large Language Models

    Authors: Ying Zhang, Peng Zhang, Mincong Huang, **gyang Xiang, Yujie Wang, Chao Wang, Yineng Zhang, Lei Yu, Chuan Liu, Wei Lin

    Abstract: Quantization is a proven effective method for compressing large language models. Although popular techniques like W8A8 and W4A16 effectively maintain model performance, they often fail to concurrently speed up the prefill and decoding stages of inference. W4A8 is a promising strategy to accelerate both of them while usually leads to a significant performance degradation. To address these issues, w… ▽ More

    Submitted 28 June, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

  3. arXiv:2406.09455  [pdf, other

    cs.CV cs.AI cs.CL

    Pandora: Towards General World Model with Natural Language Actions and Video States

    Authors: Jiannan Xiang, Guangyi Liu, Yi Gu, Qiyue Gao, Yuting Ning, Yuheng Zha, Zeyu Feng, Tianhua Tao, Shibo Hao, Yemin Shi, Zhengzhong Liu, Eric P. Xing, Zhiting Hu

    Abstract: World models simulate future states of the world in response to different actions. They facilitate interactive content creation and provides a foundation for grounded, long-horizon reasoning. Current foundation models do not fully meet the capabilities of general world models: large language models (LLMs) are constrained by their reliance on language modality and their limited understanding of the… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Website: https://world-model.maitrix.org/

  4. arXiv:2406.06534  [pdf, other

    cs.CV eess.IV physics.optics

    Compressed Meta-Optical Encoder for Image Classification

    Authors: Anna Wirth-Singh, **lin Xiang, Minho Choi, Johannes E. Fröch, Luocheng Huang, Shane Colburn, Eli Shlizerman, Arka Majumdar

    Abstract: Optical and hybrid convolutional neural networks (CNNs) recently have become of increasing interest to achieve low-latency, low-power image classification and computer vision tasks. However, implementing optical nonlinearity is challenging, and omitting the nonlinear layers in a standard CNN comes at a significant reduction in accuracy. In this work, we use knowledge distillation to compress modif… ▽ More

    Submitted 14 June, 2024; v1 submitted 22 April, 2024; originally announced June 2024.

  5. arXiv:2406.00671  [pdf, other

    cs.RO

    An Efficient Trajectory Generation for Bi-copter Flight in Tight Space

    Authors: Xin Dong, Yangjie Cui, **gwu Xiang, Daochun Li, Zhan Tu

    Abstract: Unlike squared (or alike) quadrotors, elongated bi-copters leverage natural superiority in crossing tight spaces. To date, extensive works have focused on the design, modeling, and control of bi-copters. Besides, a proper motion planner utilizing bi-copters' shape characteristics is essential to efficiently and safely traverse tight spaces, yet it has rarely been studied. Current motion planning m… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

    Comments: 8 pages,8 figures

  6. arXiv:2405.07468  [pdf

    cs.CL cs.AI

    Evaluating large language models in medical applications: a survey

    Authors: Xiaolan Chen, Jiayang Xiang, Shanfu Lu, Yexin Liu, Mingguang He, Danli Shi

    Abstract: Large language models (LLMs) have emerged as powerful tools with transformative potential across numerous domains, including healthcare and medicine. In the medical domain, LLMs hold promise for tasks ranging from clinical decision support to patient education. However, evaluating the performance of LLMs in medical contexts presents unique challenges due to the complex and critical nature of medic… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: 4 figures, 1 table

  7. arXiv:2404.12833  [pdf, other

    cs.SE

    How Far Can We Go with Practical Function-Level Program Repair?

    Authors: Jiahong Xiang, Xiaoyang Xu, Fanchu Kong, Mingyuan Wu, Haotian Zhang, Yuqun Zhang

    Abstract: Recently, multiple Automated Program Repair (APR) techniques based on Large Language Models (LLMs) have been proposed to enhance the repair performance. While these techniques mainly focus on the single-line or hunk-level repair, they face significant challenges in real-world application due to the limited repair task scope and costly statement-level fault localization. However, the more practical… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: https://github.com/GhabiX/SRepair/

  8. arXiv:2404.07833  [pdf

    cs.CV cs.LG

    Streamlined Photoacoustic Image Processing with Foundation Models: A Training-Free Solution

    Authors: Handi Deng, Yucheng Zhou, Jiaxuan Xiang, Liujie Gu, Yan Luo, Hai Feng, Mingyuan Liu, Cheng Ma

    Abstract: Foundation models have rapidly evolved and have achieved significant accomplishments in computer vision tasks. Specifically, the prompt mechanism conveniently allows users to integrate image prior information into the model, making it possible to apply models without any training. Therefore, we propose a method based on foundation models and zero training to solve the tasks of photoacoustic (PA) i… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  9. arXiv:2404.06760  [pdf, other

    cs.CL cs.AI

    DiffusionDialog: A Diffusion Model for Diverse Dialog Generation with Latent Space

    Authors: Jianxiang Xiang, Zhenhua Liu, Haodong Liu, Yin Bai, Jia Cheng, Wenliang Chen

    Abstract: In real-life conversations, the content is diverse, and there exists the one-to-many problem that requires diverse generation. Previous studies attempted to introduce discrete or Gaussian-based continuous latent variables to address the one-to-many problem, but the diversity is limited. Recently, diffusion models have made breakthroughs in computer vision, and some attempts have been made in natur… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

    Comments: LREC-COLING 2024 camera ready

  10. arXiv:2404.00361  [pdf, other

    cs.CL

    Controllable and Diverse Data Augmentation with Large Language Model for Low-Resource Open-Domain Dialogue Generation

    Authors: Zhenhua Liu, Tong Zhu, Jianxiang Xiang, Wenliang Chen

    Abstract: Data augmentation (DA) is crucial to mitigate model training instability and over-fitting problems in low-resource open-domain dialogue generation. However, traditional DA methods often neglect semantic data diversity, restricting the overall quality. Recently, large language models (LLM) have been used for DA to generate diversified dialogues. However, they have limited controllability and tend t… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Comments: 13 pages, 5 figures

  11. arXiv:2403.11503  [pdf, other

    cs.CV

    Diffusion Models are Geometry Critics: Single Image 3D Editing Using Pre-Trained Diffusion Priors

    Authors: Ruicheng Wang, Jianfeng Xiang, Jiaolong Yang, Xin Tong

    Abstract: We propose a novel image editing technique that enables 3D manipulations on single images, such as object rotation and translation. Existing 3D-aware image editing approaches typically rely on synthetic multi-view datasets for training specialized models, thus constraining their effectiveness on open-domain images featuring significantly more varied layouts and styles. In contrast, our method dire… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: Project page: https://wangrc.site/DiffCriticEdit/

  12. arXiv:2403.08204  [pdf, other

    cs.LG cs.CV

    AutoDFP: Automatic Data-Free Pruning via Channel Similarity Reconstruction

    Authors: Siqi Li, Jun Chen, **gyang Xiang, Chengrui Zhu, Yong Liu

    Abstract: Structured pruning methods are developed to bridge the gap between the massive scale of neural networks and the limited hardware resources. Most current structured pruning methods rely on training datasets to fine-tune the compressed model, resulting in high computational burdens and being inapplicable for scenarios with stringent requirements on privacy and security. As an alternative, some data-… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

    Comments: 11 pages, 16 figures

  13. arXiv:2403.05829  [pdf, ps, other

    eess.SY cs.CR cs.ET cs.LO

    Measuring Robustness in Cyber-Physical Systems under Sensor Attacks

    Authors: Jian Xiang, Ruggero Lanotte, Simone Tini, Stephen Chong, Massimo Merro

    Abstract: This paper contributes a formal framework for quantitative analysis of bounded sensor attacks on cyber-physical systems, using the formalism of differential dynamic logic. Given a precondition and postcondition of a system, we formalize two quantitative safety notions, quantitative forward and backward safety, which respectively express (1) how strong the strongest postcondition of the system is w… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

    Comments: Preprint submitted to Elsevier

  14. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  15. arXiv:2402.17262  [pdf, other

    cs.CL cs.AI

    Speak Out of Turn: Safety Vulnerability of Large Language Models in Multi-turn Dialogue

    Authors: Zhenhong Zhou, Jiuyang Xiang, Haopeng Chen, Quan Liu, Zherui Li, Sen Su

    Abstract: Large Language Models (LLMs) have been demonstrated to generate illegal or unethical responses, particularly when subjected to "jailbreak." Research on jailbreak has highlighted the safety issues of LLMs. However, prior studies have predominantly focused on single-turn dialogue, ignoring the potential complexities and risks presented by multi-turn dialogue, a crucial mode through which humans deri… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

    Comments: working in progress 23pages, 18 figures

  16. arXiv:2402.16043  [pdf, other

    cs.CR cs.SE

    LuaTaint: A Static Taint Analysis System for Web Interface Framework Vulnerability of IoT Devices

    Authors: Jiahui Xiang, Wenhai Wang, Tong Ye, Peiyu Liu

    Abstract: IoT devices are currently facing continuous malicious attacks due to their widespread use. Among these IoT devices, web vulnerabilities are also widely exploited because of their inherent characteristics, such as improper permission controls and insecure interfaces. Recently, the embedded system web interface framework has become highly diverse, and specific vulnerabilities can arise if developers… ▽ More

    Submitted 25 February, 2024; originally announced February 2024.

  17. arXiv:2402.07788  [pdf, other

    cs.CL

    Multi-Intent Attribute-Aware Text Matching in Searching

    Authors: Mingzhe Li, Xiuying Chen, **g Xiang, Qishen Zhang, Changsheng Ma, Chenchen Dai, **xiong Chang, Zhongyi Liu, Guannan Zhang

    Abstract: Text matching systems have become a fundamental service in most searching platforms. For instance, they are responsible for matching user queries to relevant candidate items, or rewriting the user-input query to a pre-selected high-performing one for a better search experience. In practice, both the queries and items often contain multiple attributes, such as the category of the item and the locat… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

    Comments: 9 pages

  18. arXiv:2401.08743  [pdf, other

    cs.AI cs.CL cs.CV cs.LG

    MMToM-QA: Multimodal Theory of Mind Question Answering

    Authors: Chuanyang **, Yutong Wu, **g Cao, Jiannan Xiang, Yen-Ling Kuo, Zhiting Hu, Tomer Ullman, Antonio Torralba, Joshua B. Tenenbaum, Tianmin Shu

    Abstract: Theory of Mind (ToM), the ability to understand people's mental states, is an essential ingredient for develo** machines with human-level social intelligence. Recent machine learning models, particularly large language models, seem to show some aspects of ToM understanding. However, existing ToM benchmarks use unimodal datasets - either video or text. Human ToM, on the other hand, is more than v… ▽ More

    Submitted 15 June, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

    Comments: ACL 2024. 26 pages, 11 figures, 7 tables

  19. arXiv:2312.15430  [pdf, other

    cs.CV

    Make-A-Character: High Quality Text-to-3D Character Generation within Minutes

    Authors: Jianqiang Ren, Chao He, Lin Liu, Jiahao Chen, Yutong Wang, Yafei Song, Jianfang Li, Tangli Xue, Siqi Hu, Tao Chen, Kunkun Zheng, Jian**g Xiang, Liefeng Bo

    Abstract: There is a growing demand for customized and expressive 3D characters with the emergence of AI agents and Metaverse, but creating 3D characters using traditional computer graphics tools is a complex and time-consuming task. To address these challenges, we propose a user-friendly framework named Make-A-Character (Mach) to create lifelike 3D avatars from text descriptions. The framework leverages th… ▽ More

    Submitted 24 December, 2023; originally announced December 2023.

    Comments: Technical Report

  20. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  21. arXiv:2312.11555  [pdf, other

    cs.CV

    CR-SFP: Learning Consistent Representation for Soft Filter Pruning

    Authors: **gyang Xiang, Zhuangzhi Chen, Jianbiao Mei, Siqi Li, Jun Chen, Yong Liu

    Abstract: Soft filter pruning~(SFP) has emerged as an effective pruning technique for allowing pruned filters to update and the opportunity for them to regrow to the network. However, this pruning strategy applies training and pruning in an alternative manner, which inevitably causes inconsistent representations between the reconstructed network~(R-NN) at the training and the pruned network~(P-NN) at the in… ▽ More

    Submitted 17 December, 2023; originally announced December 2023.

    Comments: 11 pages, 4 figures

  22. arXiv:2312.07061  [pdf, other

    cs.CV

    MaxQ: Multi-Axis Query for N:M Sparsity Network

    Authors: **gyang Xiang, Siqi Li, Junhao Chen, Zhuangzhi Chen, Tianxin Huang, Linpeng Peng, Yong Liu

    Abstract: N:M sparsity has received increasing attention due to its remarkable performance and latency trade-off compared with structured and unstructured sparsity. However, existing N:M sparsity methods do not differentiate the relative importance of weights among blocks and leave important weights underappreciated. Besides, they directly apply N:M sparsity to the whole network, which will cause severe inf… ▽ More

    Submitted 16 March, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

    Comments: Accepted by the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2024 (CVPR2024)

  23. arXiv:2312.02214  [pdf, other

    cs.CV cs.GR

    FlashAvatar: High-fidelity Head Avatar with Efficient Gaussian Embedding

    Authors: Jun Xiang, Xuan Gao, Yudong Guo, Juyong Zhang

    Abstract: We propose FlashAvatar, a novel and lightweight 3D animatable avatar representation that could reconstruct a digital avatar from a short monocular video sequence in minutes and render high-fidelity photo-realistic images at 300FPS on a consumer-grade GPU. To achieve this, we maintain a uniform 3D Gaussian field embedded in the surface of a parametric face model and learn extra spatial offset to mo… ▽ More

    Submitted 29 March, 2024; v1 submitted 3 December, 2023; originally announced December 2023.

    Comments: Project page: https://ustc3dv.github.io/FlashAvatar/

  24. arXiv:2311.12185  [pdf, other

    cs.RO

    Kitchen Artist: Precise Control of Liquid Dispensing for Gourmet Plating

    Authors: Hung-Jui Huang, **gyi Xiang, Wenzhen Yuan

    Abstract: Manipulating liquid is widely required for many tasks, especially in cooking. A common way to address this is extruding viscous liquid from a squeeze bottle. In this work, our goal is to create a sauce plating robot, which requires precise control of the thickness of squeezed liquids on a surface. Different liquids demand different manipulation policies. We command the robot to tilt the container… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

    Comments: Submitted to ICRA 2024

  25. arXiv:2310.13245  [pdf, other

    cs.RO

    Simultaneous Shape Tracking of Multiple Deformable Linear Objects with Global-Local Topology Preservation

    Authors: **gyi Xiang, Holly Dinkel

    Abstract: This work presents an algorithm for tracking the shape of multiple entangling Deformable Linear Objects (DLOs) from a sequence of RGB-D images. This algorithm runs in real-time and improves on previous single-DLO tracking approaches by enabling tracking of multiple objects. This is achieved using Global-Local Topology Preservation (GLTP). This work uses the geodesic distance in GLTP to define the… ▽ More

    Submitted 23 October, 2023; v1 submitted 19 October, 2023; originally announced October 2023.

    Comments: 3 pages, 3 figures, presented at the 3rd Workshop on Representing and Manipulating Deformable Objects at the IEEE International Conference on Robotics and Automation. Video presentation [https://youtu.be/hfiqwMxitqA]. 3rd Workshop on Representing and Manipulating Deformable Objects [https://deformable-workshop.github.io/icra2023/]

  26. arXiv:2310.12987  [pdf, other

    eess.IV cs.CV cs.GR

    Spec-NeRF: Multi-spectral Neural Radiance Fields

    Authors: Jiabao Li, Yuqi Li, Ciliang Sun, Chong Wang, **hui Xiang

    Abstract: We propose Multi-spectral Neural Radiance Fields(Spec-NeRF) for jointly reconstructing a multispectral radiance field and spectral sensitivity functions(SSFs) of the camera from a set of color images filtered by different filters. The proposed method focuses on modeling the physical imaging process, and applies the estimated SSFs and radiance field to synthesize novel views of multispectral scenes… ▽ More

    Submitted 14 September, 2023; originally announced October 2023.

  27. arXiv:2310.12004  [pdf, other

    cs.CV

    Image Super-resolution Via Latent Diffusion: A Sampling-space Mixture Of Experts And Frequency-augmented Decoder Approach

    Authors: Feng Luo, **xi Xiang, Jun Zhang, Xiao Han, Wei Yang

    Abstract: The recent use of diffusion prior, enhanced by pre-trained text-image models, has markedly elevated the performance of image super-resolution (SR). To alleviate the huge computational cost required by pixel-based diffusion SR, latent-based methods utilize a feature encoder to transform the image and then implement the SR image generation in a compact latent space. Nevertheless, there are two major… ▽ More

    Submitted 13 December, 2023; v1 submitted 18 October, 2023; originally announced October 2023.

    Comments: 15 pages, 7 figures

  28. arXiv:2310.10292  [pdf, other

    cs.CV cs.MM

    Effortless Cross-Platform Video Codec: A Codebook-Based Method

    Authors: Kuan Tian, Yonghang Guan, **xi Xiang, Jun Zhang, Xiao Han, Wei Yang

    Abstract: Under certain circumstances, advanced neural video codecs can surpass the most complex traditional codecs in their rate-distortion (RD) performance. One of the main reasons for the high performance of existing neural video codecs is the use of the entropy model, which can provide more accurate probability distribution estimations for compressing the latents. This also implies the rigorous requirem… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

    Comments: 15 pages, 11 figures

  29. arXiv:2310.06218  [pdf, other

    cs.LG cs.AI

    SUBP: Soft Uniform Block Pruning for 1xN Sparse CNNs Multithreading Acceleration

    Authors: **gyang Xiang, Siqi Li, Jun Chen, Shipeng Bai, Yukai Ma, Guang Dai, Yong Liu

    Abstract: The study of sparsity in Convolutional Neural Networks (CNNs) has become widespread to compress and accelerate models in environments with limited resources. By constraining N consecutive weights along the output channel to be group-wise non-zero, the recent network with 1$\times$N sparsity has received tremendous popularity for its three outstanding advantages: 1) A large amount of storage space… ▽ More

    Submitted 9 October, 2023; originally announced October 2023.

    Comments: 14 pages, 4 figures, Accepted by 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

  30. arXiv:2310.05391  [pdf, other

    cs.GR cs.CV

    Neural Impostor: Editing Neural Radiance Fields with Explicit Shape Manipulation

    Authors: Ruiyang Liu, **xu Xiang, Bowen Zhao, Ran Zhang, **gyi Yu, Changxi Zheng

    Abstract: Neural Radiance Fields (NeRF) have significantly advanced the generation of highly realistic and expressive 3D scenes. However, the task of editing NeRF, particularly in terms of geometry modification, poses a significant challenge. This issue has obstructed NeRF's wider adoption across various applications. To tackle the problem of efficiently editing neural implicit fields, we introduce Neural I… ▽ More

    Submitted 9 October, 2023; originally announced October 2023.

    Comments: Accepted at Pacific Graphics 2023 and Computer Graphics Forum

  31. Towards Real-Time Neural Video Codec for Cross-Platform Application Using Calibration Information

    Authors: Kuan Tian, Yonghang Guan, **xi Xiang, Jun Zhang, Xiao Han, Wei Yang

    Abstract: The state-of-the-art neural video codecs have outperformed the most sophisticated traditional codecs in terms of RD performance in certain cases. However, utilizing them for practical applications is still challenging for two major reasons. 1) Cross-platform computational errors resulting from floating point operations can lead to inaccurate decoding of the bitstream. 2) The high computational com… ▽ More

    Submitted 20 September, 2023; originally announced September 2023.

    Comments: 14 pages

  32. arXiv:2309.02186  [pdf, other

    cs.CV cs.AI cs.GR

    AniPortraitGAN: Animatable 3D Portrait Generation from 2D Image Collections

    Authors: Yue Wu, Sicheng Xu, Jianfeng Xiang, Fangyun Wei, Qifeng Chen, Jiaolong Yang, Xin Tong

    Abstract: Previous animatable 3D-aware GANs for human generation have primarily focused on either the human head or full body. However, head-only videos are relatively uncommon in real life, and full body generation typically does not deal with facial expression control and still has challenges in generating high-quality results. Towards applicable video avatars, we present an animatable 3D-aware GAN that g… ▽ More

    Submitted 5 September, 2023; originally announced September 2023.

    Comments: SIGGRAPH Asia 2023. Project Page: https://yuewuhkust.github.io/AniPortraitGAN/

  33. arXiv:2308.15727  [pdf, other

    cs.CL

    Quantifying and Analyzing Entity-level Memorization in Large Language Models

    Authors: Zhenhong Zhou, Jiuyang Xiang, Chaomeng Chen, Sen Su

    Abstract: Large language models (LLMs) have been proven capable of memorizing their training data, which can be extracted through specifically designed prompts. As the scale of datasets continues to grow, privacy risks arising from memorization have attracted increasing attention. Quantifying language model memorization helps evaluate potential privacy risks. However, prior works on quantifying memorization… ▽ More

    Submitted 5 November, 2023; v1 submitted 29 August, 2023; originally announced August 2023.

    Comments: 9 pages, 7 figures

  34. arXiv:2308.07733  [pdf, other

    eess.IV cs.CV cs.MM

    Dynamic Low-Rank Instance Adaptation for Universal Neural Image Compression

    Authors: Yue Lv, **xi Xiang, Jun Zhang, Wenming Yang, Xiao Han, Wei Yang

    Abstract: The latest advancements in neural image compression show great potential in surpassing the rate-distortion performance of conventional standard codecs. Nevertheless, there exists an indelible domain gap between the datasets utilized for training (i.e., natural images) and those utilized for inference (e.g., artistic images). Our proposal involves a low-rank adaptation approach aimed at addressing… ▽ More

    Submitted 15 August, 2023; originally announced August 2023.

    Comments: Accepted by ACM MM 2023, 13 pages, 12 figures

    ACM Class: I.4.2; E.4

  35. arXiv:2307.13300  [pdf, other

    cs.CV

    Mini-PointNetPlus: a local feature descriptor in deep learning model for 3d environment perception

    Authors: Chuanyu Luo, Nuo Cheng, Sikun Ma, Jun Xiang, Xiaohan Li, Shengguang Lei, Pu Li

    Abstract: Common deep learning models for 3D environment perception often use pillarization/voxelization methods to convert point cloud data into pillars/voxels and then process it with a 2D/3D convolutional neural network (CNN). The pioneer work PointNet has been widely applied as a local feature descriptor, a fundamental component in deep learning models for 3D perception, to extract features of a point c… ▽ More

    Submitted 25 July, 2023; originally announced July 2023.

  36. arXiv:2307.09831  [pdf, other

    cs.AI

    A Fast and Map-Free Model for Trajectory Prediction in Traffics

    Authors: Junhong Xiang, **gmin Zhang, Zhixiong Nan

    Abstract: To handle the two shortcomings of existing methods, (i)nearly all models rely on high-definition (HD) maps, yet the map information is not always available in real traffic scenes and HD map-building is expensive and time-consuming and (ii) existing models usually focus on improving prediction accuracy at the expense of reducing computing efficiency, yet the efficiency is crucial for various real a… ▽ More

    Submitted 19 July, 2023; originally announced July 2023.

    Comments: 7 pages, 3 figures

  37. arXiv:2305.10626  [pdf, other

    cs.CL cs.AI cs.LG

    Language Models Meet World Models: Embodied Experiences Enhance Language Models

    Authors: Jiannan Xiang, Tianhua Tao, Yi Gu, Tianmin Shu, Zirui Wang, Zichao Yang, Zhiting Hu

    Abstract: While large language models (LMs) have shown remarkable capabilities across numerous tasks, they often struggle with simple reasoning and planning in physical environments, such as understanding object permanence or planning household activities. The limitation arises from the fact that LMs are trained only on written text and miss essential embodied knowledge and skills. In this paper, we propose… ▽ More

    Submitted 28 October, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

  38. arXiv:2304.13240  [pdf, other

    cs.CV cs.LG

    Structure Diagram Recognition in Financial Announcements

    Authors: Meixuan Qiao, Jun Wang, Junfu Xiang, Qiyu Hou, Ruixuan Li

    Abstract: Accurately extracting structured data from structure diagrams in financial announcements is of great practical importance for building financial knowledge graphs and further improving the efficiency of various financial applications. First, we proposed a new method for recognizing structure diagrams in financial announcements, which can better detect and extract different types of connecting lines… ▽ More

    Submitted 1 May, 2023; v1 submitted 25 April, 2023; originally announced April 2023.

    Comments: ICDAR2023

  39. arXiv:2304.12685  [pdf, other

    cs.CV cs.AI eess.IV

    Exploring the Mutual Influence between Self-Supervised Single-Frame and Multi-Frame Depth Estimation

    Authors: Jie Xiang, Yun Wang, Lifeng An, Haiyang Liu, Jian Liu

    Abstract: Although both self-supervised single-frame and multi-frame depth estimation methods only require unlabeled monocular videos for training, the information they leverage varies because single-frame methods mainly rely on appearance-based features while multi-frame methods focus on geometric cues. Considering the complementary information of single-frame and multi-frame methods, some works attempt to… ▽ More

    Submitted 27 August, 2023; v1 submitted 25 April, 2023; originally announced April 2023.

    Comments: Accepted for publication in the IEEE Robotics and Automation Letters (RA-L). 8 pages, 3figures

  40. arXiv:2303.17905  [pdf, other

    cs.CV

    3D-aware Image Generation using 2D Diffusion Models

    Authors: Jianfeng Xiang, Jiaolong Yang, Binbin Huang, Xin Tong

    Abstract: In this paper, we introduce a novel 3D-aware image generation method that leverages 2D diffusion models. We formulate the 3D-aware image generation task as multiview 2D image set generation, and further to a sequential unconditional-conditional multiview image generation process. This allows us to utilize 2D diffusion models to boost the generative modeling power of the method. Additionally, we in… ▽ More

    Submitted 31 March, 2023; originally announced March 2023.

    Comments: Website: https://jeffreyxiang.github.io/ivid/

  41. arXiv:2303.06274  [pdf

    cs.CV cs.LG

    CoNIC Challenge: Pushing the Frontiers of Nuclear Detection, Segmentation, Classification and Counting

    Authors: Simon Graham, Quoc Dang Vu, Mostafa Jahanifar, Martin Weigert, Uwe Schmidt, Wenhua Zhang, Jun Zhang, Sen Yang, **xi Xiang, Xiyue Wang, Josef Lorenz Rumberger, Elias Baumann, Peter Hirsch, Lihao Liu, Chenyang Hong, Angelica I. Aviles-Rivero, Ayushi Jain, Heeyoung Ahn, Yiyu Hong, Hussam Azzuni, Min Xu, Mohammad Yaqub, Marie-Claire Blache, Benoît Piégu, Bertrand Vernay , et al. (64 additional authors not shown)

    Abstract: Nuclear detection, segmentation and morphometric profiling are essential in hel** us further understand the relationship between histology and patient outcome. To drive innovation in this area, we setup a community-wide challenge using the largest available dataset of its kind to assess nuclear segmentation and cellular composition. Our challenge, named CoNIC, stimulated the development of repro… ▽ More

    Submitted 14 March, 2023; v1 submitted 10 March, 2023; originally announced March 2023.

  42. arXiv:2302.06089  [pdf, other

    cs.CV cs.LG q-bio.QM

    Federated attention consistent learning models for prostate cancer diagnosis and Gleason grading

    Authors: Fei Kong, Xiyue Wang, **xi Xiang, Sen Yang, Xinran Wang, Meng Yue, Jun Zhang, Junhan Zhao, Xiao Han, Yuhan Dong, Biyue Zhu, Fang Wang, Yue** Liu

    Abstract: Artificial intelligence (AI) holds significant promise in transforming medical imaging, enhancing diagnostics, and refining treatment strategies. However, the reliance on extensive multicenter datasets for training AI models poses challenges due to privacy concerns. Federated learning provides a solution by facilitating collaborative model training across multiple centers without sharing raw data.… ▽ More

    Submitted 28 March, 2024; v1 submitted 12 February, 2023; originally announced February 2023.

    Comments: 14 pages

  43. Reconstructing Personalized Semantic Facial NeRF Models From Monocular Video

    Authors: Xuan Gao, Chenglai Zhong, Jun Xiang, Yang Hong, Yudong Guo, Juyong Zhang

    Abstract: We present a novel semantic model for human head defined with neural radiance field. The 3D-consistent head model consist of a set of disentangled and interpretable bases, and can be driven by low-dimensional expression coefficients. Thanks to the powerful representation ability of neural radiance field, the constructed model can represent complex facial attributes including hair, wearings, which… ▽ More

    Submitted 12 October, 2022; originally announced October 2022.

    Comments: Accepted by SIGGRAPH Asia 2022 (Journal Track). Project page: https://ustc3dv.github.io/NeRFBlendShape/

    Journal ref: ACM Trans. Graph. 41, 6, Article 200 (December 2022), 12 pages

  44. arXiv:2210.05990  [pdf, other

    cs.CV

    GGViT:Multistream Vision Transformer Network in Face2Face Facial Reenactment Detection

    Authors: Haotian Wu, Peipei Wang, Xin Wang, Ji Xiang, Rui Gong

    Abstract: Detecting manipulated facial images and videos on social networks has been an urgent problem to be solved. The compression of videos on social media has destroyed some pixel details that could be used to detect forgeries. Hence, it is crucial to detect manipulated faces in videos of different quality. We propose a new multi-stream network architecture named GGViT, which utilizes global information… ▽ More

    Submitted 12 October, 2022; originally announced October 2022.

    Comments: 6 pages,4 figures,to be published in ICPR2022

  45. arXiv:2210.04325  [pdf, other

    cs.CL cs.AI cs.LG

    ASDOT: Any-Shot Data-to-Text Generation with Pretrained Language Models

    Authors: Jiannan Xiang, Zhengzhong Liu, Yucheng Zhou, Eric P. Xing, Zhiting Hu

    Abstract: Data-to-text generation is challenging due to the great variety of the input data in terms of domains (e.g., finance vs sports) or schemata (e.g., diverse predicates). Recent end-to-end neural methods thus require substantial training examples to learn to disambiguate and describe the data. Yet, real-world data-to-text problems often suffer from various data-scarce issues: one may have access to o… ▽ More

    Submitted 22 October, 2022; v1 submitted 9 October, 2022; originally announced October 2022.

    Comments: Findings of EMNLP 2022

  46. arXiv:2206.08492  [pdf, other

    cs.LG cs.AI q-bio.NC stat.ML

    TKIL: Tangent Kernel Approach for Class Balanced Incremental Learning

    Authors: **lin Xiang, Eli Shlizerman

    Abstract: When learning new tasks in a sequential manner, deep neural networks tend to forget tasks that they previously learned, a phenomenon called catastrophic forgetting. Class incremental learning methods aim to address this problem by kee** a memory of a few exemplars from previously learned tasks, and distilling knowledge from them. However, existing methods struggle to balance the performance acro… ▽ More

    Submitted 16 June, 2022; originally announced June 2022.

  47. arXiv:2206.07255  [pdf, other

    cs.CV

    GRAM-HD: 3D-Consistent Image Generation at High Resolution with Generative Radiance Manifolds

    Authors: Jianfeng Xiang, Jiaolong Yang, Yu Deng, Xin Tong

    Abstract: Recent works have shown that 3D-aware GANs trained on unstructured single image collections can generate multiview images of novel instances. The key underpinnings to achieve this are a 3D radiance field generator and a volume rendering process. However, existing methods either cannot generate high-resolution images (e.g., up to 256X256) due to the high computation cost of neural volume rendering,… ▽ More

    Submitted 11 October, 2023; v1 submitted 14 June, 2022; originally announced June 2022.

    Comments: ICCV2023 camera ready version (more results and method comparisons). Project page: https://jeffreyxiang.github.io/GRAM-HD/

  48. arXiv:2206.02607  [pdf, other

    cs.LG cs.CE cs.GR math.NA physics.comp-ph

    CROM: Continuous Reduced-Order Modeling of PDEs Using Implicit Neural Representations

    Authors: Peter Yichen Chen, **xu Xiang, Dong Heon Cho, Yue Chang, G A Pershing, Henrique Teles Maia, Maurizio M. Chiaramonte, Kevin Carlberg, Eitan Grinspun

    Abstract: The long runtime of high-fidelity partial differential equation (PDE) solvers makes them unsuitable for time-critical applications. We propose to accelerate PDE solvers using reduced-order modeling (ROM). Whereas prior ROM approaches reduce the dimensionality of discretized vector fields, our continuous reduced-order modeling (CROM) approach builds a low-dimensional embedding of the continuous vec… ▽ More

    Submitted 3 March, 2023; v1 submitted 6 June, 2022; originally announced June 2022.

  49. arXiv:2206.01369  [pdf, other

    cs.CV cs.AI cs.LG eess.IV

    Incremental Learning Meets Transfer Learning: Application to Multi-site Prostate MRI Segmentation

    Authors: Chenyu You, **lin Xiang, Kun Su, Xiaoran Zhang, Siyuan Dong, John Onofrey, Lawrence Staib, James S. Duncan

    Abstract: Many medical datasets have recently been created for medical image segmentation tasks, and it is natural to question whether we can use them to sequentially train a single model that (1) performs better on all these datasets, and (2) generalizes well and transfers better to the unknown target site domain. Prior works have achieved this goal by jointly training one model on multi-site datasets, whi… ▽ More

    Submitted 30 July, 2022; v1 submitted 2 June, 2022; originally announced June 2022.

  50. arXiv:2206.00007  [pdf, other

    cs.LG cs.AI

    A Cross-City Federated Transfer Learning Framework: A Case Study on Urban Region Profiling

    Authors: Gaode Chen, Yijun Su, Xinghua Zhang, Anmin Hu, Guochun Chen, Siyuan Feng, Ji Xiang, Junbo Zhang, Yu Zheng

    Abstract: Data insufficiency problems (i.e., data missing and label scarcity) caused by inadequate services and infrastructures or imbalanced development levels of cities have seriously affected the urban computing tasks in real scenarios. Prior transfer learning methods inspire an elegant solution to the data insufficiency, but are only concerned with one kind of insufficiency issue and fail to give consid… ▽ More

    Submitted 11 July, 2022; v1 submitted 31 May, 2022; originally announced June 2022.