Skip to main content

Showing 1–50 of 313 results for author: Lin, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.16776  [pdf, other

    cs.CV

    Instance Consistency Regularization for Semi-Supervised 3D Instance Segmentation

    Authors: Yizheng Wu, Zhiyu Pan, Kewei Wang, Xingyi Li, Jiahao Cui, Liwen Xiao, Guosheng Lin, Zhiguo Cao

    Abstract: Large-scale datasets with point-wise semantic and instance labels are crucial to 3D instance segmentation but also expensive. To leverage unlabeled data, previous semi-supervised 3D instance segmentation approaches have explored self-training frameworks, which rely on high-quality pseudo labels for consistency regularization. They intuitively utilize both instance and semantic pseudo labels in a j… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: 14 pages, 10 figures

  2. arXiv:2406.16020  [pdf, other

    cs.SD cs.CL eess.AS

    AudioBench: A Universal Benchmark for Audio Large Language Models

    Authors: Bin Wang, Xunlong Zou, Geyu Lin, Shuo Sun, Zhuohan Liu, Wenyu Zhang, Zhengyuan Liu, AiTi Aw, Nancy F. Chen

    Abstract: We introduce AudioBench, a new benchmark designed to evaluate audio large language models (AudioLLMs). AudioBench encompasses 8 distinct tasks and 26 carefully selected or newly curated datasets, focusing on speech understanding, voice interpretation, and audio scene understanding. Despite the rapid advancement of large language models, including multimodal versions, a significant gap exists in co… ▽ More

    Submitted 25 June, 2024; v1 submitted 23 June, 2024; originally announced June 2024.

    Comments: 20 pages; v2 - typo update; Code: https://github.com/AudioLLMs/AudioBench

  3. arXiv:2406.11065  [pdf, other

    cs.CL

    Can LLMs Understand the Implication of Emphasized Sentences in Dialogue?

    Authors: Guan-Ting Lin, Hung-yi Lee

    Abstract: Emphasis is a crucial component in human communication, which indicates the speaker's intention and implication beyond pure text in dialogue. While Large Language Models (LLMs) have revolutionized natural language processing, their ability to understand emphasis in dialogue remains unclear. This paper introduces Emphasized-Talk, a benchmark with emphasis-annotated dialogue samples capturing the im… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: 10 pages

  4. arXiv:2406.11064  [pdf, other

    eess.AS cs.SD

    Continual Test-time Adaptation for End-to-end Speech Recognition on Noisy Speech

    Authors: Guan-Ting Lin, Wei-** Huang, Hung-yi Lee

    Abstract: Deep learning-based end-to-end automatic speech recognition (ASR) has made significant strides but still struggles with performance on out-of-domain (OOD) samples due to domain shifts in real-world scenarios. Test-Time Adaptation (TTA) methods address this issue by adapting models using test samples at inference time. However, current ASR TTA methods have largely focused on non-continual TTA, whic… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: 13 pages

  5. arXiv:2406.10163  [pdf, other

    cs.CV cs.AI

    MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers

    Authors: Yiwen Chen, Tong He, Di Huang, Weicai Ye, Si** Chen, Jiaxiang Tang, Xin Chen, Zhongang Cai, Lei Yang, Gang Yu, Guosheng Lin, Chi Zhang

    Abstract: Recently, 3D assets created via reconstruction and generation have matched the quality of manually crafted assets, highlighting their potential for replacement. However, this potential is largely unrealized because these assets always need to be converted to meshes for 3D industry applications, and the meshes produced by current mesh extraction methods are significantly inferior to Artist-Created… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Project Page: https://buaacyw.github.io/mesh-anything/ Code: https://github.com/buaacyw/MeshAnything

  6. arXiv:2406.10052  [pdf, other

    cs.SD cs.CL eess.AS

    Simul-Whisper: Attention-Guided Streaming Whisper with Truncation Detection

    Authors: Haoyu Wang, Guoqiang Hu, Guodong Lin, Wei-Qiang Zhang, Jian Li

    Abstract: As a robust and large-scale multilingual speech recognition model, Whisper has demonstrated impressive results in many low-resource and out-of-distribution scenarios. However, its encoder-decoder structure hinders its application to streaming speech recognition. In this paper, we introduce Simul-Whisper, which uses the time alignment embedded in Whisper's cross-attention to guide auto-regressive d… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted by INTERSPEECH 2024

  7. arXiv:2406.10000  [pdf, other

    cs.CV

    OrientDream: Streamlining Text-to-3D Generation with Explicit Orientation Control

    Authors: Yuzhong Huang, Zhong Li, Zhang Chen, Zhiyuan Ren, Guosheng Lin, Fred Morstatter, Yi Xu

    Abstract: In the evolving landscape of text-to-3D technology, Dreamfusion has showcased its proficiency by utilizing Score Distillation Sampling (SDS) to optimize implicit representations such as NeRF. This process is achieved through the distillation of pretrained large-scale text-to-image diffusion models. However, Dreamfusion encounters fidelity and efficiency constraints: it faces the multi-head Janus i… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  8. Ents: An Efficient Three-party Training Framework for Decision Trees by Communication Optimization

    Authors: Guopeng Lin, Weili Han, Wenqiang Ruan, Ruisheng Zhou, Lushan Song, Bingshuai Li, Yunfeng Shao

    Abstract: Multi-party training frameworks for decision trees based on secure multi-party computation enable multiple parties to train high-performance models on distributed private data with privacy preservation. The training process essentially involves frequent dataset splitting according to the splitting criterion (e.g. Gini impurity). However, existing multi-party training frameworks for decision trees… ▽ More

    Submitted 29 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

    Comments: This paper is the full version of a paper to appear in ACM CCS 2024

  9. arXiv:2406.03293  [pdf, other

    cs.CV

    Text-to-Image Rectified Flow as Plug-and-Play Priors

    Authors: Xiaofeng Yang, Cheng Chen, Xulei Yang, Fayao Liu, Guosheng Lin

    Abstract: Large-scale diffusion models have achieved remarkable performance in generative tasks. Beyond their initial training applications, these models have proven their ability to function as versatile plug-and-play priors. For instance, 2D diffusion models can serve as loss functions to optimize 3D implicit models. Rectified flow, a novel class of generative models, enforces a linear progression from th… ▽ More

    Submitted 24 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

    Comments: Added results on Stable Diffusion 3. Code: https://github.com/yangxiaofeng/rectified_flow_prior

  10. arXiv:2405.20770  [pdf, other

    cs.CL cs.AI cs.CR

    Large Language Model Sentinel: Advancing Adversarial Robustness by LLM Agent

    Authors: Guang Lin, Qibin Zhao

    Abstract: Over the past two years, the use of large language models (LLMs) has advanced rapidly. While these LLMs offer considerable convenience, they also raise security concerns, as LLMs are vulnerable to adversarial attacks by some well-designed textual perturbations. In this paper, we introduce a novel defense technique named Large LAnguage MOdel Sentinel (LLAMOS), which is designed to enhance the adver… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  11. arXiv:2405.19148  [pdf, other

    cs.GR

    Dress Anyone : Automatic Physically-Based Garment Pattern Refitting

    Authors: Hsiao-yu Chen, Egor Larionov, Ladislav Kavan, Gene Lin, Doug Roble, Olga Sorkine-Hornung, Tuur Stuyck

    Abstract: Well-fitted clothing is essential for both real and virtual garments to enable self-expression and accurate representation for a large variety of body types. Common practice in the industry is to provide a pre-made selection of distinct garment sizes such as small, medium and large. While these may cater to certain groups of individuals that fall within this distribution, they often exclude large… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  12. arXiv:2405.19055  [pdf, other

    cs.CV

    FUSU: A Multi-temporal-source Land Use Change Segmentation Dataset for Fine-grained Urban Semantic Understanding

    Authors: Shuai Yuan, Guancong Lin, Lixian Zhang, Runmin Dong, **xiao Zhang, Shuang Chen, Juepeng Zheng, Jie Wang, Haohuan Fu

    Abstract: Fine urban change segmentation using multi-temporal remote sensing images is essential for understanding human-environment interactions in urban areas. Although there have been advances in high-quality land cover datasets that reveal the physical features of urban landscapes, the lack of fine-grained land use datasets hinders a deeper understanding of how human activities are distributed across th… ▽ More

    Submitted 6 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

  13. arXiv:2405.16849  [pdf, other

    cs.CV

    Sync4D: Video Guided Controllable Dynamics for Physics-Based 4D Generation

    Authors: Zhoujie Fu, Jiacheng Wei, Wenhao Shen, Chaoyue Song, Xiaofeng Yang, Fayao Liu, Xulei Yang, Guosheng Lin

    Abstract: In this work, we introduce a novel approach for creating controllable dynamics in 3D-generated Gaussians using casually captured reference videos. Our method transfers the motion of objects from reference videos to a variety of generated 3D Gaussians across different categories, ensuring precise and customizable motion transfer. We achieve this by employing blend skinning-based non-parametric shap… ▽ More

    Submitted 6 June, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

    Comments: Our project page: https://sync4dphys.github.io/

  14. arXiv:2405.15452  [pdf, other

    cs.CL cs.AI cs.LG

    Leveraging Logical Rules in Knowledge Editing: A Cherry on the Top

    Authors: Keyuan Cheng, Muhammad Asif Ali, Shu Yang, Gang Lin, Yuxuan Zhai, Haoyang Fei, Ke Xu, Lu Yu, Lijie Hu, Di Wang

    Abstract: Multi-hop Question Answering (MQA) under knowledge editing (KE) is a key challenge in Large Language Models (LLMs). While best-performing solutions in this domain use a plan and solve paradigm to split a question into sub-questions followed by response generation, we claim that this approach is sub-optimal as it fails for hard to decompose questions, and it does not explicitly cater to correlated… ▽ More

    Submitted 27 May, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

    Comments: 18 pages

  15. arXiv:2405.15188  [pdf, other

    cs.CV

    PS-CAD: Local Geometry Guidance via Prompting and Selection for CAD Reconstruction

    Authors: Bingchen Yang, Haiyong Jiang, Hao Pan, Peter Wonka, Jun Xiao, Guosheng Lin

    Abstract: Reverse engineering CAD models from raw geometry is a classic but challenging research problem. In particular, reconstructing the CAD modeling sequence from point clouds provides great interpretability and convenience for editing. To improve upon this problem, we introduce geometric guidance into the reconstruction network. Our proposed model, PS-CAD, reconstructs the CAD modeling sequence one ste… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  16. arXiv:2405.10313  [pdf, other

    cs.AI cs.CL cs.CY cs.LG

    How Far Are We From AGI

    Authors: Tao Feng, Chuanyang **, **gyu Liu, Kunlun Zhu, Haoqin Tu, Zirui Cheng, Guanyu Lin, Jiaxuan You

    Abstract: The evolution of artificial intelligence (AI) has profoundly impacted human society, driving significant advancements in multiple sectors. Yet, the escalating demands on AI have highlighted the limitations of AI's current offerings, catalyzing a movement towards Artificial General Intelligence (AGI). AGI, distinguished by its ability to execute diverse real-world tasks with efficiency and effectiv… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

  17. arXiv:2405.07839  [pdf, other

    cs.LG cs.AI stat.ML

    Constrained Exploration via Reflected Replica Exchange Stochastic Gradient Langevin Dynamics

    Authors: Haoyang Zheng, Hengrong Du, Qi Feng, Wei Deng, Guang Lin

    Abstract: Replica exchange stochastic gradient Langevin dynamics (reSGLD) is an effective sampler for non-convex learning in large-scale datasets. However, the simulation may encounter stagnation issues when the high-temperature chain delves too deeply into the distribution tails. To tackle this issue, we propose reflected reSGLD (r2SGLD): an algorithm tailored for constrained non-convex exploration by util… ▽ More

    Submitted 3 June, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

    Comments: 28 pages, 13 figures

  18. arXiv:2405.03138  [pdf, other

    cs.CL

    CRAFT: Extracting and Tuning Cultural Instructions from the Wild

    Authors: Bin Wang, Geyu Lin, Zhengyuan Liu, Chengwei Wei, Nancy F. Chen

    Abstract: Large language models (LLMs) have rapidly evolved as the foundation of various natural language processing (NLP) applications. Despite their wide use cases, their understanding of culturally-related concepts and reasoning remains limited. Meantime, there is a significant need to enhance these models' cultural reasoning capabilities, especially concerning underrepresented regions. This paper introd… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: 6 pages

  19. arXiv:2404.11932  [pdf, other

    cs.CL cs.AI

    CrossIn: An Efficient Instruction Tuning Approach for Cross-Lingual Knowledge Alignment

    Authors: Geyu Lin, Bin Wang, Zhengyuan Liu, Nancy F. Chen

    Abstract: Multilingual proficiency presents a significant challenge for large language models (LLMs). English-centric models are usually suboptimal in other languages, particularly those that are linguistically distant from English. This performance discrepancy mainly stems from the imbalanced distribution of training data across languages during pre-training and instruction tuning stages. To address this p… ▽ More

    Submitted 12 June, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

    Comments: 11 pages

  20. arXiv:2404.11151  [pdf, other

    cs.CV

    REACTO: Reconstructing Articulated Objects from a Single Video

    Authors: Chaoyue Song, Jiacheng Wei, Chuan-Sheng Foo, Guosheng Lin, Fayao Liu

    Abstract: In this paper, we address the challenge of reconstructing general articulated 3D objects from a single video. Existing works employing dynamic neural radiance fields have advanced the modeling of articulated objects like humans and animals from videos, but face challenges with piece-wise rigid general articulated objects due to limitations in their deformation models. To tackle this, we propose Qu… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  21. arXiv:2404.09754  [pdf, other

    cs.CL

    Resilience of Large Language Models for Noisy Instructions

    Authors: Bin Wang, Chengwei Wei, Zhengyuan Liu, Geyu Lin, Nancy F. Chen

    Abstract: As the rapidly advancing domain of natural language processing (NLP), large language models (LLMs) have emerged as powerful tools for interpreting human commands and generating text across various tasks. Nonetheless, the resilience of LLMs to handle text containing inherent errors, stemming from human interactions and collaborative systems, has not been thoroughly explored. Our study investigates… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: 12 pages

  22. arXiv:2404.06762  [pdf, other

    cs.CL cs.HC

    Personality-aware Student Simulation for Conversational Intelligent Tutoring Systems

    Authors: Zhengyuan Liu, Stella Xin Yin, Geyu Lin, Nancy F. Chen

    Abstract: Intelligent Tutoring Systems (ITSs) can provide personalized and self-paced learning experience. The emergence of large language models (LLMs) further enables better human-machine interaction, and facilitates the development of conversational ITSs in various disciplines such as math and language learning. In dialogic teaching, recognizing and adapting to individual characteristics can significantl… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  23. arXiv:2404.06429  [pdf, other

    cs.CV cs.AI

    Magic-Boost: Boost 3D Generation with Mutli-View Conditioned Diffusion

    Authors: Fan Yang, Jianfeng Zhang, Yichun Shi, Bowen Chen, Chenxu Zhang, Huichao Zhang, Xiaofeng Yang, Jiashi Feng, Guosheng Lin

    Abstract: Benefiting from the rapid development of 2D diffusion models, 3D content creation has made significant progress recently. One promising solution involves the fine-tuning of pre-trained 2D diffusion models to harness their capacity for producing multi-view images, which are then lifted into accurate 3D models via methods like fast-NeRFs or large reconstruction models. However, as inconsistency stil… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

  24. arXiv:2404.00492  [pdf, other

    cs.CL cs.AI cs.LG

    Multi-hop Question Answering under Temporal Knowledge Editing

    Authors: Keyuan Cheng, Gang Lin, Haoyang Fei, Yuxuan zhai, Lu Yu, Muhammad Asif Ali, Lijie Hu, Di Wang

    Abstract: Multi-hop question answering (MQA) under knowledge editing (KE) has garnered significant attention in the era of large language models. However, existing models for MQA under KE exhibit poor performance when dealing with questions containing explicit temporal contexts. To address this limitation, we propose a novel framework, namely TEMPoral knowLEdge augmented Multi-hop Question Answering (TEMPLE… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Comments: 23 pages

  25. arXiv:2403.17883  [pdf, other

    cs.CV

    Superior and Pragmatic Talking Face Generation with Teacher-Student Framework

    Authors: Chao Liang, Jianwen Jiang, Tianyun Zhong, Gaojie Lin, Zhengkun Rong, Jiaqi Yang, Yongming Zhu

    Abstract: Talking face generation technology creates talking videos from arbitrary appearance and motion signal, with the "arbitrary" offering ease of use but also introducing challenges in practical applications. Existing methods work well with standard inputs but suffer serious performance degradation with intricate real-world ones. Moreover, efficiency is also an important concern in deployment. To compr… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

  26. arXiv:2403.16067  [pdf, other

    cs.CV cs.AI

    Robust Diffusion Models for Adversarial Purification

    Authors: Guang Lin, Zerui Tao, Jianhai Zhang, Toshihisa Tanaka, Qibin Zhao

    Abstract: Diffusion models (DMs) based adversarial purification (AP) has shown to be the most powerful alternative to adversarial training (AT). However, these methods neglect the fact that pre-trained diffusion models themselves are not robust to adversarial attacks as well. Additionally, the diffusion process can easily destroy semantic information and generate a high quality image but totally different f… ▽ More

    Submitted 24 May, 2024; v1 submitted 24 March, 2024; originally announced March 2024.

  27. arXiv:2403.15651  [pdf, other

    cs.CV

    GaNI: Global and Near Field Illumination Aware Neural Inverse Rendering

    Authors: Jiaye Wu, Saeed Hadadan, Geng Lin, Matthias Zwicker, David Jacobs, Roni Sengupta

    Abstract: In this paper, we present GaNI, a Global and Near-field Illumination-aware neural inverse rendering technique that can reconstruct geometry, albedo, and roughness parameters from images of a scene captured with co-located light and camera. Existing inverse rendering techniques with co-located light-camera focus on single objects only, without modeling global illumination and near-field lighting mo… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  28. arXiv:2403.14690  [pdf

    cs.CY cs.AI cs.CL cs.LG

    Incorporating Graph Attention Mechanism into Geometric Problem Solving Based on Deep Reinforcement Learning

    Authors: Xiuqin Zhong, Shengyuan Yan, Gongqi Lin, Hongguang Fu, Liang Xu, Siwen Jiang, Lei Huang, Wei Fang

    Abstract: In the context of online education, designing an automatic solver for geometric problems has been considered a crucial step towards general math Artificial Intelligence (AI), empowered by natural language understanding and traditional logical inference. In most instances, problems are addressed by adding auxiliary components such as lines or points. However, adding auxiliary components automatical… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  29. arXiv:2403.13261  [pdf, other

    cs.CV

    Self-Supervised Class-Agnostic Motion Prediction with Spatial and Temporal Consistency Regularizations

    Authors: Kewei Wang, Yizheng Wu, Jun Cen, Zhiyu Pan, Xingyi Li, Zhe Wang, Zhiguo Cao, Guosheng Lin

    Abstract: The perception of motion behavior in a dynamic environment holds significant importance for autonomous driving systems, wherein class-agnostic motion prediction methods directly predict the motion of the entire point cloud. While most existing methods rely on fully-supervised learning, the manual labeling of point cloud data is laborious and time-consuming. Therefore, several annotation-efficient… ▽ More

    Submitted 21 March, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR2024

  30. arXiv:2403.12391  [pdf, other

    cs.LG cs.AI

    FairSTG: Countering performance heterogeneity via collaborative sample-level optimization

    Authors: Gengyu Lin, Zhengyang Zhou, Qihe Huang, Kuo Yang, Shifen Cheng, Yang Wang

    Abstract: Spatiotemporal learning plays a crucial role in mobile computing techniques to empower smart cites. While existing research has made great efforts to achieve accurate predictions on the overall dataset, they still neglect the significant performance heterogeneity across samples. In this work, we designate the performance heterogeneity as the reason for unfair spatiotemporal learning, which not onl… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: Under review by IEEE Transactions on Mobile Computing

  31. arXiv:2403.09140  [pdf, other

    cs.CV

    Sculpt3D: Multi-View Consistent Text-to-3D Generation with Sparse 3D Prior

    Authors: Cheng Chen, Xiaofeng Yang, Fan Yang, Chengzeng Feng, Zhoujie Fu, Chuan-Sheng Foo, Guosheng Lin, Fayao Liu

    Abstract: Recent works on text-to-3d generation show that using only 2D diffusion supervision for 3D generation tends to produce results with inconsistent appearances (e.g., faces on the back view) and inaccurate shapes (e.g., animals with extra legs). Existing methods mainly address this issue by retraining diffusion models with images rendered from 3D data to ensure multi-view consistency while struggling… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR 2024. Project Page: https://stellarcheng.github.io/Sculpt3D/

  32. arXiv:2403.07380  [pdf, other

    cs.CV cs.AI

    Gabor-guided transformer for single image deraining

    Authors: Si** He, Guangfeng Lin

    Abstract: Image deraining have have gained a great deal of attention in order to address the challenges posed by the effects of harsh weather conditions on visual tasks. While convolutional neural networks (CNNs) are popular, their limitations in capturing global information may result in ineffective rain removal. Transformer-based methods with self-attention mechanisms have improved, but they tend to disto… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  33. arXiv:2403.06205  [pdf, other

    cs.CV

    S-DyRF: Reference-Based Stylized Radiance Fields for Dynamic Scenes

    Authors: Xingyi Li, Zhiguo Cao, Yizheng Wu, Kewei Wang, Ke Xian, Zhe Wang, Guosheng Lin

    Abstract: Current 3D stylization methods often assume static scenes, which violates the dynamic nature of our real world. To address this limitation, we present S-DyRF, a reference-based spatio-temporal stylization method for dynamic neural radiance fields. However, stylizing dynamic 3D scenes is inherently challenging due to the limited availability of stylized reference images along the temporal axis. Our… ▽ More

    Submitted 22 March, 2024; v1 submitted 10 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR 2024. Project page: https://xingyi-li.github.io/s-dyrf/

  34. arXiv:2402.19197  [pdf, other

    cs.CV cs.AI cs.LG

    Fine Structure-Aware Sampling: A New Sampling Training Scheme for Pixel-Aligned Implicit Models in Single-View Human Reconstruction

    Authors: Kennard Yanting Chan, Fayao Liu, Guosheng Lin, Chuan Sheng Foo, Weisi Lin

    Abstract: Pixel-aligned implicit models, such as PIFu, PIFuHD, and ICON, are used for single-view clothed human reconstruction. These models need to be trained using a sampling training scheme. Existing sampling training schemes either fail to capture thin surfaces (e.g. ears, fingers) or cause noisy artefacts in reconstructed meshes. To address these problems, we introduce Fine Structured-Aware Sampling (F… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

    Comments: Accepted in Proceedings of the AAAI Conference on Artificial Intelligence, 2024 (AAAI 2024)

    Journal ref: Proceedings of the AAAI Conference on Artificial Intelligence, 2024, pp. 964-971

  35. arXiv:2402.15406  [pdf, ps, other

    cs.LG math.NA

    Conformalized-DeepONet: A Distribution-Free Framework for Uncertainty Quantification in Deep Operator Networks

    Authors: Christian Moya, Amirhossein Mollaali, Zecheng Zhang, Lu Lu, Guang Lin

    Abstract: In this paper, we adopt conformal prediction, a distribution-free uncertainty quantification (UQ) framework, to obtain confidence prediction intervals with coverage guarantees for Deep Operator Network (DeepONet) regression. Initially, we enhance the uncertainty quantification frameworks (B-DeepONet and Prob-DeepONet) previously proposed by the authors by using split conformal prediction. By combi… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

  36. arXiv:2402.12786  [pdf, other

    cs.CL eess.AS

    Advancing Large Language Models to Capture Varied Speaking Styles and Respond Properly in Spoken Conversations

    Authors: Guan-Ting Lin, Cheng-Han Chiang, Hung-yi Lee

    Abstract: In spoken dialogue, even if two current turns are the same sentence, their responses might still differ when they are spoken in different styles. The spoken styles, containing paralinguistic and prosodic information, mark the most significant difference between text and speech modality. When using text-only LLMs to model spoken dialogue, text-only LLMs cannot give different responses based on the… ▽ More

    Submitted 30 May, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

    Comments: Accepted by ACL 2024

  37. arXiv:2402.05868  [pdf, other

    cs.CL cs.AI cs.CR cs.IR cs.LG

    EmojiCrypt: Prompt Encryption for Secure Communication with Large Language Models

    Authors: Guo Lin, Wenyue Hua, Yongfeng Zhang

    Abstract: Cloud-based large language models (LLMs) such as ChatGPT have increasingly become integral to daily operations, serving as vital tools across various applications. While these models offer substantial benefits in terms of accessibility and functionality, they also introduce significant privacy concerns: the transmission and storage of user data in cloud infrastructures pose substantial risks of da… ▽ More

    Submitted 12 February, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

    Comments: 12 pages, 4 figures, 2 tables, comments and suggestions are welcome

  38. arXiv:2401.16352  [pdf, other

    cs.CV cs.AI

    Adversarial Training on Purification (AToP): Advancing Both Robustness and Generalization

    Authors: Guang Lin, Chao Li, Jianhai Zhang, Toshihisa Tanaka, Qibin Zhao

    Abstract: The deep neural networks are known to be vulnerable to well-designed adversarial attacks. The most successful defense technique based on adversarial training (AT) can achieve optimal robustness against particular attacks but cannot generalize well to unseen attacks. Another effective defense technique based on adversarial purification (AP) can enhance generalization but cannot achieve optimal robu… ▽ More

    Submitted 15 March, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

  39. arXiv:2401.15169  [pdf, other

    cs.GR

    Estimating Cloth Elasticity Parameters From Homogenized Yarn-Level Models

    Authors: Joy Xiaoji Zhang, Gene Wei-Chin Lin, Lukas Bode, Hsiao-yu Chen, Tuur Stuyck, Egor Larionov

    Abstract: Virtual garment simulation has become increasingly important with applications in garment design and virtual try-on. However, reproducing garments faithfully remains a cumbersome process. We propose an end-to-end method for estimating parameters of shell material models corresponding to real fabrics with minimal priors. Our method determines yarn model properties from information directly obtained… ▽ More

    Submitted 26 January, 2024; originally announced January 2024.

  40. arXiv:2401.13463  [pdf, other

    cs.CL cs.IR cs.SD eess.AS

    SpeechDPR: End-to-End Spoken Passage Retrieval for Open-Domain Spoken Question Answering

    Authors: Chyi-Jiunn Lin, Guan-Ting Lin, Yung-Sung Chuang, Wei-Lun Wu, Shang-Wen Li, Abdelrahman Mohamed, Hung-yi Lee, Lin-shan Lee

    Abstract: Spoken Question Answering (SQA) is essential for machines to reply to user's question by finding the answer span within a given spoken passage. SQA has been previously achieved without ASR to avoid recognition errors and Out-of-Vocabulary (OOV) problems. However, the real-world problem of Open-domain SQA (openSQA), in which the machine needs to first retrieve passages that possibly contain the ans… ▽ More

    Submitted 18 March, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

    Comments: Accepted at ICASSP 2024

  41. arXiv:2401.13203  [pdf, other

    cs.CV

    Style-Consistent 3D Indoor Scene Synthesis with Decoupled Objects

    Authors: Yunfan Zhang, Hong Huang, Zhiwei Xiong, Zhiqi Shen, Guosheng Lin, Hao Wang, Nicholas Vun

    Abstract: Controllable 3D indoor scene synthesis stands at the forefront of technological progress, offering various applications like gaming, film, and augmented/virtual reality. The capability to stylize and de-couple objects within these scenarios is a crucial factor, providing an advanced level of control throughout the editing process. This control extends not just to manipulating geometric attributes… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

  42. arXiv:2401.11665  [pdf, other

    stat.ML cs.AI cs.LG

    Accelerating Approximate Thompson Sampling with Underdamped Langevin Monte Carlo

    Authors: Haoyang Zheng, Wei Deng, Christian Moya, Guang Lin

    Abstract: Approximate Thompson sampling with Langevin Monte Carlo broadens its reach from Gaussian posterior sampling to encompass more general smooth posteriors. However, it still encounters scalability issues in high-dimensional problems when demanding high accuracy. To address this, we propose an approximate Thompson sampling strategy, utilizing underdamped Langevin Monte Carlo, where the latter is the g… ▽ More

    Submitted 20 June, 2024; v1 submitted 21 January, 2024; originally announced January 2024.

    Comments: 52 pages, 2 figures

  43. arXiv:2401.11535  [pdf, other

    cs.CV cs.RO

    EndoGS: Deformable Endoscopic Tissues Reconstruction with Gaussian Splatting

    Authors: Lingting Zhu, Zhao Wang, Jiahao Cui, Zhenchao **, Guying Lin, Lequan Yu

    Abstract: Surgical 3D reconstruction is a critical area of research in robotic surgery, with recent works adopting variants of dynamic radiance fields to achieve success in 3D reconstruction of deformable tissues from single-viewpoint videos. However, these methods often suffer from time-consuming optimization or inferior quality, limiting their adoption in downstream tasks. Inspired by 3D Gaussian Splattin… ▽ More

    Submitted 12 February, 2024; v1 submitted 21 January, 2024; originally announced January 2024.

    Comments: 11 pages, 4 figures

  44. arXiv:2401.02921  [pdf, other

    cs.CL eess.AS

    Towards ASR Robust Spoken Language Understanding Through In-Context Learning With Word Confusion Networks

    Authors: Kevin Everson, Yile Gu, Huck Yang, Prashanth Gurunath Shivakumar, Guan-Ting Lin, Jari Kolehmainen, Ivan Bulyko, Ankur Gandhe, Shalini Ghosh, Wael Hamza, Hung-yi Lee, Ariya Rastrow, Andreas Stolcke

    Abstract: In the realm of spoken language understanding (SLU), numerous natural language understanding (NLU) methodologies have been adapted by supplying large language models (LLMs) with transcribed speech instead of conventional written text. In real-world scenarios, prior to input into an LLM, an automated speech recognition (ASR) system generates an output transcript hypothesis, where inherent errors ca… ▽ More

    Submitted 5 January, 2024; originally announced January 2024.

    Comments: Accepted to ICASSP 2024

  45. arXiv:2401.01391  [pdf, other

    cs.CV cs.GR cs.LG

    On Optimal Sampling for Learning SDF Using MLPs Equipped with Positional Encoding

    Authors: Guying Lin, Lei Yang, Yuan Liu, Congyi Zhang, Junhui Hou, Xiaogang **, Taku Komura, John Keyser, Wen** Wang

    Abstract: Neural implicit fields, such as the neural signed distance field (SDF) of a shape, have emerged as a powerful representation for many applications, e.g., encoding a 3D shape and performing collision detection. Typically, implicit fields are encoded by Multi-layer Perceptrons (MLP) with positional encoding (PE) to capture high-frequency geometric details. However, a notable side effect of such PE-e… ▽ More

    Submitted 2 January, 2024; originally announced January 2024.

  46. arXiv:2312.15316  [pdf, other

    cs.CL eess.AS

    Paralinguistics-Enhanced Large Language Modeling of Spoken Dialogue

    Authors: Guan-Ting Lin, Prashanth Gurunath Shivakumar, Ankur Gandhe, Chao-Han Huck Yang, Yile Gu, Shalini Ghosh, Andreas Stolcke, Hung-yi Lee, Ivan Bulyko

    Abstract: Large Language Models (LLMs) have demonstrated superior abilities in tasks such as chatting, reasoning, and question-answering. However, standard LLMs may ignore crucial paralinguistic information, such as sentiment, emotion, and speaking style, which are essential for achieving natural, human-like spoken conversation, especially when such information is conveyed by acoustic cues. We therefore pro… ▽ More

    Submitted 17 January, 2024; v1 submitted 23 December, 2023; originally announced December 2023.

    Comments: Accepted by ICASSP 2024. Camera-ready version

  47. arXiv:2312.09781  [pdf, other

    cs.CL cs.AI

    GSQA: An End-to-End Model for Generative Spoken Question Answering

    Authors: Min-Han Shih, Ho-Lam Chung, Yu-Chi Pai, Ming-Hao Hsu, Guan-Ting Lin, Shang-Wen Li, Hung-yi Lee

    Abstract: In recent advancements in spoken question answering (QA), end-to-end models have made significant strides. However, previous research has primarily focused on extractive span selection. While this extractive-based approach is effective when answers are present directly within the input, it falls short in addressing abstractive questions, where answers are not directly extracted but inferred from t… ▽ More

    Submitted 25 December, 2023; v1 submitted 15 December, 2023; originally announced December 2023.

    Comments: 5 pages, 2 figures, submitted to ICASSP 2024

  48. arXiv:2312.09404  [pdf, other

    cs.LG cond-mat.stat-mech physics.chem-ph

    Unbiasing Enhanced Sampling on a High-dimensional Free Energy Surface with Deep Generative Model

    Authors: Yikai Liu, Tushar K. Ghosh, Guang Lin, Ming Chen

    Abstract: Biased enhanced sampling methods utilizing collective variables (CVs) are powerful tools for sampling conformational ensembles. Due to high intrinsic dimensions, efficiently generating conformational ensembles for complex systems requires enhanced sampling on high-dimensional free energy surfaces. While methods like temperature-accelerated molecular dynamics (TAMD) can adopt many CVs in a simulati… ▽ More

    Submitted 17 December, 2023; v1 submitted 14 December, 2023; originally announced December 2023.

  49. arXiv:2312.08009  [pdf, other

    cs.CV

    Semi-Supervised Class-Agnostic Motion Prediction with Pseudo Label Regeneration and BEVMix

    Authors: Kewei Wang, Yizheng Wu, Zhiyu Pan, Xingyi Li, Ke Xian, Zhe Wang, Zhiguo Cao, Guosheng Lin

    Abstract: Class-agnostic motion prediction methods aim to comprehend motion within open-world scenarios, holding significance for autonomous driving systems. However, training a high-performance model in a fully-supervised manner always requires substantial amounts of manually annotated data, which can be both expensive and time-consuming to obtain. To address this challenge, our study explores the potentia… ▽ More

    Submitted 14 December, 2023; v1 submitted 13 December, 2023; originally announced December 2023.

    Comments: This paper is accepted by AAAI2024

  50. arXiv:2312.04820  [pdf, other

    cs.CV cs.AI

    Learn to Optimize Denoising Scores for 3D Generation: A Unified and Improved Diffusion Prior on NeRF and 3D Gaussian Splatting

    Authors: Xiaofeng Yang, Yiwen Chen, Cheng Chen, Chi Zhang, Yi Xu, Xulei Yang, Fayao Liu, Guosheng Lin

    Abstract: We propose a unified framework aimed at enhancing the diffusion priors for 3D generation tasks. Despite the critical importance of these tasks, existing methodologies often struggle to generate high-caliber results. We begin by examining the inherent limitations in previous diffusion priors. We identify a divergence between the diffusion priors and the training procedures of diffusion models that… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.