Skip to main content

Showing 1–50 of 104 results for author: Yin, F

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01598  [pdf

    cs.LG cs.AI

    Long-Term Prediction Accuracy Improvement of Data-Driven Medium-Range Global Weather Forecast

    Authors: Yifan Hu, Fukang Yin, Weimin Zhang, Kaijun Ren, Junqiang Song, Kefeng Deng, Di Zhang

    Abstract: Long-term stability stands as a crucial requirement in data-driven medium-range global weather forecasting. Spectral bias is recognized as the primary contributor to instabilities, as data-driven methods difficult to learn small-scale dynamics. In this paper, we reveal that the universal mechanism for these instabilities is not only related to spectral bias but also to distortions brought by proce… ▽ More

    Submitted 25 June, 2024; originally announced July 2024.

  2. arXiv:2407.00219  [pdf, other

    cs.CL cs.AI

    Evaluating Human Alignment and Model Faithfulness of LLM Rationale

    Authors: Mohsen Fayyaz, Fan Yin, Jiao Sun, Nanyun Peng

    Abstract: We study how well large language models (LLMs) explain their generations with rationales -- a set of tokens extracted from the input texts that reflect the decision process of LLMs. We examine LLM rationales extracted with two methods: 1) attribution-based methods that use attention or gradients to locate important tokens, and 2) prompting-based methods that guide LLMs to extract rationales using… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

  3. arXiv:2406.13692  [pdf, other

    cs.CL

    Synchronous Faithfulness Monitoring for Trustworthy Retrieval-Augmented Generation

    Authors: Di Wu, Jia-Chen Gu, Fan Yin, Nanyun Peng, Kai-Wei Chang

    Abstract: Retrieval-augmented language models (RALMs) have shown strong performance and wide applicability in knowledge-intensive tasks. However, there are significant trustworthiness concerns as RALMs are prone to generating unfaithful outputs, including baseless information or contradictions with the retrieved context. This paper proposes SynCheck, a lightweight monitor that leverages fine-grained decodin… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  4. arXiv:2406.02721  [pdf, other

    cs.CL cs.AI

    Self-Control of LLM Behaviors by Compressing Suffix Gradient into Prefix Controller

    Authors: Min Cai, Yuchen Zhang, Shichang Zhang, Fan Yin, Difan Zou, Yisong Yue, Ziniu Hu

    Abstract: We propose Self-Control, a novel method utilizing suffix gradients to control the behavior of large language models (LLMs) without explicit human annotations. Given a guideline expressed in suffix string and the model's self-assessment of adherence, Self-Control computes the gradient of this self-judgment concerning the model's hidden states, directly influencing the auto-regressive generation pro… ▽ More

    Submitted 18 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: 41 pages, 12 figures, 41 tables; Website: https://llm-self-control.github.io/

  5. arXiv:2406.01563  [pdf, other

    cs.CL

    LoFiT: Localized Fine-tuning on LLM Representations

    Authors: Fangcong Yin, Xi Ye, Greg Durrett

    Abstract: Recent work in interpretability shows that large language models (LLMs) can be adapted for new tasks in a learning-free way: it is possible to intervene on LLM representations to elicit desired behaviors for alignment. For instance, adding certain bias vectors to the outputs of certain attention heads is reported to boost the truthfulness of models. In this work, we show that localized fine-tuning… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  6. arXiv:2405.20853  [pdf, other

    cs.CV

    MeshXL: Neural Coordinate Field for Generative 3D Foundation Models

    Authors: Si** Chen, Xin Chen, Anqi Pang, Xianfang Zeng, Wei Cheng, Yijun Fu, Fukun Yin, Yanru Wang, Zhibin Wang, Chi Zhang, **gyi Yu, Gang Yu, Bin Fu, Tao Chen

    Abstract: The polygon mesh representation of 3D data exhibits great flexibility, fast rendering speed, and storage efficiency, which is widely preferred in various applications. However, given its unstructured graph representation, the direct generation of high-fidelity 3D meshes is challenging. Fortunately, with a pre-defined ordering strategy, 3D meshes can be represented as sequences, and the generation… ▽ More

    Submitted 18 June, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

  7. arXiv:2405.19716  [pdf, other

    cs.CV cs.CL

    Enhancing Large Vision Language Models with Self-Training on Image Comprehension

    Authors: Yihe Deng, Pan Lu, Fan Yin, Ziniu Hu, Sheng Shen, James Zou, Kai-Wei Chang, Wei Wang

    Abstract: Large vision language models (LVLMs) integrate large language models (LLMs) with pre-trained vision encoders, thereby activating the perception capability of the model to understand image inputs for different queries and conduct subsequent reasoning. Improving this capability requires high-quality vision-language data, which is costly and labor-intensive to acquire. Self-training approaches have b… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 19 pages, 14 figures, 6 tables

  8. arXiv:2405.08194  [pdf, other

    cs.IT

    Distributionally Robust Degree Optimization for BATS Codes

    Authors: Hoover H. F. Yin, Jie Wang, Sherman S. M. Chow

    Abstract: Batched sparse (BATS) code is a network coding solution for multi-hop wireless networks with packet loss. Achieving a close-to-optimal rate relies on an optimal degree distribution. Technical challenges arise from the sensitivity of this distribution to the often empirically obtained rank distribution at the destination node. Specifically, if the empirical distribution overestimates the channel, B… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: 8 pages, accepted by 2024 IEEE International Symposium on Information Theory

  9. arXiv:2404.01700  [pdf, other

    cs.CV

    MotionChain: Conversational Motion Controllers via Multimodal Prompts

    Authors: Biao Jiang, Xin Chen, Chi Zhang, Fukun Yin, Zhuoyuan Li, Gang YU, Jiayuan Fan

    Abstract: Recent advancements in language models have demonstrated their adeptness in conducting multi-turn dialogues and retaining conversational context. However, this proficiency remains largely unexplored in other multimodal generative models, particularly in human motion models. By integrating multi-turn conversations in controlling continuous virtual human movements, generative human motion models can… ▽ More

    Submitted 3 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: 14 pages, 4 figures

  10. arXiv:2404.01697  [pdf, other

    stat.ML cs.LG

    Preventing Model Collapse in Gaussian Process Latent Variable Models

    Authors: Ying Li, Zhidi Lin, Feng Yin, Michael Minyi Zhang

    Abstract: Gaussian process latent variable models (GPLVMs) are a versatile family of unsupervised learning models commonly used for dimensionality reduction. However, common challenges in modeling data with GPLVMs include inadequate kernel flexibility and improper selection of the projection noise, leading to a type of model collapse characterized by vague latent representations that do not reflect the unde… ▽ More

    Submitted 18 June, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: International Conference on Machine Learning (ICML), 2024

  11. arXiv:2403.10123  [pdf, other

    cs.LG

    Regularization-Based Efficient Continual Learning in Deep State-Space Models

    Authors: Yuanhang Zhang, Zhidi Lin, Yiyong Sun, Feng Yin, Carsten Fritsche

    Abstract: Deep state-space models (DSSMs) have gained popularity in recent years due to their potent modeling capacity for dynamic systems. However, existing DSSM works are limited to single-task modeling, which requires retraining with historical task data upon revisiting a forepassed task. To address this limitation, we propose continual learning DSSMs (CLDSSMs), which are capable of adapting to evolving… ▽ More

    Submitted 29 June, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

    Comments: 7 pages, 14 figures

  12. arXiv:2403.06457  [pdf, other

    cs.CV

    Ensemble Quadratic Assignment Network for Graph Matching

    Authors: Haoru Tan, Chuang Wang, Sitong Wu, Xu-Yao Zhang, Fei Yin, Cheng-Lin Liu

    Abstract: Graph matching is a commonly used technique in computer vision and pattern recognition. Recent data-driven approaches have improved the graph matching accuracy remarkably, whereas some traditional algorithm-based methods are more robust to feature noises, outlier nodes, and global transformation (e.g.~rotation). In this paper, we propose a graph neural network (GNN) based approach to combine the a… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

    Comments: Accepted by IJCV in 2024

  13. arXiv:2403.05149  [pdf, other

    physics.app-ph cs.AI

    Inverse Design of Photonic Crystal Surface Emitting Lasers is a Sequence Modeling Problem

    Authors: Ceyao Zhang, Renjie Li, Cheng Zhang, Zhaoyu Zhang, Feng Yin

    Abstract: Photonic Crystal Surface Emitting Lasers (PCSEL)'s inverse design demands expert knowledge in physics, materials science, and quantum mechanics which is prohibitively labor-intensive. Advanced AI technologies, especially reinforcement learning (RL), have emerged as a powerful tool to augment and accelerate this inverse design process. By modeling the inverse design of PCSEL as a sequential decisio… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: accepted by AAAI workshop AI2ASE(2024)https://ai-2-ase.github.io/papers/29%5cCameraReady%5cPIT__PSCEL_inverse_design_transformer.pdf

  14. On Defeating Graph Analysis of Anonymous Transactions

    Authors: Christoph Egger, Russell W. F. Lai, Viktoria Ronge, Ivy K. Y. Woo, Hoover H. F. Yin

    Abstract: In a ring-signature-based anonymous cryptocurrency, signers of a transaction are hidden among a set of potential signers, called a ring, whose size is much smaller than the number of all users. The ring-membership relations specified by the sets of transactions thus induce bipartite transaction graphs, whose distribution is in turn induced by the ring sampler underlying the cryptocurrency. Since… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

    Journal ref: Proceedings on Privacy Enhancing Technologies (PoPETs), Vol. 2022, Issue 3, Pages 538-557

  15. arXiv:2402.18048  [pdf, other

    cs.CL

    Characterizing Truthfulness in Large Language Model Generations with Local Intrinsic Dimension

    Authors: Fan Yin, Jayanth Srinivasa, Kai-Wei Chang

    Abstract: We study how to characterize and predict the truthfulness of texts generated from large language models (LLMs), which serves as a crucial step in building trust between humans and LLMs. Although several approaches based on entropy or verbalized uncertainty have been proposed to calibrate model predictions, these methods are often intractable, sensitive to hyperparameters, and less reliable when ap… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

    Comments: preprint, 9 pages, 5 figures

  16. arXiv:2402.11138  [pdf, other

    cs.CL cs.AI cs.LG

    Contrastive Instruction Tuning

    Authors: Tianyi Lorena Yan, Fei Wang, James Y. Huang, Wenxuan Zhou, Fan Yin, Aram Galstyan, Wenpeng Yin, Muhao Chen

    Abstract: Instruction tuning has been used as a promising approach to improve the performance of large language models (LLMs) on unseen tasks. However, current LLMs exhibit limited robustness to unseen instructions, generating inconsistent outputs when the same instruction is phrased with slightly varied forms or language styles. This behavior indicates LLMs' lack of robustness to textual variations and gen… ▽ More

    Submitted 6 June, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

    Comments: ACL 2024 Findings

  17. arXiv:2402.10104  [pdf, other

    cs.AI cs.CL

    GeoEval: Benchmark for Evaluating LLMs and Multi-Modal Models on Geometry Problem-Solving

    Authors: Jiaxin Zhang, Zhongzhi Li, Mingliang Zhang, Fei Yin, Chenglin Liu, Yashar Moshfeghi

    Abstract: Recent advancements in large language models (LLMs) and multi-modal models (MMs) have demonstrated their remarkable capabilities in problem-solving. Yet, their proficiency in tackling geometry math problems, which necessitates an integrated understanding of both textual and visual information, has not been thoroughly evaluated. To address this gap, we introduce the GeoEval benchmark, a comprehensi… ▽ More

    Submitted 17 May, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

    Comments: Accepted in ACL 2024 Findings

  18. arXiv:2401.18018  [pdf, other

    cs.LG cs.AI cs.CL

    On Prompt-Driven Safeguarding for Large Language Models

    Authors: Chujie Zheng, Fan Yin, Hao Zhou, Fandong Meng, Jie Zhou, Kai-Wei Chang, Minlie Huang, Nanyun Peng

    Abstract: Prepending model inputs with safety prompts is a common practice for safeguarding large language models (LLMs) against queries with harmful intents. However, the underlying working mechanisms of safety prompts have not been unraveled yet, restricting the possibility of automatically optimizing them to improve LLM safety. In this work, we investigate how LLMs' behavior (i.e., complying with or refu… ▽ More

    Submitted 3 June, 2024; v1 submitted 31 January, 2024; originally announced January 2024.

    Comments: ICML 2024

  19. arXiv:2401.03428  [pdf, other

    cs.AI cs.MA

    Exploring Large Language Model based Intelligent Agents: Definitions, Methods, and Prospects

    Authors: Yuheng Cheng, Ceyao Zhang, Zhengwen Zhang, Xiangrui Meng, Sirui Hong, Wenhao Li, Zihao Wang, Zekai Wang, Feng Yin, Junhua Zhao, Xiuqiang He

    Abstract: Intelligent agents stand out as a potential path toward artificial general intelligence (AGI). Thus, researchers have dedicated significant effort to diverse implementations for them. Benefiting from recent progress in large language models (LLMs), LLM-based agents that use universal natural language as an interface exhibit robust generalization capabilities across various applications -- from ser… ▽ More

    Submitted 7 January, 2024; originally announced January 2024.

  20. arXiv:2312.10763  [pdf, other

    cs.CV

    M3DBench: Let's Instruct Large Models with Multi-modal 3D Prompts

    Authors: Mingsheng Li, Xin Chen, Chi Zhang, Si** Chen, Hongyuan Zhu, Fukun Yin, Gang Yu, Tao Chen

    Abstract: Recently, 3D understanding has become popular to facilitate autonomous agents to perform further decisionmaking. However, existing 3D datasets and methods are often limited to specific tasks. On the other hand, recent progress in Large Language Models (LLMs) and Multimodal Language Models (MLMs) have demonstrated exceptional general language and imagery tasking performance. Therefore, it is intere… ▽ More

    Submitted 17 December, 2023; originally announced December 2023.

  21. arXiv:2312.05910  [pdf, other

    cs.LG eess.SP stat.ML

    Ensemble Kalman Filtering Meets Gaussian Process SSM for Non-Mean-Field and Online Inference

    Authors: Zhidi Lin, Yiyong Sun, Feng Yin, Alexandre Hoang Thiéry

    Abstract: The Gaussian process state-space models (GPSSMs) represent a versatile class of data-driven nonlinear dynamical system models. However, the presence of numerous latent variables in GPSSM incurs unresolved issues for existing variational inference approaches, particularly under the more realistic non-mean-field (NMF) assumption, including extensive training effort, compromised inference accuracy, a… ▽ More

    Submitted 14 January, 2024; v1 submitted 10 December, 2023; originally announced December 2023.

    Comments: Gaussian process, state-space model, ensemble Kalman filter, online learning, variational inference. (19 pages, 10 figures)

  22. arXiv:2311.17618  [pdf, other

    cs.CV

    ShapeGPT: 3D Shape Generation with A Unified Multi-modal Language Model

    Authors: Fukun Yin, Xin Chen, Chi Zhang, Biao Jiang, Zibo Zhao, Jiayuan Fan, Gang Yu, Taihao Li, Tao Chen

    Abstract: The advent of large language models, enabling flexibility through instruction-driven approaches, has revolutionized many traditional generative tasks, but large models for 3D data, particularly in comprehensively handling 3D shapes with other modalities, are still under-explored. By achieving instruction-based shape generations, versatile multimodal generative shape models can significantly benefi… ▽ More

    Submitted 1 December, 2023; v1 submitted 29 November, 2023; originally announced November 2023.

  23. arXiv:2311.16856  [pdf, other

    cs.LG eess.SP stat.ML

    Attentional Graph Neural Networks for Robust Massive Network Localization

    Authors: Wenzhong Yan, Juntao Wang, Feng Yin, Yang Tian, Abdelhak M. Zoubir

    Abstract: In recent years, Graph neural networks (GNNs) have emerged as a prominent tool for classification tasks in machine learning. However, their application in regression tasks remains underexplored. To tap the potential of GNNs in regression, this paper integrates GNNs with attention mechanism, a technique that revolutionized sequential learning tasks with its adaptability and robustness, to tackle a… ▽ More

    Submitted 14 February, 2024; v1 submitted 28 November, 2023; originally announced November 2023.

  24. arXiv:2311.16476  [pdf, other

    cs.CV cs.AI

    LANS: A Layout-Aware Neural Solver for Plane Geometry Problem

    Authors: Zhong-Zhi Li, Ming-Liang Zhang, Fei Yin, Cheng-Lin Liu

    Abstract: Geometry problem solving (GPS) is a challenging mathematical reasoning task requiring multi-modal understanding, fusion, and reasoning. Existing neural solvers take GPS as a vision-language task but are short in the representation of geometry diagrams that carry rich and complex layout information. In this paper, we propose a layout-aware neural solver named LANS, integrated with two new modules:… ▽ More

    Submitted 19 February, 2024; v1 submitted 24 November, 2023; originally announced November 2023.

  25. arXiv:2311.01773  [pdf, other

    cs.CV

    PDF: Point Diffusion Implicit Function for Large-scale Scene Neural Representation

    Authors: Yuhan Ding, Fukun Yin, Jiayuan Fan, Hui Li, Xin Chen, Wen Liu, Chongshan Lu, Gang YU, Tao Chen

    Abstract: Recent advances in implicit neural representations have achieved impressive results by sampling and fusing individual points along sampling rays in the sampling space. However, due to the explosively growing sampling space, finely representing and synthesizing detailed textures remains a challenge for unbounded large-scale outdoor scenes. To alleviate the dilemma of using individual points to perc… ▽ More

    Submitted 3 November, 2023; originally announced November 2023.

    Comments: Accepted to NeurIPS 2023

  26. arXiv:2311.00288  [pdf, other

    cs.CL cs.AI

    Active Instruction Tuning: Improving Cross-Task Generalization by Training on Prompt Sensitive Tasks

    Authors: Po-Nien Kung, Fan Yin, Di Wu, Kai-Wei Chang, Nanyun Peng

    Abstract: Instruction tuning (IT) achieves impressive zero-shot generalization results by training large language models (LLMs) on a massive amount of diverse tasks with instructions. However, how to select new tasks to improve the performance and generalizability of IT models remains an open question. Training on all existing tasks is impractical due to prohibiting computation requirements, and randomly se… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

    Comments: EMNLP 2023 Main

  27. arXiv:2310.14487  [pdf, other

    cs.CV cs.AI

    VQ-NeRF: Vector Quantization Enhances Implicit Neural Representations

    Authors: Yiying Yang, Wen Liu, Fukun Yin, Xin Chen, Gang Yu, Jiayuan Fan, Tao Chen

    Abstract: Recent advancements in implicit neural representations have contributed to high-fidelity surface reconstruction and photorealistic novel view synthesis. However, the computational complexity inherent in these methodologies presents a substantial impediment, constraining the attainable frame rates and resolutions in practical applications. In response to this predicament, we propose VQ-NeRF, an eff… ▽ More

    Submitted 22 October, 2023; originally announced October 2023.

    Comments: Submitted to the 38th Annual AAAI Conference on Artificial Intelligence

  28. arXiv:2309.08201  [pdf, other

    cs.LG eess.SP math.OC

    Sparsity-Aware Distributed Learning for Gaussian Processes with Linear Multiple Kernel

    Authors: Richard Cornelius Suwandi, Zhidi Lin, Feng Yin, Zhiguo Wang, Sergios Theodoridis

    Abstract: Gaussian processes (GPs) stand as crucial tools in machine learning and signal processing, with their effectiveness hinging on kernel design and hyper-parameter optimization. This paper presents a novel GP linear multiple kernel (LMK) and a generic sparsity-aware distributed learning framework to optimize the hyper-parameters. The newly proposed grid spectral mixture (GSM) kernel is tailored for m… ▽ More

    Submitted 26 December, 2023; v1 submitted 15 September, 2023; originally announced September 2023.

  29. arXiv:2309.01074  [pdf, other

    cs.LG eess.SP eess.SY

    Towards Efficient Modeling and Inference in Multi-Dimensional Gaussian Process State-Space Models

    Authors: Zhidi Lin, Juan Maroñas, Ying Li, Feng Yin, Sergios Theodoridis

    Abstract: The Gaussian process state-space model (GPSSM) has attracted extensive attention for modeling complex nonlinear dynamical systems. However, the existing GPSSM employs separate Gaussian processes (GPs) for each latent state dimension, leading to escalating computational complexity and parameter proliferation, thus posing challenges for modeling dynamical systems with high-dimensional latent states.… ▽ More

    Submitted 3 September, 2023; originally announced September 2023.

  30. arXiv:2308.12866  [pdf, other

    cs.CV

    ToonTalker: Cross-Domain Face Reenactment

    Authors: Yuan Gong, Yong Zhang, Xiaodong Cun, Fei Yin, Yanbo Fan, Xuan Wang, Baoyuan Wu, Yujiu Yang

    Abstract: We target cross-domain face reenactment in this paper, i.e., driving a cartoon image with the video of a real person and vice versa. Recently, many works have focused on one-shot talking face generation to drive a portrait with a real video, i.e., within-domain reenactment. Straightforwardly applying those methods to cross-domain animation will cause inaccurate expression transfer, blur effects, a… ▽ More

    Submitted 24 August, 2023; originally announced August 2023.

  31. arXiv:2308.11339  [pdf, other

    cs.AI cs.LG cs.MA

    ProAgent: Building Proactive Cooperative Agents with Large Language Models

    Authors: Ceyao Zhang, Kaijie Yang, Siyi Hu, Zihao Wang, Guanghe Li, Yihang Sun, Cheng Zhang, Zhaowei Zhang, Anji Liu, Song-Chun Zhu, Xiaojun Chang, Junge Zhang, Feng Yin, Yitao Liang, Yaodong Yang

    Abstract: Building agents with adaptive behavior in cooperative tasks stands as a paramount goal in the realm of multi-agent systems. Current approaches to develo** cooperative agents rely primarily on learning-based methods, whose policy generalization depends heavily on the diversity of teammates they interact with during the training phase. Such reliance, however, constrains the agents' capacity for st… ▽ More

    Submitted 11 January, 2024; v1 submitted 22 August, 2023; originally announced August 2023.

    Comments: v3 is the AAAI'24 camera ready version, which polished abstract and introduction based on the reviewers' comments, and enriched related works. 7 pages of main content, 2 pages of references, 2 figures and 1 table

  32. arXiv:2307.07434  [pdf, other

    cs.CV eess.IV

    Combining multitemporal optical and SAR data for LAI imputation with BiLSTM network

    Authors: W. Zhao, F. Yin, H. Ma, Q. Wu, J. Gomez-Dans, P. Lewis

    Abstract: The Leaf Area Index (LAI) is vital for predicting winter wheat yield. Acquisition of crop conditions via Sentinel-2 remote sensing images can be hindered by persistent clouds, affecting yield predictions. Synthetic Aperture Radar (SAR) provides all-weather imagery, and the ratio between its cross- and co-polarized channels (C-band) shows a high correlation with time series LAI over winter wheat re… ▽ More

    Submitted 14 July, 2023; originally announced July 2023.

  33. arXiv:2307.03441  [pdf, other

    cs.CV

    NOFA: NeRF-based One-shot Facial Avatar Reconstruction

    Authors: Wangbo Yu, Yanbo Fan, Yong Zhang, Xuan Wang, Fei Yin, Yunpeng Bai, Yan-Pei Cao, Ying Shan, Yang Wu, Zhongqian Sun, Baoyuan Wu

    Abstract: 3D facial avatar reconstruction has been a significant research topic in computer graphics and computer vision, where photo-realistic rendering and flexible controls over poses and expressions are necessary for many related applications. Recently, its performance has been greatly improved with the development of neural radiance fields (NeRF). However, most existing NeRF-based facial avatars focus… ▽ More

    Submitted 7 July, 2023; originally announced July 2023.

  34. arXiv:2306.11839  [pdf, other

    stat.ME cs.LG stat.AP stat.ML

    Should I Stop or Should I Go: Early Stop** with Heterogeneous Populations

    Authors: Hammaad Adam, Fan Yin, Huibin, Hu, Neil Tenenholtz, Lorin Crawford, Lester Mackey, Allison Koenecke

    Abstract: Randomized experiments often need to be stopped prematurely due to the treatment having an unintended harmful effect. Existing methods that determine when to stop an experiment early are typically applied to the data in aggregate and do not account for treatment effect heterogeneity. In this paper, we study the early stop** of experiments for harm on heterogeneous populations. We first establish… ▽ More

    Submitted 27 October, 2023; v1 submitted 20 June, 2023; originally announced June 2023.

    Comments: NeurIPS 2023 (spotlight)

  35. arXiv:2306.02083  [pdf, other

    cs.CV

    Efficient Text-Guided 3D-Aware Portrait Generation with Score Distillation Sampling on Distribution

    Authors: Yiji Cheng, Fei Yin, Xiaoke Huang, Xintong Yu, Jiaxiang Liu, Shikun Feng, Yujiu Yang, Yansong Tang

    Abstract: Text-to-3D is an emerging task that allows users to create 3D content with infinite possibilities. Existing works tackle the problem by optimizing a 3D representation with guidance from pre-trained diffusion models. An apparent drawback is that they need to optimize from scratch for each prompt, which is computationally expensive and often yields poor visual fidelity. In this paper, we propose Dre… ▽ More

    Submitted 3 June, 2023; originally announced June 2023.

  36. arXiv:2306.01150  [pdf, other

    cs.CL cs.AI

    Did You Read the Instructions? Rethinking the Effectiveness of Task Definitions in Instruction Learning

    Authors: Fan Yin, Jesse Vig, Philippe Laban, Shafiq Joty, Caiming Xiong, Chien-Sheng Jason Wu

    Abstract: Large language models (LLMs) have shown impressive performance in following natural language instructions to solve unseen tasks. However, it remains unclear whether models truly understand task definitions and whether the human-written definitions are optimal. In this paper, we systematically study the role of task definitions in instruction learning. We first conduct an ablation analysis informed… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

    Comments: ACL 2023, camera-ready; 10 pages

  37. arXiv:2305.19998  [pdf, other

    cs.CL cs.LG

    Efficient Shapley Values Estimation by Amortization for Text Classification

    Authors: Chenghao Yang, Fan Yin, He He, Kai-Wei Chang, Xiaofei Ma, Bing Xiang

    Abstract: Despite the popularity of Shapley Values in explaining neural text classification models, computing them is prohibitive for large pretrained models due to a large number of model evaluations. In practice, Shapley Values are often estimated with a small number of stochastic model evaluations. However, we show that the estimated Shapley Values are sensitive to random seed choices -- the top-ranked f… ▽ More

    Submitted 31 May, 2023; originally announced May 2023.

    Comments: ACL 2023 Camera Ready

  38. arXiv:2305.19713  [pdf, other

    cs.CL cs.LG

    Red Teaming Language Model Detectors with Language Models

    Authors: Zhouxing Shi, Yihan Wang, Fan Yin, Xiangning Chen, Kai-Wei Chang, Cho-Jui Hsieh

    Abstract: The prevalence and strong capability of large language models (LLMs) present significant safety and ethical risks if exploited by malicious users. To prevent the potentially deceptive usage of LLMs, recent works have proposed algorithms to detect LLM-generated text and protect LLMs. In this paper, we investigate the robustness and reliability of these LLM detectors under adversarial attacks. We st… ▽ More

    Submitted 19 October, 2023; v1 submitted 31 May, 2023; originally announced May 2023.

    Comments: Preprint. Accepted by TACL

  39. arXiv:2305.16965  [pdf, other

    cs.CV cs.LG eess.IV

    Accelerating Diffusion Models for Inverse Problems through Shortcut Sampling

    Authors: Gongye Liu, Haoze Sun, Jiayi Li, Fei Yin, Yujiu Yang

    Abstract: Diffusion models have recently demonstrated an impressive ability to address inverse problems in an unsupervised manner. While existing methods primarily focus on modifying the posterior sampling process, the potential of the forward process remains largely unexplored. In this work, we propose Shortcut Sampling for Diffusion(SSD), a novel approach for solving inverse problems in a zero-shot manner… ▽ More

    Submitted 2 May, 2024; v1 submitted 26 May, 2023; originally announced May 2023.

    Comments: full version; IJCAI 2024 accepted (main track)

  40. arXiv:2305.14327  [pdf, other

    cs.CL cs.AI

    Dynosaur: A Dynamic Growth Paradigm for Instruction-Tuning Data Curation

    Authors: Da Yin, Xiao Liu, Fan Yin, Ming Zhong, Hritik Bansal, Jiawei Han, Kai-Wei Chang

    Abstract: Instruction tuning has emerged to enhance the capabilities of large language models (LLMs) to comprehend instructions and generate appropriate responses. Existing methods either manually annotate or employ LLM (e.g., GPT-series) to generate data for instruction tuning. However, they often overlook associating instructions with existing annotated datasets. In this paper, we propose Dynosaur, a dyna… ▽ More

    Submitted 26 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: EMNLP 2023. Code and data are available at https://github.com/WadeYin9712/Dynosaur

  41. arXiv:2305.14216  [pdf, other

    cs.LG

    Constrained Proximal Policy Optimization

    Authors: Chengbin Xuan, Feng Zhang, Faliang Yin, Hak-Keung Lam

    Abstract: The problem of constrained reinforcement learning (CRL) holds significant importance as it provides a framework for addressing critical safety satisfaction concerns in the field of reinforcement learning (RL). However, with the introduction of constraint satisfaction, the current CRL methods necessitate the utilization of second-order optimization or primal-dual frameworks with additional Lagrangi… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

  42. arXiv:2303.10533  [pdf

    q-bio.QM cs.CV

    A Radiomics-Incorporated Deep Ensemble Learning Model for Multi-Parametric MRI-based Glioma Segmentation

    Authors: Yang Chen, Zhenyu Yang, **gtong Zhao, Justus Adamson, Yang Sheng, Fang-Fang Yin, Chunhao Wang

    Abstract: We developed a deep ensemble learning model with a radiomics spatial encoding execution for improved glioma segmentation accuracy using multi-parametric MRI (mp-MRI). This model was developed using 369 glioma patients with a 4-modality mp-MRI protocol: T1, contrast-enhanced T1 (T1-Ce), T2, and FLAIR. In each modality volume, a 3D sliding kernel was implemented across the brain to capture image het… ▽ More

    Submitted 18 March, 2023; originally announced March 2023.

  43. arXiv:2303.03323  [pdf, other

    cs.CV cs.AI cs.CR cs.LG

    CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive Learning

    Authors: Hritik Bansal, Nishad Singhi, Yu Yang, Fan Yin, Aditya Grover, Kai-Wei Chang

    Abstract: Multimodal contrastive pretraining has been used to train multimodal representation models, such as CLIP, on large amounts of paired image-text data. However, previous studies have revealed that such models are vulnerable to backdoor attacks. Specifically, when trained on backdoored examples, CLIP learns spurious correlations between the embedded backdoor trigger and the target label, aligning the… ▽ More

    Submitted 17 July, 2023; v1 submitted 6 March, 2023; originally announced March 2023.

    Comments: 22 pages. Accepted at ICCV 2023

  44. arXiv:2302.11097  [pdf, other

    cs.AI cs.CV

    A Multi-Modal Neural Geometric Solver with Textual Clauses Parsed from Diagram

    Authors: Ming-Liang Zhang, Fei Yin, Cheng-Lin Liu

    Abstract: Geometry problem solving (GPS) is a high-level mathematical reasoning requiring the capacities of multi-modal fusion and geometric knowledge application. Recently, neural solvers have shown great potential in GPS but still be short in diagram presentation and modal fusion. In this work, we convert diagrams into basic textual clauses to describe diagram features effectively, and propose a new neura… ▽ More

    Submitted 28 April, 2023; v1 submitted 21 February, 2023; originally announced February 2023.

    Comments: Accepted to IJCAI 2023

  45. arXiv:2301.08843  [pdf, other

    cs.LG eess.SP

    Towards Flexibility and Interpretability of Gaussian Process State-Space Model

    Authors: Zhid Lin, Feng Yin, Juan Maroñas

    Abstract: The Gaussian process state-space model (GPSSM) has garnered considerable attention over the past decade. However, the standard GP with a preliminary kernel, such as the squared exponential kernel or Matérn kernel, that is commonly used in GPSSM studies, limits the model's representation power and substantially restricts its applicability to complex scenarios. To address this issue, we propose a ne… ▽ More

    Submitted 6 April, 2023; v1 submitted 20 January, 2023; originally announced January 2023.

    Comments: preprint

  46. arXiv:2301.06782  [pdf, other

    cs.CV

    A Large-Scale Outdoor Multi-modal Dataset and Benchmark for Novel View Synthesis and Implicit Scene Reconstruction

    Authors: Chongshan Lu, Fukun Yin, Xin Chen, Tao Chen, Gang YU, Jiayuan Fan

    Abstract: Neural Radiance Fields (NeRF) has achieved impressive results in single object scene reconstruction and novel view synthesis, which have been demonstrated on many single modality and single object focused indoor scene datasets like DTU, BMVS, and NeRF Synthetic.However, the study of NeRF on large-scale outdoor scene reconstruction is still limited, as there is no unified outdoor scene dataset for… ▽ More

    Submitted 17 January, 2023; originally announced January 2023.

  47. arXiv:2301.00364  [pdf, other

    cs.LG cs.CR cs.CV

    Generalizable Black-Box Adversarial Attack with Meta Learning

    Authors: Fei Yin, Yong Zhang, Baoyuan Wu, Yan Feng, **gyi Zhang, Yanbo Fan, Yujiu Yang

    Abstract: In the scenario of black-box adversarial attack, the target model's parameters are unknown, and the attacker aims to find a successful adversarial perturbation based on query feedback under a query budget. Due to the limited feedback information, existing query-based black-box attack methods often require many queries for attacking each benign example. To reduce query cost, we propose to utilize t… ▽ More

    Submitted 1 January, 2023; originally announced January 2023.

    Comments: T-PAMI 2022. Project Page is at https://github.com/SCLBD/MCG-Blackbox

  48. arXiv:2212.07608  [pdf, other

    cs.LG eess.SP eess.SY

    Output-Dependent Gaussian Process State-Space Model

    Authors: Zhidi Lin, Lei Cheng, Feng Yin, Lexi Xu, Shuguang Cui

    Abstract: Gaussian process state-space model (GPSSM) is a fully probabilistic state-space model that has attracted much attention over the past decade. However, the outputs of the transition function in the existing GPSSMs are assumed to be independent, meaning that the GPSSMs cannot exploit the inductive biases between different outputs and lose certain model capacities. To address this issue, this paper p… ▽ More

    Submitted 14 December, 2022; originally announced December 2022.

    Comments: 5 pages, 4 figures

  49. arXiv:2211.16927  [pdf, other

    cs.CV

    3D GAN Inversion with Facial Symmetry Prior

    Authors: Fei Yin, Yong Zhang, Xuan Wang, Tengfei Wang, Xiaoyu Li, Yuan Gong, Yanbo Fan, Xiaodong Cun, Ying Shan, Cengiz Oztireli, Yujiu Yang

    Abstract: Recently, a surge of high-quality 3D-aware GANs have been proposed, which leverage the generative power of neural rendering. It is natural to associate 3D GANs with GAN inversion methods to project a real image into the generator's latent space, allowing free-view consistent synthesis and editing, referred as 3D GAN inversion. Although with the facial prior preserved in pre-trained 3D GANs, recons… ▽ More

    Submitted 14 March, 2023; v1 submitted 30 November, 2022; originally announced November 2022.

    Comments: Project Page is at https://feiiyin.github.io/SPI/

  50. arXiv:2211.14758  [pdf, other

    cs.CV

    VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild

    Authors: Kun Cheng, Xiaodong Cun, Yong Zhang, Menghan Xia, Fei Yin, Mingrui Zhu, Xuan Wang, Jue Wang, Nannan Wang

    Abstract: We present VideoReTalking, a new system to edit the faces of a real-world talking head video according to input audio, producing a high-quality and lip-syncing output video even with a different emotion. Our system disentangles this objective into three sequential tasks: (1) face video generation with a canonical expression; (2) audio-driven lip-sync; and (3) face enhancement for improving photo-r… ▽ More

    Submitted 27 November, 2022; originally announced November 2022.

    Comments: Accepted by SIGGRAPH Asia 2022 Conference Proceedings. Project page: https://vinthony.github.io/video-retalking/