Skip to main content

Showing 1–50 of 220 results for author: Wu, N

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.18995  [pdf, other

    cs.LG cs.AI

    FedMLP: Federated Multi-Label Medical Image Classification under Task Heterogeneity

    Authors: Zhaobin Sun, Nannan Wu, Junjie Shi, Li Yu, Xin Yang, Kwang-Ting Cheng, Zengqiang Yan

    Abstract: Cross-silo federated learning (FL) enables decentralized organizations to collaboratively train models while preserving data privacy and has made significant progress in medical image classification. One common assumption is task homogeneity where each client has access to all classes during training. However, in clinical practice, given a multi-label classification task, constrained by the level… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: Early accepted by MICCAI 2024

  2. arXiv:2406.15658  [pdf, other

    cs.CV cs.AI

    TorchSpatial: A Location Encoding Framework and Benchmark for Spatial Representation Learning

    Authors: Nemin Wu, Qian Cao, Zhangyu Wang, Ze** Liu, Yanlin Qi, Jielu Zhang, Joshua Ni, Xiaobai Yao, Hongxu Ma, Lan Mu, Stefano Ermon, Tanuja Ganu, Akshay Nambi, Ni Lao, Gengchen Mai

    Abstract: Spatial representation learning (SRL) aims at learning general-purpose neural network representations from various types of spatial data (e.g., points, polylines, polygons, networks, images, etc.) in their native formats. Learning good spatial representations is a fundamental problem for various downstream applications such as species distribution modeling, weather forecasting, trajectory generati… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: 9 pages, 2 figures. Submitted to NeurIPS 2024 Datasets and Benchmarks Track. Under review

  3. arXiv:2406.14434  [pdf, other

    cs.CL

    Towards Truthful Multilingual Large Language Models: Benchmarking and Alignment Strategies

    Authors: Weihao Liu, Ning Wu, Wenbiao Ding, Shining Liang, Ming Gong, Dongmei Zhang

    Abstract: In the era of large language models (LLMs), building multilingual large language models (MLLMs) that can serve users worldwide holds great significance. However, existing research seldom focuses on the truthfulness of MLLMs. Meanwhile, contemporary multilingual aligning technologies struggle to balance massive languages and often exhibit serious truthfulness gaps across different languages, especi… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 15 pages

  4. arXiv:2405.19758  [pdf, other

    cs.RO

    InterPreT: Interactive Predicate Learning from Language Feedback for Generalizable Task Planning

    Authors: Muzhi Han, Yifeng Zhu, Song-Chun Zhu, Ying Nian Wu, Yuke Zhu

    Abstract: Learning abstract state representations and knowledge is crucial for long-horizon robot planning. We present InterPreT, an LLM-powered framework for robots to learn symbolic predicates from language feedback of human non-experts during embodied interaction. The learned predicates provide relational abstractions of the environment state, facilitating the learning of symbolic operators that capture… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: RSS 2024; https://interpret-robot.github.io

  5. arXiv:2405.18816  [pdf, other

    cs.CV cs.LG

    Flow Priors for Linear Inverse Problems via Iterative Corrupted Trajectory Matching

    Authors: Yasi Zhang, Peiyu Yu, Yaxuan Zhu, Yingshan Chang, Feng Gao, Ying Nian Wu, Oscar Leong

    Abstract: Generative models based on flow matching have attracted significant attention for their simplicity and superior performance in high-resolution image synthesis. By leveraging the instantaneous change-of-variables formula, one can directly compute image likelihoods from a learned flow, making them enticing candidates as priors for downstream tasks such as inverse problems. In particular, a natural a… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  6. arXiv:2405.18515  [pdf, other

    cs.LG

    Atlas3D: Physically Constrained Self-Supporting Text-to-3D for Simulation and Fabrication

    Authors: Yunuo Chen, Tianyi Xie, Zeshun Zong, Xuan Li, Feng Gao, Yin Yang, Ying Nian Wu, Chenfanfu Jiang

    Abstract: Existing diffusion-based text-to-3D generation methods primarily focus on producing visually realistic shapes and appearances, often neglecting the physical constraints necessary for downstream tasks. Generated models frequently fail to maintain balance when placed in physics-based simulations or 3D printed. This balance is crucial for satisfying user design intentions in interactive gaming, embod… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  7. arXiv:2405.16865  [pdf, other

    q-bio.NC cs.LG stat.ML

    An Investigation of Conformal Isometry Hypothesis for Grid Cells

    Authors: Dehong Xu, Ruiqi Gao, Wen-Hao Zhang, Xue-Xin Wei, Ying Nian Wu

    Abstract: This paper investigates the conformal isometry hypothesis as a potential explanation for the emergence of hexagonal periodic patterns in the response maps of grid cells. The hypothesis posits that the activities of the population of grid cells form a high-dimensional vector in the neural space, representing the agent's self-position in 2D physical space. As the agent moves in the 2D physical space… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: arXiv admin note: text overlap with arXiv:2310.19192

  8. arXiv:2405.16852  [pdf, other

    cs.LG cs.AI stat.ML

    EM Distillation for One-step Diffusion Models

    Authors: Sirui Xie, Zhisheng Xiao, Diederik P Kingma, Tingbo Hou, Ying Nian Wu, Kevin Patrick Murphy, Tim Salimans, Ben Poole, Ruiqi Gao

    Abstract: While diffusion models can learn complex distributions, sampling requires a computationally expensive iterative process. Existing distillation methods enable efficient sampling, but have notable limitations, such as performance degradation with very few sampling steps, reliance on training data access, or mode-seeking optimization that may fail to capture the full distribution. We propose EM Disti… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  9. arXiv:2405.16730  [pdf, other

    cs.LG cs.AI stat.AP

    Latent Energy-Based Odyssey: Black-Box Optimization via Expanded Exploration in the Energy-Based Latent Space

    Authors: Peiyu Yu, Dinghuai Zhang, Hengzhi He, Xiaojian Ma, Ruiyao Miao, Yifan Lu, Yasi Zhang, Deqian Kong, Ruiqi Gao, Jianwen Xie, Guang Cheng, Ying Nian Wu

    Abstract: Offline Black-Box Optimization (BBO) aims at optimizing a black-box function using the knowledge from a pre-collected offline dataset of function values and corresponding input designs. However, the high-dimensional and highly-multimodal input design space of black-box function pose inherent challenges for most existing methods that model and operate directly upon input designs. These issues inclu… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  10. arXiv:2405.16127  [pdf, other

    cs.IR

    Finetuning Large Language Model for Personalized Ranking

    Authors: Zhuoxi Bai, Ning Wu, Fengyu Cai, Xinyi Zhu, Yun Xiong

    Abstract: Large Language Models (LLMs) have demonstrated remarkable performance across various domains, motivating researchers to investigate their potential use in recommendation systems. However, directly applying LLMs to recommendation tasks has proven challenging due to the significant disparity between the data used for pre-training LLMs and the specific requirements of recommendation tasks. In this st… ▽ More

    Submitted 20 June, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

  11. arXiv:2405.14018  [pdf, other

    cs.CR cs.LG stat.AP

    Watermarking Generative Tabular Data

    Authors: Hengzhi He, Peiyu Yu, Junpeng Ren, Ying Nian Wu, Guang Cheng

    Abstract: In this paper, we introduce a simple yet effective tabular data watermarking mechanism with statistical guarantees. We show theoretically that the proposed watermark can be effectively detected, while faithfully preserving the data fidelity, and also demonstrates appealing robustness against additive noise attack. The general idea is to achieve the watermarking through a strategic embedding based… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  12. arXiv:2405.10570  [pdf

    eess.IV cs.AI

    Simultaneous Deep Learning of Myocardium Segmentation and T2 Quantification for Acute Myocardial Infarction MRI

    Authors: Yirong Zhou, Chengyan Wang, Mengtian Lu, Kunyuan Guo, Zi Wang, Dan Ruan, Rui Guo, Peijun Zhao, Jianhua Wang, Naiming Wu, Jianzhong Lin, Yinyin Chen, Hang **, Lianxin Xie, Lilan Wu, Liuhong Zhu, Jianjun Zhou, Congbo Cai, He Wang, Xiaobo Qu

    Abstract: In cardiac Magnetic Resonance Imaging (MRI) analysis, simultaneous myocardial segmentation and T2 quantification are crucial for assessing myocardial pathologies. Existing methods often address these tasks separately, limiting their synergistic potential. To address this, we propose SQNet, a dual-task network integrating Transformer and Convolutional Neural Network (CNN) components. SQNet features… ▽ More

    Submitted 29 May, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

    Comments: 10 pages, 8 figures, 6 tables

  13. arXiv:2405.10422  [pdf, other

    cs.NI

    A First Look at Immersive Telepresence on Apple Vision Pro

    Authors: Ruizhi Cheng, Nan Wu, Matteo Varvello, Eugene Chai, Songqing Chen, Bo Han

    Abstract: Due to the widespread adoption of "work-from-home" policies, videoconferencing applications (e.g., Zoom) have become indispensable for remote communication. However, these systems lack immersiveness, leading to the so-called "Zoom fatigue" and degrading communication efficiency. The recent debut of Apple Vision Pro, a mixed reality headset that supports "spatial persona", aims to offer an immersiv… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

  14. arXiv:2404.17805  [pdf, other

    cs.LG cs.CV

    From Optimization to Generalization: Fair Federated Learning against Quality Shift via Inter-Client Sharpness Matching

    Authors: Nannan Wu, Zhuo Kuang, Zengqiang Yan, Li Yu

    Abstract: Due to escalating privacy concerns, federated learning has been recognized as a vital approach for training deep neural networks with decentralized medical data. In practice, it is challenging to ensure consistent imaging quality across various institutions, often attributed to equipment malfunctions affecting a minority of clients. This imbalance in image quality can cause the federated model to… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Comments: This paper is accepted at IJCAI'24 (Main Track)

  15. arXiv:2404.07389  [pdf, other

    cs.CV

    Object-Conditioned Energy-Based Attention Map Alignment in Text-to-Image Diffusion Models

    Authors: Yasi Zhang, Peiyu Yu, Ying Nian Wu

    Abstract: Text-to-image diffusion models have shown great success in generating high-quality text-guided images. Yet, these models may still fail to semantically align generated images with the provided text prompts, leading to problems like incorrect attribute binding and/or catastrophic object neglect. Given the pervasive object-oriented structure underlying text prompts, we introduce a novel object-condi… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  16. arXiv:2403.17448  [pdf, other

    cs.RO

    Adaptive Line-Of-Sight guidance law based on vector fields path following for underactuated unmanned surface vehicle

    Authors: Jie Qi, Ronghua Wanga, Nailong Wu

    Abstract: The focus of this paper is to develop a methodology that enables an unmanned surface vehicle (USV) to efficiently track a planned path. The introduction of a vector field-based adaptive line of-sight guidance law (VFALOS) for accurate trajectory tracking and minimizing the overshoot response time during USV tracking of curved paths improves the overall line-of-sight (LOS) guidance method. These im… ▽ More

    Submitted 5 April, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

  17. arXiv:2403.11552  [pdf, other

    cs.RO cs.AI

    LLM3:Large Language Model-based Task and Motion Planning with Motion Failure Reasoning

    Authors: Shu Wang, Muzhi Han, Ziyuan Jiao, Zeyu Zhang, Ying Nian Wu, Song-Chun Zhu, Hangxin Liu

    Abstract: Conventional Task and Motion Planning (TAMP) approaches rely on manually crafted interfaces connecting symbolic task planning with continuous motion generation. These domain-specific and labor-intensive modules are limited in addressing emerging tasks in real-world settings. Here, we present LLM^3, a novel Large Language Model (LLM)-based TAMP framework featuring a domain-independent interface. Sp… ▽ More

    Submitted 20 March, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: Submitted to IROS 2024. Codes available: https://github.com/AssassinWS/LLM-TAMP

  18. arXiv:2403.00782  [pdf, other

    q-fin.ST cs.AI cs.CL

    Ploutos: Towards interpretable stock movement prediction with financial large language model

    Authors: Hanshuang Tong, Jun Li, Ning Wu, Ming Gong, Dongmei Zhang, Qi Zhang

    Abstract: Recent advancements in large language models (LLMs) have opened new pathways for many domains. However, the full potential of LLMs in financial investments remains largely untapped. There are two main challenges for typical deep learning-based methods for quantitative finance. First, they struggle to fuse textual and numerical information flexibly for stock movement prediction. Second, traditional… ▽ More

    Submitted 18 February, 2024; originally announced March 2024.

    Comments: 8 pages, 4 figures

  19. arXiv:2402.18507  [pdf, other

    cs.CV

    Multimodal Learning To Improve Cardiac Late Mechanical Activation Detection From Cine MR Images

    Authors: Jiarui Xing, Nian Wu, Kenneth Bilchick, Frederick Epstein, Miaomiao Zhang

    Abstract: This paper presents a multimodal deep learning framework that utilizes advanced image techniques to improve the performance of clinical analysis heavily dependent on routinely acquired standard images. More specifically, we develop a joint learning network that for the first time leverages the accuracy and reproducibility of myocardial strains obtained from Displacement Encoding with Stimulated Ec… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

  20. arXiv:2402.17179  [pdf, other

    cs.LG q-bio.BM

    Dual-Space Optimization: Improved Molecule Sequence Design by Latent Prompt Transformer

    Authors: Deqian Kong, Yuhao Huang, Jianwen Xie, Edouardo Honig, Ming Xu, Shuanghong Xue, Pei Lin, San** Zhou, Sheng Zhong, Nanning Zheng, Ying Nian Wu

    Abstract: Designing molecules with desirable properties, such as drug-likeliness and high binding affinities towards protein targets, is a challenging problem. In this paper, we propose the Dual-Space Optimization (DSO) method that integrates latent space sampling and data space selection to solve this problem. DSO iteratively updates a latent space generative model and a synthetic dataset in an optimizatio… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  21. arXiv:2402.15939  [pdf

    eess.IV cs.LG

    Deep Separable Spatiotemporal Learning for Fast Dynamic Cardiac MRI

    Authors: Zi Wang, Min Xiao, Yirong Zhou, Chengyan Wang, Naiming Wu, Yi Li, Yiwen Gong, Shufu Chang, Yinyin Chen, Liuhong Zhu, Jianjun Zhou, Congbo Cai, He Wang, Di Guo, Guang Yang, Xiaobo Qu

    Abstract: Dynamic magnetic resonance imaging (MRI) plays an indispensable role in cardiac diagnosis. To enable fast imaging, the k-space data can be undersampled but the image reconstruction poses a great challenge of high-dimensional processing. This challenge leads to necessitate extensive training data in many deep learning reconstruction methods. This work proposes a novel and efficient approach, levera… ▽ More

    Submitted 24 February, 2024; originally announced February 2024.

    Comments: 10 pages, 11 figures, 3 tables

  22. arXiv:2402.13598  [pdf, other

    cs.CL cs.AI cs.LG

    User-LLM: Efficient LLM Contextualization with User Embeddings

    Authors: Lin Ning, Luyang Liu, Jiaxing Wu, Neo Wu, Devora Berlowitz, Sushant Prakash, Bradley Green, Shawn O'Banion, Jun Xie

    Abstract: Large language models (LLMs) have revolutionized natural language processing. However, effectively incorporating complex and potentially noisy user interaction data remains a challenge. To address this, we propose User-LLM, a novel framework that leverages user embeddings to contextualize LLMs. These embeddings, distilled from diverse user interactions using self-supervised pretraining, capture la… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

  23. arXiv:2402.08075  [pdf, other

    q-bio.GN cs.AI cs.LG

    Efficient and Scalable Fine-Tune of Language Models for Genome Understanding

    Authors: Huixin Zhan, Ying Nian Wu, Zijun Zhang

    Abstract: Although DNA foundation models have advanced the understanding of genomes, they still face significant challenges in the limited scale and diversity of genomic data. This limitation starkly contrasts with the success of natural language foundation models, which thrive on substantially larger scales. Furthermore, genome understanding involves numerous downstream genome annotation tasks with inheren… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

  24. arXiv:2402.04647  [pdf, other

    cs.LG

    Latent Plan Transformer: Planning as Latent Variable Inference

    Authors: Deqian Kong, Dehong Xu, Minglu Zhao, Bo Pang, Jianwen Xie, Andrew Lizarraga, Yuhao Huang, Sirui Xie, Ying Nian Wu

    Abstract: In tasks aiming for long-term returns, planning becomes essential. We study generative modeling for planning with datasets repurposed from offline reinforcement learning. Specifically, we identify temporal consistency in the absence of step-wise rewards as one key technical challenge. We introduce the Latent Plan Transformer (LPT), a novel model that leverages a latent space to connect a Transform… ▽ More

    Submitted 28 May, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

  25. arXiv:2402.03103  [pdf, ps, other

    cs.PL cs.LO math.CT

    Scoped Effects as Parameterized Algebraic Theories

    Authors: Cristina Matache, Sam Lindley, Sean Moss, Sam Staton, Nicolas Wu, Zhixuan Yang

    Abstract: Notions of computation can be modelled by monads. Algebraic effects offer a characterization of monads in terms of algebraic operations and equational axioms, where operations are basic programming features, such as reading or updating the state, and axioms specify observably equivalent expressions. However, many useful programming features depend on additional mechanisms such as delimited scopes… ▽ More

    Submitted 20 May, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

    Comments: Extended version of the ESOP 2024 paper with the same title

  26. arXiv:2401.09742  [pdf, other

    cs.CV

    Image Translation as Diffusion Visual Programmers

    Authors: Cheng Han, James C. Liang, Qifan Wang, Majid Rabbani, Sohail Dianat, Raghuveer Rao, Ying Nian Wu, Dongfang Liu

    Abstract: We introduce the novel Diffusion Visual Programmer (DVP), a neuro-symbolic image translation framework. Our proposed DVP seamlessly embeds a condition-flexible diffusion model within the GPT architecture, orchestrating a coherent sequence of visual programs (i.e., computer vision models) for various pro-symbolic steps, which span RoI identification, style transfer, and position manipulation, facil… ▽ More

    Submitted 30 January, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

    Comments: 25 pages, 20 figures

  27. arXiv:2312.17016  [pdf, other

    cs.CV cs.AI

    On the Promises and Challenges of Multimodal Foundation Models for Geographical, Environmental, Agricultural, and Urban Planning Applications

    Authors: Chenjiao Tan, Qian Cao, Yiwei Li, Jielu Zhang, Xiao Yang, Huaqin Zhao, Zihao Wu, Zhengliang Liu, Hao Yang, Nemin Wu, Tao Tang, Xinyue Ye, Lilong Chai, Ninghao Liu, Changying Li, Lan Mu, Tianming Liu, Gengchen Mai

    Abstract: The advent of large language models (LLMs) has heightened interest in their potential for multimodal applications that integrate language and vision. This paper explores the capabilities of GPT-4V in the realms of geography, environmental science, agriculture, and urban planning by evaluating its performance across a variety of tasks. Data sources comprise satellite imagery, aerial photos, ground-… ▽ More

    Submitted 23 December, 2023; originally announced December 2023.

    Comments: 110 Pages; 61 Figures

    ACM Class: I.2.7; I.2.10; I.4.6; I.4.8; J.2

  28. arXiv:2312.12838  [pdf, other

    cs.LG cs.CV

    FedA3I: Annotation Quality-Aware Aggregation for Federated Medical Image Segmentation against Heterogeneous Annotation Noise

    Authors: Nannan Wu, Zhaobin Sun, Zengqiang Yan, Li Yu

    Abstract: Federated learning (FL) has emerged as a promising paradigm for training segmentation models on decentralized medical data, owing to its privacy-preserving property. However, existing research overlooks the prevalent annotation noise encountered in real-world medical datasets, which limits the performance ceilings of FL. In this paper, we, for the first time, identify and tackle this problem. For… ▽ More

    Submitted 18 January, 2024; v1 submitted 20 December, 2023; originally announced December 2023.

    Comments: Accepted at AAAI'24

  29. arXiv:2312.04333  [pdf, other

    cs.CL

    Is Bigger and Deeper Always Better? Probing LLaMA Across Scales and Layers

    Authors: Nuo Chen, Ning Wu, Shining Liang, Ming Gong, Linjun Shou, Dongmei Zhang, Jia Li

    Abstract: This paper presents an in-depth analysis of Large Language Models (LLMs), focusing on LLaMA, a prominent open-source foundational model in natural language processing. Instead of assessing LLaMA through its generative output, we design multiple-choice tasks to probe its intrinsic understanding in high-order tasks such as reasoning and computation. We examine the model horizontally, comparing diffe… ▽ More

    Submitted 9 January, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

    Comments: 15 pages

  30. arXiv:2311.14281  [pdf, ps, other

    cs.CV

    Multi-modal Instance Refinement for Cross-domain Action Recognition

    Authors: Yuan Qing, Naixing Wu, Shaohua Wan, Lixin Duan

    Abstract: Unsupervised cross-domain action recognition aims at adapting the model trained on an existing labeled source domain to a new unlabeled target domain. Most existing methods solve the task by directly aligning the feature distributions of source and target domains. However, this would cause negative transfer during domain adaptation due to some negative training samples in both domains. In the sour… ▽ More

    Submitted 24 November, 2023; originally announced November 2023.

    Comments: Accepted by PRCV 2023

  31. arXiv:2311.06212  [pdf, other

    stat.ML cs.LG stat.AP

    Differentiable VQ-VAE's for Robust White Matter Streamline Encodings

    Authors: Andrew Lizarraga, Brandon Taraku, Edouardo Honig, Ying Nian Wu, Shantanu H. Joshi

    Abstract: Given the complex geometry of white matter streamlines, Autoencoders have been proposed as a dimension-reduction tool to simplify the analysis streamlines in a low-dimensional latent spaces. However, despite these recent successes, the majority of encoder architectures only perform dimension reduction on single streamlines as opposed to a full bundle of streamlines. This is a severe limitation of… ▽ More

    Submitted 18 November, 2023; v1 submitted 10 November, 2023; originally announced November 2023.

    Comments: 5 pages, 4 figures, 1 table

  32. arXiv:2310.20246  [pdf, other

    cs.CL cs.AI

    Breaking Language Barriers in Multilingual Mathematical Reasoning: Insights and Observations

    Authors: Nuo Chen, Zinan Zheng, Ning Wu, Ming Gong, Yangqiu Song, Dongmei Zhang, Jia Li

    Abstract: Existing research predominantly focuses on develo** powerful language learning models (LLMs) for mathematical reasoning within monolingual languages, with few explorations in preserving efficacy in a multilingual context. To bridge this gap, this paper pioneers exploring and training powerful Multilingual Math Reasoning (xMR) LLMs. Firstly, by utilizing translation, we construct the first multil… ▽ More

    Submitted 28 November, 2023; v1 submitted 31 October, 2023; originally announced October 2023.

    Comments: Work in Progress

  33. arXiv:2310.19192  [pdf, other

    q-bio.NC cs.LG stat.ML

    Emergence of Grid-like Representations by Training Recurrent Networks with Conformal Normalization

    Authors: Dehong Xu, Ruiqi Gao, Wen-Hao Zhang, Xue-Xin Wei, Ying Nian Wu

    Abstract: Grid cells in the entorhinal cortex of mammalian brains exhibit striking hexagon grid firing patterns in their response maps as the animal (e.g., a rat) navigates in a 2D open environment. In this paper, we study the emergence of the hexagon grid patterns of grid cells based on a general recurrent neural network (RNN) model that captures the navigation process. The responses of grid cells collecti… ▽ More

    Submitted 19 February, 2024; v1 submitted 29 October, 2023; originally announced October 2023.

  34. arXiv:2310.09604  [pdf, other

    cs.LG cs.CV

    Learning Hierarchical Features with Joint Latent Space Energy-Based Prior

    Authors: Jiali Cui, Ying Nian Wu, Tian Han

    Abstract: This paper studies the fundamental problem of multi-layer generator models in learning hierarchical representations. The multi-layer generator model that consists of multiple layers of latent variables organized in a top-down architecture tends to learn multiple levels of data abstraction. However, such multi-layer latent variables are typically parameterized to be Gaussian, which can be less info… ▽ More

    Submitted 14 October, 2023; originally announced October 2023.

  35. arXiv:2310.03325  [pdf, other

    cs.AI cs.CV cs.LG

    Learning Concept-Based Causal Transition and Symbolic Reasoning for Visual Planning

    Authors: Yilue Qian, Peiyu Yu, Ying Nian Wu, Yao Su, Wei Wang, Lifeng Fan

    Abstract: Visual planning simulates how humans make decisions to achieve desired goals in the form of searching for visual causal transitions between an initial visual state and a final visual goal state. It has become increasingly important in egocentric vision with its advantages in guiding agents to perform daily tasks in complex environments. In this paper, we propose an interpretable and generalizable… ▽ More

    Submitted 27 March, 2024; v1 submitted 5 October, 2023; originally announced October 2023.

  36. arXiv:2310.03253  [pdf, other

    cs.LG q-bio.BM stat.ML

    Molecule Design by Latent Prompt Transformer

    Authors: Deqian Kong, Yuhao Huang, Jianwen Xie, Ying Nian Wu

    Abstract: This paper proposes a latent prompt Transformer model for solving challenging optimization problems such as molecule design, where the goal is to find molecules with optimal values of a target chemical or biological property that can be computed by an existing software. Our proposed model consists of three components. (1) A latent vector whose prior distribution is modeled by a Unet transformation… ▽ More

    Submitted 5 February, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

  37. arXiv:2310.03218  [pdf, other

    cs.LG cs.AI stat.ML

    Learning Energy-Based Prior Model with Diffusion-Amortized MCMC

    Authors: Peiyu Yu, Yaxuan Zhu, Sirui Xie, Xiaojian Ma, Ruiqi Gao, Song-Chun Zhu, Ying Nian Wu

    Abstract: Latent space Energy-Based Models (EBMs), also known as energy-based priors, have drawn growing interests in the field of generative modeling due to its flexibility in the formulation and strong modeling power of the latent space. However, the common practice of learning latent space EBMs with non-convergent short-run MCMC for prior and posterior sampling is hindering the model from further progres… ▽ More

    Submitted 4 October, 2023; originally announced October 2023.

    Comments: NeurIPS 2023

  38. Tailors: Accelerating Sparse Tensor Algebra by Overbooking Buffer Capacity

    Authors: Zi Yu Xue, Yannan Nellie Wu, Joel S. Emer, Vivienne Sze

    Abstract: Sparse tensor algebra is a challenging class of workloads to accelerate due to low arithmetic intensity and varying sparsity patterns. Prior sparse tensor algebra accelerators have explored tiling sparse data to increase exploitable data reuse and improve throughput, but typically allocate tile size in a given buffer for the worst-case data occupancy. This severely limits the utilization of availa… ▽ More

    Submitted 26 June, 2024; v1 submitted 29 September, 2023; originally announced October 2023.

    Comments: 17 pages, 13 figures, in MICRO 2023

    Journal ref: 56th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO '23), 2023

  39. RUEL: Retrieval-Augmented User Representation with Edge Browser Logs for Sequential Recommendation

    Authors: Ning Wu, Ming Gong, Linjun Shou, Jian Pei, Daxin Jiang

    Abstract: Online recommender systems (RS) aim to match user needs with the vast amount of resources available on various platforms. A key challenge is to model user preferences accurately under the condition of data sparsity. To address this challenge, some methods have leveraged external user behavior data from multiple platforms to enrich user representation. However, all of these methods require a consis… ▽ More

    Submitted 19 September, 2023; originally announced September 2023.

    Comments: CIKM 2023 ADS

  40. arXiv:2309.09017  [pdf, other

    cs.RO

    Triple Regression for Camera Agnostic Sim2Real Robot Gras** and Manipulation Tasks

    Authors: Yuanhong Zeng, Yizhou Zhao, Ying Nian Wu

    Abstract: Sim2Real (Simulation to Reality) techniques have gained prominence in robotic manipulation and motion planning due to their ability to enhance success rates by enabling agents to test and evaluate various policies and trajectories. In this paper, we investigate the advantages of integrating Sim2Real into robotic frameworks. We introduce the Triple Regression Sim2Real framework, which constructs a… ▽ More

    Submitted 16 September, 2023; originally announced September 2023.

  41. Zero-shot information extraction from radiological reports using ChatGPT

    Authors: Danqing Hu, Bing Liu, Xiaofeng Zhu, Xudong Lu, Nan Wu

    Abstract: Electronic health records contain an enormous amount of valuable information, but many are recorded in free text. Information extraction is the strategy to transform the sequence of characters into structured data, which can be employed for secondary analysis. However, the traditional information extraction components, such as named entity recognition and relation extraction, require annotated dat… ▽ More

    Submitted 6 September, 2023; v1 submitted 4 September, 2023; originally announced September 2023.

  42. arXiv:2308.12228  [pdf, other

    cs.RO

    Electromagnets Under the Table: an Unobtrusive Magnetic Navigation System for Microsurgery

    Authors: Adam Schonewille, Changyan He, Cameron Forbrigger, Nancy Wu, James Drake, Thomas Looi, Eric Diller

    Abstract: Miniature magnetic tools have the potential to enable minimally invasive surgical techniques to be applied to space-restricted surgical procedures in areas such as neurosurgery. However, typical magnetic navigation systems, which create the magnetic fields to drive such tools, either cannot generate large enough fields, or surround the patient in a way that obstructs surgeon access to the patient.… ▽ More

    Submitted 23 August, 2023; originally announced August 2023.

  43. arXiv:2307.11345  [pdf, other

    cs.IT eess.SP

    Sensing Aided Covert Communications: Turning Interference into Allies

    Authors: Xinyi Wang, Zesong Fei, Peng Liu, J. Andrew Zhang, Qingqing Wu, Nan Wu

    Abstract: In this paper, we investigate the realization of covert communication in a general radar-communication cooperation system, which includes integrated sensing and communications as a special example. We explore the possibility of utilizing the sensing ability of radar to track and jam the aerial adversary target attempting to detect the transmission. Based on the echoes from the target, the extended… ▽ More

    Submitted 3 January, 2024; v1 submitted 21 July, 2023; originally announced July 2023.

    Comments: 13 pages, 12 figures, submitted to IEEE journals for potential publication

  44. arXiv:2307.07862  [pdf, other

    cs.RO eess.SY

    Sim2Plan: Robot Motion Planning via Message Passing between Simulation and Reality

    Authors: Yizhou Zhao, Yuanhong Zeng, Qian Long, Ying Nian Wu, Song-Chun Zhu

    Abstract: Simulation-to-real is the task of training and develo** machine learning models and deploying them in real settings with minimal additional training. This approach is becoming increasingly popular in fields such as robotics. However, there is often a gap between the simulated environment and the real world, and machine learning models trained in simulation may not perform as well in the real wor… ▽ More

    Submitted 15 July, 2023; originally announced July 2023.

    Comments: Published as a conference paper at FTC 2023

  45. arXiv:2307.04047  [pdf, other

    cs.CV

    Threshold-Consistent Margin Loss for Open-World Deep Metric Learning

    Authors: Qin Zhang, Linghan Xu, Qingming Tang, Jun Fang, Ying Nian Wu, Joe Tighe, Yifan Xing

    Abstract: Existing losses used in deep metric learning (DML) for image retrieval often lead to highly non-uniform intra-class and inter-class representation structures across test classes and data distributions. When combined with the common practice of using a fixed threshold to declare a match, this gives rise to significant performance variations in terms of false accept rate (FAR) and false reject rate… ▽ More

    Submitted 12 March, 2024; v1 submitted 8 July, 2023; originally announced July 2023.

    Comments: Accepted to ICLR'24

  46. arXiv:2306.14902  [pdf, other

    q-bio.BM cs.LG stat.ML

    Molecule Design by Latent Space Energy-Based Modeling and Gradual Distribution Shifting

    Authors: Deqian Kong, Bo Pang, Tian Han, Ying Nian Wu

    Abstract: Generation of molecules with desired chemical and biological properties such as high drug-likeness, high binding affinity to target proteins, is critical for drug discovery. In this paper, we propose a probabilistic generative model to capture the joint distribution of molecules and their properties. Our model assumes an energy-based model (EBM) in the latent space. Conditional on the latent vecto… ▽ More

    Submitted 8 June, 2023; originally announced June 2023.

    Journal ref: 39th Conference on Uncertainty in Artificial Intelligence 2023

  47. arXiv:2306.06323  [pdf, other

    cs.CV cs.LG

    Learning Joint Latent Space EBM Prior Model for Multi-layer Generator

    Authors: Jiali Cui, Ying Nian Wu, Tian Han

    Abstract: This paper studies the fundamental problem of learning multi-layer generator models. The multi-layer generator model builds multiple layers of latent variables as a prior model on top of the generator, which benefits learning complex data distribution and hierarchical representations. However, such a prior model usually focuses on modeling inter-layer relations between latent variables by assuming… ▽ More

    Submitted 11 October, 2023; v1 submitted 9 June, 2023; originally announced June 2023.

  48. arXiv:2306.01153  [pdf, other

    cs.CL

    Diverse and Faithful Knowledge-Grounded Dialogue Generation via Sequential Posterior Inference

    Authors: Yan Xu, Deqian Kong, Dehong Xu, Ziwei Ji, Bo Pang, Pascale Fung, Ying Nian Wu

    Abstract: The capability to generate responses with diversity and faithfulness using factual knowledge is paramount for creating a human-like, trustworthy dialogue system. Common strategies either adopt a two-step paradigm, which optimizes knowledge selection and response generation separately, and may overlook the inherent correlation between these two tasks, or leverage conditional variational method to j… ▽ More

    Submitted 5 August, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

    Comments: Accepted to ICML 2023

  49. HighLight: Efficient and Flexible DNN Acceleration with Hierarchical Structured Sparsity

    Authors: Yannan Nellie Wu, Po-An Tsai, Saurav Muralidharan, Angshuman Parashar, Vivienne Sze, Joel S. Emer

    Abstract: Due to complex interactions among various deep neural network (DNN) optimization techniques, modern DNNs can have weights and activations that are dense or sparse with diverse sparsity degrees. To offer a good trade-off between accuracy and hardware performance, an ideal DNN accelerator should have high flexibility to efficiently translate DNN sparsity into reductions in energy and/or latency with… ▽ More

    Submitted 1 October, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

    Comments: Accepted to MICRO23

  50. arXiv:2305.12039  [pdf, other

    cs.CV

    Learning for Transductive Threshold Calibration in Open-World Recognition

    Authors: Qin Zhang, Dongsheng An, Tianjun Xiao, Tong He, Qingming Tang, Ying Nian Wu, Joseph Tighe, Yifan Xing, Stefano Soatto

    Abstract: In deep metric learning for visual recognition, the calibration of distance thresholds is crucial for achieving desired model performance in the true positive rates (TPR) or true negative rates (TNR). However, calibrating this threshold presents challenges in open-world scenarios, where the test classes can be entirely disjoint from those encountered during training. We define the problem of findi… ▽ More

    Submitted 22 March, 2024; v1 submitted 19 May, 2023; originally announced May 2023.