Skip to main content

Showing 1–50 of 662 results for author: Feng, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01303  [pdf, other

    cs.RO

    RoDyn-SLAM: Robust Dynamic Dense RGB-D SLAM with Neural Radiance Fields

    Authors: Haochen Jiang, Yueming Xu, Kejie Li, Jianfeng Feng, Li Zhang

    Abstract: Leveraging neural implicit representation to conduct dense RGB-D SLAM has been studied in recent years. However, this approach relies on a static environment assumption and does not work robustly within a dynamic environment due to the inconsistent observation of geometry and photometry. To address the challenges presented in dynamic environments, we propose a novel dynamic SLAM framework with neu… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: IEEE RAL 2024

  2. arXiv:2407.00603  [pdf, other

    cs.CV

    Hierarchical Memory for Long Video QA

    Authors: Yiqin Wang, Haoji Zhang, Yansong Tang, Yong Liu, Jiashi Feng, Jifeng Dai, Xiaojie **

    Abstract: This paper describes our champion solution to the LOVEU Challenge @ CVPR'24, Track 1 (Long Video VQA). Processing long sequences of visual tokens is computationally expensive and memory-intensive, making long video question-answering a challenging task. The key is to compress visual tokens effectively, reducing memory footprint and decoding latency, while preserving the essential information for a… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  3. arXiv:2407.00451  [pdf, other

    cs.RO

    Language-Guided Object-Centric Diffusion Policy for Collision-Aware Robotic Manipulation

    Authors: Hang Li, Qian Feng, Zhi Zheng, Jianxiang Feng, Alois Knoll

    Abstract: Learning from demonstrations faces challenges in generalizing beyond the training data and is fragile even to slight visual variations. To tackle this problem, we introduce Lan-o3dp, a language guided object centric diffusion policy that takes 3d representation of task relevant objects as conditional input and can be guided by cost function for safety constraints at inference time. Lan-o3dp enable… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: 11 pages, 8 figures

  4. arXiv:2406.19501  [pdf, other

    cs.CL cs.LG

    Monitoring Latent World States in Language Models with Propositional Probes

    Authors: Jiahai Feng, Stuart Russell, Jacob Steinhardt

    Abstract: Language models are susceptible to bias, sycophancy, backdoors, and other tendencies that lead to unfaithful responses to the input context. Interpreting internal states of language models could help monitor and correct unfaithful behavior. We hypothesize that language models represent their input contexts in a latent world model, and seek to extract this latent world state from the activations. W… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  5. arXiv:2406.18067  [pdf, other

    cs.CL eess.AS

    Exploring Energy-Based Models for Out-of-Distribution Detection in Dialect Identification

    Authors: Yaqian Hao, Chenguang Hu, Yingying Gao, Shilei Zhang, Junlan Feng

    Abstract: The diverse nature of dialects presents challenges for models trained on specific linguistic patterns, rendering them susceptible to errors when confronted with unseen or out-of-distribution (OOD) data. This study introduces a novel margin-enhanced joint energy model (MEJEM) tailored specifically for OOD detection in dialects. By integrating a generative model and the energy margin loss, our appro… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  6. arXiv:2406.18065  [pdf, other

    eess.AS cs.SD

    On Calibration of Speech Classification Models: Insights from Energy-Based Model Investigations

    Authors: Yaqian Hao, Chenguang Hu, Yingying Gao, Shilei Zhang, Junlan Feng

    Abstract: For speech classification tasks, deep learning models often achieve high accuracy but exhibit shortcomings in calibration, manifesting as classifiers exhibiting overconfidence. The significance of calibration lies in its critical role in guaranteeing the reliability of decision-making within deep learning systems. This study explores the effectiveness of Energy-Based Models in calibrating confiden… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  7. arXiv:2406.16655  [pdf, other

    cs.CL

    Large Language Models Are Cross-Lingual Knowledge-Free Reasoners

    Authors: Peng Hu, Sizhe Liu, Changjiang Gao, Xin Huang, Xue Han, Junlan Feng, Chao Deng, Shujian Huang

    Abstract: Large Language Models have demonstrated impressive reasoning capabilities across multiple languages. However, the relationship between capabilities in different languages is less explored. In this work, we decompose the process of reasoning tasks into two separated parts: knowledge retrieval and knowledge-free reasoning, and analyze the cross-lingual transferability of them. With adapted and const… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  8. arXiv:2406.16066  [pdf, other

    cs.CE

    Constructing Boundary-identical Microstructures by Guided Diffusion for Fast Multiscale Designs

    Authors: **gxuan Feng, Lili Wang, Xiaoya Zhai, Kai Chen, Wenming Wu, Ligang Liu, Xiao-Ming Fu

    Abstract: We propose a novel method to construct large-scale boundary-identical microstructure datasets with high attribute coverage for highly efficient multiscale design. Central to our technique is using a deep generative model to generate microstructures under the two conditions, including the specified boundary and homogenized elastic tensor. We achieve the desired dataset by alternately adding microst… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  9. arXiv:2406.15873  [pdf, other

    physics.comp-ph cs.LG physics.chem-ph

    NeuralSCF: Neural network self-consistent fields for density functional theory

    Authors: Feitong Song, Ji Feng

    Abstract: Kohn-Sham density functional theory (KS-DFT) has found widespread application in accurate electronic structure calculations. However, it can be computationally demanding especially for large-scale simulations, motivating recent efforts toward its machine-learning (ML) acceleration. We propose a neural network self-consistent fields (NeuralSCF) framework that establishes the Kohn-Sham density map a… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: 14 pages, 4 figures

  10. arXiv:2406.15675  [pdf, other

    eess.SY cs.AI cs.SC

    Combining Neural Networks and Symbolic Regression for Analytical Lyapunov Function Discovery

    Authors: Jie Feng, Haohan Zou, Yuanyuan Shi

    Abstract: We propose CoNSAL (Combining Neural networks and Symbolic regression for Analytical Lyapunov function) to construct analytical Lyapunov functions for nonlinear dynamic systems. This framework contains a neural Lyapunov function and a symbolic regression component, where symbolic regression is applied to distill the neural network to precise analytical forms. Our approach utilizes symbolic regressi… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: Workshop paper, accepted by Workshop on Foundations of Reinforcement Learning and Control at the 41st International Conference on Machine Learning, Vienna, Austria

  11. arXiv:2406.14683  [pdf, other

    cs.LG cs.CL

    TAGLAS: An atlas of text-attributed graph datasets in the era of large graph and language models

    Authors: Jiarui Feng, Hao Liu, Lecheng Kong, Yixin Chen, Muhan Zhang

    Abstract: In this report, we present TAGLAS, an atlas of text-attributed graph (TAG) datasets and benchmarks. TAGs are graphs with node and edge features represented in text, which have recently gained wide applicability in training graph-language or graph foundation models. In TAGLAS, we collect and integrate more than 23 TAG datasets with domains ranging from citation graphs to molecule graphs and tasks f… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Preprint

  12. arXiv:2406.14635  [pdf, other

    cs.AI cs.LG

    Harvesting Efficient On-Demand Order Pooling from Skilled Couriers: Enhancing Graph Representation Learning for Refining Real-time Many-to-One Assignments

    Authors: Yile Liang, Jiuxia Zhao, Donghui Li, Jie Feng, Chen Zhang, Xuetao Ding, **ghua Hao, Renqing He

    Abstract: The recent past has witnessed a notable surge in on-demand food delivery (OFD) services, offering delivery fulfillment within dozens of minutes after an order is placed. In OFD, pooling multiple orders for simultaneous delivery in real-time order assignment is a pivotal efficiency source, which may in turn extend delivery time. Constructing high-quality order pooling to harmonize platform efficien… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Accepted in KDD 2024 ADS Track

  13. arXiv:2406.14050  [pdf, other

    cs.CV

    Gaze-directed Vision GNN for Mitigating Shortcut Learning in Medical Image

    Authors: Shaoxuan Wu, Xiao Zhang, Bin Wang, Zhuo **, Hansheng Li, Jun Feng

    Abstract: Deep neural networks have demonstrated remarkable performance in medical image analysis. However, its susceptibility to spurious correlations due to shortcut learning raises concerns about network interpretability and reliability. Furthermore, shortcut learning is exacerbated in medical contexts where disease indicators are often subtle and sparse. In this paper, we propose a novel gaze-directed V… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  14. arXiv:2406.13948  [pdf, other

    cs.AI cs.CL cs.LG

    CityGPT: Empowering Urban Spatial Cognition of Large Language Models

    Authors: Jie Feng, Yuwei Du, Tianhui Liu, Siqi Guo, Yuming Lin, Yong Li

    Abstract: Large language models(LLMs) with powerful language generation and reasoning capabilities have already achieved success in many domains, e.g., math and code generation. However, due to the lacking of physical world's corpus and knowledge during training, they usually fail to solve many real-life tasks in the urban space. In this paper, we propose CityGPT, a systematic framework for enhancing the ca… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  15. arXiv:2406.13945  [pdf, other

    cs.AI cs.CL cs.LG

    CityBench: Evaluating the Capabilities of Large Language Model as World Model

    Authors: Jie Feng, Jun Zhang, Junbo Yan, Xin Zhang, Tianjian Ouyang, Tianhui Liu, Yuwei Du, Siqi Guo, Yong Li

    Abstract: Large language models (LLMs) with powerful generalization ability has been widely used in many domains. A systematic and reliable evaluation of LLMs is a crucial step in their development and applications, especially for specific professional fields. In the urban domain, there have been some early explorations about the usability of LLMs, but a systematic and scalable evaluation benchmark is still… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  16. arXiv:2406.13358  [pdf, other

    cs.CV eess.IV

    Multi-scale Restoration of Missing Data in Optical Time-series Images with Masked Spatial-Temporal Attention Network

    Authors: Zaiyan Zhang, **ing Yan, Yuanqi Liang, Jiaxin Feng, Haixu He, Wei Han

    Abstract: Due to factors such as thick cloud cover and sensor limitations, remote sensing images often suffer from significant missing data, resulting in incomplete time-series information. Existing methods for imputing missing values in remote sensing images do not fully exploit spatio-temporal auxiliary information, leading to limited accuracy in restoration. Therefore, this paper proposes a novel deep le… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  17. arXiv:2406.13268  [pdf, other

    eess.AS cs.SD

    CEC: A Noisy Label Detection Method for Speaker Recognition

    Authors: Yao Shen, Yingying Gao, Yaqian Hao, Chenguang Hu, Fulin Zhang, Junlan Feng, Shilei Zhang

    Abstract: Noisy labels are inevitable, even in well-annotated datasets. The detection of noisy labels is of significant importance to enhance the robustness of speaker recognition models. In this paper, we propose a novel noisy label detection approach based on two new statistical metrics: Continuous Inconsistent Counting (CIC) and Total Inconsistent Counting (TIC). These metrics are calculated through Cros… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: interspeech 2024

  18. arXiv:2406.11147  [pdf, other

    cs.SE cs.AI

    Vul-RAG: Enhancing LLM-based Vulnerability Detection via Knowledge-level RAG

    Authors: Xueying Du, Geng Zheng, Kaixin Wang, Jiayi Feng, Wentai Deng, Mingwei Liu, Bihuan Chen, Xin Peng, Tao Ma, Yiling Lou

    Abstract: Vulnerability detection is essential for software quality assurance. In recent years, deep learning models (especially large language models) have shown promise in vulnerability detection. In this work, we propose a novel LLM-based vulnerability detection technique Vul-RAG, which leverages knowledge-level retrieval-augmented generation (RAG) framework to detect vulnerability for the given code in… ▽ More

    Submitted 19 June, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

  19. arXiv:2406.09444  [pdf, other

    eess.AS cs.CL cs.SD

    GenDistiller: Distilling Pre-trained Language Models based on an Autoregressive Generative Model

    Authors: Yingying Gao, Shilei Zhang, Chao Deng, Junlan Feng

    Abstract: Pre-trained speech language models such as HuBERT and WavLM leverage unlabeled speech data for self-supervised learning and offer powerful representations for numerous downstream tasks. Despite the success of these models, their high requirements for memory and computing resource hinder their application on resource restricted devices. Therefore, this paper introduces GenDistiller, a novel knowled… ▽ More

    Submitted 21 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

    Comments: arXiv admin note: text overlap with arXiv:2310.13418

  20. arXiv:2406.09414  [pdf, other

    cs.CV

    Depth Anything V2

    Authors: Lihe Yang, Bingyi Kang, Zilong Huang, Zhen Zhao, Xiaogang Xu, Jiashi Feng, Hengshuang Zhao

    Abstract: This work presents Depth Anything V2. Without pursuing fancy techniques, we aim to reveal crucial findings to pave the way towards building a powerful monocular depth estimation model. Notably, compared with V1, this version produces much finer and more robust depth predictions through three key practices: 1) replacing all labeled real images with synthetic images, 2) scaling up the capacity of ou… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Project page: https://depth-anything-v2.github.io

  21. arXiv:2406.08085  [pdf, other

    cs.CV

    Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams

    Authors: Haoji Zhang, Yiqin Wang, Yansong Tang, Yong Liu, Jiashi Feng, Jifeng Dai, Xiaojie **

    Abstract: Benefiting from the advancements in large language models and cross-modal alignment, existing multi-modal video understanding methods have achieved prominent performance in offline scenario. However, online video streams, as one of the most common media forms in the real world, have seldom received attention. Compared to offline videos, the 'dynamic' nature of online video streams poses challenges… ▽ More

    Submitted 30 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

  22. arXiv:2406.07949  [pdf, other

    cs.CV

    Multi-Teacher Multi-Objective Meta-Learning for Zero-Shot Hyperspectral Band Selection

    Authors: Jie Feng, Xiaojian Zhong, Di Li, Weisheng Dong, Ronghua Shang, Licheng Jiao

    Abstract: Band selection plays a crucial role in hyperspectral image classification by removing redundant and noisy bands and retaining discriminative ones. However, most existing deep learning-based methods are aimed at dealing with a specific band selection dataset, and need to retrain parameters for new datasets, which significantly limits their generalizability.To address this issue, a novel multi-teach… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  23. arXiv:2406.07801  [pdf, other

    cs.CL cs.SD eess.AS

    PolySpeech: Exploring Unified Multitask Speech Models for Competitiveness with Single-task Models

    Authors: Runyan Yang, Huibao Yang, Xiqing Zhang, Tiantian Ye, Ying Liu, Yingying Gao, Shilei Zhang, Chao Deng, Junlan Feng

    Abstract: Recently, there have been attempts to integrate various speech processing tasks into a unified model. However, few previous works directly demonstrated that joint optimization of diverse tasks in multitask speech models has positive influence on the performance of individual tasks. In this paper we present a multitask speech model -- PolySpeech, which supports speech recognition, speech synthesis,… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: 5 pages, 2 figures

  24. arXiv:2406.03344  [pdf, other

    cs.SD cs.AI eess.AS

    Audio Mamba: Bidirectional State Space Model for Audio Representation Learning

    Authors: Mehmet Hamza Erol, Arda Senocak, Jiu Feng, Joon Son Chung

    Abstract: Transformers have rapidly become the preferred choice for audio classification, surpassing methods based on CNNs. However, Audio Spectrogram Transformers (ASTs) exhibit quadratic scaling due to self-attention. The removal of this quadratic self-attention cost presents an appealing direction. Recently, state space models (SSMs), such as Mamba, have demonstrated potential in language and vision task… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Code is available at https://github.com/mhamzaerol/Audio-Mamba-AuM

  25. arXiv:2406.02804  [pdf, other

    cs.AI cs.CL cs.LG

    $\texttt{ACCORD}$: Closing the Commonsense Measurability Gap

    Authors: François Roewer-Després, **yue Feng, Zining Zhu, Frank Rudzicz

    Abstract: We present $\texttt{ACCORD}$, a framework and benchmark suite for disentangling the commonsense grounding and reasoning abilities of large language models (LLMs) through controlled, multi-hop counterfactuals. $\texttt{ACCORD}$ introduces formal elements to commonsense reasoning to explicitly control and quantify reasoning complexity beyond the typical 1 or 2 hops. Uniquely, $\texttt{ACCORD}$ can a… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: For leaderboard and dataset download, see https://www.codabench.org/competitions/3160/ For source code, see https://github.com/francois-rd/accord/

    ACM Class: I.2.0; I.2.7

  26. arXiv:2406.00430  [pdf, other

    cs.RO cs.AI

    Evaluating Uncertainty-based Failure Detection for Closed-Loop LLM Planners

    Authors: Zhi Zheng, Qian Feng, Hang Li, Alois Knoll, Jianxiang Feng

    Abstract: Recently, Large Language Models (LLMs) have witnessed remarkable performance as zero-shot task planners for robotic manipulation tasks. However, the open-loop nature of previous works makes LLM-based planning error-prone and fragile. On the other hand, failure detection approaches for closed-loop planning are often limited by task-specific heuristics or following an unrealistic assumption that the… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: Accepted at ICRA 2024 Workshop on Back to the Future: Robot Learning Going Probabilistic. Website: https://sites.google.com/view/konwloop/home

  27. arXiv:2406.00121  [pdf, other

    cs.CV

    Empowering Visual Creativity: A Vision-Language Assistant to Image Editing Recommendations

    Authors: Tiancheng Shen, Jun Hao Liew, Long Mai, Lu Qi, Jiashi Feng, Jiaya Jia

    Abstract: Advances in text-based image generation and editing have revolutionized content creation, enabling users to create impressive content from imaginative text prompts. However, existing methods are not designed to work well with the oversimplified prompts that are often encountered in typical scenarios when users start their editing with only vague or abstract purposes in mind. Those scenarios demand… ▽ More

    Submitted 31 May, 2024; originally announced June 2024.

  28. arXiv:2406.00036  [pdf, other

    cs.CL cs.AI cs.LG

    EMERGE: Integrating RAG for Improved Multimodal EHR Predictive Modeling

    Authors: Yinghao Zhu, Changyu Ren, Zixiang Wang, Xiaochen Zheng, Shiyun Xie, Junlan Feng, Xi Zhu, Zhoujun Li, Liantao Ma, Chengwei Pan

    Abstract: The integration of multimodal Electronic Health Records (EHR) data has notably advanced clinical predictive capabilities. However, current models that utilize clinical notes and multivariate time-series EHR data often lack the necessary medical context for precise clinical tasks. Previous methods using knowledge graphs (KGs) primarily focus on structured knowledge extraction. To address this, we p… ▽ More

    Submitted 27 May, 2024; originally announced June 2024.

    Comments: arXiv admin note: text overlap with arXiv:2402.07016

  29. arXiv:2405.19119  [pdf, other

    cs.LG

    Can Graph Learning Improve Task Planning?

    Authors: Xixi Wu, Yifei Shen, Caihua Shan, Kaitao Song, Siwei Wang, Bohang Zhang, Jiarui Feng, Hong Cheng, Wei Chen, Yun Xiong, Dongsheng Li

    Abstract: Task planning is emerging as an important research topic alongside the development of large language models (LLMs). It aims to break down complex user requests into solvable sub-tasks, thereby fulfilling the original requests. In this context, the sub-tasks can be naturally viewed as a graph, where the nodes represent the sub-tasks, and the edges denote the dependencies among them. Consequently, t… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  30. arXiv:2405.18428  [pdf, other

    cs.CV cs.AI

    DiG: Scalable and Efficient Diffusion Models with Gated Linear Attention

    Authors: Lianghui Zhu, Zilong Huang, Bencheng Liao, Jun Hao Liew, Hanshu Yan, Jiashi Feng, Xinggang Wang

    Abstract: Diffusion models with large-scale pre-training have achieved significant success in the field of visual content generation, particularly exemplified by Diffusion Transformers (DiT). However, DiT models have faced challenges with scalability and quadratic complexity efficiency. In this paper, we aim to leverage the long sequence modeling capability of Gated Linear Attention (GLA) Transformers, expa… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: Code is released at https://github.com/hustvl/DiG

  31. arXiv:2405.15239  [pdf, other

    cs.CV

    Automating the Diagnosis of Human Vision Disorders by Cross-modal 3D Generation

    Authors: Li Zhang, Yuankun Yang, Ziyang Xie, Zhiyuan Yuan, Jianfeng Feng, Xiatian Zhu, Yu-Gang Jiang

    Abstract: Understanding the hidden mechanisms behind human's visual perception is a fundamental quest in neuroscience, underpins a wide variety of critical applications, e.g. clinical diagnosis. To that end, investigating into the neural responses of human mind activities, such as functional Magnetic Resonance Imaging (fMRI), has been a significant research vehicle. However, analyzing fMRI signals is challe… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: 25 pages, 16 figures, project page: https://brain-3d.github.io/

  32. arXiv:2405.13816  [pdf, other

    cs.CL

    Getting More from Less: Large Language Models are Good Spontaneous Multilingual Learners

    Authors: Shimao Zhang, Changjiang Gao, Wenhao Zhu, Jiajun Chen, Xin Huang, Xue Han, Junlan Feng, Chao Deng, Shujian Huang

    Abstract: Recently, Large Language Models (LLMs) have shown impressive language capabilities. While most of the existing LLMs have very unbalanced performance across different languages, multilingual alignment based on translation parallel data is an effective method to enhance the LLMs' multilingual capabilities. In this work, we discover and comprehensively investigate the spontaneous multilingual alignme… ▽ More

    Submitted 18 June, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

  33. arXiv:2405.13722  [pdf, other

    cs.CV

    InstaDrag: Lightning Fast and Accurate Drag-based Image Editing Emerging from Videos

    Authors: Yujun Shi, Jun Hao Liew, Hanshu Yan, Vincent Y. F. Tan, Jiashi Feng

    Abstract: Accuracy and speed are critical in image editing tasks. Pan et al. introduced a drag-based image editing framework that achieves pixel-level control using Generative Adversarial Networks (GANs). A flurry of subsequent studies enhanced this framework's generality by leveraging large-scale diffusion models. However, these methods often suffer from inordinately long processing times (exceeding 1 minu… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: Project page: https://instadrag.github.io/

  34. arXiv:2405.13084  [pdf, other

    cs.CL cs.AI

    The 2nd FutureDial Challenge: Dialog Systems with Retrieval Augmented Generation (FutureDial-RAG)

    Authors: Yucheng Cai, Si Chen, Yi Huang, Junlan Feng, Zhijian Ou

    Abstract: The 2nd FutureDial Challenge: Dialog Systems with Retrieval Augmented Generation (FutureDial-RAG), Co-located with SLT 2024

    Submitted 21 May, 2024; originally announced May 2024.

  35. arXiv:2405.12661  [pdf, other

    cs.CV

    EmoEdit: Evoking Emotions through Image Manipulation

    Authors: **gyuan Yang, Jiawei Feng, Weibin Luo, Dani Lischinski, Daniel Cohen-Or, Hui Huang

    Abstract: Affective Image Manipulation (AIM) seeks to modify user-provided images to evoke specific emotional responses. This task is inherently complex due to its twofold objective: significantly evoking the intended emotion, while preserving the original image composition. Existing AIM methods primarily adjust color and style, often failing to elicit precise and profound emotional shifts. Drawing on psych… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  36. arXiv:2405.10818  [pdf

    cs.SI

    Modeling Supply Chain Interaction and Disruption: Insights from Real-world Data and Complex Adaptive System

    Authors: Jiawei Feng, Mengsi Cai, Fangze Dai, Tianci Bu, Xiaoyu Zhang, Huijun Zheng, Xin Lu

    Abstract: In the rapidly evolving automotive industry, Systems-on-Chips (SoCs) are playing an increasingly crucial role in enhancing vehicle intelligence, connectivity, and safety features. For enterprises whose business encompasses automotive SoCs, the sustained and stable provision and receipt of SoC relevant goods or services are essential. Considering the imperative for a resilient and adaptable supply… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: arXiv admin note: text overlap with arXiv:2304.10428 by other authors

  37. arXiv:2405.07510  [pdf, other

    cs.LG

    PeRFlow: Piecewise Rectified Flow as Universal Plug-and-Play Accelerator

    Authors: Hanshu Yan, Xingchao Liu, Jiachun Pan, Jun Hao Liew, Qiang Liu, Jiashi Feng

    Abstract: We present Piecewise Rectified Flow (PeRFlow), a flow-based method for accelerating diffusion models. PeRFlow divides the sampling process of generative flows into several time windows and straightens the trajectories in each interval via the reflow operation, thereby approaching piecewise linear flows. PeRFlow achieves superior performance in a few-step generation. Moreover, through dedicated par… ▽ More

    Submitted 29 May, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

  38. arXiv:2405.05925  [pdf, other

    cs.LG cs.AI physics.ao-ph

    FuXi-ENS: A machine learning model for medium-range ensemble weather forecasting

    Authors: Xiaohui Zhong, Lei Chen, Hao Li, Jie Feng, Bo Lu

    Abstract: Ensemble weather forecasting is essential for weather predictions and mitigating the impacts of extreme weather events. Constructing an ensemble prediction system (EPS) based on conventional numerical weather prediction (NWP) models is highly computationally expensive. Machine learning (ML) models have emerged as valuable tools for deterministic weather forecasts, providing forecasts with signific… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  39. arXiv:2405.05526  [pdf, other

    cs.RO

    Benchmarking Neural Radiance Fields for Autonomous Robots: An Overview

    Authors: Yuhang Ming, Xingrui Yang, Weihan Wang, Zheng Chen, **glun Feng, Yifan Xing, Guofeng Zhang

    Abstract: Neural Radiance Fields (NeRF) have emerged as a powerful paradigm for 3D scene representation, offering high-fidelity renderings and reconstructions from a set of sparse and unstructured sensor data. In the context of autonomous robotics, where perception and understanding of the environment are pivotal, NeRF holds immense promise for improving performance. In this paper, we present a comprehensiv… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: 32 pages, 5 figures, 8 tables

  40. arXiv:2405.03959  [pdf, other

    cs.CV

    Joint Identity Verification and Pose Alignment for Partial Fingerprints

    Authors: Xiongjun Guan, Zhiyu Pan, Jianjiang Feng, Jie Zhou

    Abstract: Currently, portable electronic devices are becoming more and more popular. For lightweight considerations, their fingerprint recognition modules usually use limited-size sensors. However, partial fingerprints have few matchable features, especially when there are differences in finger pressing posture or image quality, which makes partial fingerprint verification challenging. Most existing methods… ▽ More

    Submitted 21 May, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

  41. arXiv:2405.01434  [pdf, other

    cs.CV

    StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation

    Authors: Yupeng Zhou, Daquan Zhou, Ming-Ming Cheng, Jiashi Feng, Qibin Hou

    Abstract: For recent diffusion-based generative models, maintaining consistent content across a series of generated images, especially those containing subjects and complex details, presents a significant challenge. In this paper, we propose a new way of self-attention calculation, termed Consistent Self-Attention, that significantly boosts the consistency between the generated images and augments prevalent… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  42. arXiv:2405.01199  [pdf, other

    cs.CV

    Latent Fingerprint Matching via Dense Minutia Descriptor

    Authors: Zhiyu Pan, Yongjie Duan, Xiongjun Guan, Jianjiang Feng, Jie Zhou

    Abstract: Latent fingerprint matching is a daunting task, primarily due to the poor quality of latent fingerprints. In this study, we propose a deep-learning based dense minutia descriptor (DMD) for latent fingerprint matching. A DMD is obtained by extracting the fingerprint patch aligned by its central minutia, capturing detailed minutia information and texture information. Our dense descriptor takes the f… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: 10 pages, 6 figures

  43. arXiv:2405.01112  [pdf, other

    cs.CV

    Sports Analysis and VR Viewing System Based on Player Tracking and Pose Estimation with Multimodal and Multiview Sensors

    Authors: Wenxuan Guo, Zhiyu Pan, Ziheng Xi, Alapati Tuerxun, Jianjiang Feng, Jie Zhou

    Abstract: Sports analysis and viewing play a pivotal role in the current sports domain, offering significant value not only to coaches and athletes but also to fans and the media. In recent years, the rapid development of virtual reality (VR) and augmented reality (AR) technologies have introduced a new platform for watching games. Visualization of sports competitions in VR/AR represents a revolutionary tec… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: arXiv admin note: text overlap with arXiv:2312.06409

  44. arXiv:2405.00181  [pdf, other

    cs.CV cs.AI

    Uncovering What, Why and How: A Comprehensive Benchmark for Causation Understanding of Video Anomaly

    Authors: Hang Du, Sicheng Zhang, Binzhu Xie, Guoshun Nan, Jiayang Zhang, Junrui Xu, Hangyu Liu, Sicong Leng, Jiangming Liu, Hehe Fan, Dajiu Huang, **g Feng, Linli Chen, Can Zhang, Xuhuan Li, Hao Zhang, Jianhang Chen, Qimei Cui, Xiaofeng Tao

    Abstract: Video anomaly understanding (VAU) aims to automatically comprehend unusual occurrences in videos, thereby enabling various applications such as traffic surveillance and industrial manufacturing. While existing VAU benchmarks primarily concentrate on anomaly detection and localization, our focus is on more practicality, prompting us to raise the following crucial questions: "what anomaly occurred?"… ▽ More

    Submitted 6 May, 2024; v1 submitted 30 April, 2024; originally announced May 2024.

    Comments: Accepted in CVPR2024, Codebase: https://github.com/fesvhtr/CUVA

  45. arXiv:2404.19334  [pdf, other

    cs.CV

    Multi-Scale Heterogeneity-Aware Hypergraph Representation for Histopathology Whole Slide Images

    Authors: Minghao Han, Xukun Zhang, Dingkang Yang, Tao Liu, Haopeng Kuang, **ghui Feng, Lihua Zhang

    Abstract: Survival prediction is a complex ordinal regression task that aims to predict the survival coefficient ranking among a cohort of patients, typically achieved by analyzing patients' whole slide images. Existing deep learning approaches mainly adopt multiple instance learning or graph neural networks under weak supervision. Most of them are unable to uncover the diverse interactions between differen… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: 9 pages, 6 figures, accepted by ICME2024

  46. Regression of Dense Distortion Field from a Single Fingerprint Image

    Authors: Xiongjun Guan, Yongjie Duan, Jianjiang Feng, Jie Zhou

    Abstract: Skin distortion is a long standing challenge in fingerprint matching, which causes false non-matches. Previous studies have shown that the recognition rate can be improved by estimating the distortion field from a distorted fingerprint and then rectifying it into a normal fingerprint. However, existing rectification methods are based on principal component representation of distortion fields, whic… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: arXiv admin note: text overlap with arXiv:2404.17148

    Journal ref: IEEE Transactions on Information Forensics and Security, vol. 18, pp. 4377-4390, 2023

  47. arXiv:2404.17343  [pdf, other

    cs.CL cs.FL

    A Bionic Natural Language Parser Equivalent to a Pushdown Automaton

    Authors: Zhenghao Wei, Kehua Lin, Jianlin Feng

    Abstract: Assembly Calculus (AC), proposed by Papadimitriou et al., aims to reproduce advanced cognitive functions through simulating neural activities, with several applications based on AC having been developed, including a natural language parser proposed by Mitropolsky et al. However, this parser lacks the ability to handle Kleene closures, preventing it from parsing all regular languages and rendering… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: to be published in IJCNN 2024

  48. arXiv:2404.17159  [pdf, other

    cs.CV

    Phase-aggregated Dual-branch Network for Efficient Fingerprint Dense Registration

    Authors: Xiongjun Guan, Jianjiang Feng, Jie Zhou

    Abstract: Fingerprint dense registration aims to finely align fingerprint pairs at the pixel level, thereby reducing intra-class differences caused by distortion. Unfortunately, traditional methods exhibited subpar performance when dealing with low-quality fingerprints while suffering from slow inference speed. Although deep learning based approaches shows significant improvement in these aspects, their reg… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  49. Pose-Specific 3D Fingerprint Unfolding

    Authors: Xiongjun Guan, Jianjiang Feng, Jie Zhou

    Abstract: In order to make 3D fingerprints compatible with traditional 2D flat fingerprints, a common practice is to unfold the 3D fingerprint into a 2D rolled fingerprint, which is then matched with the flat fingerprints by traditional 2D fingerprint recognition algorithms. The problem with this method is that there may be large elastic deformation between the unfolded rolled fingerprint and flat fingerpri… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Journal ref: 15th Chinese Conference on Biometric Recognition (CCBR), Shanghai, China, 2021, pp. 185-194

  50. Direct Regression of Distortion Field from a Single Fingerprint Image

    Authors: Xiongjun Guan, Yongjie Duan, Jianjiang Feng, Jie Zhou

    Abstract: Skin distortion is a long standing challenge in fingerprint matching, which causes false non-matches. Previous studies have shown that the recognition rate can be improved by estimating the distortion field from a distorted fingerprint and then rectifying it into a normal fingerprint. However, existing rectification methods are based on principal component representation of distortion fields, whic… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Journal ref: 2022 IEEE International Joint Conference on Biometrics (IJCB), Abu Dhabi, United Arab Emirates, 2022, pp. 1-8