Skip to main content

Showing 1–50 of 248 results for author: Fang, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.17458  [pdf, other

    cs.CV

    Continuous Urban Change Detection from Satellite Image Time Series with Temporal Feature Refinement and Multi-Task Integration

    Authors: Sebastian Hafner, Heng Fang, Hossein Azizpour, Yifang Ban

    Abstract: Urbanization advances at unprecedented rates, resulting in negative effects on the environment and human well-being. Remote sensing has the potential to mitigate these effects by supporting sustainable development strategies with accurate information on urban growth. Deep learning-based methods have achieved promising urban change detection results from optical satellite image pairs using convolut… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Submitted to IEEE Transactions on Geoscience and Remote Sensing, Code will be available at https://github.com/SebastianHafner/ContUrbanCD.git

  2. arXiv:2406.17005  [pdf, other

    cs.CV

    PVUW 2024 Challenge on Complex Video Understanding: Methods and Results

    Authors: Henghui Ding, Chang Liu, Yunchao Wei, Nikhila Ravi, Shuting He, Song Bai, Philip Torr, Deshui Miao, Xin Li, Zhenyu He, Yaowei Wang, Ming-Hsuan Yang, Zhensong Xu, Jiangtao Yao, Cheng**g Wu, Ting Liu, Luoqi Liu, Xinyu Liu, **g Zhang, Kexin Zhang, Yuting Yang, Licheng Jiao, Shuyuan Yang, Mingqi Gao, **gnan Luo , et al. (12 additional authors not shown)

    Abstract: Pixel-level Video Understanding in the Wild Challenge (PVUW) focus on complex video understanding. In this CVPR 2024 workshop, we add two new tracks, Complex Video Object Segmentation Track based on MOSE dataset and Motion Expression guided Video Segmentation track based on MeViS dataset. In the two new tracks, we provide additional videos and annotations that feature challenging elements, such as… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: MOSE Challenge: https://henghuiding.github.io/MOSE/ChallengeCVPR2024, MeViS Challenge: https://henghuiding.github.io/MeViS/ChallengeCVPR2024

  3. arXiv:2406.11142  [pdf, other

    cs.RO cs.CV

    Graspness Discovery in Clutters for Fast and Accurate Grasp Detection

    Authors: Chenxi Wang, Hao-Shu Fang, Minghao Gou, Hongjie Fang, ** Gao, Cewu Lu

    Abstract: Efficient and robust grasp pose detection is vital for robotic manipulation. For general 6 DoF gras**, conventional methods treat all points in a scene equally and usually adopt uniform sampling to select grasp candidates. However, we discover that ignoring where to grasp greatly harms the speed and accuracy of current grasp pose detection methods. In this paper, we propose "graspness", a qualit… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: ICCV 2021

  4. arXiv:2406.07080  [pdf, other

    cs.CL

    DARA: Decomposition-Alignment-Reasoning Autonomous Language Agent for Question Answering over Knowledge Graphs

    Authors: Haishuo Fang, Xiaodan Zhu, Iryna Gurevych

    Abstract: Answering Questions over Knowledge Graphs (KGQA) is key to well-functioning autonomous language agents in various real-life applications. To improve the neural-symbolic reasoning capabilities of language agents powered by Large Language Models (LLMs) in KGQA, we propose the DecompositionAlignment-Reasoning Agent (DARA) framework. DARA effectively parses questions into formal queries through a dual… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted by ACL2024 findings

  5. arXiv:2406.06563  [pdf, other

    cs.CL cs.AI

    Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models

    Authors: Tianwen Wei, Bo Zhu, Liang Zhao, Cheng Cheng, Biye Li, Weiwei Lü, Peng Cheng, Jianhao Zhang, Xiaoyu Zhang, Liang Zeng, Xiaokun Wang, Yutuan Ma, Rui Hu, Shuicheng Yan, Han Fang, Yahui Zhou

    Abstract: In this technical report, we introduce the training methodologies implemented in the development of Skywork-MoE, a high-performance mixture-of-experts (MoE) large language model (LLM) with 146 billion parameters and 16 experts. It is initialized from the pre-existing dense checkpoints of our Skywork-13B model. We explore the comparative effectiveness of upcycling versus training from scratch initi… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  6. arXiv:2406.05704  [pdf, other

    cs.CV

    Hierarchical Features Matter: A Deep Exploration of GAN Priors for Improved Dataset Distillation

    Authors: Xinhao Zhong, Hao Fang, Bin Chen, Xulin Gu, Tao Dai, Meikang Qiu, Shu-Tao Xia

    Abstract: Dataset distillation is an emerging dataset reduction method, which condenses large-scale datasets while maintaining task accuracy. Current methods have integrated parameterization techniques to boost synthetic dataset performance by shifting the optimization space from pixel to another informative feature domain. However, they limit themselves to a fixed optimization space for distillation, negle… ▽ More

    Submitted 12 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

  7. arXiv:2406.05491  [pdf, other

    cs.CV cs.CR

    One Perturbation is Enough: On Generating Universal Adversarial Perturbations against Vision-Language Pre-training Models

    Authors: Hao Fang, Jiawei Kong, Wenbo Yu, Bin Chen, Jiawei Li, Shutao Xia, Ke Xu

    Abstract: Vision-Language Pre-training (VLP) models trained on large-scale image-text pairs have demonstrated unprecedented capability in many practical applications. However, previous studies have revealed that VLP models are vulnerable to adversarial samples crafted by a malicious adversary. While existing attacks have achieved great success in improving attack effect and transferability, they all focus o… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  8. arXiv:2406.04842  [pdf, other

    cs.CV

    3rd Place Solution for MeViS Track in CVPR 2024 PVUW workshop: Motion Expression guided Video Segmentation

    Authors: Feiyu Pan, Hao Fang, Xiankai Lu

    Abstract: Referring video object segmentation (RVOS) relies on natural language expressions to segment target objects in video, emphasizing modeling dense text-video relations. The current RVOS methods typically use independently pre-trained vision and language models as backbones, resulting in a significant domain gap between video and text. In cross-modal feature interaction, text features are only used a… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  9. arXiv:2406.04214  [pdf, other

    cs.CL

    ValueBench: Towards Comprehensively Evaluating Value Orientations and Understanding of Large Language Models

    Authors: Yuanyi Ren, Haoran Ye, Hanjun Fang, Xin Zhang, Guojie Song

    Abstract: Large Language Models (LLMs) are transforming diverse fields and gaining increasing influence as human proxies. This development underscores the urgent need for evaluating value orientations and understanding of LLMs to ensure their responsible integration into public-facing applications. This work introduces ValueBench, the first comprehensive psychometric benchmark for evaluating value orientati… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Accepted at ACL 2024

  10. arXiv:2406.01436  [pdf, other

    cs.CL

    Editing the Mind of Giants: An In-Depth Exploration of Pitfalls of Knowledge Editing in Large Language Models

    Authors: Cheng-Hsun Hsueh, Paul Kuo-Ming Huang, Tzu-Han Lin, Che-Wei Liao, Hung-Chieh Fang, Chao-Wei Huang, Yun-Nung Chen

    Abstract: Knowledge editing is a rising technique for efficiently updating factual knowledge in Large Language Models (LLMs) with minimal alteration of parameters. However, recent studies have identified concerning side effects, such as knowledge distortion and the deterioration of general abilities, that have emerged after editing. This survey presents a comprehensive study of these side effects, providing… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  11. arXiv:2406.00605  [pdf, other

    cs.CL cs.AI

    LongSkywork: A Training Recipe for Efficiently Extending Context Length in Large Language Models

    Authors: Liang Zhao, Tianwen Wei, Liang Zeng, Cheng Cheng, Liu Yang, Peng Cheng, Lijie Wang, Chenxia Li, Xuejie Wu, Bo Zhu, Yimeng Gan, Rui Hu, Shuicheng Yan, Han Fang, Yahui Zhou

    Abstract: We introduce LongSkywork, a long-context Large Language Model (LLM) capable of processing up to 200,000 tokens. We provide a training recipe for efficiently extending context length of LLMs. We identify that the critical element in enhancing long-context processing capability is to incorporate a long-context SFT stage following the standard SFT stage. A mere 200 iterations can convert the standard… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  12. arXiv:2405.20725  [pdf, other

    cs.AI cs.CV

    GI-NAS: Boosting Gradient Inversion Attacks through Adaptive Neural Architecture Search

    Authors: Wenbo Yu, Hao Fang, Bin Chen, Xiaohang Sui, Chuan Chen, Hao Wu, Shu-Tao Xia, Ke Xu

    Abstract: Gradient Inversion Attacks invert the transmitted gradients in Federated Learning (FL) systems to reconstruct the sensitive data of local clients and have raised considerable privacy concerns. A majority of gradient inversion methods rely heavily on explicit prior knowledge (e.g., a well pre-trained generative model), which is often unavailable in realistic scenarios. To alleviate this issue, rese… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

  13. Learn to be Fair without Labels: a Distribution-based Learning Framework for Fair Ranking

    Authors: Fumian Chen, Hui Fang

    Abstract: Ranking algorithms as an essential component of retrieval systems have been constantly improved in previous studies, especially regarding relevance-based utilities. In recent years, more and more research attempts have been proposed regarding fairness in rankings due to increasing concerns about potential discrimination and the issue of echo chamber. These attempts include traditional score-based… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: ICTIR'23

  14. LEO Satellite Network Access in the Wild: Potentials, Experiences, and Challenges

    Authors: Sami Ma, Yi Ching Chou, Miao Zhang, Hao Fang, Haoyuan Zhao, Jiangchuan Liu, William I. Atlas

    Abstract: In the past three years, working with the Pacific Salmon Foundation and various First Nations groups, we have established Starlink-empowered wild salmon monitoring sites in remote Northern British Columbia, Canada. We report our experiences with the network services in these challenging environments, including deep woods and deep valleys, that lack infrastructural support with some close to Starli… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

    Comments: 8 pages, 6 figures

    ACM Class: C.2.1

  15. arXiv:2405.03458  [pdf, other

    cs.CV

    SSyncOA: Self-synchronizing Object-aligned Watermarking to Resist Crop**-paste Attacks

    Authors: Chengxin Zhao, Hefei Ling, Si**g Xie, Han Fang, Yaokun Fang, Nan Sun

    Abstract: Modern image processing tools have made it easy for attackers to crop the region or object of interest in images and paste it into other images. The challenge this crop**-paste attack poses to the watermarking technology is that it breaks the synchronization of the image watermark, introducing multiple superimposed desynchronization distortions, such as rotation, scaling, and translation. Howeve… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: 7 pages, 5 figures (Have been accepted by ICME 2024)

  16. arXiv:2404.16233  [pdf, other

    cs.LG cs.AI

    AutoGluon-Multimodal (AutoMM): Supercharging Multimodal AutoML with Foundation Models

    Authors: Zhiqiang Tang, Haoyang Fang, Su Zhou, Taojiannan Yang, Zihan Zhong, Tony Hu, Katrin Kirchhoff, George Karypis

    Abstract: AutoGluon-Multimodal (AutoMM) is introduced as an open-source AutoML library designed specifically for multimodal learning. Distinguished by its exceptional ease of use, AutoMM enables fine-tuning of foundation models with just three lines of code. Supporting various modalities including image, text, and tabular data, both independently and in combination, the library offers a comprehensive suite… ▽ More

    Submitted 30 April, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

    Comments: Accepted at AutoML 2024 Conference

  17. arXiv:2404.13263  [pdf, other

    cs.CV

    FilterPrompt: Guiding Image Transfer in Diffusion Models

    Authors: Xi Wang, Yichen Peng, Heng Fang, Haoran Xie, Xi Yang, Chuntao Li

    Abstract: In controllable generation tasks, flexibly manipulating the generated images to attain a desired appearance or structure based on a single input image cue remains a critical and longstanding challenge. Achieving this requires the effective decoupling of key attributes within the input image data, aiming to get representations accurately. Previous research has predominantly concentrated on disentan… ▽ More

    Submitted 12 May, 2024; v1 submitted 20 April, 2024; originally announced April 2024.

    Comments: Project Page: https://meaoxixi.github.io/FilterPrompt/

  18. arXiv:2404.12281  [pdf, other

    cs.RO

    RISE: 3D Perception Makes Real-World Robot Imitation Simple and Effective

    Authors: Chenxi Wang, Hongjie Fang, Hao-Shu Fang, Cewu Lu

    Abstract: Precise robot manipulations require rich spatial information in imitation learning. Image-based policies model object positions from fixed cameras, which are sensitive to camera view changes. Policies utilizing 3D point clouds usually predict keyframes rather than continuous actions, posing difficulty in dynamic and contact-rich scenarios. To utilize 3D perception efficiently, we present RISE, an… ▽ More

    Submitted 21 April, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

  19. arXiv:2404.12216  [pdf, other

    cs.CV

    ProTA: Probabilistic Token Aggregation for Text-Video Retrieval

    Authors: Han Fang, Xianghao Zang, Chao Ban, Zerun Feng, Lanxiang Zhou, Zhongjiang He, Yongxiang Li, Hao Sun

    Abstract: Text-video retrieval aims to find the most relevant cross-modal samples for a given query. Recent methods focus on modeling the whole spatial-temporal relations. However, since video clips contain more diverse content than captions, the model aligning these asymmetric video-text pairs has a high risk of retrieving many false positive results. In this paper, we propose Probabilistic Token Aggregati… ▽ More

    Submitted 20 April, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

  20. Behavior Alignment: A New Perspective of Evaluating LLM-based Conversational Recommendation Systems

    Authors: Dayu Yang, Fumian Chen, Hui Fang

    Abstract: Large Language Models (LLMs) have demonstrated great potential in Conversational Recommender Systems (CRS). However, the application of LLMs to CRS has exposed a notable discrepancy in behavior between LLM-based CRS and human recommenders: LLMs often appear inflexible and passive, frequently rushing to complete the recommendation task without sufficient inquiry.This behavior discrepancy can lead t… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: Accepted by the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2024)

  21. arXiv:2404.04956  [pdf, other

    cs.CV cs.CR

    Gaussian Shading: Provable Performance-Lossless Image Watermarking for Diffusion Models

    Authors: Zi** Yang, Kai Zeng, Kejiang Chen, Han Fang, Weiming Zhang, Nenghai Yu

    Abstract: Ethical concerns surrounding copyright protection and inappropriate content generation pose challenges for the practical implementation of diffusion models. One effective solution involves watermarking the generated images. However, existing methods often compromise the model performance or require additional training, which is undesirable for operators and users. To address this issue, we propose… ▽ More

    Submitted 6 May, 2024; v1 submitted 7 April, 2024; originally announced April 2024.

    Comments: 17 pages, 11 figures, accepted by CVPR 2024

  22. arXiv:2404.02284  [pdf, other

    cs.RO

    APEX: Ambidextrous Dual-Arm Robotic Manipulation Using Collision-Free Generative Diffusion Models

    Authors: Apan Dastider, Hao Fang, Mingjie Lin

    Abstract: Dexterous manipulation, particularly adept coordinating and gras**, constitutes a fundamental and indispensable capability for robots, facilitating the emulation of human-like behaviors. Integrating this capability into robots empowers them to supplement and even supplant humans in undertaking increasingly intricate tasks in both daily life and industrial settings. Unfortunately, contemporary me… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: Under Review in IEEE IROS 2024

  23. arXiv:2404.00261  [pdf, other

    cs.IR cs.AI

    A Simple Yet Effective Approach for Diversified Session-Based Recommendation

    Authors: Qing Yin, Hui Fang, Zhu Sun, Yew-Soon Ong

    Abstract: Session-based recommender systems (SBRSs) have become extremely popular in view of the core capability of capturing short-term and dynamic user preferences. However, most SBRSs primarily maximize recommendation accuracy but ignore user minor preferences, thus leading to filter bubbles in the long run. Only a handful of works, being devoted to improving diversity, depend on unique model designs and… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

  24. arXiv:2403.19622  [pdf, other

    cs.RO cs.CV

    RH20T-P: A Primitive-Level Robotic Dataset Towards Composable Generalization Agents

    Authors: Zeren Chen, Zhelun Shi, Xiaoya Lu, Lehan He, Sucheng Qian, Hao Shu Fang, Zhenfei Yin, Wanli Ouyang, **g Shao, Yu Qiao, Cewu Lu, Lu Sheng

    Abstract: The ultimate goals of robotic learning is to acquire a comprehensive and generalizable robotic system capable of performing both seen skills within the training distribution and unseen skills in novel environments. Recent progress in utilizing language models as high-level planners has demonstrated that the complexity of tasks can be reduced through decomposing them into primitive-level plans, mak… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: 24 pages, 12 figures, 6 tables

  25. arXiv:2403.18462  [pdf, other

    cs.IR

    Decoy Effect In Search Interaction: Understanding User Behavior and Measuring System Vulnerability

    Authors: Nuo Chen, Jiqun Liu, Hanpei Fang, Yuankai Luo, Tetsuya Sakai, Xiao-Ming Wu

    Abstract: This study examines the decoy effect's underexplored influence on user search interactions and methods for measuring information retrieval (IR) systems' vulnerability to this effect. It explores how decoy results alter users' interactions on search engine result pages, focusing on metrics like click-through likelihood, browsing time, and perceived document usefulness. By analyzing user interaction… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

  26. arXiv:2403.17632  [pdf, other

    cs.AI cs.CY cs.LG

    Data-driven Energy Consumption Modelling for Electric Micromobility using an Open Dataset

    Authors: Yue Ding, Sen Yan, Maqsood Hussain Shah, Hongyuan Fang, Ji Li, Mingming Liu

    Abstract: The escalating challenges of traffic congestion and environmental degradation underscore the critical importance of embracing E-Mobility solutions in urban spaces. In particular, micro E-Mobility tools such as E-scooters and E-bikes, play a pivotal role in this transition, offering sustainable alternatives for urban commuters. However, the energy consumption patterns for these tools are a critical… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: 7 pages, 5 figures, 4 tables. This manuscript has been accepted by the IEEE ITEC 2024

  27. arXiv:2403.16788  [pdf, other

    cs.CV

    HPL-ESS: Hybrid Pseudo-Labeling for Unsupervised Event-based Semantic Segmentation

    Authors: Linglin **g, Yiming Ding, Yunpeng Gao, Zhigang Wang, Xu Yan, Dong Wang, Gerald Schaefer, Hui Fang, Bin Zhao, Xuelong Li

    Abstract: Event-based semantic segmentation has gained popularity due to its capability to deal with scenarios under high-speed motion and extreme lighting conditions, which cannot be addressed by conventional RGB cameras. Since it is hard to annotate event data, previous approaches rely on event-to-image reconstruction to obtain pseudo labels for training. However, this will inevitably introduce noise, and… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  28. arXiv:2403.08948  [pdf, ps, other

    eess.SY cs.GT

    Model-free Resilient Controller Design based on Incentive Feedback Stackelberg Game and Q-learning

    Authors: Jiajun Shen, Fengjun Li, Morteza Hashemi, Huazhen Fang

    Abstract: In the swift evolution of Cyber-Physical Systems (CPSs) within intelligent environments, especially in the industrial domain shaped by Industry 4.0, the surge in development brings forth unprecedented security challenges. This paper explores the intricate security issues of Industrial CPSs (ICPSs), with a specific focus on the unique threats presented by intelligent attackers capable of directly c… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

    Comments: 8 pages

  29. arXiv:2403.04746  [pdf, other

    cs.CL cs.AI cs.LG

    LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error

    Authors: Boshi Wang, Hao Fang, Jason Eisner, Benjamin Van Durme, Yu Su

    Abstract: Tools are essential for large language models (LLMs) to acquire up-to-date information and take consequential actions in external environments. Existing work on tool-augmented LLMs primarily focuses on the broad coverage of tools and the flexibility of adding new tools. However, a critical aspect that has surprisingly been understudied is simply how accurately an LLM uses tools for which it has be… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

    Comments: Code and data available at https://github.com/microsoft/simulated-trial-and-error

  30. arXiv:2402.14872  [pdf, other

    cs.CL cs.AI cs.NE

    Semantic Mirror Jailbreak: Genetic Algorithm Based Jailbreak Prompts Against Open-source LLMs

    Authors: Xiaoxia Li, Siyuan Liang, Jiyi Zhang, Han Fang, Aishan Liu, Ee-Chien Chang

    Abstract: Large Language Models (LLMs), used in creative writing, code generation, and translation, generate text based on input sequences but are vulnerable to jailbreak attacks, where crafted prompts induce harmful outputs. Most jailbreak prompt methods use a combination of jailbreak templates followed by questions to ask to create jailbreak prompts. However, existing jailbreak prompt designs generally su… ▽ More

    Submitted 27 February, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

  31. arXiv:2402.09270  [pdf, other

    cs.CV

    Fast Window-Based Event Denoising with Spatiotemporal Correlation Enhancement

    Authors: Huachen Fang, **jian Wu, Qibin Hou, Weisheng Dong, Guangming Shi

    Abstract: Previous deep learning-based event denoising methods mostly suffer from poor interpretability and difficulty in real-time processing due to their complex architecture designs. In this paper, we propose window-based event denoising, which simultaneously deals with a stack of events while existing element-based denoising focuses on one event each time. Besides, we give the theoretical analysis based… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

  32. arXiv:2402.05819  [pdf, other

    eess.AS cs.CL cs.LG

    Integrating Self-supervised Speech Model with Pseudo Word-level Targets from Visually-grounded Speech Model

    Authors: Hung-Chieh Fang, Nai-Xuan Ye, Yi-Jen Shih, Puyuan Peng, Hsuan-Fu Wang, Layne Berry, Hung-yi Lee, David Harwath

    Abstract: Recent advances in self-supervised speech models have shown significant improvement in many downstream tasks. However, these models predominantly centered on frame-level training objectives, which can fall short in spoken language understanding tasks that require semantic comprehension. Existing works often rely on additional speech-text data as intermediate targets, which is costly in the real-wo… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

    Comments: Accepted to ICASSP 2024 workshop on Self-supervision in Audio, Speech, and Beyond (SASB)

  33. arXiv:2402.04640  [pdf, other

    cs.LG

    Domain Bridge: Generative model-based domain forensic for black-box models

    Authors: Jiyi Zhang, Han Fang, Ee-Chien Chang

    Abstract: In forensic investigations of machine learning models, techniques that determine a model's data domain play an essential role, with prior work relying on large-scale corpora like ImageNet to approximate the target model's domain. Although such methods are effective in finding broad domains, they often struggle in identifying finer-grained classes within those domains. In this paper, we introduce a… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

  34. arXiv:2402.04013  [pdf, other

    cs.CV

    Privacy Leakage on DNNs: A Survey of Model Inversion Attacks and Defenses

    Authors: Hao Fang, Yixiang Qiu, Hongyao Yu, Wenbo Yu, Jiawei Kong, Baoli Chong, Bin Chen, Xuan Wang, Shu-Tao Xia

    Abstract: Model Inversion (MI) attacks aim to disclose private information about the training data by abusing access to the pre-trained models. These attacks enable adversaries to reconstruct high-fidelity data that closely aligns with the private training data, which has raised significant privacy concerns. Despite the rapid advances in the field, we lack a comprehensive overview of existing MI attacks and… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

  35. arXiv:2402.03583  [pdf, other

    cs.SI cs.AI cs.LG

    MQuinE: a cure for "Z-paradox" in knowledge graph embedding models

    Authors: Yang Liu, Huang Fang, Yunfeng Cai, Mingming Sun

    Abstract: Knowledge graph embedding (KGE) models achieved state-of-the-art results on many knowledge graph tasks including link prediction and information retrieval. Despite the superior performance of KGE models in practice, we discover a deficiency in the expressiveness of some popular existing KGE models called \emph{Z-paradox}. Motivated by the existence of Z-paradox, we propose a new KGE model called \… ▽ More

    Submitted 6 February, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

    Comments: 18pages, 1 figure

  36. arXiv:2402.02317  [pdf, other

    cs.LG

    INViT: A Generalizable Routing Problem Solver with Invariant Nested View Transformer

    Authors: Han Fang, Zhihao Song, Paul Weng, Yutong Ban

    Abstract: Recently, deep reinforcement learning has shown promising results for learning fast heuristics to solve routing problems. Meanwhile, most of the solvers suffer from generalizing to an unseen distribution or distributions with different scales. To address this issue, we propose a novel architecture, called Invariant Nested View Transformer (INViT), which is designed to enforce a nested design toget… ▽ More

    Submitted 26 May, 2024; v1 submitted 3 February, 2024; originally announced February 2024.

    Comments: Accepted as poster of ICML-2024

  37. arXiv:2402.01259  [pdf, other

    cs.NI cs.AI cs.LG cs.SI eess.SY

    Position Aware 60 GHz mmWave Beamforming for V2V Communications Utilizing Deep Learning

    Authors: Muhammad Baqer Mollah, Honggang Wang, Hua Fang

    Abstract: Beamforming techniques are considered as essential parts to compensate the severe path loss in millimeter-wave (mmWave) communications by adopting large antenna arrays and formulating narrow beams to obtain satisfactory received powers. However, performing accurate beam alignment over such narrow beams for efficient link configuration by traditional beam selection approaches, mainly relied on chan… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

    Comments: 2024 IEEE International Conference on Communications (ICC), Denver, CO, USA

  38. arXiv:2401.17868  [pdf, other

    cs.CV cs.LG

    Convolution Meets LoRA: Parameter Efficient Finetuning for Segment Anything Model

    Authors: Zihan Zhong, Zhiqiang Tang, Tong He, Haoyang Fang, Chun Yuan

    Abstract: The Segment Anything Model (SAM) stands as a foundational framework for image segmentation. While it exhibits remarkable zero-shot generalization in typical scenarios, its advantage diminishes when applied to specialized domains like medical imagery and remote sensing. To address this limitation, this paper introduces Conv-LoRA, a simple yet effective parameter-efficient fine-tuning approach. By i… ▽ More

    Submitted 31 January, 2024; originally announced January 2024.

    Comments: Accepted at ICLR 2024 Conference

  39. arXiv:2401.17699  [pdf, other

    cs.CV

    Unified Physical-Digital Face Attack Detection

    Authors: Hao Fang, Ajian Liu, Haocheng Yuan, Junze Zheng, Dingheng Zeng, Yanhong Liu, Jiankang Deng, Sergio Escalera, Xiaoming Liu, Jun Wan, Zhen Lei

    Abstract: Face Recognition (FR) systems can suffer from physical (i.e., print photo) and digital (i.e., DeepFake) attacks. However, previous related work rarely considers both situations at the same time. This implies the deployment of multiple models and thus more computational burden. The main reasons for this lack of an integrated model are caused by two factors: (1) The lack of a dataset including both… ▽ More

    Submitted 31 January, 2024; originally announced January 2024.

    Comments: 12 pages, 8 figures

  40. arXiv:2401.16122  [pdf, other

    cs.CV cs.RO

    DeFlow: Decoder of Scene Flow Network in Autonomous Driving

    Authors: Qingwen Zhang, Yi Yang, Heng Fang, Ruoyu Geng, Patric Jensfelt

    Abstract: Scene flow estimation determines a scene's 3D motion field, by predicting the motion of points in the scene, especially for aiding tasks in autonomous driving. Many networks with large-scale point clouds as input use voxelization to create a pseudo-image for real-time running. However, the voxelization process often results in the loss of point-specific features. This gives rise to a challenge in… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

    Comments: 7 pages, 4 figures, Code check https://github.com/KTH-RPL/deflow, accepted by ICRA 2024

  41. arXiv:2312.16262  [pdf, other

    cs.IR cs.AI

    Dynamic In-Context Learning from Nearest Neighbors for Bundle Generation

    Authors: Zhu Sun, Kaidong Feng, Jie Yang, Xinghua Qu, Hui Fang, Yew-Soon Ong, Wenyuan Liu

    Abstract: Product bundling has evolved into a crucial marketing strategy in e-commerce. However, current studies are limited to generating (1) fixed-size or single bundles, and most importantly, (2) bundles that do not reflect consistent user intents, thus being less intelligible or useful to users. This paper explores two interrelated tasks, i.e., personalized bundle generation and the underlying intent in… ▽ More

    Submitted 26 December, 2023; originally announced December 2023.

  42. Session-Based Recommendation by Exploiting Substitutable and Complementary Relationships from Multi-behavior Data

    Authors: Huizi Wu, Cong Geng, Hui Fang

    Abstract: Session-based recommendation (SR) aims to dynamically recommend items to a user based on a sequence of the most recent user-item interactions. Most existing studies on SR adopt advanced deep learning methods. However, the majority only consider a special behavior type (e.g., click), while those few considering multi-typed behaviors ignore to take full advantage of the relationships between product… ▽ More

    Submitted 14 January, 2024; v1 submitted 13 December, 2023; originally announced December 2023.

    Comments: 31 pages,11 figures, accepted by Data Mining and Knowledge Discovery(2023)

  43. arXiv:2312.07378  [pdf, other

    cs.CV

    X4D-SceneFormer: Enhanced Scene Understanding on 4D Point Cloud Videos through Cross-modal Knowledge Transfer

    Authors: Linglin **g, Ying Xue, Xu Yan, Chaoda Zheng, Dong Wang, Ruimao Zhang, Zhigang Wang, Hui Fang, Bin Zhao, Zhen Li

    Abstract: The field of 4D point cloud understanding is rapidly develo** with the goal of analyzing dynamic 3D point cloud sequences. However, it remains a challenging task due to the sparsity and lack of texture in point clouds. Moreover, the irregularity of point cloud poses a difficulty in aligning temporal information within video sequences. To address these issues, we propose a novel cross-modal knowl… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

  44. arXiv:2312.07371  [pdf, other

    cs.LG cs.AI cs.CR physics.soc-ph

    Privacy-Aware Energy Consumption Modeling of Connected Battery Electric Vehicles using Federated Learning

    Authors: Sen Yan, Hongyuan Fang, Ji Li, Tomas Ward, Noel O'Connor, Mingming Liu

    Abstract: Battery Electric Vehicles (BEVs) are increasingly significant in modern cities due to their potential to reduce air pollution. Precise and real-time estimation of energy consumption for them is imperative for effective itinerary planning and optimizing vehicle systems, which can reduce driving range anxiety and decrease energy costs. As public awareness of data privacy increases, adopting approach… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

    Comments: This paper is accepted by IEEE Transactions on Transportation Electrification (TTE) on December 4, 2023. (13 pages, 6 figures, and 6 tables)

  45. arXiv:2311.17953  [pdf, other

    cs.CV

    Rethinking Image Editing Detection in the Era of Generative AI Revolution

    Authors: Zhihao Sun, Haipeng Fang, Xinying Zhao, Danding Wang, Juan Cao

    Abstract: The accelerated advancement of generative AI significantly enhance the viability and effectiveness of generative regional editing methods. This evolution render the image manipulation more accessible, thereby intensifying the risk of altering the conveyed information within original images and even propagating misinformation. Consequently, there exists a critical demand for robust capable of detec… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

  46. arXiv:2311.11017  [pdf, other

    cs.CV

    Improving Adversarial Transferability by Stable Diffusion

    Authors: Jiayang Liu, Siyu Zhu, Siyuan Liang, Jie Zhang, Han Fang, Weiming Zhang, Ee-Chien Chang

    Abstract: Deep neural networks (DNNs) are susceptible to adversarial examples, which introduce imperceptible perturbations to benign samples, deceiving DNN predictions. While some attack methods excel in the white-box setting, they often struggle in the black-box scenario, particularly against models fortified with defense mechanisms. Various techniques have emerged to enhance the transferability of adversa… ▽ More

    Submitted 18 November, 2023; originally announced November 2023.

  47. arXiv:2311.02346  [pdf

    cs.RO

    A Comprehensive Dynamic Simulation Framework for Coupled Neuromusculoskeletal-Exoskeletal Systems

    Authors: Wei **, Jiaqi Liu, Qiwei Zhang, Xiaoxu Zhang, Qining Wang, Hongbin Fang, Jian Xu

    Abstract: The modeling and simulation of coupled neuromusculoskeletal-exoskeletal systems play a crucial role in human biomechanical analysis, as well as in the design and control of exoskeletons. However, conventional dynamic simulation frameworks have limitations due to their reliance on experimental data and their inability to capture comprehensive biomechanical signals and dynamic responses. To address… ▽ More

    Submitted 4 November, 2023; originally announced November 2023.

    Comments: 50 pages, 11 figures, 4 tables

  48. arXiv:2310.19341  [pdf, other

    cs.CL cs.AI

    Skywork: A More Open Bilingual Foundation Model

    Authors: Tianwen Wei, Liang Zhao, Lichang Zhang, Bo Zhu, Lijie Wang, Haihua Yang, Biye Li, Cheng Cheng, Weiwei Lü, Rui Hu, Chenxia Li, Liu Yang, Xilin Luo, Xuejie Wu, Lunan Liu, Wenjun Cheng, Peng Cheng, Jianhao Zhang, Xiaoyu Zhang, Lei Lin, Xiaokun Wang, Yutuan Ma, Chuanhai Dong, Yanqi Sun, Yifu Chen , et al. (5 additional authors not shown)

    Abstract: In this technical report, we present Skywork-13B, a family of large language models (LLMs) trained on a corpus of over 3.2 trillion tokens drawn from both English and Chinese texts. This bilingual foundation model is the most extensively trained and openly published LLMs of comparable size to date. We introduce a two-stage training methodology using a segmented corpus, targeting general purpose tr… ▽ More

    Submitted 30 October, 2023; originally announced October 2023.

  49. arXiv:2310.14780  [pdf, other

    cs.CV

    Dance Your Latents: Consistent Dance Generation through Spatial-temporal Subspace Attention Guided by Motion Flow

    Authors: Haipeng Fang, Zhihao Sun, Ziyao Huang, Fan Tang, Juan Cao, Sheng Tang

    Abstract: The advancement of generative AI has extended to the realm of Human Dance Generation, demonstrating superior generative capacities. However, current methods still exhibit deficiencies in achieving spatiotemporal consistency, resulting in artifacts like ghosting, flickering, and incoherent motions. In this paper, we present Dance-Your-Latents, a framework that makes latents dance coherently followi… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

    Comments: 10 pages, 5 figures

  50. arXiv:2310.09700  [pdf, other

    cs.NI cs.SI eess.SY

    mmWave Enabled Connected Autonomous Vehicles: A Use Case with V2V Cooperative Perception

    Authors: Muhammad Baqer Mollah, Honggang Wang, Mohammad Ataul Karim, Hua Fang

    Abstract: Connected and autonomous vehicles (CAVs) will revolutionize tomorrow's intelligent transportation systems, being considered promising to improve transportation safety, traffic efficiency, and mobility. In fact, envisioned use cases of CAVs demand very high throughput, lower latency, highly reliable communications, and precise positioning capabilities. The availability of a large spectrum at millim… ▽ More

    Submitted 14 October, 2023; originally announced October 2023.

    Comments: 8 Pages

    Journal ref: IEEE Network, 2023