Skip to main content

Showing 1–50 of 305 results for author: Feng, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01155  [pdf, other

    cs.LG

    CPT: Consistent Proxy Tuning for Black-box Optimization

    Authors: Yuanyang He, Zitong Huang, Xinxing Xu, Rick Siow Mong Goh, Salman Khan, Wangmeng Zuo, Yong Liu, Chun-Mei Feng

    Abstract: Black-box tuning has attracted recent attention due to that the structure or inner parameters of advanced proprietary models are not accessible. Proxy-tuning provides a test-time output adjustment for tuning black-box language models. It applies the difference of the output logits before and after tuning a smaller white-box "proxy" model to improve the black-box model. However, this technique serv… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 10 pages,2 figures plus supplementary materials

  2. arXiv:2407.00577  [pdf, other

    cs.RO

    FALCON: Fast Autonomous Aerial Exploration using Coverage Path Guidance

    Authors: Yichen Zhang, Xinyi Chen, Chen Feng, Boyu Zhou, Shaojie Shen

    Abstract: This paper introduces FALCON, a novel Fast Autonomous expLoration framework using COverage path guidaNce, which aims at setting a new performance benchmark in the field of autonomous aerial exploration. Despite recent advancements in the domain, existing exploration planners often suffer from inefficiencies such as frequent revisitations of previously explored regions. FALCON effectively harnesses… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  3. arXiv:2406.18977  [pdf, other

    cs.RO cs.CL cs.CV

    RoboUniView: Visual-Language Model with Unified View Representation for Robotic Manipulaiton

    Authors: Fanfan Liu, Feng Yan, Liming Zheng, Chengjian Feng, Yiyang Huang, Lin Ma

    Abstract: Utilizing Vision-Language Models (VLMs) for robotic manipulation represents a novel paradigm, aiming to enhance the model's ability to generalize to new objects and instructions. However, due to variations in camera specifications and mounting positions, existing methods exhibit significant performance disparities across different robotic platforms. To address this challenge, we propose RoboUniVie… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  4. arXiv:2406.18470  [pdf, other

    cs.IR cs.LG

    UniRec: A Dual Enhancement of Uniformity and Frequency in Sequential Recommendations

    Authors: Yang Liu, Yitong Wang, Chenyue Feng

    Abstract: Representation learning in sequential recommendation is critical for accurately modeling user interaction patterns and improving recommendation precision. However, existing approaches predominantly emphasize item-to-item transitions, often neglecting the time intervals between interactions, which are closely related to behavior pattern changes. Additionally, broader interaction attributes, such as… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: 15 pages, 8 figures, for source code, see https://github.com/Linxi000/UniRec

    ACM Class: H.3.3; I.2.6

  5. arXiv:2406.17520  [pdf, other

    cs.CV cs.RO

    Tell Me Where You Are: Multimodal LLMs Meet Place Recognition

    Authors: Zonglin Lyu, Juexiao Zhang, Mingxuan Lu, Yiming Li, Chen Feng

    Abstract: Large language models (LLMs) exhibit a variety of promising capabilities in robotics, including long-horizon planning and commonsense reasoning. However, their performance in place recognition is still underexplored. In this work, we introduce multimodal LLMs (MLLMs) to visual place recognition (VPR), where a robot must localize itself using visual observations. Our key design is to use vision-bas… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  6. arXiv:2406.13167  [pdf, other

    cs.CL

    QRMeM: Unleash the Length Limitation through Question then Reflection Memory Mechanism

    Authors: Bo Wang, Heyan Huang, Yixin Cao, Jiahao Ying, Wei Tang, Chong Feng

    Abstract: While large language models (LLMs) have made notable advancements in natural language processing, they continue to struggle with processing extensive text. Memory mechanism offers a flexible solution for managing long contexts, utilizing techniques such as compression, summarization, and structuring to facilitate nuanced and efficient handling of large volumes of text. However, existing techniques… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  7. arXiv:2406.13007  [pdf, other

    cs.CV

    NTIRE 2024 Challenge on Night Photography Rendering

    Authors: Egor Ershov, Artyom Panshin, Oleg Karasev, Sergey Korchagin, Shepelev Lev, Alexandr Startsev, Daniil Vladimirov, Ekaterina Zaychenkova, Nikola Banić, Dmitrii Iarchuk, Maria Efimova, Radu Timofte, Arseniy Terekhin, Shuwei Yue, Yuyang Liu, Minchen Wei, Lu Xu, Chao Zhang, Yasi Wang, Furkan Kınlı, Doğa Yılmaz, Barış Özcan, Furkan Kıraç, Shuai Liu, **gyuan Xiao , et al. (25 additional authors not shown)

    Abstract: This paper presents a review of the NTIRE 2024 challenge on night photography rendering. The goal of the challenge was to find solutions that process raw camera images taken in nighttime conditions, and thereby produce a photo-quality output images in the standard RGB (sRGB) space. Unlike the previous year's competition, the challenge images were collected with a mobile phone and the speed of algo… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 10 pages, 10 figures

  8. arXiv:2406.12712  [pdf, other

    cs.CV

    Self-Localized Collaborative Perception

    Authors: Zhenyang Ni, Zixing Lei, Yifan Lu, Dingju Wang, Chen Feng, Yanfeng Wang, Siheng Chen

    Abstract: Collaborative perception has garnered considerable attention due to its capacity to address several inherent challenges in single-agent perception, including occlusion and out-of-range issues. However, existing collaborative perception systems heavily rely on precise localization systems to establish a consistent spatial coordinate system between agents. This reliance makes them susceptible to lar… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  9. arXiv:2406.11021  [pdf, other

    cs.CV

    $α$-SSC: Uncertainty-Aware Camera-based 3D Semantic Scene Completion

    Authors: Sanbao Su, Nuo Chen, Felix Juefei-Xu, Chen Feng, Fei Miao

    Abstract: In the realm of autonomous vehicle (AV) perception, comprehending 3D scenes is paramount for tasks such as planning and map**. Semantic scene completion (SSC) aims to infer scene geometry and semantics from limited observations. While camera-based SSC has gained popularity due to affordability and rich visual cues, existing methods often neglect the inherent uncertainty in models. To address thi… ▽ More

    Submitted 21 June, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

    Comments: 12 pages, 4 figures

  10. arXiv:2406.09742  [pdf, other

    cs.IR

    IFA: Interaction Fidelity Attention for Entire Lifelong Behaviour Sequence Modeling

    Authors: Wenhui Yu, Chao Feng, Yanze Zhang, Lantao Hu, Peng Jiang, Han Li

    Abstract: The lifelong user behavior sequence provides abundant information of user preference and gains impressive improvement in the recommendation task, however increases computational consumption significantly. To meet the severe latency requirement in online service, a short sub-sequence is sampled based on similarity to the target item. Unfortunately, items not in the sub-sequence are abandoned, leadi… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 7 pages, 2 figures

  11. arXiv:2406.09383  [pdf, other

    cs.CV

    Multiagent Multitraversal Multimodal Self-Driving: Open MARS Dataset

    Authors: Yiming Li, Zhiheng Li, Nuo Chen, Moonjun Gong, Zonglin Lyu, Zehong Wang, Peili Jiang, Chen Feng

    Abstract: Large-scale datasets have fueled recent advancements in AI-based autonomous vehicle research. However, these datasets are usually collected from a single vehicle's one-time pass of a certain location, lacking multiagent interactions or repeated traversals of the same place. Such information could lead to transformative enhancements in autonomous vehicles' perception, prediction, and planning capab… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Accepted by CVPR 2024

  12. arXiv:2406.07572  [pdf, ps, other

    cs.AI cs.CE cs.LG

    Domain-specific ReAct for physics-integrated iterative modeling: A case study of LLM agents for gas path analysis of gas turbines

    Authors: Tao Song, Yuwei Fan, Chenlong Feng, Keyu Song, Chao Liu, Dongxiang Jiang

    Abstract: This study explores the application of large language models (LLMs) with callable tools in energy and power engineering domain, focusing on gas path analysis of gas turbines. We developed a dual-agent tool-calling process to integrate expert knowledge, predefined tools, and LLM reasoning. We evaluated various LLMs, including LLama3, Qwen1.5 and GPT. Smaller models struggled with tool usage and par… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  13. arXiv:2406.07202  [pdf

    cs.CV

    Can Foundation Models Reliably Identify Spatial Hazards? A Case Study on Curb Segmentation

    Authors: Diwei Sheng, Giles Hamilton-Fletcher, Mahya Beheshti, Chen Feng, John-Ross Rizzo

    Abstract: Curbs serve as vital borders that delineate safe pedestrian zones from potential vehicular traffic hazards. Curbs also represent a primary spatial hazard during dynamic navigation with significant stumbling potential. Such vulnerabilities are particularly exacerbated for persons with blindness and low vision (PBLV). Accurate visual-based discrimination of curbs is paramount for assistive technolog… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: 21 pages, 8 figures, submitted to Assistive Technology

  14. arXiv:2406.06644  [pdf, other

    cs.LG cs.AI

    Latent Diffusion Model-Enabled Real-Time Semantic Communication Considering Semantic Ambiguities and Channel Noises

    Authors: Jianhua Pei, Cheng Feng, ** Wang, Hina Tabassum, Dongyuan Shi

    Abstract: Semantic communication (SemCom) has emerged as a new paradigm for 6G communication, with deep learning (DL) models being one of the key drives to shift from the accuracy of bit/symbol to the semantics and pragmatics of data. Nevertheless, DL-based SemCom systems often face performance bottlenecks due to overfitting, poor generalization, and sensitivity to outliers. Furthermore, the varying-fading… ▽ More

    Submitted 24 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

  15. arXiv:2406.00779  [pdf, other

    cs.LG

    Differentiation of Multi-objective Data-driven Decision Pipeline

    Authors: Peng Li, Lixia Wu, Chaoqun Feng, Haoyuan Hu, Lei Fu, Jie** Ye

    Abstract: Real-world scenarios frequently involve multi-objective data-driven optimization problems, characterized by unknown problem coefficients and multiple conflicting objectives. Traditional two-stage methods independently apply a machine learning model to estimate problem coefficients, followed by invoking a solver to tackle the predicted optimization problem. The independent use of optimization solve… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  16. arXiv:2406.00212  [pdf, other

    eess.IV cs.CV

    MVAD: A Multiple Visual Artifact Detector for Video Streaming

    Authors: Chen Feng, Duolikun Danier, Fan Zhang, David Bull

    Abstract: Visual artifacts are often introduced into streamed video content, due to prevailing conditions during content production and/or delivery. Since these can degrade the quality of the user's experience, it is important to automatically and accurately detect them in order to enable effective quality measurement and enhancement. Existing detection methods often focus on a single type of artifact and/o… ▽ More

    Submitted 31 May, 2024; originally announced June 2024.

    Comments: 9 pages

  17. arXiv:2405.20224  [pdf, other

    cs.CV

    EvaGaussians: Event Stream Assisted Gaussian Splatting from Blurry Images

    Authors: Wangbo Yu, Chaoran Feng, Jiye Tang, Xu Jia, Li Yuan, Yonghong Tian

    Abstract: 3D Gaussian Splatting (3D-GS) has demonstrated exceptional capabilities in 3D scene reconstruction and novel view synthesis. However, its training heavily depends on high-quality, sharp images and accurate camera poses. Fulfilling these requirements can be challenging in non-ideal real-world scenarios, where motion-blurred images are commonly encountered in high-speed moving cameras or low-light e… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: Project Page: https://drexubery.github.io/EvaGaussians/

  18. arXiv:2405.17187  [pdf, other

    cs.CV cs.AI cs.RO

    Memorize What Matters: Emergent Scene Decomposition from Multitraverse

    Authors: Yiming Li, Zehong Wang, Yue Wang, Zhiding Yu, Zan Gojcic, Marco Pavone, Chen Feng, Jose M. Alvarez

    Abstract: Humans naturally retain memories of permanent elements, while ephemeral moments often slip through the cracks of memory. This selective retention is crucial for robotic perception, localization, and map**. To endow robots with this capability, we introduce 3D Gaussian Map** (3DGM), a self-supervised, camera-only offline map** framework grounded in 3D Gaussian Splatting. 3DGM converts multitr… ▽ More

    Submitted 29 May, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

    Comments: Project page: https://3d-gaussian-map**.github.io; Code and data: https://github.com/NVlabs/3DGM

  19. arXiv:2405.13629  [pdf, other

    cs.LG

    Maximum Entropy Reinforcement Learning via Energy-Based Normalizing Flow

    Authors: Chen-Hao Chao, Chien Feng, Wei-Fang Sun, Cheng-Kuang Lee, Simon See, Chun-Yi Lee

    Abstract: Existing Maximum-Entropy (MaxEnt) Reinforcement Learning (RL) methods for continuous action spaces are typically formulated based on actor-critic frameworks and optimized through alternating steps of policy evaluation and policy improvement. In the policy evaluation steps, the critic is updated to capture the soft Q-function. In the policy improvement steps, the actor is adjusted in accordance wit… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  20. arXiv:2405.11236  [pdf, other

    cs.CV

    TriLoRA: Integrating SVD for Advanced Style Personalization in Text-to-Image Generation

    Authors: Chengcheng Feng, Mu He, Qiuyu Tian, Haojie Yin, Xiaofang Zhao, Hongwei Tang, Xingqiang Wei

    Abstract: As deep learning technology continues to advance, image generation models, especially models like Stable Diffusion, are finding increasingly widespread application in visual arts creation. However, these models often face challenges such as overfitting, lack of stability in generated results, and difficulties in accurately capturing the features desired by creators during the fine-tuning process.… ▽ More

    Submitted 13 June, 2024; v1 submitted 18 May, 2024; originally announced May 2024.

  21. arXiv:2405.08621  [pdf, other

    eess.IV cs.CV

    RMT-BVQA: Recurrent Memory Transformer-based Blind Video Quality Assessment for Enhanced Video Content

    Authors: Tianhao Peng, Chen Feng, Duolikun Danier, Fan Zhang, David Bull

    Abstract: With recent advances in deep learning, numerous algorithms have been developed to enhance video quality, reduce visual artefacts and improve perceptual quality. However, little research has been reported on the quality assessment of enhanced content - the evaluation of enhancement methods is often based on quality metrics that were designed for compression applications. In this paper, we propose a… ▽ More

    Submitted 15 May, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

    Comments: 8pages, 2figures

  22. arXiv:2405.02965  [pdf, other

    cs.AI cs.RO

    Robust Collaborative Perception without External Localization and Clock Devices

    Authors: Zixing Lei, Zhenyang Ni, Ruize Han, Shuo Tang, Dingju Wang, Chen Feng, Siheng Chen, Yanfeng Wang

    Abstract: A consistent spatial-temporal coordination across multiple agents is fundamental for collaborative perception, which seeks to improve perception abilities through information exchange among agents. To achieve this spatial-temporal alignment, traditional methods depend on external devices to provide localization and clock signals. However, hardware-generated signals could be vulnerable to noise and… ▽ More

    Submitted 31 May, 2024; v1 submitted 5 May, 2024; originally announced May 2024.

    Comments: 6pages, accepted to ICRA 2024

  23. arXiv:2404.19696  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Naturally Supervised 3D Visual Grounding with Language-Regularized Concept Learners

    Authors: Chun Feng, Joy Hsu, Weiyu Liu, Jiajun Wu

    Abstract: 3D visual grounding is a challenging task that often requires direct and dense supervision, notably the semantic label for each object in the scene. In this paper, we instead study the naturally supervised setting that learns from only 3D scene and QA pairs, where prior works underperform. We propose the Language-Regularized Concept Learner (LARC), which uses constraints from language as regulariz… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: CVPR 2024. The first two authors contributed equally

  24. arXiv:2404.19534  [pdf, other

    cs.CV

    MIPI 2024 Challenge on Nighttime Flare Removal: Methods and Results

    Authors: Yuekun Dai, Dafeng Zhang, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangchen Zhou, Ruicheng Feng, Peiqing Yang, Zhezhu **, Guanqun Liu, Chen Change Loy, Lize Zhang, Shuai Liu, Chaoyu Feng, Luyang Wang, Shuan Chen, Guangqi Shao, Xiaotao Wang, Lei Lei, Qirui Yang, Qihua Cheng, Zhiqiang Xu, Yihao Liu, Huan**g Yue, **gyu Yang , et al. (38 additional authors not shown)

    Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More

    Submitted 27 May, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

    Comments: CVPR 2024 Mobile Intelligent Photography and Imaging (MIPI) Workshop--Nighttime Flare Removal Challenge Report. Website: https://mipi-challenge.org/MIPI2024/

  25. arXiv:2404.19134  [pdf, other

    cs.CV

    Evaluating Deep Clustering Algorithms on Non-Categorical 3D CAD Models

    Authors: Siyuan Xiang, Chin Tseng, Congcong Wen, Deshana Desai, Yifeng Kou, Binil Starly, Daniele Panozzo, Chen Feng

    Abstract: We introduce the first work on benchmarking and evaluating deep clustering algorithms on large-scale non-categorical 3D CAD models. We first propose a workflow to allow expert mechanical engineers to efficiently annotate 252,648 carefully sampled pairwise CAD model similarities, from a subset of the ABC dataset with 22,968 shapes. Using seven baseline deep clustering methods, we then investigate t… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  26. arXiv:2404.16205  [pdf, other

    cs.CV cs.MM

    AIS 2024 Challenge on Video Quality Assessment of User-Generated Content: Methods and Results

    Authors: Marcos V. Conde, Saman Zadtootaghaj, Nabajeet Barman, Radu Timofte, Chenlong He, Qi Zheng, Ruoxi Zhu, Zhengzhong Tu, Haiqiang Wang, Xiangguang Chen, Wenhui Meng, Xiang Pan, Huiying Shi, Han Zhu, Xiaozhong Xu, Lei Sun, Zhenzhong Chen, Shan Liu, Zicheng Zhang, Haoning Wu, Yingjie Zhou, Chunyi Li, Xiaohong Liu, Weisi Lin, Guangtao Zhai , et al. (11 additional authors not shown)

    Abstract: This paper reviews the AIS 2024 Video Quality Assessment (VQA) Challenge, focused on User-Generated Content (UGC). The aim of this challenge is to gather deep learning-based methods capable of estimating the perceptual quality of UGC videos. The user-generated videos from the YouTube UGC Dataset include diverse content (sports, games, lyrics, anime, etc.), quality and resolutions. The proposed met… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: CVPR 2024 Workshop -- AI for Streaming (AIS) Video Quality Assessment Challenge

  27. arXiv:2404.09571  [pdf, other

    eess.IV cs.CV

    MTKD: Multi-Teacher Knowledge Distillation for Image Super-Resolution

    Authors: Yuxuan Jiang, Chen Feng, Fan Zhang, David Bull

    Abstract: Knowledge distillation (KD) has emerged as a promising technique in deep learning, typically employed to enhance a compact student network through learning from their high-performance but more complex teacher variant. When applied in the context of image super-resolution, most KD approaches are modified versions of methods developed for other computer vision tasks, which are based on training stra… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  28. arXiv:2404.04933  [pdf, other

    cs.CV

    UniMD: Towards Unifying Moment Retrieval and Temporal Action Detection

    Authors: Yingsen Zeng, Yujie Zhong, Chengjian Feng, Lin Ma

    Abstract: Temporal Action Detection (TAD) focuses on detecting pre-defined actions, while Moment Retrieval (MR) aims to identify the events described by open-ended natural language within untrimmed videos. Despite that they focus on different events, we observe they have a significant connection. For instance, most descriptions in MR involve multiple actions from TAD. In this paper, we aim to investigate th… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

    Comments: Tech report

  29. arXiv:2404.01672  [pdf, other

    cs.IT eess.SP

    The Meta Distribution of the SIR in Joint Communication and Sensing Networks

    Authors: Kun Ma, Chenyuan Feng, Giovanni Geraci, Howard H. Yang

    Abstract: In this paper, we introduce a novel mathematical framework for assessing the performance of joint communication and sensing (JCAS) in wireless networks, employing stochastic geometry as an analytical tool. We focus on deriving the meta distribution of the signal-to-interference ratio (SIR) for JCAS networks. This approach enables a fine-grained quantification of individual user or radar performanc… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

  30. arXiv:2404.00504  [pdf, other

    cs.CV

    NYC-Indoor-VPR: A Long-Term Indoor Visual Place Recognition Dataset with Semi-Automatic Annotation

    Authors: Diwei Sheng, Anbang Yang, John-Ross Rizzo, Chen Feng

    Abstract: Visual Place Recognition (VPR) in indoor environments is beneficial to humans and robots for better localization and navigation. It is challenging due to appearance changes at various frequencies, and difficulties of obtaining ground truth metric trajectories for training and evaluation. This paper introduces the NYC-Indoor-VPR dataset, a unique and rich collection of over 36,000 images compiled f… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Comments: 7 pages, 7 figures, published in 2024 IEEE International Conference on Robotics and Automation (ICRA 2024)

  31. arXiv:2403.20085  [pdf, other

    cs.RO

    OmniNxt: A Fully Open-source and Compact Aerial Robot with Omnidirectional Visual Perception

    Authors: Peize Liu, Chen Feng, Yang Xu, Yan Ning, Hao Xu, Shaojie Shen

    Abstract: Adopting omnidirectional Field of View (FoV) cameras in aerial robots vastly improves perception ability, significantly advancing aerial robotics's capabilities in inspection, reconstruction, and rescue tasks. However, such sensors also elevate system complexity, e.g., hardware design, and corresponding algorithm, which limits researchers from utilizing aerial robots with omnidirectional FoV in th… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: Submitted to IROS2024. Open source: https://github.com/HKUST-Aerial-Robotics/OmniNxt. Project page: https://hkust-aerial-robotics.github.io/OmniNxt/

  32. arXiv:2403.15872  [pdf, other

    cs.CL

    RAAMove: A Corpus for Analyzing Moves in Research Article Abstracts

    Authors: Hongzheng Li, Ruo** Wang, Ge Shi, Xing Lv, Lei Lei, Chong Feng, Fang Liu, **kun Lin, Yangguang Mei, Lingnan Xu

    Abstract: Move structures have been studied in English for Specific Purposes (ESP) and English for Academic Purposes (EAP) for decades. However, there are few move annotation corpora for Research Article (RA) abstracts. In this paper, we introduce RAAMove, a comprehensive multi-domain corpus dedicated to the annotation of move structures in RA abstracts. The primary objective of RAAMove is to facilitate mov… ▽ More

    Submitted 23 March, 2024; originally announced March 2024.

    Comments: Accepted by LREC-COLING 2024

  33. arXiv:2403.14806  [pdf, other

    cs.ET physics.app-ph physics.optics

    Photonic-Electronic Integrated Circuits for High-Performance Computing and AI Accelerator

    Authors: Shupeng Ning, Hanqing Zhu, Chenghao Feng, Jiaqi Gu, Zhixing Jiang, Zhoufeng Ying, Jason Midkiff, Sourabh Jain, May H. Hlaing, David Z. Pan, Ray T. Chen

    Abstract: In recent decades, the demand for computational power has surged, particularly with the rapid expansion of artificial intelligence (AI). As we navigate the post-Moore's law era, the limitations of traditional electrical digital computing, including process bottlenecks and power consumption issue, are propelling the search for alternative computing paradigms. Among various emerging technologies, in… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

  34. arXiv:2403.13588  [pdf, other

    cs.SE cs.CL

    Genetic Auto-prompt Learning for Pre-trained Code Intelligence Language Models

    Authors: Chengzhe Feng, Yanan Sun, Ke Li, Pan Zhou, Jiancheng Lv, Aojun Lu

    Abstract: As Pre-trained Language Models (PLMs), a popular approach for code intelligence, continue to grow in size, the computational cost of their usage has become prohibitively expensive. Prompt learning, a recent development in the field of natural language processing, emerges as a potential solution to address this challenge. In this paper, we investigate the effectiveness of prompt learning in code in… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  35. arXiv:2403.13171  [pdf, other

    cs.CV

    LUWA Dataset: Learning Lithic Use-Wear Analysis on Microscopic Images

    Authors: **g Zhang, Irving Fang, Juexiao Zhang, Hao Wu, Akshat Kaushik, Alice Rodriguez, Hanwen Zhao, Zhuo Zheng, Radu Iovita, Chen Feng

    Abstract: Lithic Use-Wear Analysis (LUWA) using microscopic images is an underexplored vision-for-science research area. It seeks to distinguish the worked material, which is critical for understanding archaeological artifacts, material interactions, tool functionalities, and dental records. However, this challenging task goes beyond the well-studied image classification problem for common objects. It is af… ▽ More

    Submitted 27 March, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

    Comments: CVPR

  36. arXiv:2403.11681  [pdf, other

    cs.RO cs.CV

    MASSTAR: A Multi-Modal and Large-Scale Scene Dataset with a Versatile Toolchain for Surface Prediction and Completion

    Authors: Guiyong Zheng, **qi Jiang, Chen Feng, Shaojie Shen, Boyu Zhou

    Abstract: Surface prediction and completion have been widely studied in various applications. Recently, research in surface completion has evolved from small objects to complex large-scale scenes. As a result, researchers have begun increasing the volume of data and leveraging a greater variety of data modalities including rendered RGB images, descriptive texts, depth images, etc, to enhance algorithm perfo… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: Submitted to IROS2024. Code: https://github.com/SYSU-STAR/MASSTAR. Project Page: https://github.com/SYSU-STAR/MASSTAR

  37. arXiv:2403.10988  [pdf, other

    cs.CV cs.AI

    Boosting Flow-based Generative Super-Resolution Models via Learned Prior

    Authors: Li-Yuan Tsao, Yi-Chen Lo, Chia-Che Chang, Hao-Wei Chen, Roy Tseng, Chien Feng, Chun-Yi Lee

    Abstract: Flow-based super-resolution (SR) models have demonstrated astonishing capabilities in generating high-quality images. However, these methods encounter several challenges during image generation, such as grid artifacts, exploding inverses, and suboptimal results due to a fixed sampling temperature. To overcome these issues, this work introduces a conditional learned prior to the inference phase of… ▽ More

    Submitted 28 May, 2024; v1 submitted 16 March, 2024; originally announced March 2024.

    Comments: Accepted to CVPR2024

  38. arXiv:2403.09140  [pdf, other

    cs.CV

    Sculpt3D: Multi-View Consistent Text-to-3D Generation with Sparse 3D Prior

    Authors: Cheng Chen, Xiaofeng Yang, Fan Yang, Chengzeng Feng, Zhoujie Fu, Chuan-Sheng Foo, Guosheng Lin, Fayao Liu

    Abstract: Recent works on text-to-3d generation show that using only 2D diffusion supervision for 3D generation tends to produce results with inconsistent appearances (e.g., faces on the back view) and inaccurate shapes (e.g., animals with extra legs). Existing methods mainly address this issue by retraining diffusion models with images rendered from 3D data to ensure multi-view consistency while struggling… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR 2024. Project Page: https://stellarcheng.github.io/Sculpt3D/

  39. arXiv:2403.08161  [pdf, other

    cs.CV cs.AI

    LAFS: Landmark-based Facial Self-supervised Learning for Face Recognition

    Authors: Zhonglin Sun, Chen Feng, Ioannis Patras, Georgios Tzimiropoulos

    Abstract: In this work we focus on learning facial representations that can be adapted to train effective face recognition models, particularly in the absence of labels. Firstly, compared with existing labelled face datasets, a vastly larger magnitude of unlabeled faces exists in the real world. We explore the learning strategy of these unlabeled facial images through self-supervised pretraining to transfer… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

    Comments: accepted to CVPR 2024

  40. arXiv:2403.06512  [pdf, other

    cs.CR cs.SE

    Asset-centric Threat Modeling for AI-based Systems

    Authors: Jan von der Assen, Jamo Sharif, Chao Feng, Christian Killer, Gérôme Bovet, Burkhard Stiller

    Abstract: Threat modeling is a popular method to securely develop systems by achieving awareness of potential areas of future damage caused by adversaries. However, threat modeling for systems relying on Artificial Intelligence is still not well explored. While conventional threat modeling methods and tools did not address AI-related threats, research on this amalgamation still lacks solutions capable of gu… ▽ More

    Submitted 3 June, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

  41. arXiv:2403.05428  [pdf, other

    cs.MM

    Towards Real-World Stickers Use: A New Dataset for Multi-Tag Sticker Recognition

    Authors: Bingbing Wang, Bin Liang, Chun-Mei Feng, Wangmeng Zuo, Zhixin Bai, Shijue Huang, Kam-Fai Wong, Xi Zeng, Ruifeng Xu

    Abstract: In real-world conversations, the diversity and ambiguity of stickers often lead to varied interpretations based on the context, necessitating the requirement for comprehensively understanding stickers and supporting multi-tagging. To address this challenge, we introduce StickerTAG, the first multi-tag sticker dataset comprising a collected tag set with 461 tags and 13,571 sticker-tag pairs, design… ▽ More

    Submitted 16 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  42. arXiv:2403.05046  [pdf, other

    cs.RO

    EgoPAT3Dv2: Predicting 3D Action Target from 2D Egocentric Vision for Human-Robot Interaction

    Authors: Irving Fang, Yuzhong Chen, Yifan Wang, Jianghan Zhang, Qiushi Zhang, Jiali Xu, Xibo He, Weibo Gao, Hao Su, Yiming Li, Chen Feng

    Abstract: A robot's ability to anticipate the 3D action target location of a hand's movement from egocentric videos can greatly improve safety and efficiency in human-robot interaction (HRI). While previous research predominantly focused on semantic action classification or 2D target region prediction, we argue that predicting the action target's 3D coordinate could pave the way for more versatile downstrea… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

    Comments: 6 pages. Accepted at ICRA 2024

  43. arXiv:2403.04968  [pdf, other

    cs.CV

    ActFormer: Scalable Collaborative Perception via Active Queries

    Authors: Suozhi Huang, Juexiao Zhang, Yiming Li, Chen Feng

    Abstract: Collaborative perception leverages rich visual observations from multiple robots to extend a single robot's perception ability beyond its field of view. Many prior works receive messages broadcast from all collaborators, leading to a scalability challenge when dealing with a large number of robots and sensors. In this work, we aim to address \textit{scalable camera-based collaborative perception}… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

    Comments: Accepted to ICRA 2024

  44. arXiv:2402.17583  [pdf, other

    cs.SE cs.CL cs.LG

    FaultProfIT: Hierarchical Fault Profiling of Incident Tickets in Large-scale Cloud Systems

    Authors: Junjie Huang, **yang Liu, Zhuangbin Chen, Zhihan Jiang, Yichen LI, Jiazhen Gu, Cong Feng, Zengyin Yang, Yongqiang Yang, Michael R. Lyu

    Abstract: Postmortem analysis is essential in the management of incidents within cloud systems, which provides valuable insights to improve system's reliability and robustness. At CloudA, fault pattern profiling is performed during the postmortem phase, which involves the classification of incidents' faults into unique categories, referred to as fault pattern. By aggregating and analyzing these fault patter… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

    Comments: Accepted by Proceedings of the 46th International Conference on Software Engineering: Software Engineering in Practice (ICSE SEIP 2024)

  45. Star-Searcher: A Complete and Efficient Aerial System for Autonomous Target Search in Complex Unknown Environments

    Authors: Yiming Luo, Zixuan Zhuang, Neng Pan, Chen Feng, Shaojie Shen, Fei Gao, Hui Cheng, Boyu Zhou

    Abstract: This paper tackles the challenge of autonomous target search using unmanned aerial vehicles (UAVs) in complex unknown environments. To fill the gap in systematic approaches for this task, we introduce Star-Searcher, an aerial system featuring specialized sensor suites, map**, and planning modules to optimize searching. Path planning challenges due to increased inspection requirements are address… ▽ More

    Submitted 21 March, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

    Comments: Aceepted to IEEE RA-L. Code: https://github.com/SYSU-STAR/STAR-Searcher. Video: https://www.youtube.com/watch?v=08ll_oo_DtU

  46. arXiv:2402.11769  [pdf, other

    eess.SY cs.GT math.OC

    Connection-Aware P2P Trading: Simultaneous Trading and Peer Selection

    Authors: Cheng Feng, Kedi Zheng, Lanqing Shan, Hani Alers, Lampros Stergioulas, Hongye Guo, Qixin Chen

    Abstract: Peer-to-peer (P2P) trading is seen as a viable solution to handle the growing number of distributed energy resources in distribution networks. However, when dealing with large-scale consumers, there are several challenges that must be addressed. One of these challenges is limited communication capabilities. Additionally, prosumers may have specific preferences when it comes to trading. Both can re… ▽ More

    Submitted 18 February, 2024; originally announced February 2024.

    Comments: Submitted to IEEE PES Transactions

  47. arXiv:2402.11690  [pdf, other

    cs.CL cs.CV

    Vision-Flan: Scaling Human-Labeled Tasks in Visual Instruction Tuning

    Authors: Zhiyang Xu, Chao Feng, Rulin Shao, Trevor Ashby, Ying Shen, Di **, Yu Cheng, Qifan Wang, Lifu Huang

    Abstract: Despite vision-language models' (VLMs) remarkable capabilities as versatile visual assistants, two substantial challenges persist within the existing VLM frameworks: (1) lacking task diversity in pretraining and visual instruction tuning, and (2) annotation error and bias in GPT-4 synthesized instruction tuning data. Both challenges lead to issues such as poor generalizability, hallucination, and… ▽ More

    Submitted 18 February, 2024; originally announced February 2024.

    Comments: 8 Pages, visual instruction tuning

  48. arXiv:2402.10686  [pdf, other

    cs.IT cs.CR cs.LG eess.SP

    Uncertainty, Calibration, and Membership Inference Attacks: An Information-Theoretic Perspective

    Authors: Meiyi Zhu, Caili Guo, Chunyan Feng, Osvaldo Simeone

    Abstract: In a membership inference attack (MIA), an attacker exploits the overconfidence exhibited by typical machine learning models to determine whether a specific data point was used to train a target model. In this paper, we analyze the performance of the state-of-the-art likelihood ratio attack (LiRA) within an information-theoretical framework that allows the investigation of the impact of the aleato… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

    Comments: 27 pages, 13 figures

  49. arXiv:2402.07570  [pdf, other

    cs.LG cs.AI

    Only the Curve Shape Matters: Training Foundation Models for Zero-Shot Multivariate Time Series Forecasting through Next Curve Shape Prediction

    Authors: Cheng Feng, Long Huang, Denis Krompass

    Abstract: We present General Time Transformer (GTT), an encoder-only style foundation model for zero-shot multivariate time series forecasting. GTT is pretrained on a large dataset of 200M high-quality time series samples spanning diverse domains. In our proposed framework, the task of multivariate time series forecasting is formulated as a channel-wise next curve shape prediction problem, where each time s… ▽ More

    Submitted 18 February, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

  50. arXiv:2402.06606  [pdf, other

    cs.LG cs.AI cs.CR

    RQP-SGD: Differential Private Machine Learning through Noisy SGD and Randomized Quantization

    Authors: Ce Feng, Parv Venkitasubramaniam

    Abstract: The rise of IoT devices has prompted the demand for deploying machine learning at-the-edge with real-time, efficient, and secure data processing. In this context, implementing machine learning (ML) models with real-valued weight parameters can prove to be impractical particularly for large models, and there is a need to train models with quantized discrete weights. At the same time, these low-dime… ▽ More

    Submitted 9 February, 2024; originally announced February 2024.

    Comments: This work is accepted by the 5th AAAI Workshop on Privacy-Preserving Artificial Intelligence