Skip to main content

Showing 1–50 of 216 results for author: Yuan, W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01755  [pdf, other

    cs.RO

    An Intelligent Robotic System for Perceptive Pancake Batter Stirring and Precise Pouring

    Authors: Xinyuan Luo, Shengmiao **, Hung-Jui Huang, Wenzhen Yuan

    Abstract: Cooking robots have long been desired by the commercial market, while the technical challenge is still significant. A major difficulty comes from the demand of perceiving and handling liquid with different properties. This paper presents a robot system that mixes batter and makes pancakes out of it, where understanding and handling the viscous liquid is an essential component. The system integrate… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 8 pages, 10 figures, Accepted to IROS 2024

  2. arXiv:2406.18915  [pdf, other

    cs.RO cs.CV

    Manipulate-Anything: Automating Real-World Robots using Vision-Language Models

    Authors: Jiafei Duan, Wentao Yuan, Wilbert Pumacay, Yi Ru Wang, Kiana Ehsani, Dieter Fox, Ranjay Krishna

    Abstract: Large-scale endeavors like RT-1 and widespread community efforts such as Open-X-Embodiment have contributed to growing the scale of robot demonstration data. However, there is still an opportunity to improve the quality, quantity, and diversity of robot demonstration data. Although vision-language models have been shown to automatically generate demonstration data, their utility has been limited t… ▽ More

    Submitted 27 June, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

    Comments: Project page: https://robot-ma.github.io/

  3. arXiv:2406.17744  [pdf, other

    cs.CL

    Following Length Constraints in Instructions

    Authors: Weizhe Yuan, Ilia Kulikov, ** Yu, Kyunghyun Cho, Sainbayar Sukhbaatar, Jason Weston, **g Xu

    Abstract: Aligned instruction following models can better fulfill user requests than their unaligned counterparts. However, it has been shown that there is a length bias in evaluation of such models, and that training algorithms tend to exploit this bias by learning longer responses. In this work we show how to train models that can be controlled at inference time with instructions containing desired length… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 13 pages

  4. arXiv:2406.14927  [pdf, other

    cs.CV cs.RO

    Gaussian-Informed Continuum for Physical Property Identification and Simulation

    Authors: Junhao Cai, Yuji Yang, Weihao Yuan, Yisheng He, Zilong Dong, Liefeng Bo, Hui Cheng, Qifeng Chen

    Abstract: This paper studies the problem of estimating physical properties (system identification) through visual observations. To facilitate geometry-aware guidance in physical property estimation, we introduce a novel hybrid framework that leverages 3D Gaussian representation to not only capture explicit shapes but also enable the simulated continuum to deduce implicit shapes during training. We propose a… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: 19 pages, 8 figures

  5. arXiv:2406.10721  [pdf, other

    cs.RO cs.AI cs.CV

    RoboPoint: A Vision-Language Model for Spatial Affordance Prediction for Robotics

    Authors: Wentao Yuan, Jiafei Duan, Valts Blukis, Wilbert Pumacay, Ranjay Krishna, Adithyavairavan Murali, Arsalan Mousavian, Dieter Fox

    Abstract: From rearranging objects on a table to putting groceries into shelves, robots must plan precise action points to perform tasks accurately and reliably. In spite of the recent adoption of vision language models (VLMs) to control robot behavior, VLMs struggle to precisely articulate robot actions using language. We introduce an automatic synthetic data generation pipeline that instruction-tunes VLMs… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  6. arXiv:2406.05387  [pdf, other

    cs.IR

    PTF-FSR: A Parameter Transmission-Free Federated Sequential Recommender System

    Authors: Wei Yuan, Chaoqun Yang, Liang Qu, Quoc Viet Hung Nguyen, Guanhua Ye, Hongzhi Yin

    Abstract: Sequential recommender systems have made significant progress. Recently, due to increasing concerns about user data privacy, some researchers have implemented federated learning for sequential recommendation, a.k.a., Federated Sequential Recommender Systems (FedSeqRecs), in which a public sequential recommender model is shared and frequently transmitted between a central server and clients to achi… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  7. arXiv:2406.03249  [pdf, other

    cs.LG

    Near-field Beamforming for Extremely Large-scale MIMO Based on Unsupervised Deep Learning

    Authors: Jiali Nie, Yuanhao Cui, Zhaohui Yang, Weijie Yuan, Xiaojun **g

    Abstract: Extremely Large-scale Array (ELAA) is considered a frontier technology for future communication systems, pivotal in improving wireless systems' rate and spectral efficiency. However, as ELAA employs a multitude of antennas operating at higher frequencies, users are typically situated in the near-field region where the spherical wavefront propagates. This inevitably leads to a significant increase… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  8. arXiv:2406.01022  [pdf, other

    cs.CR cs.IR

    Poisoning Attacks and Defenses in Recommender Systems: A Survey

    Authors: Zongwei Wang, Junliang Yu, Min Gao, Wei Yuan, Guanhua Ye, Shazia Sadiq, Hongzhi Yin

    Abstract: Modern recommender systems (RS) have profoundly enhanced user experience across digital platforms, yet they face significant threats from poisoning attacks. These attacks, aimed at manipulating recommendation outputs for unethical gains, exploit vulnerabilities in RS through injecting malicious data or intervening model training. This survey presents a unique perspective by examining these threats… ▽ More

    Submitted 5 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: 22 pages, 8 figures

  9. arXiv:2405.08300  [pdf, other

    cs.CV cs.SC

    Vector-Symbolic Architecture for Event-Based Optical Flow

    Authors: Hongzhi You, Yijun Cao, Wei Yuan, Fanjun Wang, Ning Qiao, Yongjie Li

    Abstract: From a perspective of feature matching, optical flow estimation for event cameras involves identifying event correspondences by comparing feature similarity across accompanying event frames. In this work, we introduces an effective and robust high-dimensional (HD) feature descriptor for event frames, utilizing Vector Symbolic Architectures (VSA). The topological similarity among neighboring variab… ▽ More

    Submitted 15 May, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

  10. arXiv:2405.06983  [pdf, other

    cs.NI

    ISAC-Assisted Wireless Rechargeable Sensor Networks with Multiple Mobile Charging Vehicles

    Authors: Muhammad Umar Farooq Qaisar, Weijie Yuan, Paolo Bellavista, Guangjie Han, Adeel Ahmed

    Abstract: As IoT-based wireless sensor networks (WSNs) become more prevalent, the issue of energy shortages becomes more pressing. One potential solution is the use of wireless power transfer (WPT) technology, which is the key to building a new shape of wireless rechargeable sensor networks (WRSNs). However, efficient charging and scheduling are critical for WRSNs to function properly. Motivated by the fact… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

    Comments: Accepted for publication in the Special Issue Q1'2024, "Integrating Sensing and Communication for Ubiquitous Internet of Things," IEEE Internet of Things Magazine

  11. arXiv:2404.19733  [pdf, other

    cs.CL cs.AI

    Iterative Reasoning Preference Optimization

    Authors: Richard Yuanzhe Pang, Weizhe Yuan, Kyunghyun Cho, He He, Sainbayar Sukhbaatar, Jason Weston

    Abstract: Iterative preference optimization methods have recently been shown to perform well for general instruction tuning tasks, but typically make little improvement on reasoning tasks (Yuan et al., 2024, Chen et al., 2024). In this work we develop an iterative approach that optimizes the preference between competing generated Chain-of-Thought (CoT) candidates by optimizing for winning vs. losing reasoni… ▽ More

    Submitted 25 June, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

  12. arXiv:2404.16687  [pdf, other

    cs.CV

    NTIRE 2024 Quality Assessment of AI-Generated Content Challenge

    Authors: Xiaohong Liu, Xiongkuo Min, Guangtao Zhai, Chunyi Li, Tengchuan Kou, Wei Sun, Haoning Wu, Yixuan Gao, Yuqin Cao, Zicheng Zhang, Xiele Wu, Radu Timofte, Fei Peng, Huiyuan Fu, Anlong Ming, Chuanming Wang, Huadong Ma, Shuai He, Zifei Dou, Shu Chen, Huacong Zhang, Haiyi Xie, Chengwei Wang, Baoying Chen, Jishen Zeng , et al. (89 additional authors not shown)

    Abstract: This paper reports on the NTIRE 2024 Quality Assessment of AI-Generated Content Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2024. This challenge is to address a major challenge in the field of image and video processing, namely, Image Quality Assessment (IQA) and Video Quality Assessment (VQA) for AI-Generated Conte… ▽ More

    Submitted 7 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

  13. arXiv:2404.11818  [pdf, other

    cs.IR

    Automated Similarity Metric Generation for Recommendation

    Authors: Liang Qu, Yun Lin, Wei Yuan, Xiaojun Wan, Yuhui Shi, Hongzhi Yin

    Abstract: The embedding-based architecture has become the dominant approach in modern recommender systems, map** users and items into a compact vector space. It then employs predefined similarity metrics, such as the inner product, to calculate similarity scores between user and item embeddings, thereby guiding the recommendation of items that align closely with a user's preferences. Given the critical ro… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  14. arXiv:2404.09790  [pdf, other

    cs.CV

    NTIRE 2024 Challenge on Image Super-Resolution ($\times$4): Methods and Results

    Authors: Zheng Chen, Zongwei Wu, Eduard Zamfir, Kai Zhang, Yulun Zhang, Radu Timofte, Xiaokang Yang, Hongyuan Yu, Cheng Wan, Yuxin Hong, Zhijuan Huang, Yajun Zou, Yuan Huang, Jiamin Lin, Bingnan Han, Xianyu Guan, Yongsheng Yu, Daoan Zhang, Xuanwu Yin, Kunlong Zuo, **hua Hao, Kai Zhao, Kun Yuan, Ming Sun, Chao Zhou , et al. (63 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 challenge on image super-resolution ($\times$4), highlighting the solutions proposed and the outcomes obtained. The challenge involves generating corresponding high-resolution (HR) images, magnified by a factor of four, from low-resolution (LR) inputs using prior information. The LR images originate from bicubic downsampling degradation. The aim of the challenge i… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: NTIRE 2024 webpage: https://cvlai.net/ntire/2024. Code: https://github.com/zhengchen1999/NTIRE2024_ImageSR_x4

  15. arXiv:2404.08261  [pdf, other

    cs.GT

    QI-DPFL: Quality-Aware and Incentive-Boosted Federated Learning with Differential Privacy

    Authors: Wenhao Yuan, Xuehe Wang

    Abstract: Federated Learning (FL) has increasingly been recognized as an innovative and secure distributed model training paradigm, aiming to coordinate multiple edge clients to collaboratively train a shared model without uploading their private datasets. The challenge of encouraging mobile edge devices to participate zealously in FL model training procedures, while mitigating the privacy leakage risks dur… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: To be published in IJCNN 2024

  16. arXiv:2404.08188  [pdf, other

    cs.IT eess.SP

    Fundamental Limits of Communication-Assisted Sensing in ISAC Systems

    Authors: Fuwang Dong, Fan Liu, Shihang Liu, Yifeng Xiong, Weijie Yuan, Yuanhao Cui

    Abstract: In this paper, we introduce a novel communication-assisted sensing (CAS) framework that explores the potential coordination gains offered by the integrated sensing and communication technique. The CAS system endows users with beyond-line-of-the-sight sensing capabilities, supported by a dual-functional base station that enables simultaneous sensing and communication. To delve into the system's fun… ▽ More

    Submitted 23 April, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

    Comments: This paper has been accepted by ISIT. The updated version will be coming soon

  17. arXiv:2404.02514  [pdf, other

    cs.CV

    Freditor: High-Fidelity and Transferable NeRF Editing by Frequency Decomposition

    Authors: Yisheng He, Weihao Yuan, Siyu Zhu, Zilong Dong, Liefeng Bo, Qixing Huang

    Abstract: This paper enables high-fidelity, transferable NeRF editing by frequency decomposition. Recent NeRF editing pipelines lift 2D stylization results to 3D scenes while suffering from blurry results, and fail to capture detailed structures caused by the inconsistency between 2D editings. Our critical insight is that low-frequency components of images are more multiview-consistent after editing compare… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

  18. arXiv:2404.01954  [pdf, other

    cs.CL cs.AI

    HyperCLOVA X Technical Report

    Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seong** Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

    Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More

    Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: 44 pages; updated authors list and fixed author names

  19. arXiv:2404.00269  [pdf, other

    cs.CV

    IPoD: Implicit Field Learning with Point Diffusion for Generalizable 3D Object Reconstruction from Single RGB-D Images

    Authors: Yushuang Wu, Luyue Shi, Junhao Cai, Weihao Yuan, Lingteng Qiu, Zilong Dong, Liefeng Bo, Shuguang Cui, Xiaoguang Han

    Abstract: Generalizable 3D object reconstruction from single-view RGB-D images remains a challenging task, particularly with real-world data. Current state-of-the-art methods develop Transformer-based implicit field learning, necessitating an intensive learning paradigm that requires dense query-supervision uniformly sampled throughout the entire space. We propose a novel approach, IPoD, which harmonizes im… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Comments: CVPR 2024

  20. arXiv:2403.20107  [pdf, other

    cs.IR

    Robust Federated Contrastive Recommender System against Model Poisoning Attack

    Authors: Wei Yuan, Chaoqun Yang, Liang Qu, Guanhua Ye, Quoc Viet Hung Nguyen, Hongzhi Yin

    Abstract: Federated Recommender Systems (FedRecs) have garnered increasing attention recently, thanks to their privacy-preserving benefits. However, the decentralized and open characteristics of current FedRecs present two dilemmas. First, the performance of FedRecs is compromised due to highly sparse on-device data for each client. Second, the system's robustness is undermined by the vulnerability to model… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

  21. arXiv:2403.15559  [pdf, other

    cs.CV cs.AI

    An Optimization Framework to Enforce Multi-View Consistency for Texturing 3D Meshes Using Pre-Trained Text-to-Image Models

    Authors: Zhengyi Zhao, Chen Song, Xiaodong Gu, Yuan Dong, Qi Zuo, Weihao Yuan, Zilong Dong, Liefeng Bo, Qixing Huang

    Abstract: A fundamental problem in the texturing of 3D meshes using pre-trained text-to-image models is to ensure multi-view consistency. State-of-the-art approaches typically use diffusion models to aggregate multi-view inputs, where common issues are the blurriness caused by the averaging operation in the aggregation step or inconsistencies in local features. This paper introduces an optimization framewor… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  22. arXiv:2403.14192  [pdf, ps, other

    cs.IT eess.SP

    Fundamentals of Delay-Doppler Communications: Practical Implementation and Extensions to OTFS

    Authors: Shuangyang Li, Peter Jung, Weijie Yuan, Zhiqiang Wei, **hong Yuan, Baoming Bai, Giuseppe Caire

    Abstract: The recently proposed orthogonal time frequency space (OTFS) modulation, which is a typical Delay-Doppler (DD) communication scheme, has attracted significant attention thanks to its appealing performance over doubly-selective channels. In this paper, we present the fundamentals of general DD communications from the viewpoint of the Zak transform. We start our study by constructing DD domain basis… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

  23. arXiv:2403.12396  [pdf, other

    cs.CV cs.RO

    OV9D: Open-Vocabulary Category-Level 9D Object Pose and Size Estimation

    Authors: Junhao Cai, Yisheng He, Weihao Yuan, Siyu Zhu, Zilong Dong, Liefeng Bo, Qifeng Chen

    Abstract: This paper studies a new open-set problem, the open-vocabulary category-level object pose and size estimation. Given human text descriptions of arbitrary novel object categories, the robot agent seeks to predict the position, orientation, and size of the target object in the observed scene image. To enable such generalizability, we first introduce OO3D-9D, a large-scale photorealistic dataset for… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  24. arXiv:2403.12010  [pdf, other

    cs.CV cs.AI cs.GR

    VideoMV: Consistent Multi-View Generation Based on Large Video Generative Model

    Authors: Qi Zuo, Xiaodong Gu, Lingteng Qiu, Yuan Dong, Zhengyi Zhao, Weihao Yuan, Rui Peng, Siyu Zhu, Zilong Dong, Liefeng Bo, Qixing Huang

    Abstract: Generating multi-view images based on text or single-image prompts is a critical capability for the creation of 3D content. Two fundamental questions on this topic are what data we use for training and how to ensure multi-view consistency. This paper introduces a novel framework that makes fundamental contributions to both questions. Unlike leveraging images from 2D diffusion models for training,… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: Project page: aigc3d.github.io/VideoMV/

  25. arXiv:2403.10823  [pdf, other

    cs.CV cs.AI

    VisionCLIP: An Med-AIGC based Ethical Language-Image Foundation Model for Generalizable Retina Image Analysis

    Authors: Hao Wei, Bowen Liu, Minqing Zhang, Peilun Shi, Wu Yuan

    Abstract: Generalist foundation model has ushered in newfound capabilities in medical domain. However, the contradiction between the growing demand for high-quality annotated data with patient privacy continues to intensify. The utilization of medical artificial intelligence generated content (Med-AIGC) as an inexhaustible resource repository arises as a potential solution to address the aforementioned chal… ▽ More

    Submitted 16 March, 2024; originally announced March 2024.

  26. arXiv:2403.08147  [pdf, other

    cs.LG q-bio.BM

    Representing Molecules as Random Walks Over Interpretable Grammars

    Authors: Michael Sun, Minghao Guo, Weize Yuan, Veronika Thost, Crystal Elaine Owens, Aristotle Franklin Grosz, Sharvaa Selvan, Katelyn Zhou, Hassan Mohiuddin, Benjamin J Pedretti, Zachary P Smith, Jie Chen, Wojciech Matusik

    Abstract: Recent research in molecular discovery has primarily been devoted to small, drug-like molecules, leaving many similarly important applications in material design without adequate technology. These applications often rely on more complex molecular structures with fewer examples that are carefully designed using known substructures. We propose a data-efficient and interpretable model for representin… ▽ More

    Submitted 2 June, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

  27. arXiv:2403.07151  [pdf, other

    cs.LG cs.AI cs.CR

    Don't Forget What I did?: Assessing Client Contributions in Federated Learning

    Authors: Bishwamittra Ghosh, Debabrota Basu, Fu Huazhu, Wang Yuan, Renuga Kanagavelu, Jiang ** Peng, Liu Yong, Goh Siow Mong Rick, Wei Qingsong

    Abstract: Federated Learning (FL) is a collaborative machine learning (ML) approach, where multiple clients participate in training an ML model without exposing the private data. Fair and accurate assessment of client contributions is an important problem in FL to facilitate incentive allocation and encouraging diverse clients to participate in a unified model training. Existing methods for assessing client… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

    Comments: Under submission

  28. arXiv:2403.04638  [pdf, other

    cs.RO

    Scalable, Simulation-Guided Compliant Tactile Finger Design

    Authors: Yuxiang Ma, Arpit Agarwal, Sandra Q. Liu, Wenzhen Yuan, Edward H. Adelson

    Abstract: Compliant grippers enable robots to work with humans in unstructured environments. In general, these grippers can improve with tactile sensing to estimate the state of objects around them to precisely manipulate objects. However, co-designing compliant structures with high-resolution tactile sensing is a challenging task. We propose a simulation framework for the end-to-end forward design of GelSi… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

    Comments: Yuxiang Ma, Arpit Agarwal, and Sandra Q. Liu contributed equally to this work. Project video: https://youtu.be/CnTUTA5cfMw . 7 pages, 11 figures, 2024 IEEE International Conference on Soft Robotics (RoboSoft)

  29. arXiv:2403.01069  [pdf, other

    cs.CL

    LLMCRIT: Teaching Large Language Models to Use Criteria

    Authors: Weizhe Yuan, Pengfei Liu, Matthias Gallé

    Abstract: Humans follow criteria when they execute tasks, and these criteria are directly used to assess the quality of task completion. Therefore, having models learn to use criteria to provide feedback can help humans or models to perform tasks better. However, existing research in this field tends to consider only a limited set of criteria or quality assessment aspects. To fill this gap, we propose a gen… ▽ More

    Submitted 4 June, 2024; v1 submitted 1 March, 2024; originally announced March 2024.

    Comments: ACL 2024 findings

  30. arXiv:2402.10873  [pdf, ps, other

    cs.NI eess.SP

    Probabilistic On-Demand Charging Scheduling for ISAC-Assisted WRSNs with Multiple Mobile Charging Vehicles

    Authors: Muhammad Umar Farooq Qaisar, Weijie Yuan, Paolo Bellavista, Guangjie Han, Rabiu Sale Zakariyya, Adeel Ahmed

    Abstract: The internet of things (IoT) based wireless sensor networks (WSNs) face an energy shortage challenge that could be overcome by the novel wireless power transfer (WPT) technology. The combination of WSNs and WPT is known as wireless rechargeable sensor networks (WRSNs), with the charging efficiency and charging scheduling being the primary concerns. Therefore, this paper proposes a probabilistic on… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

    Comments: Accepted for publication at the IEEE Global Communications Conference (GLOBECOM) 2023

  31. arXiv:2401.17630  [pdf, other

    cs.IR

    Towards Personalized Privacy: User-Governed Data Contribution for Federated Recommendation

    Authors: Liang Qu, Wei Yuan, Ruiqi Zheng, Lizhen Cui, Yuhui Shi, Hongzhi Yin

    Abstract: Federated recommender systems (FedRecs) have gained significant attention for their potential to protect user's privacy by kee** user privacy data locally and only communicating model parameters/gradients to the server. Nevertheless, the currently existing architecture of FedRecs assumes that all users have the same 0-privacy budget, i.e., they do not upload any data to the server, thus overlook… ▽ More

    Submitted 31 January, 2024; originally announced January 2024.

  32. arXiv:2401.14257  [pdf, other

    cs.CV cs.AI

    Sketch2NeRF: Multi-view Sketch-guided Text-to-3D Generation

    Authors: Minglin Chen, Weihao Yuan, Yukun Wang, Zhe Sheng, Yisheng He, Zilong Dong, Liefeng Bo, Yulan Guo

    Abstract: Recently, text-to-3D approaches have achieved high-fidelity 3D content generation using text description. However, the generated objects are stochastic and lack fine-grained control. Sketches provide a cheap approach to introduce such fine-grained control. Nevertheless, it is challenging to achieve flexible control from these sketches due to their abstraction and ambiguity. In this paper, we prese… ▽ More

    Submitted 27 January, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

    Comments: 11 pages, 9 figures

  33. arXiv:2401.11441  [pdf, other

    cs.IR

    On-Device Recommender Systems: A Comprehensive Survey

    Authors: Hongzhi Yin, Liang Qu, Tong Chen, Wei Yuan, Ruiqi Zheng, **g Long, Xin Xia, Yuhui Shi, Chengqi Zhang

    Abstract: Recommender systems have been widely deployed in various real-world applications to help users identify content of interest from massive amounts of information. Traditional recommender systems work by collecting user-item interaction data in a cloud-based data center and training a centralized model to perform the recommendation service. However, such cloud-based recommender systems (CloudRSs) ine… ▽ More

    Submitted 15 February, 2024; v1 submitted 21 January, 2024; originally announced January 2024.

  34. arXiv:2401.10020  [pdf, other

    cs.CL cs.AI

    Self-Rewarding Language Models

    Authors: Weizhe Yuan, Richard Yuanzhe Pang, Kyunghyun Cho, Xian Li, Sainbayar Sukhbaatar, **g Xu, Jason Weston

    Abstract: We posit that to achieve superhuman agents, future models require superhuman feedback in order to provide an adequate training signal. Current approaches commonly train reward models from human preferences, which may then be bottlenecked by human performance level, and secondly these separate frozen reward models cannot then learn to improve during LLM training. In this work, we study Self-Rewardi… ▽ More

    Submitted 8 February, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

  35. arXiv:2401.04518  [pdf, other

    cs.CL cs.AI

    The Critique of Critique

    Authors: Shichao Sun, Junlong Li, Weizhe Yuan, Ruifeng Yuan, Wenjie Li, Pengfei Liu

    Abstract: Critique, as a natural language description for assessing the quality of model-generated content, has played a vital role in the training, evaluation, and refinement of LLMs. However, a systematic method to evaluate the quality of critique is lacking. In this paper, we pioneer the critique of critique, termed MetaCritique, which builds specific quantification criteria. To achieve a reliable evalua… ▽ More

    Submitted 1 June, 2024; v1 submitted 9 January, 2024; originally announced January 2024.

    Comments: Accepted to Findings of ACL 2024

  36. arXiv:2401.03514  [pdf, other

    cs.CL

    ROIC-DM: Robust Text Inference and Classification via Diffusion Model

    Authors: Shilong Yuan, Wei Yuan, Hongzhi Yin, Tieke He

    Abstract: While language models have made many milestones in text inference and classification tasks, they remain susceptible to adversarial attacks that can lead to unforeseen outcomes. Existing works alleviate this problem by equip** language models with defense patches. However, these defense strategies often rely on impractical assumptions or entail substantial sacrifices in model performance. Consequ… ▽ More

    Submitted 9 January, 2024; v1 submitted 7 January, 2024; originally announced January 2024.

    Comments: under review

  37. arXiv:2312.15826  [pdf, other

    cs.IR

    Adversarial Item Promotion on Visually-Aware Recommender Systems by Guided Diffusion

    Authors: Lijian Chen, Wei Yuan, Tong Chen, Guanhua Ye, Quoc Viet Hung Nguyen, Hongzhi Yin

    Abstract: Visually-aware recommender systems have found widespread application in domains where visual elements significantly contribute to the inference of users' potential preferences. While the incorporation of visual information holds the promise of enhancing recommendation accuracy and alleviating the cold-start problem, it is essential to point out that the inclusion of item images may introduce subst… ▽ More

    Submitted 22 May, 2024; v1 submitted 25 December, 2023; originally announced December 2023.

    Comments: Accepted by TOIS 2024

  38. arXiv:2312.11562  [pdf, other

    cs.AI cs.CL cs.CV cs.LG

    A Survey of Reasoning with Foundation Models

    Authors: Jiankai Sun, Chuanyang Zheng, Enze Xie, Zhengying Liu, Ruihang Chu, Jianing Qiu, Jiaqi Xu, Mingyu Ding, Hongyang Li, Mengzhe Geng, Yue Wu, Wenhai Wang, Junsong Chen, Zhangyue Yin, Xiaozhe Ren, Jie Fu, Junxian He, Wu Yuan, Qi Liu, Xihui Liu, Yu Li, Hao Dong, Yu Cheng, Ming Zhang, Pheng Ann Heng , et al. (9 additional authors not shown)

    Abstract: Reasoning, a crucial ability for complex problem-solving, plays a pivotal role in various real-world settings such as negotiation, medical diagnosis, and criminal investigation. It serves as a fundamental methodology in the field of Artificial General Intelligence (AGI). With the ongoing development of foundation models, e.g., Large Language Models (LLMs), there is a growing interest in exploring… ▽ More

    Submitted 25 January, 2024; v1 submitted 17 December, 2023; originally announced December 2023.

    Comments: 20 Figures, 160 Pages, 750+ References, Project Page https://github.com/reasoning-survey/Awesome-Reasoning-Foundation-Models

  39. arXiv:2312.08592  [pdf, other

    cs.CV

    Dietary Assessment with Multimodal ChatGPT: A Systematic Analysis

    Authors: Frank P. -W. Lo, Jianing Qiu, Zeyu Wang, Junhong Chen, Bo Xiao, Wu Yuan, Stamatia Giannarou, Gary Frost, Benny Lo

    Abstract: Conventional approaches to dietary assessment are primarily grounded in self-reporting methods or structured interviews conducted under the supervision of dietitians. These methods, however, are often subjective, potentially inaccurate, and time-intensive. Although artificial intelligence (AI)-based solutions have been devised to automate the dietary assessment process, these prior AI methodologie… ▽ More

    Submitted 13 December, 2023; originally announced December 2023.

    Comments: 10 pages

  40. arXiv:2312.05757  [pdf, ps, other

    cs.LG cs.AI cs.DL cs.SI stat.ME

    Towards Human-like Perception: Learning Structural Causal Model in Heterogeneous Graph

    Authors: Tianqian** Lin, Kaisong Song, Zhuoren Jiang, Yangyang Kang, Weikang Yuan, Xurui Li, Changlong Sun, Cui Huang, Xiaozhong Liu

    Abstract: Heterogeneous graph neural networks have become popular in various domains. However, their generalizability and interpretability are limited due to the discrepancy between their inherent inference flows and human reasoning logic or underlying causal relationships for the learning problem. This study introduces a novel solution, HG-SCM (Heterogeneous Graph as Structural Causal Model). It can mimic… ▽ More

    Submitted 9 December, 2023; originally announced December 2023.

    Comments: 28 pages, 10 figures, 6 tables, accepted by Information Processing & Management

    Journal ref: Information Processing & Management, 60 (2024) 1-21

  41. arXiv:2312.00798  [pdf, other

    cs.AI

    A Turing Test: Are AI Chatbots Behaviorally Similar to Humans?

    Authors: Qiaozhu Mei, Yutong Xie, Walter Yuan, Matthew O. Jackson

    Abstract: We administer a Turing Test to AI Chatbots. We examine how Chatbots behave in a suite of classic behavioral games that are designed to elicit characteristics such as trust, fairness, risk-aversion, cooperation, \textit{etc.}, as well as how they respond to a traditional Big-5 psychological survey that measures personality traits. ChatGPT-4 exhibits behavioral and personality traits that are statis… ▽ More

    Submitted 1 January, 2024; v1 submitted 19 November, 2023; originally announced December 2023.

    MSC Class: 91 ACM Class: D.0; J.4; K.4

  42. arXiv:2312.00299  [pdf, other

    stat.AP cs.CV

    QIENet: Quantitative irradiance estimation network using recurrent neural network based on satellite remote sensing data

    Authors: Longfeng Nie, Yuntian Chen, Dongxiao Zhang, Xinyue Liu, Wentian Yuan

    Abstract: Global horizontal irradiance (GHI) plays a vital role in estimating solar energy resources, which are used to generate sustainable green energy. In order to estimate GHI with high spatial resolution, a quantitative irradiance estimation network, named QIENet, is proposed. Specifically, the temporal and spatial characteristics of remote sensing data of the satellite Himawari-8 are extracted and fus… ▽ More

    Submitted 30 November, 2023; originally announced December 2023.

  43. arXiv:2311.16918  [pdf, other

    cs.CV cs.AI

    RichDreamer: A Generalizable Normal-Depth Diffusion Model for Detail Richness in Text-to-3D

    Authors: Lingteng Qiu, Guanying Chen, Xiaodong Gu, Qi Zuo, Mutian Xu, Yushuang Wu, Weihao Yuan, Zilong Dong, Liefeng Bo, Xiaoguang Han

    Abstract: Lifting 2D diffusion for 3D generation is a challenging problem due to the lack of geometric prior and the complex entanglement of materials and lighting in natural images. Existing methods have shown promise by first creating the geometry through score-distillation sampling (SDS) applied to rendered surface normals, followed by appearance modeling. However, relying on a 2D RGB diffusion model to… ▽ More

    Submitted 24 December, 2023; v1 submitted 28 November, 2023; originally announced November 2023.

    Comments: Project Page: https://aigc3d.github.io/richdreamer/

  44. arXiv:2311.14968  [pdf, other

    cs.IR

    Hide Your Model: A Parameter Transmission-free Federated Recommender System

    Authors: Wei Yuan, Chaoqun Yang, Liang Qu, Quoc Viet Hung Nguyen, Jianxin Li, Hongzhi Yin

    Abstract: With the growing concerns regarding user data privacy, Federated Recommender System (FedRec) has garnered significant attention recently due to its privacy-preserving capabilities. Existing FedRecs generally adhere to a learning protocol in which a central server shares a global recommendation model with clients, and participants achieve collaborative learning by frequently communicating the model… ▽ More

    Submitted 12 February, 2024; v1 submitted 25 November, 2023; originally announced November 2023.

    Comments: Accepted by ICDE2024

  45. arXiv:2311.12185  [pdf, other

    cs.RO

    Kitchen Artist: Precise Control of Liquid Dispensing for Gourmet Plating

    Authors: Hung-Jui Huang, **gyi Xiang, Wenzhen Yuan

    Abstract: Manipulating liquid is widely required for many tasks, especially in cooking. A common way to address this is extruding viscous liquid from a squeeze bottle. In this work, our goal is to create a sauce plating robot, which requires precise control of the thickness of squeezed liquids on a surface. Different liquids demand different manipulation policies. We command the robot to tilt the container… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

    Comments: Submitted to ICRA 2024

  46. arXiv:2311.11056  [pdf, other

    cs.RO cs.LG cs.SE

    Choose Your Simulator Wisely: A Review on Open-source Simulators for Autonomous Driving

    Authors: Yueyuan Li, Wei Yuan, Songan Zhang, Weihao Yan, Qiyuan Shen, Chunxiang Wang, Ming Yang

    Abstract: Simulators play a crucial role in autonomous driving, offering significant time, cost, and labor savings. Over the past few years, the number of simulators for autonomous driving has grown substantially. However, there is a growing concern about the validity of algorithms developed and evaluated in simulators, indicating a need for a thorough analysis of the development status of the simulators.… ▽ More

    Submitted 26 December, 2023; v1 submitted 18 November, 2023; originally announced November 2023.

    Comments: 18 pages, 5 figures, 8 tables

  47. arXiv:2311.10331  [pdf, other

    eess.IV cs.CV

    Leveraging Multimodal Fusion for Enhanced Diagnosis of Multiple Retinal Diseases in Ultra-wide OCTA

    Authors: Hao Wei, Peilun Shi, Guitao Bai, Minqing Zhang, Shuangle Li, Wu Yuan

    Abstract: Ultra-wide optical coherence tomography angiography (UW-OCTA) is an emerging imaging technique that offers significant advantages over traditional OCTA by providing an exceptionally wide scanning range of up to 24 x 20 $mm^{2}$, covering both the anterior and posterior regions of the retina. However, the currently accessible UW-OCTA datasets suffer from limited comprehensive hierarchical informati… ▽ More

    Submitted 17 November, 2023; originally announced November 2023.

  48. arXiv:2311.08190  [pdf, other

    eess.IV cs.CV cs.LG

    SAMIHS: Adaptation of Segment Anything Model for Intracranial Hemorrhage Segmentation

    Authors: Yinuo Wang, Kai Chen, Weimin Yuan, Cai Meng, XiangZhi Bai

    Abstract: Segment Anything Model (SAM), a vision foundation model trained on large-scale annotations, has recently continued raising awareness within medical image segmentation. Despite the impressive capabilities of SAM on natural scenes, it struggles with performance decline when confronted with medical images, especially those involving blurry boundaries and highly irregular regions of low contrast. In t… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

    Comments: 5 pages, 3 figures, 2 tables

  49. arXiv:2311.08075  [pdf, ps, other

    eess.IV cs.CV cs.HC

    GlanceSeg: Real-time microaneurysm lesion segmentation with gaze-map-guided foundation model for early detection of diabetic retinopathy

    Authors: Hongyang Jiang, Mengdi Gao, Zirong Liu, Chen Tang, Xiaoqing Zhang, Shuai Jiang, Wu Yuan, Jiang Liu

    Abstract: Early-stage diabetic retinopathy (DR) presents challenges in clinical diagnosis due to inconspicuous and minute microangioma lesions, resulting in limited research in this area. Additionally, the potential of emerging foundation models, such as the segment anything model (SAM), in medical scenarios remains rarely explored. In this work, we propose a human-in-the-loop, label-free early DR diagnosis… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

    Comments: 12 pages, 10 figures

  50. arXiv:2311.00982  [pdf, ps, other

    cs.IT

    The c-differential properties of a class of power functions

    Authors: Huan Zhou, Xiaoni Du, Wen** Yuan, Xingbin Qiao

    Abstract: Power functions with low $c$-differential uniformity have been widely studied not only because of their strong resistance to multiplicative differential attacks, but also low implementation cost in hardware. Furthermore, the $c$-differential spectrum of a function gives a more precise characterization of its $c$-differential properties. Let $f(x)=x^{\frac{p^n+3}{2}}$ be a power function over the f… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.