Skip to main content

Showing 1–50 of 176 results for author: Pang, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.18380  [pdf, ps, other

    cs.LG

    KAGNNs: Kolmogorov-Arnold Networks meet Graph Learning

    Authors: Roman Bresson, Giannis Nikolentzos, George Panagopoulos, Michail Chatzianastasis, Jun Pang, Michalis Vazirgiannis

    Abstract: In recent years, Graph Neural Networks (GNNs) have become the de facto tool for learning node and graph representations. Most GNNs typically consist of a sequence of neighborhood aggregation (a.k.a., message passing) layers. Within each of these layers, the representation of each node is updated from an aggregation and transformation of its neighbours representations at the previous layer. The upp… ▽ More

    Submitted 1 July, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

  2. arXiv:2406.14841  [pdf, other

    cs.CR cs.DB cs.LG

    TabularMark: Watermarking Tabular Datasets for Machine Learning

    Authors: Yihao Zheng, Haocheng Xia, Junyuan Pang, **fei Liu, Kui Ren, Lingyang Chu, Yang Cao, Li Xiong

    Abstract: Watermarking is broadly utilized to protect ownership of shared data while preserving data utility. However, existing watermarking methods for tabular datasets fall short on the desired properties (detectability, non-intrusiveness, and robustness) and only preserve data utility from the perspective of data statistics, ignoring the performance of downstream ML models trained on the datasets. Can we… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  3. arXiv:2406.14558  [pdf, other

    cs.RO cs.AI

    CooHOI: Learning Cooperative Human-Object Interaction with Manipulated Object Dynamics

    Authors: Jiawei Gao, Ziqin Wang, Zeqi Xiao, **gbo Wang, Tai Wang, **kun Cao, Xiaolin Hu, Si Liu, Jifeng Dai, Jiangmiao Pang

    Abstract: Recent years have seen significant advancements in humanoid control, largely due to the availability of large-scale motion capture data and the application of reinforcement learning methodologies. However, many real-world tasks, such as moving large and heavy furniture, require multi-character collaboration. Given the scarcity of data on multi-character collaboration and the efficiency challenges… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  4. arXiv:2406.13243  [pdf, ps, other

    cs.IT

    Abelian Group Codes for Classical and Classical-Quantum Channels: One-shot and Asymptotic Rate Bounds

    Authors: James Chin-Jen Pang, Sandeep Pradhan, Hessam Mahdavifar

    Abstract: We study the problem of transmission of information over classical and classical-quantum channels in the one-shot regime where the underlying codes are constrained to be group codes. In the achievability part, we introduce a new input probability distribution that incorporates the encoding homomorphism and the underlying channel law. Using a random coding argument, we characterize the performance… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 41 pages

  5. arXiv:2406.09401  [pdf, other

    cs.CV cs.AI cs.RO

    MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations

    Authors: Ruiyuan Lyu, Tai Wang, **gli Lin, Shuai Yang, Xiaohan Mao, Yilun Chen, Runsen Xu, Haifeng Huang, Chenming Zhu, Dahua Lin, Jiangmiao Pang

    Abstract: With the emergence of LLMs and their integration with other data modalities, multi-modal 3D perception attracts more attention due to its connectivity to the physical world and makes rapid progress. However, limited by existing datasets, previous works mainly focus on understanding object properties or inter-object spatial relationships in a 3D scene. To tackle this problem, this paper builds the… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Follow-up of EmbodiedScan. A multi-modal 3D dataset with the most-ever comprehensive language annotations for 3D-LLMs. Project page: https://tai-wang.github.io/mmscan/

  6. arXiv:2406.08001  [pdf, other

    cs.CV cs.LG

    Asymptotic Unbiased Sample Sampling to Speed Up Sharpness-Aware Minimization

    Authors: Jiaxin Deng, Junbiao Pang, Baochang Zhang

    Abstract: Sharpness-Aware Minimization (SAM) has emerged as a promising approach for effectively reducing the generalization error. However, SAM incurs twice the computational cost compared to base optimizer (e.g., SGD). We propose Asymptotic Unbiased Sampling with respect to iterations to accelerate SAM (AUSAM), which maintains the model's generalization capacity while significantly enhancing computational… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  7. arXiv:2405.21070  [pdf, other

    cs.CV cs.CL cs.LG

    Generalization Beyond Data Imbalance: A Controlled Study on CLIP for Transferable Insights

    Authors: Xin Wen, Bingchen Zhao, Yilun Chen, Jiangmiao Pang, Xiaojuan Qi

    Abstract: Severe data imbalance naturally exists among web-scale vision-language datasets. Despite this, we find CLIP pre-trained thereupon exhibits notable robustness to the data imbalance compared to supervised learning, and demonstrates significant effectiveness in learning generalizable representations. With an aim to investigate the reasons behind this finding, we conduct controlled experiments to stud… ▽ More

    Submitted 14 June, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

  8. arXiv:2405.11809  [pdf, other

    cs.CV cs.AI

    Distill-then-prune: An Efficient Compression Framework for Real-time Stereo Matching Network on Edge Devices

    Authors: Baiyu Pan, Jichao Jiao, Jianxing Pang, Jun Cheng

    Abstract: In recent years, numerous real-time stereo matching methods have been introduced, but they often lack accuracy. These methods attempt to improve accuracy by introducing new modules or integrating traditional methods. However, the improvements are only modest. In this paper, we propose a novel strategy by incorporating knowledge distillation and model pruning to overcome the inherent trade-off betw… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: International Conference on Robotics and Automation (ICRA) 2024

  9. arXiv:2405.10625  [pdf, other

    cs.CL cs.AI cs.LG q-bio.BM

    Specialising and Analysing Instruction-Tuned and Byte-Level Language Models for Organic Reaction Prediction

    Authors: Jiayun Pang, Ivan Vulić

    Abstract: Transformer-based encoder-decoder models have demonstrated impressive results in chemical reaction prediction tasks. However, these models typically rely on pretraining using tens of millions of unlabelled molecules, which can be time-consuming and GPU-intensive. One of the central questions we aim to answer in this work is: Can FlanT5 and ByT5, the encode-decoder models pretrained solely on langu… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: Preprint

  10. arXiv:2405.10370  [pdf, other

    cs.CV

    Grounded 3D-LLM with Referent Tokens

    Authors: Yilun Chen, Shuai Yang, Haifeng Huang, Tai Wang, Ruiyuan Lyu, Runsen Xu, Dahua Lin, Jiangmiao Pang

    Abstract: Prior studies on 3D scene understanding have primarily developed specialized models for specific tasks or required task-specific fine-tuning. In this study, we propose Grounded 3D-LLM, which explores the potential of 3D large multi-modal models (3D LMMs) to consolidate various 3D vision tasks within a unified generative framework. The model uses scene referent tokens as special noun phrases to ref… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: Preprint

  11. arXiv:2405.08458  [pdf, other

    cs.CV

    Rethinking Prior Information Generation with CLIP for Few-Shot Segmentation

    Authors: ** Wang, Bingfeng Zhang, Jian Pang, Honglong Chen, Weifeng Liu

    Abstract: Few-shot segmentation remains challenging due to the limitations of its labeling information for unseen classes. Most previous approaches rely on extracting high-level feature maps from the frozen visual encoder to compute the pixel-wise similarity as a key prior guidance for the decoder. However, such a prior representation suffers from coarse granularity and poor generalization to new classes si… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

    Comments: Accepted by CVPR 2024; The camera-ready version

  12. arXiv:2405.00378  [pdf, other

    cs.CV

    Adaptive Bidirectional Displacement for Semi-Supervised Medical Image Segmentation

    Authors: Hanyang Chi, Jian Pang, Bingfeng Zhang, Weifeng Liu

    Abstract: Consistency learning is a central strategy to tackle unlabeled data in semi-supervised medical image segmentation (SSMIS), which enforces the model to produce consistent predictions under the perturbation. However, most current approaches solely focus on utilizing a specific single perturbation, which can only cope with limited cases, while employing multiple perturbations simultaneously is hard t… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: Accepted to CVPR 2024

  13. arXiv:2404.14405  [pdf, other

    cs.RO

    Learning H-Infinity Locomotion Control

    Authors: Junfeng Long, Wenye Yu, Quanyi Li, Zirui Wang, Dahua Lin, Jiangmiao Pang

    Abstract: Stable locomotion in precipitous environments is an essential task for quadruped robots, requiring the ability to resist various external disturbances. Recent neural policies enhance robustness against disturbances by learning to resist external forces sampled from a fixed distribution in the simulated environment. However, the force generation process doesn't consider the robot's current state, m… ▽ More

    Submitted 12 June, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: Project Page: https://junfeng-long.github.io/HINF/

  14. arXiv:2404.12702  [pdf, other

    cs.CV

    Modeling Multi-Granularity Context Information Flow for Pavement Crack Detection

    Authors: Junbiao Pang, Baocheng Xiong, Jiaqi Wu

    Abstract: Crack detection has become an indispensable, interesting yet challenging task in the computer vision community. Specially, pavement cracks have a highly complex spatial structure, a low contrasting background and a weak spatial continuity, posing a significant challenge to an effective crack detection method. In this paper, we address these problems from a view that utilizes contexts of the cracks… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  15. arXiv:2404.11844  [pdf, ps, other

    cs.CY

    Finding A Taxi with Illegal Driver Substitution Activity via Behavior Modelings

    Authors: Junbiao Pang, Muhammad Ayub Sabir, Zhuyun Wang, An**g Hu, Xue Yang, Haitao Yu, Qingming Huang

    Abstract: In our urban life, Illegal Driver Substitution (IDS) activity for a taxi is a grave unlawful activity in the taxi industry, possibly causing severe traffic accidents and painful social repercussions. Currently, the IDS activity is manually supervised by law enforcers, i.e., law enforcers empirically choose a taxi and inspect it. The pressing problem of this scheme is the dilemma between the limite… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  16. arXiv:2404.10985  [pdf, ps, other

    cs.CV stat.ML

    Pixel-Wise Symbol Spotting via Progressive Points Location for Parsing CAD Images

    Authors: Junbiao Pang, Zailin Dong, Jiaxin Deng, Mengyuan Zhu, Yunwei Zhang

    Abstract: Parsing Computer-Aided Design (CAD) drawings is a fundamental step for CAD revision, semantic-based management, and the generation of 3D prototypes in both the architecture and engineering industries. Labeling symbols from a CAD drawing is a challenging yet notorious task from a practical point of view. In this work, we propose to label and spot symbols from CAD images that are converted from CAD… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: 10 pages, 10 figures,6 tables

  17. arXiv:2404.09248  [pdf, other

    cs.LG cs.AI cs.CL

    Knowledgeable Agents by Offline Reinforcement Learning from Large Language Model Rollouts

    Authors: **g-Cheng Pang, Si-Hang Yang, Kaiyuan Li, Jiaji Zhang, Xiong-Hui Chen, Nan Tang, Yang Yu

    Abstract: Reinforcement learning (RL) trains agents to accomplish complex tasks through environmental interaction data, but its capacity is also limited by the scope of the available data. To obtain a knowledgeable agent, a promising approach is to leverage the knowledge from large language models (LLMs). Despite previous studies combining LLMs with RL, seamless integration of the two components remains cha… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

  18. arXiv:2404.00409  [pdf, other

    cs.CV cs.GR

    3DGSR: Implicit Surface Reconstruction with 3D Gaussian Splatting

    Authors: Xiaoyang Lyu, Yang-Tian Sun, Yi-Hua Huang, Xiuzhe Wu, Ziyi Yang, Yilun Chen, Jiangmiao Pang, Xiaojuan Qi

    Abstract: In this paper, we present an implicit surface reconstruction method with 3D Gaussian Splatting (3DGS), namely 3DGSR, that allows for accurate 3D reconstruction with intricate details while inheriting the high efficiency and rendering quality of 3DGS. The key insight is incorporating an implicit signed distance field (SDF) within 3D Gaussians to enable them to be aligned and jointly optimized. Firs… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

  19. arXiv:2403.19289  [pdf, other

    cs.LG cs.AI stat.ME

    Uplift Modeling Under Limited Supervision

    Authors: George Panagopoulos, Daniele Malitesta, Fragkiskos D. Malliaros, Jun Pang

    Abstract: Estimating causal effects in e-commerce tends to involve costly treatment assignments which can be impractical in large-scale settings. Leveraging machine learning to predict such treatment effects without actual intervention is a standard practice to diminish the risk. However, existing methods for treatment effect prediction tend to rely on training sets of substantial size, which are built from… ▽ More

    Submitted 7 June, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

  20. arXiv:2403.18407  [pdf, other

    cs.CV cs.AI

    A Channel-ensemble Approach: Unbiased and Low-variance Pseudo-labels is Critical for Semi-supervised Classification

    Authors: Jiaqi Wu, Junbiao Pang, Baochang Zhang, Qingming Huang

    Abstract: Semi-supervised learning (SSL) is a practical challenge in computer vision. Pseudo-label (PL) methods, e.g., FixMatch and FreeMatch, obtain the State Of The Art (SOTA) performances in SSL. These approaches employ a threshold-to-pseudo-label (T2L) process to generate PLs by truncating the confidence scores of unlabeled data predicted by the self-training method. However, self-trained models typical… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

  21. arXiv:2403.18259  [pdf, other

    cs.RO

    RoboKeyGen: Robot Pose and Joint Angles Estimation via Diffusion-based 3D Keypoint Generation

    Authors: Yang Tian, Jiyao Zhang, Guowei Huang, Bin Wang, ** Wang, Jiangmiao Pang, Hao Dong

    Abstract: Estimating robot pose and joint angles is significant in advanced robotics, enabling applications like robot collaboration and online hand-eye calibration.However, the introduction of unknown joint angles makes prediction more complex than simple robot pose estimation, due to its higher dimensionality.Previous methods either regress 3D keypoints directly or utilise a render&compare strategy. These… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: Accepted by ICRA 2024

  22. arXiv:2403.17367  [pdf, other

    cs.RO

    RoboDuet: A Framework Affording Mobile-Manipulation and Cross-Embodiment

    Authors: Guo** Pan, Qingwei Ben, Zhecheng Yuan, Guangqi Jiang, Yandong Ji, Jiangmiao Pang, Houde Liu, Huazhe Xu

    Abstract: Combining the mobility of legged robots with the manipulation skills of arms has the potential to significantly expand the operational range and enhance the capabilities of robotic systems in performing various mobile manipulation tasks. Existing approaches are confined to imprecise six degrees of freedom (DoF) manipulation and possess a limited arm workspace. In this paper, we propose a novel fra… ▽ More

    Submitted 13 May, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

  23. arXiv:2403.08821  [pdf, other

    cs.CV cs.LG

    Effective Gradient Sample Size via Variation Estimation for Accelerating Sharpness aware Minimization

    Authors: Jiaxin Deng, Junbiao Pang, Baochang Zhang, Tian Wang

    Abstract: Sharpness-aware Minimization (SAM) has been proposed recently to improve model generalization ability. However, SAM calculates the gradient twice in each optimization step, thereby doubling the computation costs compared to stochastic gradient descent (SGD). In this paper, we propose a simple yet efficient sampling method to significantly accelerate SAM. Concretely, we discover that the gradient o… ▽ More

    Submitted 24 February, 2024; originally announced March 2024.

  24. arXiv:2402.18445  [pdf, other

    cs.NI

    HyperFedNet: Communication-Efficient Personalized Federated Learning Via Hypernetwork

    Authors: Xingyun Chen, Yan Huang, Zhenzhen Xie, Junjie Pang

    Abstract: In response to the challenges posed by non-independent and identically distributed (non-IID) data and the escalating threat of privacy attacks in Federated Learning (FL), we introduce HyperFedNet (HFN), a novel architecture that incorporates hypernetworks to revolutionize parameter aggregation and transmission in FL. Traditional FL approaches, characterized by the transmission of extensive paramet… ▽ More

    Submitted 2 March, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

  25. arXiv:2402.16174  [pdf, other

    cs.CV cs.AI cs.RO

    GenNBV: Generalizable Next-Best-View Policy for Active 3D Reconstruction

    Authors: Xiao Chen, Quanyi Li, Tai Wang, Tianfan Xue, Jiangmiao Pang

    Abstract: While recent advances in neural radiance field enable realistic digitization for large-scale scenes, the image-capturing process is still time-consuming and labor-intensive. Previous works attempt to automate this process using the Next-Best-View (NBV) policy for active 3D reconstruction. However, the existing NBV policies heavily rely on hand-crafted criteria, limited action space, or per-scene o… ▽ More

    Submitted 15 June, 2024; v1 submitted 25 February, 2024; originally announced February 2024.

    Comments: CVPR 2024. Project page: http://gennbv.github.io/

  26. arXiv:2402.15895  [pdf, other

    cs.CV

    Multi-Object Tracking by Hierarchical Visual Representations

    Authors: **kun Cao, Jiangmiao Pang, Kris Kitani

    Abstract: We propose a new visual hierarchical representation paradigm for multi-object tracking. It is more effective to discriminate between objects by attending to objects' compositional visual regions and contrasting with the background contextual information instead of sticking to only the semantic visual cue such as bounding boxes. This compositional-semantic-contextual hierarchy is flexible to be int… ▽ More

    Submitted 24 February, 2024; originally announced February 2024.

    Comments: 6 pages, 3 figures, 10 tables, accepted by ICRA 2024

  27. arXiv:2402.13497  [pdf, other

    cs.CV

    Push Quantization-Aware Training Toward Full Precision Performances via Consistency Regularization

    Authors: Junbiao Pang, Tianyang Cai, Baochang Zhang, Jiaqi Wu, Ye Tao

    Abstract: Existing Quantization-Aware Training (QAT) methods intensively depend on the complete labeled dataset or knowledge distillation to guarantee the performances toward Full Precision (FP) accuracies. However, empirical results show that QAT still has inferior results compared to its FP counterpart. One question is how to push QAT toward or even surpass FP performances. In this paper, we address this… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

    Comments: 11 pages, 5 figures

  28. arXiv:2402.12789  [pdf, other

    cs.LG cs.AI

    Fairness Without Harm: An Influence-Guided Active Sampling Approach

    Authors: **long Pang, Jialu Wang, Zhaowei Zhu, Yuanshun Yao, Chen Qian, Yang Liu

    Abstract: The pursuit of fairness in machine learning (ML), ensuring that the models do not exhibit biases toward protected demographic groups, typically results in a compromise scenario. This compromise can be explained by a Pareto frontier where given certain resources (e.g., data), reducing the fairness violations often comes at the cost of lowering the model accuracy. In this work, we aim to train model… ▽ More

    Submitted 31 May, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

  29. arXiv:2402.12238  [pdf, other

    cs.CV

    Mixed Gaussian Flow for Diverse Trajectory Prediction

    Authors: Jiahe Chen, **kun Cao, Dahua Lin, Kris Kitani, Jiangmiao Pang

    Abstract: Existing trajectory prediction studies intensively leverage generative models. Normalizing flow is one of the genres with the advantage of being invertible to derive the probability density of predicted trajectories. However, map** from a standard Gaussian by a flow-based model hurts the capacity to capture complicated patterns of trajectories, ignoring the under-represented motion intentions in… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

  30. arXiv:2402.07616  [pdf, other

    cs.CL cs.AI

    Anchor-based Large Language Models

    Authors: Jianhui Pang, Fanghua Ye, Derek Fai Wong, Xin He, Wanshun Chen, Longyue Wang

    Abstract: Large language models (LLMs) predominantly employ decoder-only transformer architectures, necessitating the retention of keys/values information for historical tokens to provide contextual information and avoid redundant computation. However, the substantial size and parameter volume of these LLMs require massive GPU memory. This memory demand increases with the length of the input text, leading t… ▽ More

    Submitted 1 June, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

    Comments: The paper has been accepted by the ACL2024 conference. Work was done when Jianhui Pang and Fanghua Ye were interning at Tencent AI Lab

  31. arXiv:2402.07243  [pdf, other

    cs.CV eess.IV

    PIVOT-Net: Heterogeneous Point-Voxel-Tree-based Framework for Point Cloud Compression

    Authors: Jiahao Pang, Kevin Bui, Dong Tian

    Abstract: The universality of the point cloud format enables many 3D applications, making the compression of point clouds a critical phase in practice. Sampled as discrete 3D points, a point cloud approximates 2D surface(s) embedded in 3D with a finite bit-depth. However, the point distribution of a practical point cloud changes drastically as its bit-depth increases, requiring different methodologies for e… ▽ More

    Submitted 11 February, 2024; originally announced February 2024.

    Comments: Accepted at 3DV 2024

  32. arXiv:2402.03719  [pdf, other

    cs.CL cs.AI

    Empowering Language Models with Active Inquiry for Deeper Understanding

    Authors: **g-Cheng Pang, Heng-Bo Fan, Pengyuan Wang, Jia-Hao Xiao, Nan Tang, Si-Hang Yang, Chengxing Jia, Sheng-Jun Huang, Yang Yu

    Abstract: The rise of large language models (LLMs) has revolutionized the way that we interact with artificial intelligence systems through natural language. However, LLMs often misinterpret user queries because of their uncertain intention, leading to less helpful responses. In natural human interactions, clarification is sought through targeted questioning to uncover obscure information. Thus, in this pap… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

  33. arXiv:2401.12794  [pdf, other

    cs.CL

    Benchmarking LLMs via Uncertainty Quantification

    Authors: Fanghua Ye, Mingming Yang, Jianhui Pang, Longyue Wang, Derek F. Wong, Emine Yilmaz, Shuming Shi, Zhaopeng Tu

    Abstract: The proliferation of open-source Large Language Models (LLMs) from various institutions has highlighted the urgent need for comprehensive evaluation methods. However, current evaluation platforms, such as the widely recognized HuggingFace open LLM leaderboard, neglect a crucial aspect -- uncertainty, which is vital for thoroughly assessing LLMs. To bridge this gap, we introduce a new benchmarking… ▽ More

    Submitted 25 April, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

    Comments: 25 pages, preprints

  34. arXiv:2401.08350  [pdf, other

    cs.CL

    Salute the Classic: Revisiting Challenges of Machine Translation in the Age of Large Language Models

    Authors: Jianhui Pang, Fanghua Ye, Longyue Wang, Dian Yu, Derek F. Wong, Shuming Shi, Zhaopeng Tu

    Abstract: The evolution of Neural Machine Translation (NMT) has been significantly influenced by six core challenges (Koehn and Knowles, 2017), which have acted as benchmarks for progress in this field. This study revisits these challenges, offering insights into their ongoing relevance in the context of advanced Large Language Models (LLMs): domain mismatch, amount of parallel data, rare word prediction, t… ▽ More

    Submitted 17 January, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

    Comments: 17 pages. Longyue Wang is the Corresponding Author

  35. arXiv:2312.16170  [pdf, other

    cs.CV cs.AI cs.RO

    EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI

    Authors: Tai Wang, Xiaohan Mao, Chenming Zhu, Runsen Xu, Ruiyuan Lyu, Peisen Li, Xiao Chen, Wenwei Zhang, Kai Chen, Tianfan Xue, Xihui Liu, Cewu Lu, Dahua Lin, Jiangmiao Pang

    Abstract: In the realm of computer vision and robotics, embodied agents are expected to explore their environment and carry out human instructions. This necessitates the ability to fully understand 3D scenes given their first-person observations and contextualize them into language for interaction. However, traditional research focuses more on scene-level input and output setups from a global view. To addre… ▽ More

    Submitted 26 December, 2023; originally announced December 2023.

    Comments: A multi-modal, ego-centric 3D perception dataset and benchmark for holistic 3D scene understanding. Project page: http://tai-wang.github.io/embodiedscan

  36. arXiv:2312.11460  [pdf, other

    cs.RO cs.AI cs.CV cs.LG eess.SY

    Hybrid Internal Model: Learning Agile Legged Locomotion with Simulated Robot Response

    Authors: Junfeng Long, Zirui Wang, Quanyi Li, Jiawei Gao, Liu Cao, Jiangmiao Pang

    Abstract: Robust locomotion control depends on accurate state estimations. However, the sensors of most legged robots can only provide partial and noisy observations, making the estimation particularly challenging, especially for external states like terrain frictions and elevation maps. Inspired by the classical Internal Model Control principle, we consider these external states as disturbances and introdu… ▽ More

    Submitted 1 January, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

    Comments: Use 1 hour to train a quadruped robot capable of traversing any terrain under any disturbances in the open world, Project Page: https://github.com/OpenRobotLab/HIMLoco

  37. arXiv:2312.00335  [pdf, other

    cs.CV

    Learning Anatomically Consistent Embedding for Chest Radiography

    Authors: Ziyu Zhou, Haozhe Luo, Jiaxuan Pang, Xiaowei Ding, Michael Gotway, Jianming Liang

    Abstract: Self-supervised learning (SSL) approaches have recently shown substantial success in learning visual representations from unannotated images. Compared with photographic images, medical images acquired with the same imaging protocol exhibit high consistency in anatomy. To exploit this anatomical consistency, this paper introduces a novel SSL approach, called PEAC (patch embedding of anatomical cons… ▽ More

    Submitted 11 June, 2024; v1 submitted 30 November, 2023; originally announced December 2023.

    Comments: BMVC 2023, oral

  38. arXiv:2311.01782  [pdf, other

    cs.CV

    Generating Unbiased Pseudo-labels via a Theoretically Guaranteed Chebyshev Constraint to Unify Semi-supervised Classification and Regression

    Authors: Jiaqi Wu, Junbiao Pang, Qingming Huang

    Abstract: Both semi-supervised classification and regression are practically challenging tasks for computer vision. However, semi-supervised classification methods are barely applied to regression tasks. Because the threshold-to-pseudo label process (T2L) in classification uses confidence to determine the quality of label. It is successful for classification tasks but inefficient for regression tasks. In na… ▽ More

    Submitted 3 November, 2023; originally announced November 2023.

  39. arXiv:2311.01770  [pdf, other

    cs.CV cs.AI

    Modeling the Uncertainty with Maximum Discrepant Students for Semi-supervised 2D Pose Estimation

    Authors: Jiaqi Wu, Junbiao Pang, Qingming Huang

    Abstract: Semi-supervised pose estimation is a practically challenging task for computer vision. Although numerous excellent semi-supervised classification methods have emerged, these methods typically use confidence to evaluate the quality of pseudo-labels, which is difficult to achieve in pose estimation tasks. For example, in pose estimation, confidence represents only the possibility that a position of… ▽ More

    Submitted 3 November, 2023; originally announced November 2023.

  40. Foundation Ark: Accruing and Reusing Knowledge for Superior and Robust Performance

    Authors: DongAo Ma, Jiaxuan Pang, Michael B. Gotway, Jianming Liang

    Abstract: Deep learning nowadays offers expert-level and sometimes even super-expert-level performance, but achieving such performance demands massive annotated data for training (e.g., Google's proprietary CXR Foundation Model (CXR-FM) was trained on 821,544 labeled and mostly private chest X-rays (CXRs)). Numerous datasets are publicly available in medical imaging but individually small and heterogeneous… ▽ More

    Submitted 14 October, 2023; originally announced October 2023.

    Comments: Best Paper Award Runner-Up at Medical Image Computing and Computer Assisted Intervention (MICCAI) 2023

  41. arXiv:2310.05107  [pdf, other

    cs.CV

    OV-PARTS: Towards Open-Vocabulary Part Segmentation

    Authors: Meng Wei, Xiaoyu Yue, Wenwei Zhang, Shu Kong, Xihui Liu, Jiangmiao Pang

    Abstract: Segmenting and recognizing diverse object parts is a crucial ability in applications spanning various computer vision and robotic tasks. While significant progress has been made in object-level Open-Vocabulary Semantic Segmentation (OVSS), i.e., segmenting objects with arbitrary text, the corresponding part-level research poses additional challenges. Firstly, part segmentation inherently involves… ▽ More

    Submitted 8 October, 2023; originally announced October 2023.

    Comments: Accepted by NeurIPS Dataset and Benchmark Track 2023

  42. arXiv:2310.02614  [pdf, ps, other

    cs.AI cs.MA

    On Quantified Observability Analysis in Multiagent Systems

    Authors: Chunyan Mu, Jun Pang

    Abstract: In multiagent systems (MASs), agents' observation upon system behaviours may improve the overall team performance, but may also leak sensitive information to an observer. A quantified observability analysis can thus be useful to assist decision-making in MASs by operators seeking to optimise the relationship between performance effectiveness and information exposure through observations in practic… ▽ More

    Submitted 4 October, 2023; originally announced October 2023.

    Comments: 8 pages

  43. arXiv:2310.01994  [pdf, other

    cs.CV

    Understanding Masked Autoencoders From a Local Contrastive Perspective

    Authors: Xiaoyu Yue, Lei Bai, Meng Wei, Jiangmiao Pang, Xihui Liu, Lu** Zhou, Wanli Ouyang

    Abstract: Masked AutoEncoder (MAE) has revolutionized the field of self-supervised learning with its simple yet effective masking and reconstruction strategies. However, despite achieving state-of-the-art performance across various downstream vision tasks, the underlying mechanisms that drive MAE's efficacy are less well-explored compared to the canonical contrastive learning paradigm. In this paper, we fir… ▽ More

    Submitted 8 December, 2023; v1 submitted 3 October, 2023; originally announced October 2023.

  44. arXiv:2309.07918  [pdf, other

    cs.CV

    Unified Human-Scene Interaction via Prompted Chain-of-Contacts

    Authors: Zeqi Xiao, Tai Wang, **gbo Wang, **kun Cao, Wenwei Zhang, Bo Dai, Dahua Lin, Jiangmiao Pang

    Abstract: Human-Scene Interaction (HSI) is a vital component of fields like embodied AI and virtual reality. Despite advancements in motion quality and physical plausibility, two pivotal factors, versatile interaction control and the development of a user-friendly interface, require further exploration before the practical application of HSI. This paper presents a unified HSI framework, UniHSI, which suppor… ▽ More

    Submitted 19 April, 2024; v1 submitted 14 September, 2023; originally announced September 2023.

    Comments: A unified Human-Scene Interaction framework that supports versatile interactions through language commands.Project URL: https://xizaoqu.github.io/unihsi/ . Code: https://github.com/OpenRobotLab/UniHSI

  45. arXiv:2308.16911  [pdf, other

    cs.CV cs.AI cs.CL

    PointLLM: Empowering Large Language Models to Understand Point Clouds

    Authors: Runsen Xu, Xiaolong Wang, Tai Wang, Yilun Chen, Jiangmiao Pang, Dahua Lin

    Abstract: The unprecedented advancements in Large Language Models (LLMs) have shown a profound impact on natural language processing but are yet to fully embrace the realm of 3D understanding. This paper introduces PointLLM, a preliminary effort to fill this gap, enabling LLMs to understand point clouds and offering a new avenue beyond 2D visual data. PointLLM understands colored object point clouds with hu… ▽ More

    Submitted 1 December, 2023; v1 submitted 31 August, 2023; originally announced August 2023.

    Comments: 28 pages. Empowering large language models with 3D point cloud understanding, accompanied by a novel dataset and carefully designed benchmarks. Project page: https://runsenxu.com/projects/PointLLM

  46. arXiv:2308.15413  [pdf, other

    cs.CV eess.IV

    Wrap**Net: Mesh Autoencoder via Deep Sphere Deformation

    Authors: Eric Lei, Muhammad Asad Lodhi, Jiahao Pang, Junghyun Ahn, Dong Tian

    Abstract: There have been recent efforts to learn more meaningful representations via fixed length codewords from mesh data, since a mesh serves as a complete model of underlying 3D shape compared to a point cloud. However, the mesh connectivity presents new difficulties when constructing a deep learning pipeline for meshes. Previous mesh unsupervised learning approaches typically assume category-specific t… ▽ More

    Submitted 29 August, 2023; originally announced August 2023.

  47. arXiv:2308.07635  [pdf, other

    cs.CL cs.AI

    LLM-Mini-CEX: Automatic Evaluation of Large Language Model for Diagnostic Conversation

    Authors: Xiaoming Shi, Jie Xu, **ru Ding, Jiali Pang, Sichen Liu, Shuqing Luo, Xingwei Peng, Lu Lu, Haihong Yang, Mingtao Hu, Tong Ruan, Shaoting Zhang

    Abstract: There is an increasing interest in develo** LLMs for medical diagnosis to improve diagnosis efficiency. Despite their alluring technological potential, there is no unified and comprehensive evaluation criterion, leading to the inability to evaluate the quality and potential risks of medical LLMs, further hindering the application of LLMs in medical treatment scenarios. Besides, current evaluatio… ▽ More

    Submitted 15 August, 2023; originally announced August 2023.

  48. arXiv:2306.05233  [pdf, other

    cs.CR cs.CV cs.LG

    Ownership Protection of Generative Adversarial Networks

    Authors: Hailong Hu, Jun Pang

    Abstract: Generative adversarial networks (GANs) have shown remarkable success in image synthesis, making GAN models themselves commercially valuable to legitimate model owners. Therefore, it is critical to technically protect the intellectual property of GANs. Prior works need to tamper with the training set or training process, and they are not robust to emerging model extraction attacks. In this paper, w… ▽ More

    Submitted 8 June, 2023; originally announced June 2023.

  49. arXiv:2306.05208  [pdf, other

    cs.CR cs.CV cs.LG

    PriSampler: Mitigating Property Inference of Diffusion Models

    Authors: Hailong Hu, Jun Pang

    Abstract: Diffusion models have been remarkably successful in data synthesis. However, when these models are applied to sensitive datasets, such as banking and human face data, they might bring up severe privacy concerns. This work systematically presents the first privacy study about property inference attacks against diffusion models, where adversaries aim to extract sensitive global properties of its tra… ▽ More

    Submitted 29 April, 2024; v1 submitted 8 June, 2023; originally announced June 2023.

  50. arXiv:2305.14483  [pdf, other

    cs.CL cs.LG

    Language Model Self-improvement by Reinforcement Learning Contemplation

    Authors: **g-Cheng Pang, Pengyuan Wang, Kaiyuan Li, Xiong-Hui Chen, Jiacheng Xu, Zongzhang Zhang, Yang Yu

    Abstract: Large Language Models (LLMs) have exhibited remarkable performance across various natural language processing (NLP) tasks. However, fine-tuning these models often necessitates substantial supervision, which can be expensive and time-consuming to obtain. This paper introduces a novel unsupervised method called LanguageModel Self-Improvement by Reinforcement Learning Contemplation (SIRLC) that impro… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.