Search | arXiv e-print repository

Drug Supply Chain Optimization for Adaptive Clinical Trials

Authors: **cheng Pang, Hong Yan, Zoe Hua

Abstract: With increasing interest in adaptive clinical trial designs, challenges are present to drug supply chain management which may offset the benefit of adaptive designs. Thus, it is necessary to develop an optimization tool to facilitate the decision making and analysis of drug supply chain planning. The challenges include the uncertainty of maximum drug supply needed, the shifting of supply requireme… ▽ More With increasing interest in adaptive clinical trial designs, challenges are present to drug supply chain management which may offset the benefit of adaptive designs. Thus, it is necessary to develop an optimization tool to facilitate the decision making and analysis of drug supply chain planning. The challenges include the uncertainty of maximum drug supply needed, the shifting of supply requirement, and rapid availability of new supply at decision points. In this paper, statistical simulations are designed to optimize the pre-study medication supply strategy and monitor ongoing drug supply using real-time data collected with the progress of study. Particle swarm algorithm is applied when performing optimization, where feature extraction is implemented to reduce dimensionality and save computational cost. △ Less

Submitted 12 October, 2023; originally announced October 2023.

arXiv:2310.05107 [pdf, other]

OV-PARTS: Towards Open-Vocabulary Part Segmentation

Authors: Meng Wei, Xiaoyu Yue, Wenwei Zhang, Shu Kong, Xihui Liu, Jiangmiao Pang

Abstract: Segmenting and recognizing diverse object parts is a crucial ability in applications spanning various computer vision and robotic tasks. While significant progress has been made in object-level Open-Vocabulary Semantic Segmentation (OVSS), i.e., segmenting objects with arbitrary text, the corresponding part-level research poses additional challenges. Firstly, part segmentation inherently involves… ▽ More Segmenting and recognizing diverse object parts is a crucial ability in applications spanning various computer vision and robotic tasks. While significant progress has been made in object-level Open-Vocabulary Semantic Segmentation (OVSS), i.e., segmenting objects with arbitrary text, the corresponding part-level research poses additional challenges. Firstly, part segmentation inherently involves intricate boundaries, while limited annotated data compounds the challenge. Secondly, part segmentation introduces an open granularity challenge due to the diverse and often ambiguous definitions of parts in the open world. Furthermore, the large-scale vision and language models, which play a key role in the open vocabulary setting, struggle to recognize parts as effectively as objects. To comprehensively investigate and tackle these challenges, we propose an Open-Vocabulary Part Segmentation (OV-PARTS) benchmark. OV-PARTS includes refined versions of two publicly available datasets: Pascal-Part-116 and ADE20K-Part-234. And it covers three specific tasks: Generalized Zero-Shot Part Segmentation, Cross-Dataset Part Segmentation, and Few-Shot Part Segmentation, providing insights into analogical reasoning, open granularity and few-shot adapting abilities of models. Moreover, we analyze and adapt two prevailing paradigms of existing object-level OVSS methods for OV-PARTS. Extensive experimental analysis is conducted to inspire future research in leveraging foundational models for OV-PARTS. The code and dataset are available at https://github.com/OpenRobotLab/OV_PARTS. △ Less

Submitted 8 October, 2023; originally announced October 2023.

Comments: Accepted by NeurIPS Dataset and Benchmark Track 2023

arXiv:2310.04153 [pdf, other]

Fair coins tend to land on the same side they started: Evidence from 350,757 flips

Authors: František Bartoš, Alexandra Sarafoglou, Henrik R. Godmann, Amir Sahrani, David Klein Leunk, Pierre Y. Gui, David Voss, Kaleem Ullah, Malte J. Zoubek, Franziska Nippold, Frederik Aust, Felipe F. Vieira, Chris-Gabriel Islam, Anton J. Zoubek, Sara Shabani, Jonas Petter, Ingeborg B. Roos, Adam Finnemann, Aaron B. Lob, Madlen F. Hoffstadt, Jason Nak, Jill de Ron, Koen Derks, Karoline Huth, Sjoerd Terpstra , et al. (25 additional authors not shown)

Abstract: Many people have flipped coins but few have stopped to ponder the statistical and physical intricacies of the process. In a preregistered study we collected $350{,}757$ coin flips to test the counterintuitive prediction from a physics model of human coin tossing developed by Diaconis, Holmes, and Montgomery (DHM; 2007). The model asserts that when people flip an ordinary coin, it tends to land on… ▽ More Many people have flipped coins but few have stopped to ponder the statistical and physical intricacies of the process. In a preregistered study we collected $350{,}757$ coin flips to test the counterintuitive prediction from a physics model of human coin tossing developed by Diaconis, Holmes, and Montgomery (DHM; 2007). The model asserts that when people flip an ordinary coin, it tends to land on the same side it started -- DHM estimated the probability of a same-side outcome to be about 51%. Our data lend strong support to this precise prediction: the coins landed on the same side more often than not, $\text{Pr}(\text{same side}) = 0.508$, 95% credible interval (CI) [$0.506$, $0.509$], $\text{BF}_{\text{same-side bias}} = 2359$. Furthermore, the data revealed considerable between-people variation in the degree of this same-side bias. Our data also confirmed the generic prediction that when people flip an ordinary coin -- with the initial side-up randomly determined -- it is equally likely to land heads or tails: $\text{Pr}(\text{heads}) = 0.500$, 95% CI [$0.498$, $0.502$], $\text{BF}_{\text{heads-tails bias}} = 0.182$. Furthermore, this lack of heads-tails bias does not appear to vary across coins. Additional exploratory analyses revealed that the within-people same-side bias decreased as more coins were flipped, an effect that is consistent with the possibility that practice makes people flip coins in a less wobbly fashion. Our data therefore provide strong evidence that when some (but not all) people flip a fair coin, it tends to land on the same side it started. Our data provide compelling statistical support for the DHM physics model of coin tossing. △ Less

Submitted 2 June, 2024; v1 submitted 6 October, 2023; originally announced October 2023.

arXiv:2310.02614 [pdf, ps, other]

On Quantified Observability Analysis in Multiagent Systems

Authors: Chunyan Mu, Jun Pang

Abstract: In multiagent systems (MASs), agents' observation upon system behaviours may improve the overall team performance, but may also leak sensitive information to an observer. A quantified observability analysis can thus be useful to assist decision-making in MASs by operators seeking to optimise the relationship between performance effectiveness and information exposure through observations in practic… ▽ More In multiagent systems (MASs), agents' observation upon system behaviours may improve the overall team performance, but may also leak sensitive information to an observer. A quantified observability analysis can thus be useful to assist decision-making in MASs by operators seeking to optimise the relationship between performance effectiveness and information exposure through observations in practice. This paper presents a novel approach to quantitatively analysing the observability properties in MASs. The concept of opacity is applied to formally express the characterisation of observability in MASs modelled as partially observable multiagent systems. We propose a temporal logic oPATL to reason about agents' observability with quantitative goals, which capture the probability of information transparency of system behaviours to an observer, and develop verification techniques for quantitatively analysing such properties. We implement the approach as an extension of the PRISM model checker, and illustrate its applicability via several examples. △ Less

Submitted 4 October, 2023; originally announced October 2023.

Comments: 8 pages

arXiv:2310.01994 [pdf, other]

Understanding Masked Autoencoders From a Local Contrastive Perspective

Authors: Xiaoyu Yue, Lei Bai, Meng Wei, Jiangmiao Pang, Xihui Liu, Lu** Zhou, Wanli Ouyang

Abstract: Masked AutoEncoder (MAE) has revolutionized the field of self-supervised learning with its simple yet effective masking and reconstruction strategies. However, despite achieving state-of-the-art performance across various downstream vision tasks, the underlying mechanisms that drive MAE's efficacy are less well-explored compared to the canonical contrastive learning paradigm. In this paper, we fir… ▽ More Masked AutoEncoder (MAE) has revolutionized the field of self-supervised learning with its simple yet effective masking and reconstruction strategies. However, despite achieving state-of-the-art performance across various downstream vision tasks, the underlying mechanisms that drive MAE's efficacy are less well-explored compared to the canonical contrastive learning paradigm. In this paper, we first propose a local perspective to explicitly extract a local contrastive form from MAE's reconstructive objective at the patch level. And then we introduce a new empirical framework, called Local Contrastive MAE (LC-MAE), to analyze both reconstructive and contrastive aspects of MAE. LC-MAE reveals that MAE learns invariance to random masking and ensures distribution consistency between the learned token embeddings and the original images. Furthermore, we dissect the contribution of the decoder and random masking to MAE's success, revealing both the decoder's learning mechanism and the dual role of random masking as data augmentation and effective receptive field restriction. Our experimental analysis sheds light on the intricacies of MAE and summarizes some useful design methodologies, which can inspire more powerful visual self-supervised methods. △ Less

Submitted 8 December, 2023; v1 submitted 3 October, 2023; originally announced October 2023.

arXiv:2309.07918 [pdf, other]

Unified Human-Scene Interaction via Prompted Chain-of-Contacts

Authors: Zeqi Xiao, Tai Wang, **gbo Wang, **kun Cao, Wenwei Zhang, Bo Dai, Dahua Lin, Jiangmiao Pang

Abstract: Human-Scene Interaction (HSI) is a vital component of fields like embodied AI and virtual reality. Despite advancements in motion quality and physical plausibility, two pivotal factors, versatile interaction control and the development of a user-friendly interface, require further exploration before the practical application of HSI. This paper presents a unified HSI framework, UniHSI, which suppor… ▽ More Human-Scene Interaction (HSI) is a vital component of fields like embodied AI and virtual reality. Despite advancements in motion quality and physical plausibility, two pivotal factors, versatile interaction control and the development of a user-friendly interface, require further exploration before the practical application of HSI. This paper presents a unified HSI framework, UniHSI, which supports unified control of diverse interactions through language commands. This framework is built upon the definition of interaction as Chain of Contacts (CoC): steps of human joint-object part pairs, which is inspired by the strong correlation between interaction types and human-object contact regions. Based on the definition, UniHSI constitutes a Large Language Model (LLM) Planner to translate language prompts into task plans in the form of CoC, and a Unified Controller that turns CoC into uniform task execution. To facilitate training and evaluation, we collect a new dataset named ScenePlan that encompasses thousands of task plans generated by LLMs based on diverse scenarios. Comprehensive experiments demonstrate the effectiveness of our framework in versatile task execution and generalizability to real scanned scenes. The project page is at https://github.com/OpenRobotLab/UniHSI . △ Less

Submitted 19 April, 2024; v1 submitted 14 September, 2023; originally announced September 2023.

Comments: A unified Human-Scene Interaction framework that supports versatile interactions through language commands.Project URL: https://xizaoqu.github.io/unihsi/ . Code: https://github.com/OpenRobotLab/UniHSI

arXiv:2308.16911 [pdf, other]

PointLLM: Empowering Large Language Models to Understand Point Clouds

Authors: Runsen Xu, Xiaolong Wang, Tai Wang, Yilun Chen, Jiangmiao Pang, Dahua Lin

Abstract: The unprecedented advancements in Large Language Models (LLMs) have shown a profound impact on natural language processing but are yet to fully embrace the realm of 3D understanding. This paper introduces PointLLM, a preliminary effort to fill this gap, enabling LLMs to understand point clouds and offering a new avenue beyond 2D visual data. PointLLM understands colored object point clouds with hu… ▽ More The unprecedented advancements in Large Language Models (LLMs) have shown a profound impact on natural language processing but are yet to fully embrace the realm of 3D understanding. This paper introduces PointLLM, a preliminary effort to fill this gap, enabling LLMs to understand point clouds and offering a new avenue beyond 2D visual data. PointLLM understands colored object point clouds with human instructions and generates contextually appropriate responses, illustrating its grasp of point clouds and common sense. Specifically, it leverages a point cloud encoder with a powerful LLM to effectively fuse geometric, appearance, and linguistic information. We collect a novel dataset comprising 660K simple and 70K complex point-text instruction pairs to enable a two-stage training strategy: aligning latent spaces and subsequently instruction-tuning the unified model. To rigorously evaluate the perceptual and generalization capabilities of PointLLM, we establish two benchmarks: Generative 3D Object Classification and 3D Object Captioning, assessed through three different methods, including human evaluation, GPT-4/ChatGPT evaluation, and traditional metrics. Experimental results reveal PointLLM's superior performance over existing 2D and 3D baselines, with a notable achievement in human-evaluated object captioning tasks where it surpasses human annotators in over 50% of the samples. Codes, datasets, and benchmarks are available at https://github.com/OpenRobotLab/PointLLM . △ Less

Submitted 1 December, 2023; v1 submitted 31 August, 2023; originally announced August 2023.

Comments: 28 pages. Empowering large language models with 3D point cloud understanding, accompanied by a novel dataset and carefully designed benchmarks. Project page: https://runsenxu.com/projects/PointLLM

arXiv:2308.15413 [pdf, other]

Wrap**Net: Mesh Autoencoder via Deep Sphere Deformation

Authors: Eric Lei, Muhammad Asad Lodhi, Jiahao Pang, Junghyun Ahn, Dong Tian

Abstract: There have been recent efforts to learn more meaningful representations via fixed length codewords from mesh data, since a mesh serves as a complete model of underlying 3D shape compared to a point cloud. However, the mesh connectivity presents new difficulties when constructing a deep learning pipeline for meshes. Previous mesh unsupervised learning approaches typically assume category-specific t… ▽ More There have been recent efforts to learn more meaningful representations via fixed length codewords from mesh data, since a mesh serves as a complete model of underlying 3D shape compared to a point cloud. However, the mesh connectivity presents new difficulties when constructing a deep learning pipeline for meshes. Previous mesh unsupervised learning approaches typically assume category-specific templates, e.g., human face/body templates. It restricts the learned latent codes to only be meaningful for objects in a specific category, so the learned latent spaces are unable to be used across different types of objects. In this work, we present Wrap**Net, the first mesh autoencoder enabling general mesh unsupervised learning over heterogeneous objects. It introduces a novel base graph in the bottleneck dedicated to representing mesh connectivity, which is shown to facilitate learning a shared latent space representing object shape. The superiority of Wrap**Net mesh learning is further demonstrated via improved reconstruction quality and competitive classification compared to point cloud learning, as well as latent interpolation between meshes of different categories. △ Less

Submitted 29 August, 2023; originally announced August 2023.

arXiv:2308.07635 [pdf, other]

LLM-Mini-CEX: Automatic Evaluation of Large Language Model for Diagnostic Conversation

Authors: Xiaoming Shi, Jie Xu, **ru Ding, Jiali Pang, Sichen Liu, Shuqing Luo, Xingwei Peng, Lu Lu, Haihong Yang, Mingtao Hu, Tong Ruan, Shaoting Zhang

Abstract: There is an increasing interest in develo** LLMs for medical diagnosis to improve diagnosis efficiency. Despite their alluring technological potential, there is no unified and comprehensive evaluation criterion, leading to the inability to evaluate the quality and potential risks of medical LLMs, further hindering the application of LLMs in medical treatment scenarios. Besides, current evaluatio… ▽ More There is an increasing interest in develo** LLMs for medical diagnosis to improve diagnosis efficiency. Despite their alluring technological potential, there is no unified and comprehensive evaluation criterion, leading to the inability to evaluate the quality and potential risks of medical LLMs, further hindering the application of LLMs in medical treatment scenarios. Besides, current evaluations heavily rely on labor-intensive interactions with LLMs to obtain diagnostic dialogues and human evaluation on the quality of diagnosis dialogue. To tackle the lack of unified and comprehensive evaluation criterion, we first initially establish an evaluation criterion, termed LLM-specific Mini-CEX to assess the diagnostic capabilities of LLMs effectively, based on original Mini-CEX. To address the labor-intensive interaction problem, we develop a patient simulator to engage in automatic conversations with LLMs, and utilize ChatGPT for evaluating diagnosis dialogues automatically. Experimental results show that the LLM-specific Mini-CEX is adequate and necessary to evaluate medical diagnosis dialogue. Besides, ChatGPT can replace manual evaluation on the metrics of humanistic qualities and provides reproducible and automated comparisons between different LLMs. △ Less

Submitted 15 August, 2023; originally announced August 2023.

arXiv:2306.05233 [pdf, other]

Ownership Protection of Generative Adversarial Networks

Authors: Hailong Hu, Jun Pang

Abstract: Generative adversarial networks (GANs) have shown remarkable success in image synthesis, making GAN models themselves commercially valuable to legitimate model owners. Therefore, it is critical to technically protect the intellectual property of GANs. Prior works need to tamper with the training set or training process, and they are not robust to emerging model extraction attacks. In this paper, w… ▽ More Generative adversarial networks (GANs) have shown remarkable success in image synthesis, making GAN models themselves commercially valuable to legitimate model owners. Therefore, it is critical to technically protect the intellectual property of GANs. Prior works need to tamper with the training set or training process, and they are not robust to emerging model extraction attacks. In this paper, we propose a new ownership protection method based on the common characteristics of a target model and its stolen models. Our method can be directly applicable to all well-trained GANs as it does not require retraining target models. Extensive experimental results show that our new method can achieve the best protection performance, compared to the state-of-the-art methods. Finally, we demonstrate the effectiveness of our method with respect to the number of generations of model extraction attacks, the number of generated samples, different datasets, as well as adaptive attacks. △ Less

Submitted 8 June, 2023; originally announced June 2023.

arXiv:2306.05208 [pdf, other]

PriSampler: Mitigating Property Inference of Diffusion Models

Authors: Hailong Hu, Jun Pang

Abstract: Diffusion models have been remarkably successful in data synthesis. However, when these models are applied to sensitive datasets, such as banking and human face data, they might bring up severe privacy concerns. This work systematically presents the first privacy study about property inference attacks against diffusion models, where adversaries aim to extract sensitive global properties of its tra… ▽ More Diffusion models have been remarkably successful in data synthesis. However, when these models are applied to sensitive datasets, such as banking and human face data, they might bring up severe privacy concerns. This work systematically presents the first privacy study about property inference attacks against diffusion models, where adversaries aim to extract sensitive global properties of its training set from a diffusion model. Specifically, we focus on the most practical attack scenario: adversaries are restricted to accessing only synthetic data. Under this realistic scenario, we conduct a comprehensive evaluation of property inference attacks on various diffusion models trained on diverse data types, including tabular and image datasets. A broad range of evaluations reveals that diffusion models and their samplers are universally vulnerable to property inference attacks. In response, we propose a new model-agnostic plug-in method PriSampler to mitigate the risks of the property inference of diffusion models. PriSampler can be directly applied to well-trained diffusion models and support both stochastic and deterministic sampling. Extensive experiments illustrate the effectiveness of our defense, and it can lead adversaries to infer the proportion of properties as close as predefined values that model owners wish. Notably, PriSampler also shows its significantly superior performance to diffusion models trained with differential privacy on both model utility and defense performance. This work will elevate the awareness of preventing property inference attacks and encourage privacy-preserving synthetic data release. △ Less

Submitted 29 April, 2024; v1 submitted 8 June, 2023; originally announced June 2023.

arXiv:2305.14798 [pdf, ps, other]

The Minimization of Piecewise Functions: Pseudo Stationarity

Authors: Ying Cui, Junyi Liu, Jong-Shi Pang

Abstract: There are many significant applied contexts that require the solution of discontinuous optimization problems in finite dimensions. Yet these problems are very difficult, both computationally and analytically. With the functions being discontinuous and a minimizer (local or global) of the problems, even if it exists, being impossible to verifiably compute, a foremost question is what kind of ''stat… ▽ More There are many significant applied contexts that require the solution of discontinuous optimization problems in finite dimensions. Yet these problems are very difficult, both computationally and analytically. With the functions being discontinuous and a minimizer (local or global) of the problems, even if it exists, being impossible to verifiably compute, a foremost question is what kind of ''stationary solutions'' one can expect to obtain; these solutions provide promising candidates for minimizers; i.e., their defining conditions are necessary for optimality. Motivated by recent results on sparse optimization, we introduce in this paper such a kind of solution, termed ''pseudo B- (for Bouligand) stationary solution'', for a broad class of discontinuous piecewise continuous optimization problems with objective and constraint defined by indicator functions of the positive real axis composite with functions that are possibly nonsmooth. We present two approaches for computing such a solution. One approach is based on lifting the problem to a higher dimension via the epigraphical formulation of the indicator functions; this requires the addition of some auxiliary variables. The other approach is based on certain continuous (albeit not necessarily differentiable) piecewise approximations of the indicator functions and the convergence to a pseudo B-stationary solution of the original problem is established. The conditions for convergence are discussed and illustrated by an example. △ Less

Submitted 24 May, 2023; originally announced May 2023.

MSC Class: 90C26; 90C30

arXiv:2305.14483 [pdf, other]

Language Model Self-improvement by Reinforcement Learning Contemplation

Authors: **g-Cheng Pang, Pengyuan Wang, Kaiyuan Li, Xiong-Hui Chen, Jiacheng Xu, Zongzhang Zhang, Yang Yu

Abstract: Large Language Models (LLMs) have exhibited remarkable performance across various natural language processing (NLP) tasks. However, fine-tuning these models often necessitates substantial supervision, which can be expensive and time-consuming to obtain. This paper introduces a novel unsupervised method called LanguageModel Self-Improvement by Reinforcement Learning Contemplation (SIRLC) that impro… ▽ More Large Language Models (LLMs) have exhibited remarkable performance across various natural language processing (NLP) tasks. However, fine-tuning these models often necessitates substantial supervision, which can be expensive and time-consuming to obtain. This paper introduces a novel unsupervised method called LanguageModel Self-Improvement by Reinforcement Learning Contemplation (SIRLC) that improves LLMs without reliance on external labels. Our approach is grounded in the observation that it is simpler for language models to assess text quality than to generate text. Building on this insight, SIRLC assigns LLMs dual roles as both student and teacher. As a student, the LLM generates answers to unlabeled questions, while as a teacher, it evaluates the generated text and assigns scores accordingly. The model parameters are updated using reinforcement learning to maximize the evaluation score. We demonstrate that SIRLC can be applied to various NLP tasks, such as reasoning problems, text generation, and machine translation. Our experiments show that SIRLC effectively improves LLM performance without external supervision, resulting in a 5.6% increase in answering accuracy for reasoning tasks and a rise in BERTScore from 0.82 to 0.86 for translation tasks. Furthermore, SIRLC can be applied to models of different sizes, showcasing its broad applicability. △ Less

Submitted 23 May, 2023; originally announced May 2023.

arXiv:2305.12211 [pdf, other]

Coordinate-Update Algorithms can Efficiently Detect Infeasible Optimization Problems

Authors: **hee Paeng, Jisun Park, Ernest K. Ryu

Abstract: Coordinate update/descent algorithms are widely used in large-scale optimization due to their low per-iteration cost and scalability, but their behavior on infeasible or misspecified problems has not been much studied compared to the algorithms that use full updates. For coordinate-update methods to be as widely adopted to the extent so that they can be used as engines of general-purpose solvers,… ▽ More Coordinate update/descent algorithms are widely used in large-scale optimization due to their low per-iteration cost and scalability, but their behavior on infeasible or misspecified problems has not been much studied compared to the algorithms that use full updates. For coordinate-update methods to be as widely adopted to the extent so that they can be used as engines of general-purpose solvers, it is necessary to also understand their behavior under pathological problem instances. In this work, we show that the normalized iterates of randomized coordinate-update fixed-point iterations (RC-FPI) converge to the infimal displacement vector and use this result to design an efficient infeasibility detection method. We then extend the analysis to the setup where the coordinates are defined by non-orthonormal basis using the Friedrichs angle and then apply the machinery to decentralized optimization problems. △ Less

Submitted 19 November, 2023; v1 submitted 20 May, 2023; originally announced May 2023.

arXiv:2305.07340 [pdf, other]

MedGPTEval: A Dataset and Benchmark to Evaluate Responses of Large Language Models in Medicine

Authors: Jie Xu, Lu Lu, Sen Yang, Bilin Liang, Xinwei Peng, Jiali Pang, **ru Ding, Xiaoming Shi, Lingrui Yang, Huan Song, Kang Li, Xin Sun, Shaoting Zhang

Abstract: METHODS: First, a set of evaluation criteria is designed based on a comprehensive literature review. Second, existing candidate criteria are optimized for using a Delphi method by five experts in medicine and engineering. Third, three clinical experts design a set of medical datasets to interact with LLMs. Finally, benchmarking experiments are conducted on the datasets. The responses generated by… ▽ More METHODS: First, a set of evaluation criteria is designed based on a comprehensive literature review. Second, existing candidate criteria are optimized for using a Delphi method by five experts in medicine and engineering. Third, three clinical experts design a set of medical datasets to interact with LLMs. Finally, benchmarking experiments are conducted on the datasets. The responses generated by chatbots based on LLMs are recorded for blind evaluations by five licensed medical experts. RESULTS: The obtained evaluation criteria cover medical professional capabilities, social comprehensive capabilities, contextual capabilities, and computational robustness, with sixteen detailed indicators. The medical datasets include twenty-seven medical dialogues and seven case reports in Chinese. Three chatbots are evaluated, ChatGPT by OpenAI, ERNIE Bot by Baidu Inc., and Doctor PuJiang (Dr. PJ) by Shanghai Artificial Intelligence Laboratory. Experimental results show that Dr. PJ outperforms ChatGPT and ERNIE Bot in both multiple-turn medical dialogue and case report scenarios. △ Less

Submitted 12 May, 2023; originally announced May 2023.

arXiv:2305.02520 [pdf]

doi 10.1103/PhysRevB.108.134430

Electric-field-induced formation and annihilation of skyrmions in two-dimensional magnet

Authors: **gman Pang, Hongjia Wang, Yufei Tang, Yun Zhang, Laurent Bellaiche

Abstract: Electric manipulation of skyrmions in 2D magnetic materials has garnered significant attention due to the potential in energy-efficient spintronic devices. In this work, using first-principles calculations and Monte Carlo simulations, we report the electric-field-tunable magnetic skyrmions in MnIn2Te4 monolayer. By adjusting the magnetic parameters, including the Heisenberg exchange interaction, D… ▽ More Electric manipulation of skyrmions in 2D magnetic materials has garnered significant attention due to the potential in energy-efficient spintronic devices. In this work, using first-principles calculations and Monte Carlo simulations, we report the electric-field-tunable magnetic skyrmions in MnIn2Te4 monolayer. By adjusting the magnetic parameters, including the Heisenberg exchange interaction, DMI, and MAE, through applying an electric field, the formation or annihilation of skyrmions can be achieved. Our work suggests a platform for experimental realization of the electric-field-tunable magnetic skyrmions in 2D magnets. △ Less

Submitted 3 May, 2023; originally announced May 2023.

arXiv:2304.14485 [pdf]

Inter-sphere consistency-based method for camera-projector pair calibration

Authors: Zhaoshuai Qi, **gqi Pang, Yifeng Hao, Yanning Zhang

Abstract: We construct constraints from consistency between estimated parameters from different spheres, termed inter-sphere consistency. It facilitates more flexible calibration using only two spheres, which has been considered a challenging and not well addressed ill-posed problem. We construct constraints from consistency between estimated parameters from different spheres, termed inter-sphere consistency. It facilitates more flexible calibration using only two spheres, which has been considered a challenging and not well addressed ill-posed problem. △ Less

Submitted 10 March, 2023; originally announced April 2023.

Comments: 3 pages,1 figure

arXiv:2304.09854 [pdf, other]

Transformer-Based Visual Segmentation: A Survey

Authors: Xiangtai Li, Henghui Ding, Haobo Yuan, Wenwei Zhang, Jiangmiao Pang, Guangliang Cheng, Kai Chen, Ziwei Liu, Chen Change Loy

Abstract: Visual segmentation seeks to partition images, video frames, or point clouds into multiple segments or groups. This technique has numerous real-world applications, such as autonomous driving, image editing, robot sensing, and medical analysis. Over the past decade, deep learning-based methods have made remarkable strides in this area. Recently, transformers, a type of neural network based on self-… ▽ More Visual segmentation seeks to partition images, video frames, or point clouds into multiple segments or groups. This technique has numerous real-world applications, such as autonomous driving, image editing, robot sensing, and medical analysis. Over the past decade, deep learning-based methods have made remarkable strides in this area. Recently, transformers, a type of neural network based on self-attention originally designed for natural language processing, have considerably surpassed previous convolutional or recurrent approaches in various vision processing tasks. Specifically, vision transformers offer robust, unified, and even simpler solutions for various segmentation tasks. This survey provides a thorough overview of transformer-based visual segmentation, summarizing recent advancements. We first review the background, encompassing problem definitions, datasets, and prior convolutional methods. Next, we summarize a meta-architecture that unifies all recent transformer-based approaches. Based on this meta-architecture, we examine various method designs, including modifications to the meta-architecture and associated applications. We also present several closely related settings, including 3D point cloud segmentation, foundation model tuning, domain-aware segmentation, efficient segmentation, and medical segmentation. Additionally, we compile and re-evaluate the reviewed methods on several well-established datasets. Finally, we identify open challenges in this field and propose directions for future research. The project page can be found at https://github.com/lxtGH/Awesome-Segmentation-With-Transformer. We will also continually monitor developments in this rapidly evolving field. △ Less

Submitted 20 December, 2023; v1 submitted 19 April, 2023; originally announced April 2023.

Comments: Work in progress. Github: https://github.com/lxtGH/Awesome-Segmentation-With-Transformer

arXiv:2303.16628 [pdf, other]

DORT: Modeling Dynamic Objects in Recurrent for Multi-Camera 3D Object Detection and Tracking

Authors: Qing Lian, Tai Wang, Dahua Lin, Jiangmiao Pang

Abstract: Recent multi-camera 3D object detectors usually leverage temporal information to construct multi-view stereo that alleviates the ill-posed depth estimation. However, they typically assume all the objects are static and directly aggregate features across frames. This work begins with a theoretical and empirical analysis to reveal that ignoring the motion of moving objects can result in serious loca… ▽ More Recent multi-camera 3D object detectors usually leverage temporal information to construct multi-view stereo that alleviates the ill-posed depth estimation. However, they typically assume all the objects are static and directly aggregate features across frames. This work begins with a theoretical and empirical analysis to reveal that ignoring the motion of moving objects can result in serious localization bias. Therefore, we propose to model Dynamic Objects in RecurrenT (DORT) to tackle this problem. In contrast to previous global Bird-Eye-View (BEV) methods, DORT extracts object-wise local volumes for motion estimation that also alleviates the heavy computational burden. By iteratively refining the estimated object motion and location, the preceding features can be precisely aggregated to the current frame to mitigate the aforementioned adverse effects. The simple framework has two significant appealing properties. It is flexible and practical that can be plugged into most camera-based 3D object detectors. As there are predictions of object motion in the loop, it can easily track objects across frames according to their nearest center distances. Without bells and whistles, DORT outperforms all the previous methods on the nuScenes detection and tracking benchmarks with 62.5\% NDS and 57.6\% AMOTA, respectively. The source code will be released. △ Less

Submitted 18 April, 2023; v1 submitted 29 March, 2023; originally announced March 2023.

arXiv:2303.13510 [pdf, other]

MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based Self-Supervised Pre-Training

Authors: Runsen Xu, Tai Wang, Wenwei Zhang, Runjian Chen, **kun Cao, Jiangmiao Pang, Dahua Lin

Abstract: This paper introduces the Masked Voxel Jigsaw and Reconstruction (MV-JAR) method for LiDAR-based self-supervised pre-training and a carefully designed data-efficient 3D object detection benchmark on the Waymo dataset. Inspired by the scene-voxel-point hierarchy in downstream 3D object detectors, we design masking and reconstruction strategies accounting for voxel distributions in the scene and loc… ▽ More This paper introduces the Masked Voxel Jigsaw and Reconstruction (MV-JAR) method for LiDAR-based self-supervised pre-training and a carefully designed data-efficient 3D object detection benchmark on the Waymo dataset. Inspired by the scene-voxel-point hierarchy in downstream 3D object detectors, we design masking and reconstruction strategies accounting for voxel distributions in the scene and local point distributions within the voxel. We employ a Reversed-Furthest-Voxel-Sampling strategy to address the uneven distribution of LiDAR points and propose MV-JAR, which combines two techniques for modeling the aforementioned distributions, resulting in superior performance. Our experiments reveal limitations in previous data-efficient experiments, which uniformly sample fine-tuning splits with varying data proportions from each LiDAR sequence, leading to similar data diversity across splits. To address this, we propose a new benchmark that samples scene sequences for diverse fine-tuning splits, ensuring adequate model convergence and providing a more accurate evaluation of pre-training methods. Experiments on our Waymo benchmark and the KITTI dataset demonstrate that MV-JAR consistently and significantly improves 3D detection performance across various data scales, achieving up to a 6.3% increase in mAPH compared to training from scratch. Codes and the benchmark will be available at https://github.com/SmartBot-PJLab/MV-JAR . △ Less

Submitted 23 March, 2023; originally announced March 2023.

Comments: Accepted by CVPR 2023 with a carefully designed benchmark on Waymo. Codes and the benchmark will be available at https://github.com/SmartBot-PJLab/MV-JAR

arXiv:2303.13509 [pdf, other]

Position-Guided Point Cloud Panoptic Segmentation Transformer

Authors: Zeqi Xiao, Wenwei Zhang, Tai Wang, Chen Change Loy, Dahua Lin, Jiangmiao Pang

Abstract: DEtection TRansformer (DETR) started a trend that uses a group of learnable queries for unified visual perception. This work begins by applying this appealing paradigm to LiDAR-based point cloud segmentation and obtains a simple yet effective baseline. Although the naive adaptation obtains fair results, the instance segmentation performance is noticeably inferior to previous works. By diving into… ▽ More DEtection TRansformer (DETR) started a trend that uses a group of learnable queries for unified visual perception. This work begins by applying this appealing paradigm to LiDAR-based point cloud segmentation and obtains a simple yet effective baseline. Although the naive adaptation obtains fair results, the instance segmentation performance is noticeably inferior to previous works. By diving into the details, we observe that instances in the sparse point clouds are relatively small to the whole scene and often have similar geometry but lack distinctive appearance for segmentation, which are rare in the image domain. Considering instances in 3D are more featured by their positional information, we emphasize their roles during the modeling and design a robust Mixed-parameterized Positional Embedding (MPE) to guide the segmentation process. It is embedded into backbone features and later guides the mask prediction and query update processes iteratively, leading to Position-Aware Segmentation (PA-Seg) and Masked Focal Attention (MFA). All these designs impel the queries to attend to specific regions and identify various instances. The method, named Position-guided Point cloud Panoptic segmentation transFormer (P3Former), outperforms previous state-of-the-art methods by 3.4% and 1.2% PQ on SemanticKITTI and nuScenes benchmark, respectively. The source code and models are available at https://github.com/SmartBot-PJLab/P3Former . △ Less

Submitted 23 March, 2023; originally announced March 2023.

Comments: Project page: https://github.com/SmartBot-PJLab/P3Former

arXiv:2303.12782 [pdf, other]

Tube-Link: A Flexible Cross Tube Framework for Universal Video Segmentation

Authors: Xiangtai Li, Haobo Yuan, Wenwei Zhang, Guangliang Cheng, Jiangmiao Pang, Chen Change Loy

Abstract: Video segmentation aims to segment and track every pixel in diverse scenarios accurately. In this paper, we present Tube-Link, a versatile framework that addresses multiple core tasks of video segmentation with a unified architecture. Our framework is a near-online approach that takes a short subclip as input and outputs the corresponding spatial-temporal tube masks. To enhance the modeling of cro… ▽ More Video segmentation aims to segment and track every pixel in diverse scenarios accurately. In this paper, we present Tube-Link, a versatile framework that addresses multiple core tasks of video segmentation with a unified architecture. Our framework is a near-online approach that takes a short subclip as input and outputs the corresponding spatial-temporal tube masks. To enhance the modeling of cross-tube relationships, we propose an effective way to perform tube-level linking via attention along the queries. In addition, we introduce temporal contrastive learning to instance-wise discriminative features for tube-level association. Our approach offers flexibility and efficiency for both short and long video inputs, as the length of each subclip can be varied according to the needs of datasets or scenarios. Tube-Link outperforms existing specialized architectures by a significant margin on five video segmentation datasets. Specifically, it achieves almost 13% relative improvements on VIPSeg and 4% improvements on KITTI-STEP over the strong baseline Video K-Net. When using a ResNet50 backbone on Youtube-VIS-2019 and 2021, Tube-Link boosts IDOL by 3% and 4%, respectively. △ Less

Submitted 21 August, 2023; v1 submitted 22 March, 2023; originally announced March 2023.

Comments: ICCV-2023, Project page: https://github.com/lxtGH/Tube-Link (fix typos and errors, update the results)

arXiv:2303.12776 [pdf, other]

Dense Distinct Query for End-to-End Object Detection

Authors: Shilong Zhang, Xinjiang Wang, Jiaqi Wang, Jiangmiao Pang, Chengqi Lyu, Wenwei Zhang, ** Luo, Kai Chen

Abstract: One-to-one label assignment in object detection has successfully obviated the need for non-maximum suppression (NMS) as postprocessing and makes the pipeline end-to-end. However, it triggers a new dilemma as the widely used sparse queries cannot guarantee a high recall, while dense queries inevitably bring more similar queries and encounter optimization difficulties. As both sparse and dense queri… ▽ More One-to-one label assignment in object detection has successfully obviated the need for non-maximum suppression (NMS) as postprocessing and makes the pipeline end-to-end. However, it triggers a new dilemma as the widely used sparse queries cannot guarantee a high recall, while dense queries inevitably bring more similar queries and encounter optimization difficulties. As both sparse and dense queries are problematic, then what are the expected queries in end-to-end object detection? This paper shows that the solution should be Dense Distinct Queries (DDQ). Concretely, we first lay dense queries like traditional detectors and then select distinct ones for one-to-one assignments. DDQ blends the advantages of traditional and recent end-to-end detectors and significantly improves the performance of various detectors including FCN, R-CNN, and DETRs. Most impressively, DDQ-DETR achieves 52.1 AP on MS-COCO dataset within 12 epochs using a ResNet-50 backbone, outperforming all existing detectors in the same setting. DDQ also shares the benefit of end-to-end detectors in crowded scenes and achieves 93.8 AP on CrowdHuman. We hope DDQ can inspire researchers to consider the complementarity between traditional methods and end-to-end detectors. The source code can be found at \url{https://github.com/jshilong/DDQ}. △ Less

Submitted 5 July, 2023; v1 submitted 22 March, 2023; originally announced March 2023.

Comments: Accepted to CVPR2023. Code has been released at https://github.com/jshilong/DDQ

arXiv:2303.09819 [pdf, ps, other]

doi 10.1016/j.nima.2023.168391

Muon radiography experiments on the subway overburden structure detection

Authors: Xin Mao, Zhiwei Li, Shuning Dong, **gtai Li, Jianming Zhang, Jie Pang, Ya** Cheng, Bin Liao, ** Ouyang, Ran Han

Abstract: Muon radiography is an innovative and non-destructive technique for internal density structure imaging, based on measuring the attenuation of cosmic-ray muons after they penetrate the target. Due to the strong penetration ability of muons, the detection range of muon radiography can reach the order of hundreds of meters or even kilometers. Using a portable muon detector composed of plastic scintil… ▽ More Muon radiography is an innovative and non-destructive technique for internal density structure imaging, based on measuring the attenuation of cosmic-ray muons after they penetrate the target. Due to the strong penetration ability of muons, the detection range of muon radiography can reach the order of hundreds of meters or even kilometers. Using a portable muon detector composed of plastic scintillators and silicon photomultipliers, we performed a short-duration(1h) flux scanning experiment of the overburden above the platform and tunnel of the Xiaoying West Road subway station under construction. With the observation direction facing up, the detector is placed on the north side of the track and moved eastward from the platform section inside the station to the tunnel section. The scanning length is 264m and a total of 21 locations are observed. By comparing the observed and predicted values of the muon survival ratio at different locations, the experiment accurately detects the jump in thickness at the interface of the platform section and tunnel section. Furthermore, unknown anomalies caused by random placed light brick piles and side passage mouth above the observation locations are detected and confirmed later. This experiment verifies the feasibility of using natural muons to quickly detect abnormal structures of the overburden of tunnel, and shows that muon radiography has broad application prospects in tunnel safety and other similar aspects. △ Less

Submitted 17 March, 2023; originally announced March 2023.

Comments: 30 pages, 10 figures

arXiv:2303.09511 [pdf, other]

Capacity-achieving Polar-based Codes with Sparsity Constraints on the Generator Matrices

Authors: James Chin-Jen Pang, Hessam Mahdavifar, S. Sandeep Pradhan

Abstract: In this paper, we leverage polar codes and the well-established channel polarization to design capacity-achieving codes with a certain constraint on the weights of all the columns in the generator matrix (GM) while having a low-complexity decoding algorithm. We first show that given a binary-input memoryless symmetric (BMS) channel $W$ and a constant $s \in (0, 1]$, there exists a polarization ker… ▽ More In this paper, we leverage polar codes and the well-established channel polarization to design capacity-achieving codes with a certain constraint on the weights of all the columns in the generator matrix (GM) while having a low-complexity decoding algorithm. We first show that given a binary-input memoryless symmetric (BMS) channel $W$ and a constant $s \in (0, 1]$, there exists a polarization kernel such that the corresponding polar code is capacity-achieving with the \textit{rate of polarization} $s/2$, and the GM column weights being bounded from above by $N^s$. To improve the sparsity versus error rate trade-off, we devise a column-splitting algorithm and two coding schemes for BEC and then for general BMS channels. The \textit{polar-based} codes generated by the two schemes inherit several fundamental properties of polar codes with the original $2 \times 2$ kernel including the decay in error probability, decoding complexity, and the capacity-achieving property. Furthermore, they demonstrate the additional property that their GM column weights are bounded from above sublinearly in $N$, while the original polar codes have some column weights that are linear in $N$. In particular, for any BEC and $β<0.5$, the existence of a sequence of capacity-achieving polar-based codes where all the GM column weights are bounded from above by $N^λ$ with $λ\approx 0.585$, and with the error probability bounded by $O(2^{-N^β} )$ under a decoder with complexity $O(N\log N)$, is shown. The existence of similar capacity-achieving polar-based codes with the same decoding complexity is shown for any BMS channel and $β<0.5$ with $λ\approx 0.631$. △ Less

Submitted 16 March, 2023; originally announced March 2023.

Comments: 31 pages, single column. arXiv admin note: substantial text overlap with arXiv:2012.13977

arXiv:2303.03055 [pdf]

Low-discrepancy Sampling in the Expanded Dimensional Space: An Acceleration Technique for Particle Swarm Optimization

Authors: Feng Wu, Yuelin Zhao, Jianhua Pang, Jun Yan, Wanxie Zhong

Abstract: Compared with random sampling, low-discrepancy sampling is more effective in covering the search space. However, the existing research cannot definitely state whether the impact of a low-discrepancy sample on particle swarm optimization (PSO) is positive or negative. Using Niderreiter's theorem, this study completes an error analysis of PSO, which reveals that the error bound of PSO at each iterat… ▽ More Compared with random sampling, low-discrepancy sampling is more effective in covering the search space. However, the existing research cannot definitely state whether the impact of a low-discrepancy sample on particle swarm optimization (PSO) is positive or negative. Using Niderreiter's theorem, this study completes an error analysis of PSO, which reveals that the error bound of PSO at each iteration depends on the dispersion of the sample set in an expanded dimensional space. Based on this error analysis, an acceleration technique for PSO-type algorithms is proposed with low-discrepancy sampling in the expanded dimensional space. The acceleration technique can generate a low-discrepancy sample set with a smaller dispersion, compared with a random sampling, in the expanded dimensional space; it also reduces the error at each iteration, and hence improves the convergence speed. The acceleration technique is combined with the standard PSO and the comprehensive learning particle swarm optimization, and the performance of the improved algorithm is compared with the original algorithm. The experimental results show that the two improved algorithms have significantly faster convergence speed under the same accuracy requirement. △ Less

Submitted 2 July, 2023; v1 submitted 6 March, 2023; originally announced March 2023.

Comments: 29 pages, 0 figures

ACM Class: F.2.2

arXiv:2303.01012 [pdf, other]

Exploring Unconfirmed Transactions for Effective Bitcoin Address Clustering

Authors: Kai Wang, Maike Tong, Changhao Wu, Jun Pang, Chen Chen, Xiapu Luo, Weili Han

Abstract: The development of clustering heuristics has demonstrated that Bitcoin is not completely anonymous. Currently, existing clustering heuristics only consider confirmed transactions recorded in the Bitcoin blockchain. However, unconfirmed transactions in the mempool have yet to be utilized to improve the performance of the clustering heuristics. In this paper, we bridge this gap by combining unconf… ▽ More The development of clustering heuristics has demonstrated that Bitcoin is not completely anonymous. Currently, existing clustering heuristics only consider confirmed transactions recorded in the Bitcoin blockchain. However, unconfirmed transactions in the mempool have yet to be utilized to improve the performance of the clustering heuristics. In this paper, we bridge this gap by combining unconfirmed and confirmed transactions for clustering Bitcoin addresses effectively. First, we present a data collection system for capturing unconfirmed transactions. Two case studies are performed to show the presence of user behaviors in unconfirmed transactions not present in confirmed transactions. Next, we apply the state-of-the-art clustering heuristics to unconfirmed transactions, and the clustering results can reduce the number of entities after applying, for example, the co-spend heuristics in confirmed transactions by 2.3%. Finally, we propose three novel clustering heuristics to capture specific behavior patterns in unconfirmed transactions, which further reduce the number of entities after the application of the co-spend heuristics by 9.8%. Our results demonstrate the utility of unconfirmed transactions in address clustering and further shed light on the limitations of anonymity in cryptocurrencies. To the best of our knowledge, this paper is the first to apply the unconfirmed transactions in Bitcoin to cluster addresses. △ Less

Submitted 3 March, 2023; v1 submitted 2 March, 2023; originally announced March 2023.

Comments: 15 pages, 13 figures, 4 tables. typos corrected

arXiv:2302.14638 [pdf, other]

doi 10.1109/TASLP.2023.3235194

SpeechFormer++: A Hierarchical Efficient Framework for Paralinguistic Speech Processing

Authors: Weidong Chen, Xiaofen Xing, Xiangmin Xu, Jianxin Pang, Lan Du

Abstract: Paralinguistic speech processing is important in addressing many issues, such as sentiment and neurocognitive disorder analyses. Recently, Transformer has achieved remarkable success in the natural language processing field and has demonstrated its adaptation to speech. However, previous works on Transformer in the speech field have not incorporated the properties of speech, leaving the full poten… ▽ More Paralinguistic speech processing is important in addressing many issues, such as sentiment and neurocognitive disorder analyses. Recently, Transformer has achieved remarkable success in the natural language processing field and has demonstrated its adaptation to speech. However, previous works on Transformer in the speech field have not incorporated the properties of speech, leaving the full potential of Transformer unexplored. In this paper, we consider the characteristics of speech and propose a general structure-based framework, called SpeechFormer++, for paralinguistic speech processing. More concretely, following the component relationship in the speech signal, we design a unit encoder to model the intra- and inter-unit information (i.e., frames, phones, and words) efficiently. According to the hierarchical relationship, we utilize merging blocks to generate features at different granularities, which is consistent with the structural pattern in the speech signal. Moreover, a word encoder is introduced to integrate word-grained features into each unit encoder, which effectively balances fine-grained and coarse-grained information. SpeechFormer++ is evaluated on the speech emotion recognition (IEMOCAP & MELD), depression classification (DAIC-WOZ) and Alzheimer's disease detection (Pitt) tasks. The results show that SpeechFormer++ outperforms the standard Transformer while greatly reducing the computational cost. Furthermore, it delivers superior results compared to the state-of-the-art approaches. △ Less

Submitted 27 February, 2023; originally announced February 2023.

Comments: 14 pages, 7 figures, 14 tables, TASLP 2023 paper

arXiv:2302.13729 [pdf, other]

DST: Deformable Speech Transformer for Emotion Recognition

Authors: Weidong Chen, Xiaofen Xing, Xiangmin Xu, Jianxin Pang, Lan Du

Abstract: Enabled by multi-head self-attention, Transformer has exhibited remarkable results in speech emotion recognition (SER). Compared to the original full attention mechanism, window-based attention is more effective in learning fine-grained features while greatly reducing model redundancy. However, emotional cues are present in a multi-granularity manner such that the pre-defined fixed window can seve… ▽ More Enabled by multi-head self-attention, Transformer has exhibited remarkable results in speech emotion recognition (SER). Compared to the original full attention mechanism, window-based attention is more effective in learning fine-grained features while greatly reducing model redundancy. However, emotional cues are present in a multi-granularity manner such that the pre-defined fixed window can severely degrade the model flexibility. In addition, it is difficult to obtain the optimal window settings manually. In this paper, we propose a Deformable Speech Transformer, named DST, for SER task. DST determines the usage of window sizes conditioned on input speech via a light-weight decision network. Meanwhile, data-dependent offsets derived from acoustic features are utilized to adjust the positions of the attention windows, allowing DST to adaptively discover and attend to the valuable information embedded in the speech. Extensive experiments on IEMOCAP and MELD demonstrate the superiority of DST. △ Less

Submitted 27 February, 2023; originally announced February 2023.

Comments: 5 pages, 4 figures, 2tables, accepted by ICASSP 2023

arXiv:2302.09368 [pdf, other]

Natural Language-conditioned Reinforcement Learning with Inside-out Task Language Development and Translation

Authors: **g-Cheng Pang, Xin-Yu Yang, Si-Hang Yang, Yang Yu

Abstract: Natural Language-conditioned reinforcement learning (RL) enables the agents to follow human instructions. Previous approaches generally implemented language-conditioned RL by providing human instructions in natural language (NL) and training a following policy. In this outside-in approach, the policy needs to comprehend the NL and manage the task simultaneously. However, the unbounded NL examples… ▽ More Natural Language-conditioned reinforcement learning (RL) enables the agents to follow human instructions. Previous approaches generally implemented language-conditioned RL by providing human instructions in natural language (NL) and training a following policy. In this outside-in approach, the policy needs to comprehend the NL and manage the task simultaneously. However, the unbounded NL examples often bring much extra complexity for solving concrete RL tasks, which can distract policy learning from completing the task. To ease the learning burden of the policy, we investigate an inside-out scheme for natural language-conditioned RL by develo** a task language (TL) that is task-related and unique. The TL is used in RL to achieve highly efficient and effective policy training. Besides, a translator is trained to translate NL into TL. We implement this scheme as TALAR (TAsk Language with predicAte Representation) that learns multiple predicates to model object relationships as the TL. Experiments indicate that TALAR not only better comprehends NL instructions but also leads to a better instruction-following policy that improves 13.4% success rate and adapts to unseen expressions of NL instruction. The TL can also be an effective task abstraction, naturally compatible with hierarchical RL. △ Less

Submitted 18 February, 2023; originally announced February 2023.

arXiv:2301.09956 [pdf, other]

Membership Inference of Diffusion Models

Authors: Hailong Hu, Jun Pang

Abstract: Recent years have witnessed the tremendous success of diffusion models in data synthesis. However, when diffusion models are applied to sensitive data, they also give rise to severe privacy concerns. In this paper, we systematically present the first study about membership inference attacks against diffusion models, which aims to infer whether a sample was used to train the model. Two attack metho… ▽ More Recent years have witnessed the tremendous success of diffusion models in data synthesis. However, when diffusion models are applied to sensitive data, they also give rise to severe privacy concerns. In this paper, we systematically present the first study about membership inference attacks against diffusion models, which aims to infer whether a sample was used to train the model. Two attack methods are proposed, namely loss-based and likelihood-based attacks. Our attack methods are evaluated on several state-of-the-art diffusion models, over different datasets in relation to privacy-sensitive data. Extensive experimental evaluations show that our attacks can achieve remarkable performance. Furthermore, we exhaustively investigate various factors which can affect attack performance. Finally, we also evaluate the performance of our attack methods on diffusion models trained with differential privacy. △ Less

Submitted 24 January, 2023; originally announced January 2023.

arXiv:2301.07296 [pdf, other]

Three-body coupled channel framework for two-neutron halo nuclei

Authors: **-Yi Pang, Li-Tan Li, Feng-Kun Guo, Jia-Jun Wu

Abstract: We study the Borromean nuclei formed by a core nucleus and two neutrons in a nonrelativistic effective field theory formalism considering both neutron-neutron and neutron-core interactions. We provide formulae of the charge and matter radii, and successfully reproduce the universal relation proposed by Hongo and Son based on the approximation of an infinite neutron-neutron scattering length and ne… ▽ More We study the Borromean nuclei formed by a core nucleus and two neutrons in a nonrelativistic effective field theory formalism considering both neutron-neutron and neutron-core interactions. We provide formulae of the charge and matter radii, and successfully reproduce the universal relation proposed by Hongo and Son based on the approximation of an infinite neutron-neutron scattering length and neglecting the neutron-core scattering. Once the realistic finite neutron-neutron and neutron-core scattering lengths are used, the charge and matter radii are influenced by the neutron-core channel in a growingly relevant manner. We obtain a relation among the binding energy of the three-body Borromean system, the ratio between charge and matter radii, and the ratio between the neutron-neutron and core-neutron scattering lengths. We find that the two-neutron separation energy for $^{22}$C needs to be $\lesssim 2$ keV in order to be consistent with the experimental constraints of the matter radius of $^{22}$C and the $^{20}{\rm C}\,n$ $S$-wave scattering length. △ Less

Submitted 17 January, 2023; originally announced January 2023.

Comments: 26 pages, 14 figures

arXiv:2301.05629 [pdf, other]

Electromagnetic-Compliant Channel Modeling and Performance Evaluation for Holographic MIMO

Authors: Tengjiao Wang, Wei Han, Zhimeng Zhong, Jiyong Pang, Guohua Zhou, Shaobo Wang, Qiang Li

Abstract: Recently, the concept of holographic multiple-input multiple-output (MIMO) is emerging as one of the promising technologies beyond massive MIMO. Many challenges need to be addressed to bring this novel idea into practice, including electromagnetic (EM)-compliant channel modeling and accurate performance evaluation. In this paper, an EM-compliant channel model is proposed for the holographic MIMO s… ▽ More Recently, the concept of holographic multiple-input multiple-output (MIMO) is emerging as one of the promising technologies beyond massive MIMO. Many challenges need to be addressed to bring this novel idea into practice, including electromagnetic (EM)-compliant channel modeling and accurate performance evaluation. In this paper, an EM-compliant channel model is proposed for the holographic MIMO systems, which is able to model both the characteristics of the propagation channel and the non-ideal factors caused by mutual coupling at the transceivers, including the antenna pattern distortion and the decrease of antenna efficiency. Based on the proposed channel model, a more realistic performance evaluation is conducted to show the performance of the holographic MIMO system in both the single-user and the multi-user scenarios. Key challenges and future research directions are further provided based on the theoretical analyses and numerical results. △ Less

Submitted 13 January, 2023; originally announced January 2023.

Comments: 6 pages, 4 figures, to be published in IEEE GLOBECOM 2022

arXiv:2212.13098 [pdf]

E-beam-enhanced solid-state mechanical amorphization of alpha-quartz: Reducing deformation barrier via localized excess electrons as mobile anions

Authors: Sung-Gyu Kang, Wonseok Jeong, Hwangsun Kim, Jeongin Paeng, Seungwu Han, Heung Nam Han, In-Suk Choi

Abstract: Under hydrostatic pressure, alpha-quartz undergoes solid-state mechanical amorphization wherein the interpenetration of SiO4 tetrahedra occurs and the material loses crystallinity. This phase transformation requires a high hydrostatic pressure of 14 GPa because the repulsive forces resulting from the ionic nature of the Si-O bonds prevent the severe distortion of the atomic configuration. Herein,… ▽ More Under hydrostatic pressure, alpha-quartz undergoes solid-state mechanical amorphization wherein the interpenetration of SiO4 tetrahedra occurs and the material loses crystallinity. This phase transformation requires a high hydrostatic pressure of 14 GPa because the repulsive forces resulting from the ionic nature of the Si-O bonds prevent the severe distortion of the atomic configuration. Herein, we experimentally and computationally demonstrate that e-beam irradiation changes the nature of the interatomic bonds in alpha-quartz and enhances the solid-state mechanical amorphization at nanoscale. Specifically, during in situ uniaxial compression, a larger permanent deformation occurs in alpha-quartz micropillars compressed during e-beam irradiation than in those without e-beam irradiation. Microstructural analysis reveals that the large permanent deformation under e-beam irradiation originates from the enhanced mechanical amorphization of alpha-quartz and the subsequent viscoplastic deformation of the amorphized region. Further, atomic-scale simulations suggest that the delocalized excess electrons introduced by e-beam irradiation move to highly distorted atomic configurations and alleviate the repulsive force, thus reducing the barrier to the solid-state mechanical amorphization. These findings deepen our understanding of electron-matter interactions and can be extended to new glass forming and processing technologies at nano- and microscale. △ Less

Submitted 26 December, 2022; originally announced December 2022.

Comments: 24 pages, 6 figures

arXiv:2211.14569 [pdf, other]

Online Optimization in Power Systems with High Penetration of Renewable Generation: Advances and Prospects

Authors: Zhaojian Wang, Wei Wei, John Zhen Fu Pang, Feng Liu, Bo Yang, ** Guan, Shengwei Mei

Abstract: Traditionally, offline optimization of power systems is acceptable due to the largely predictable loads and reliable generation. The increasing penetration of fluctuating renewable generation and Internet-of-Things devices allowing for fine-grained controllability of loads have led to the diminishing applicability of offline optimization in the power systems domain, and have redirected attention t… ▽ More Traditionally, offline optimization of power systems is acceptable due to the largely predictable loads and reliable generation. The increasing penetration of fluctuating renewable generation and Internet-of-Things devices allowing for fine-grained controllability of loads have led to the diminishing applicability of offline optimization in the power systems domain, and have redirected attention to online optimization methods. However, online optimization is a broad topic that can be applied in and motivated by different settings, operated on different time scales, and built on different theoretical foundations. This paper reviews the various types of online optimization techniques used in the power systems domain and aims to make clear the distinction between the most common techniques used. In particular, we introduce and compare four distinct techniques used covering the breadth of online optimization techniques used in the power systems domain, i.e., optimization-guided dynamic control, feedback optimization for single-period problems, Lyapunov-based optimization, and online convex optimization techniques for multi-period problems. Lastly, we recommend some potential future directions for online optimization in the power systems domain. △ Less

Submitted 26 November, 2022; originally announced November 2022.

Journal ref: IEEE/CAA Journal of Automatica Sinica, 2022

arXiv:2211.14314 [pdf, other]

The applicability of transperceptual and deep learning approaches to the study and mimicry of complex cartilaginous tissues

Authors: J. Waghorne, C. Howard, H. Hu, J. Pang, W. J. Peveler, L. Harris, O. Barrera

Abstract: Complex soft tissues, for example the knee meniscus, play a crucial role in mobility and joint health, but when damaged are incredibly difficult to repair and replace. This is due to their highly hierarchical and porous nature which in turn leads to their unique mechanical properties. In order to design tissue substitutes, the internal architecture of the native tissue needs to be understood and r… ▽ More Complex soft tissues, for example the knee meniscus, play a crucial role in mobility and joint health, but when damaged are incredibly difficult to repair and replace. This is due to their highly hierarchical and porous nature which in turn leads to their unique mechanical properties. In order to design tissue substitutes, the internal architecture of the native tissue needs to be understood and replicated. Here we explore a combined audio-visual approach - so called transperceptual - to generate artificial architectures mimicking the native ones. The proposed method uses both traditional imagery, and sound generated from each image as a method of rapidly comparing and contrasting the porosity and pore size within the samples. We have trained and tested a generative adversarial network (GAN) on the 2D image stacks. The impact of the training set of images on the similarity of the artificial to the original dataset was assessed by analyzing two samples. The first consisting of n=478 pairs of audio and image files for which the images were downsampled to 64 $\times$ 64 pixels, the second one consisting of n=7640 pairs of audio and image files for which the full resolution 256 $\times$ 256 pixels is retained but each image is divided into 16 squares to maintain the limit of 64 $\times$ 64 pixels required by the GAN. We reconstruct the 2D stacks of artificially generated datasets into 3D objects and run image analysis algorithms to characterize statistically the architectural parameters - pore size, tortuosity and pore connectivity - and compare them with the original dataset. Results show that the artificially generated dataset that undergoes downsampling performs better in terms of parameter matching. Our audiovisual approach has the potential to be extended to larger data sets to explore both how similarities and differences can be audibly recognized across multiple samples. △ Less

Submitted 21 November, 2022; originally announced November 2022.

arXiv:2211.10126 [pdf, other]

Three-particle Lellouch-Lüscher formalism in moving frames

Authors: Fabian Müller, **-Yi Pang, Akaki Rusetsky, Jia-Jun Wu

Abstract: A manifestly relativistic-invariant Lellouch-Lüscher formalism for the three-particle decays is proposed. Similarly to ref.[1], the formalism is based on the use of the non-relativistic effective Lagrangians. Manifest Lorentz invariance is guaranteed, as in ref.[2], by choosing the quantization axis along the total four-momentum of the three-particle system. A systematic inclusion of the higher-or… ▽ More A manifestly relativistic-invariant Lellouch-Lüscher formalism for the three-particle decays is proposed. Similarly to ref.[1], the formalism is based on the use of the non-relativistic effective Lagrangians. Manifest Lorentz invariance is guaranteed, as in ref.[2], by choosing the quantization axis along the total four-momentum of the three-particle system. A systematic inclusion of the higher-order derivative couplings, as well as higher partial waves is addressed. △ Less

Submitted 27 February, 2023; v1 submitted 18 November, 2022; originally announced November 2022.

Comments: 36 pages, 4 figures

arXiv:2211.09124 [pdf, other]

A Review of Intelligent Music Generation Systems

Authors: Lei Wang, Ziyi Zhao, Hanwei Liu, Junwei Pang, Yi Qin, Qidi Wu

Abstract: With the introduction of ChatGPT, the public's perception of AI-generated content (AIGC) has begun to reshape. Artificial intelligence has significantly reduced the barrier to entry for non-professionals in creative endeavors, enhancing the efficiency of content creation. Recent advancements have seen significant improvements in the quality of symbolic music generation, which is enabled by the use… ▽ More With the introduction of ChatGPT, the public's perception of AI-generated content (AIGC) has begun to reshape. Artificial intelligence has significantly reduced the barrier to entry for non-professionals in creative endeavors, enhancing the efficiency of content creation. Recent advancements have seen significant improvements in the quality of symbolic music generation, which is enabled by the use of modern generative algorithms to extract patterns implicit in a piece of music based on rule constraints or a musical corpus. Nevertheless, existing literature reviews tend to present a conventional and conservative perspective on future development trajectories, with a notable absence of thorough benchmarking of generative models. This paper provides a survey and analysis of recent intelligent music generation techniques, outlining their respective characteristics and discussing existing methods for evaluation. Additionally, the paper compares the different characteristics of music generation techniques in the East and West as well as analysing the field's development prospects. △ Less

Submitted 17 November, 2023; v1 submitted 16 November, 2022; originally announced November 2022.

MSC Class: 68T01 ACM Class: J.5

arXiv:2211.08910 [pdf, other]

On the Connection of Generative Models and Discriminative Models for Anomaly Detection

Authors: **gxuan Pang, Chunguang Li

Abstract: Anomaly detection (AD) has attracted considerable attention in both academia and industry. Due to the lack of anomalous data in many practical cases, AD is usually solved by first modeling the normal data pattern and then determining if data fit this model. Generative models (GMs) seem a natural tool to achieve this purpose, which learn the normal data distribution and estimate it using a probabil… ▽ More Anomaly detection (AD) has attracted considerable attention in both academia and industry. Due to the lack of anomalous data in many practical cases, AD is usually solved by first modeling the normal data pattern and then determining if data fit this model. Generative models (GMs) seem a natural tool to achieve this purpose, which learn the normal data distribution and estimate it using a probability density function (PDF). However, some works have observed the ideal performance of such GM-based AD methods. In this paper, we propose a new perspective on the ideal performance of GM-based AD methods. We state that in these methods, the implicit assumption that connects GMs'results to AD's goal is usually implausible due to normal data's multi-peaked distribution characteristic, which is quite common in practical cases. We first qualitatively formulate this perspective, and then focus on the Gaussian mixture model (GMM) to intuitively illustrate the perspective, which is a typical GM and has the natural property to approximate multi-peaked distributions. Based on the proposed perspective, in order to bypass the implicit assumption in the GMM-based AD method, we suggest integrating the Discriminative idea to orient GMM to AD tasks (DiGMM). With DiGMM, we establish a connection of generative and discriminative models, which are two key paradigms for AD and are usually treated separately before. This connection provides a possible direction for future works to jointly consider the two paradigms and incorporate their complementary characteristics for AD. △ Less

Submitted 16 November, 2022; originally announced November 2022.

arXiv:2210.09165 [pdf, other]

Expected geoneutrino signal at JUNO using local integrated 3-D refined crustal model

Authors: Ran Han, ZhiWei Li, Ruohan Gao, Yao Sun, Ya Xu, Yufei Xi, Guangzheng Jiang, Andong Wang, Ya** Cheng, Yao Sun, Jie Pang, Qi Hua, Liangjian Wen, Liang Zhan, Yu-Feng Li

Abstract: Geoneutrinos serve as a potent tool for comprehending the radiogenic power and composition of Earth. Although geoneutrinos have been observed in prior experiments, the forthcoming generation of experiments,such as JUNO, will be necessary for fully harnessing their potential. Precise prediction of the crustal contribution is vital for interpreting particlephysics measurements in the context of geo-… ▽ More Geoneutrinos serve as a potent tool for comprehending the radiogenic power and composition of Earth. Although geoneutrinos have been observed in prior experiments, the forthcoming generation of experiments,such as JUNO, will be necessary for fully harnessing their potential. Precise prediction of the crustal contribution is vital for interpreting particlephysics measurements in the context of geo-scientific inquiries. Nonetheless, existing models such as JULOC and GIGJ have limitations in accurately forecasting the crustal contribution. This paper introduces JULOCI, the novel 3-D integrated crustal model of JUNO, which employs seismic, gravity, rock sample, and heat flow data to precisely estimate the geoneutrino signal of the lithosphere. The model indicates elevated concentrations of uranium and thorium in southern China, resulting in unexpectedly strong geoneutrino signals.The accuracy of JULOC-I, coupled with a decade of experimental data, affords JUNO the opportunity to test multiple mantle models. Once operational, JUNO can validate the model predictions and enhance the precision of mantle measurements. All in all, the improved accuracy ofJULOC-I represents a substantial stride towards comprehending the geochemical distribution of the South China crust, offering a valuable tool for investigating the composition and evolution of the Earth through geoneutrinos. △ Less

Submitted 6 March, 2024; v1 submitted 17 October, 2022; originally announced October 2022.

Comments: Substantial updates on the model and predictions, submitted version

arXiv:2210.06984 [pdf, other]

QDTrack: Quasi-Dense Similarity Learning for Appearance-Only Multiple Object Tracking

Authors: Tobias Fischer, Thomas E. Huang, Jiangmiao Pang, Linlu Qiu, Haofeng Chen, Trevor Darrell, Fisher Yu

Abstract: Similarity learning has been recognized as a crucial step for object tracking. However, existing multiple object tracking methods only use sparse ground truth matching as the training objective, while ignoring the majority of the informative regions in images. In this paper, we present Quasi-Dense Similarity Learning, which densely samples hundreds of object regions on a pair of images for contras… ▽ More Similarity learning has been recognized as a crucial step for object tracking. However, existing multiple object tracking methods only use sparse ground truth matching as the training objective, while ignoring the majority of the informative regions in images. In this paper, we present Quasi-Dense Similarity Learning, which densely samples hundreds of object regions on a pair of images for contrastive learning. We combine this similarity learning with multiple existing object detectors to build Quasi-Dense Tracking (QDTrack), which does not require displacement regression or motion priors. We find that the resulting distinctive feature space admits a simple nearest neighbor search at inference time for object association. In addition, we show that our similarity learning scheme is not limited to video data, but can learn effective instance similarity even from static input, enabling a competitive tracking performance without training on videos or using tracking supervision. We conduct extensive experiments on a wide variety of popular MOT benchmarks. We find that, despite its simplicity, QDTrack rivals the performance of state-of-the-art tracking methods on all benchmarks and sets a new state-of-the-art on the large-scale BDD100K MOT benchmark, while introducing negligible computational overhead to the detector. △ Less

Submitted 27 September, 2023; v1 submitted 12 October, 2022; originally announced October 2022.

arXiv:2210.02323 [pdf, other]

Distributed Online Generalized Nash Equilibrium Tracking for Prosumer Energy Trading Games

Authors: Yongkai Xie, Zhaojian Wang, John Z. F. Pang, Bo Yang, ** Guan

Abstract: With the proliferation of distributed generations, traditional passive consumers in distribution networks are evolving into "prosumers", which can both produce and consume energy. Energy trading with the main grid or between prosumers is inevitable if the energy surplus and shortage exist. To this end, this paper investigates the peer-to-peer (P2P) energy trading market, which is formulated as a g… ▽ More With the proliferation of distributed generations, traditional passive consumers in distribution networks are evolving into "prosumers", which can both produce and consume energy. Energy trading with the main grid or between prosumers is inevitable if the energy surplus and shortage exist. To this end, this paper investigates the peer-to-peer (P2P) energy trading market, which is formulated as a generalized Nash game. We first prove the existence and uniqueness of the generalized Nash equilibrium (GNE). Then, an distributed online algorithm is proposed to track the GNE in the time-varying environment. Its regret is proved to be bounded by a sublinear function of learning time, which indicates that the online algorithm has an acceptable accuracy in practice. Finally, numerical results with six microgrids validate the performance of the algorithm. △ Less

Submitted 5 October, 2022; originally announced October 2022.

arXiv:2209.13161 [pdf, other]

On polynomial-time solvability of combinatorial Markov random fields

Authors: Shaoning Han, Andrés Gómez, Jong-Shi Pang

Abstract: The problem of inferring Markov random fields (MRFs) with a sparsity or robustness prior can be naturally modeled as a mixed-integer program. This motivates us to study a general class of convex submodular optimization problems with indicator variables, which we show to be polynomially solvable in this paper. The key insight is that, possibly after a suitable reformulation, indicator constraints p… ▽ More The problem of inferring Markov random fields (MRFs) with a sparsity or robustness prior can be naturally modeled as a mixed-integer program. This motivates us to study a general class of convex submodular optimization problems with indicator variables, which we show to be polynomially solvable in this paper. The key insight is that, possibly after a suitable reformulation, indicator constraints preserve submodularity. Fast computations of the associated Lovász extensions are also discussed under certain smoothness conditions, and can be implemented using only linear-algebraic operations in the case of quadratic objectives. △ Less

Submitted 27 September, 2022; originally announced September 2022.

arXiv:2209.09438 [pdf]

Updating velocities in heterogeneous comprehensive learning particle swarm optimization with low-discrepancy sequences

Authors: Yuelin Zhao, Feng Wu, Jianhua Pang, Wanxie Zhong

Abstract: Heterogeneous comprehensive learning particle swarm optimization (HCLPSO) is a type of evolutionary algorithm with enhanced exploration and exploitation capabilities. The low-discrepancy sequence (LDS) is more uniform in covering the search space than random sequences. In this paper, making use of the good uniformity of LDS to improve HCLPSO is researched. Numerical experiments are performed to sh… ▽ More Heterogeneous comprehensive learning particle swarm optimization (HCLPSO) is a type of evolutionary algorithm with enhanced exploration and exploitation capabilities. The low-discrepancy sequence (LDS) is more uniform in covering the search space than random sequences. In this paper, making use of the good uniformity of LDS to improve HCLPSO is researched. Numerical experiments are performed to show that it is impossible to effectively improve the search ability of HCLPSO by only using LDS to generate the initial population. However, if we properly choose some random sequences from the HCLPSO velocities updating formula and replace them with the deterministic LDS, we can obtain a more efficient algorithm. Compared with the original HCLPSO under the same accuracy requirement, the HCLPSO updating the velocities with the deterministic LDS can significantly reduce the iterations required for finding the optimal solution, without decreasing the success rate. △ Less

Submitted 19 September, 2022; originally announced September 2022.

Comments: 29 pages, 5 figures

arXiv:2209.04401 [pdf, other]

doi 10.1145/3552457.3555727

GRASP-Net: Geometric Residual Analysis and Synthesis for Point Cloud Compression

Authors: Jiahao Pang, Muhammad Asad Lodhi, Dong Tian

Abstract: Point cloud compression (PCC) is a key enabler for various 3-D applications, owing to the universality of the point cloud format. Ideally, 3D point clouds endeavor to depict object/scene surfaces that are continuous. Practically, as a set of discrete samples, point clouds are locally disconnected and sparsely distributed. This sparse nature is hindering the discovery of local correlation among poi… ▽ More Point cloud compression (PCC) is a key enabler for various 3-D applications, owing to the universality of the point cloud format. Ideally, 3D point clouds endeavor to depict object/scene surfaces that are continuous. Practically, as a set of discrete samples, point clouds are locally disconnected and sparsely distributed. This sparse nature is hindering the discovery of local correlation among points for compression. Motivated by an analysis with fractal dimension, we propose a heterogeneous approach with deep learning for lossy point cloud geometry compression. On top of a base layer compressing a coarse representation of the input, an enhancement layer is designed to cope with the challenging geometric residual/details. Specifically, a point-based network is applied to convert the erratic local details to latent features residing on the coarse point cloud. Then a sparse convolutional neural network operating on the coarse point cloud is launched. It utilizes the continuity/smoothness of the coarse geometry to compress the latent features as an enhancement bit-stream that greatly benefits the reconstruction quality. When this bit-stream is unavailable, e.g., due to packet loss, we support a skip mode with the same architecture which generates geometric details from the coarse point cloud directly. Experimentation on both dense and sparse point clouds demonstrate the state-of-the-art compression performance achieved by our proposal. Our code is available at https://github.com/InterDigitalInc/GRASP-Net. △ Less

Submitted 9 September, 2022; originally announced September 2022.

Comments: Accepted at ACM MM 2022 Workshop on Advances in Point Cloud Compression, Processing and Analysis

arXiv:2209.00776 [pdf, other]

doi 10.1145/3503161.3547743

WOC: A Handy Webcam-based 3D Online Chatroom

Authors: Chuanhang Yan, Yu Sun, Qian Bao, **hui Pang, Wu Liu, Tao Mei

Abstract: We develop WOC, a webcam-based 3D virtual online chatroom for multi-person interaction, which captures the 3D motion of users and drives their individual 3D virtual avatars in real-time. Compared to the existing wearable equipment-based solution, WOC offers convenient and low-cost 3D motion capture with a single camera. To promote the immersive chat experience, WOC provides high-fidelity virtual a… ▽ More We develop WOC, a webcam-based 3D virtual online chatroom for multi-person interaction, which captures the 3D motion of users and drives their individual 3D virtual avatars in real-time. Compared to the existing wearable equipment-based solution, WOC offers convenient and low-cost 3D motion capture with a single camera. To promote the immersive chat experience, WOC provides high-fidelity virtual avatar manipulation, which also supports the user-defined characters. With the distributed data flow service, the system delivers highly synchronized motion and voice for all users. Deployed on the website and no installation required, users can freely experience the virtual online chat at https://yanch.cloud. △ Less

Submitted 17 March, 2023; v1 submitted 1 September, 2022; originally announced September 2022.

arXiv:2207.12988 [pdf, other]

Monocular 3D Object Detection with Depth from Motion

Authors: Tai Wang, Jiangmiao Pang, Dahua Lin

Abstract: Perceiving 3D objects from monocular inputs is crucial for robotic systems, given its economy compared to multi-sensor settings. It is notably difficult as a single image can not provide any clues for predicting absolute depth values. Motivated by binocular methods for 3D object detection, we take advantage of the strong geometry structure provided by camera ego-motion for accurate object depth es… ▽ More Perceiving 3D objects from monocular inputs is crucial for robotic systems, given its economy compared to multi-sensor settings. It is notably difficult as a single image can not provide any clues for predicting absolute depth values. Motivated by binocular methods for 3D object detection, we take advantage of the strong geometry structure provided by camera ego-motion for accurate object depth estimation and detection. We first make a theoretical analysis on this general two-view case and notice two challenges: 1) Cumulative errors from multiple estimations that make the direct prediction intractable; 2) Inherent dilemmas caused by static cameras and matching ambiguity. Accordingly, we establish the stereo correspondence with a geometry-aware cost volume as the alternative for depth estimation and further compensate it with monocular understanding to address the second problem. Our framework, named Depth from Motion (DfM), then uses the established geometry to lift 2D image features to the 3D space and detects 3D objects thereon. We also present a pose-free DfM to make it usable when the camera pose is unavailable. Our framework outperforms state-of-the-art methods by a large margin on the KITTI benchmark. Detailed quantitative and qualitative analyses also validate our theoretical conclusions. The code will be released at https://github.com/Tai-Wang/Depth-from-Motion. △ Less

Submitted 1 March, 2023; v1 submitted 26 July, 2022; originally announced July 2022.

Comments: ECCV 2022 Oral

arXiv:2207.02631 [pdf, other]

Context Sensing Attention Network for Video-based Person Re-identification

Authors: Kan Wang, Changxing Ding, Jianxin Pang, Xiangmin Xu

Abstract: Video-based person re-identification (ReID) is challenging due to the presence of various interferences in video frames. Recent approaches handle this problem using temporal aggregation strategies. In this work, we propose a novel Context Sensing Attention Network (CSA-Net), which improves both the frame feature extraction and temporal aggregation steps. First, we introduce the Context Sensing Cha… ▽ More Video-based person re-identification (ReID) is challenging due to the presence of various interferences in video frames. Recent approaches handle this problem using temporal aggregation strategies. In this work, we propose a novel Context Sensing Attention Network (CSA-Net), which improves both the frame feature extraction and temporal aggregation steps. First, we introduce the Context Sensing Channel Attention (CSCA) module, which emphasizes responses from informative channels for each frame. These informative channels are identified with reference not only to each individual frame, but also to the content of the entire sequence. Therefore, CSCA explores both the individuality of each frame and the global context of the sequence. Second, we propose the Contrastive Feature Aggregation (CFA) module, which predicts frame weights for temporal aggregation. Here, the weight for each frame is determined in a contrastive manner: i.e., not only by the quality of each individual frame, but also by the average quality of the other frames in a sequence. Therefore, it effectively promotes the contribution of relatively good frames. Extensive experimental results on four datasets show that CSA-Net consistently achieves state-of-the-art performance. △ Less

Submitted 6 July, 2022; originally announced July 2022.

arXiv:2206.14619 [pdf, other]

A Multilingual Dataset of COVID-19 Vaccination Attitudes on Twitter

Authors: Ninghan Chen, Xihui Chen, Jun Pang

Abstract: Vaccine hesitancy is considered as one main cause of the stagnant uptake ratio of COVID-19 vaccines in Europe and the US where vaccines are sufficiently supplied. Fast and accurate grasp of public attitudes toward vaccination is critical to address vaccine hesitancy, and social media platforms have proved to be an effective source of public opinions. In this paper, we describe the collection and r… ▽ More Vaccine hesitancy is considered as one main cause of the stagnant uptake ratio of COVID-19 vaccines in Europe and the US where vaccines are sufficiently supplied. Fast and accurate grasp of public attitudes toward vaccination is critical to address vaccine hesitancy, and social media platforms have proved to be an effective source of public opinions. In this paper, we describe the collection and release of a dataset of tweets related to COVID-19 vaccines. This dataset consists of the IDs of 2,198,090 tweets collected from Western Europe, 17,934 of which are annotated with the originators' vaccination stances. Our annotation will facilitate using and develo** data-driven models to extract vaccination attitudes from social media posts and thus further confirm the power of social media in public health surveillance. To lay the groundwork for future research, we not only perform statistical analysis and visualisation of our dataset, but also evaluate and compare the performance of established text-based benchmarks in vaccination stance extraction. We demonstrate one potential use of our data in practice in tracking the temporal changes of public COVID-19 vaccination attitudes. △ Less

Submitted 27 June, 2022; originally announced June 2022.

arXiv:2206.13456 [pdf, other]

"Double vaccinated, 5G boosted!": Learning Attitudes towards COVID-19 Vaccination from Social Media

Authors: Ninghan Chen, Xihui Chen, Zhiqiang Zhong, Jun Pang

Abstract: To address the vaccine hesitancy which impairs the efforts of the COVID-19 vaccination campaign, it is imperative to understand public vaccination attitudes and timely grasp their changes. In spite of reliability and trustworthiness, conventional attitude collection based on surveys is time-consuming and expensive, and cannot follow the fast evolution of vaccination attitudes. We leverage the text… ▽ More To address the vaccine hesitancy which impairs the efforts of the COVID-19 vaccination campaign, it is imperative to understand public vaccination attitudes and timely grasp their changes. In spite of reliability and trustworthiness, conventional attitude collection based on surveys is time-consuming and expensive, and cannot follow the fast evolution of vaccination attitudes. We leverage the textual posts on social media to extract and track users' vaccination stances in near real time by proposing a deep learning framework. To address the impact of linguistic features such as sarcasm and irony commonly used in vaccine-related discourses, we integrate into the framework the recent posts of a user's social network neighbours to help detect the user's genuine attitude. Based on our annotated dataset from Twitter, the models instantiated from our framework can increase the performance of attitude extraction by up to 23% compared to state-of-the-art text-only models. Using this framework, we successfully validate the feasibility of using social media to track the evolution of vaccination attitudes in real life. We further show one practical use of our framework by validating the possibility to forecast a user's vaccine hesitancy changes with information perceived from social media. △ Less

Submitted 27 June, 2022; originally announced June 2022.

Showing 51–100 of 296 results for author: Pang, J