Search | arXiv e-print repository

A First Physical-World Trajectory Prediction Attack via LiDAR-induced Deceptions in Autonomous Driving

Authors: Yang Lou, Yi Zhu, Qun Song, Rui Tan, Chunming Qiao, Wei-Bin Lee, Jian** Wang

Abstract: Trajectory prediction forecasts nearby agents' moves based on their historical trajectories. Accurate trajectory prediction is crucial for autonomous vehicles. Existing attacks compromise the prediction model of a victim AV by directly manipulating the historical trajectory of an attacker AV, which has limited real-world applicability. This paper, for the first time, explores an indirect attack ap… ▽ More Trajectory prediction forecasts nearby agents' moves based on their historical trajectories. Accurate trajectory prediction is crucial for autonomous vehicles. Existing attacks compromise the prediction model of a victim AV by directly manipulating the historical trajectory of an attacker AV, which has limited real-world applicability. This paper, for the first time, explores an indirect attack approach that induces prediction errors via attacks against the perception module of a victim AV. Although it has been shown that physically realizable attacks against LiDAR-based perception are possible by placing a few objects at strategic locations, it is still an open challenge to find an object location from the vast search space in order to launch effective attacks against prediction under varying victim AV velocities. Through analysis, we observe that a prediction model is prone to an attack focusing on a single point in the scene. Consequently, we propose a novel two-stage attack framework to realize the single-point attack. The first stage of prediction-side attack efficiently identifies, guided by the distribution of detection results under object-based attacks against perception, the state perturbations for the prediction model that are effective and velocity-insensitive. In the second stage of location matching, we match the feasible object locations with the found state perturbations. Our evaluation using a public autonomous driving dataset shows that our attack causes a collision rate of up to 63% and various hazardous responses of the victim AV. The effectiveness of our attack is also demonstrated on a real testbed car. To the best of our knowledge, this study is the first security analysis spanning from LiDAR-based perception to prediction in autonomous driving, leading to a realistic attack on prediction. To counteract the proposed attack, potential defenses are discussed. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: In Proceedings of the 33rd USENIX Security Symposium 2024

arXiv:2406.01151 [pdf, other]

A 0.96pJ/SOP, 30.23K-neuron/mm^2 Heterogeneous Neuromorphic Chip With Fullerene-like Interconnection Topology for Edge-AI Computing

Authors: P. J. Zhou, Q. Yu, M. Chen, Y. C. Wang, L. W. Meng, Y. Zuo, N. Ning, Y. Liu, S. G. Hu, G. C. Qiao

Abstract: Edge-AI computing requires high energy efficiency, low power consumption, and relatively high flexibility and compact area, challenging the AI-chip design. This work presents a 0.96 pJ/SOP heterogeneous neuromorphic system-on-chip (SoC) with fullerene-like interconnection topology for edge-AI computing. The neuromorphic core integrates different technologies to augment computing energy efficiency,… ▽ More Edge-AI computing requires high energy efficiency, low power consumption, and relatively high flexibility and compact area, challenging the AI-chip design. This work presents a 0.96 pJ/SOP heterogeneous neuromorphic system-on-chip (SoC) with fullerene-like interconnection topology for edge-AI computing. The neuromorphic core integrates different technologies to augment computing energy efficiency, including sparse computing, partial membrane potential updates, and non-uniform weight quantization. Multiple neuromorphic cores and multi-mode routers form a fullerene-like network-on-chip (NoC). The average degree of communication nodes exceeds traditional topologies by 32%, with a minimal degree variance of 0.93, allowing advanced decentralized on-chip communication. Additionally, the NoC can be scaled up through extended off-chip high-level router nodes. A RISC-V CPU and a neuromorphic processor are tightly coupled and fabricated within a 5.42 mm^2 die area under 55 nm CMOS technology. The chip has a low power density of 0.52 mW/mm^2, reducing 67.5% compared to related works, and achieves a high neuron density of 30.23 K/mm^2. Eventually, the chip is demonstrated to be effective on different datasets and achieves 0.96 pJ/SOP energy efficiency. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: 5 pages, 8 figures

arXiv:2403.12042 [pdf, other]

Exploring Pre-trained Text-to-Video Diffusion Models for Referring Video Object Segmentation

Authors: Zixin Zhu, Xuelu Feng, Dongdong Chen, Junsong Yuan, Chunming Qiao, Gang Hua

Abstract: In this paper, we explore the visual representations produced from a pre-trained text-to-video (T2V) diffusion model for video understanding tasks. We hypothesize that the latent representation learned from a pretrained generative T2V model encapsulates rich semantics and coherent temporal correspondences, thereby naturally facilitating video understanding. Our hypothesis is validated through the… ▽ More In this paper, we explore the visual representations produced from a pre-trained text-to-video (T2V) diffusion model for video understanding tasks. We hypothesize that the latent representation learned from a pretrained generative T2V model encapsulates rich semantics and coherent temporal correspondences, thereby naturally facilitating video understanding. Our hypothesis is validated through the classic referring video object segmentation (R-VOS) task. We introduce a novel framework, termed ``VD-IT'', tailored with dedicatedly designed components built upon a fixed pretrained T2V model. Specifically, VD-IT uses textual information as a conditional input, ensuring semantic consistency across time for precise temporal instance matching. It further incorporates image tokens as supplementary textual inputs, enriching the feature set to generate detailed and nuanced masks.Besides, instead of using the standard Gaussian noise, we propose to predict the video-specific noise with an extra noise prediction module, which can help preserve the feature fidelity and elevates segmentation quality. Through extensive experiments, we surprisingly observe that fixed generative T2V diffusion models, unlike commonly used video backbones (e.g., Video Swin Transformer) pretrained with discriminative image/video pre-tasks, exhibit better potential to maintain semantic alignment and temporal consistency. On existing standard benchmarks, our VD-IT achieves highly competitive results, surpassing many existing state-of-the-art methods. The code will be available at \url{https://github.com/buxiangzhiren/VD-IT} △ Less

Submitted 18 March, 2024; originally announced March 2024.

Comments: The code will be available at \url{https://github.com/buxiangzhiren/VD-IT}

arXiv:2401.06146 [pdf, other]

AAMDM: Accelerated Auto-regressive Motion Diffusion Model

Authors: Tianyu Li, Calvin Qiao, Guanqiao Ren, KangKang Yin, Sehoon Ha

Abstract: Interactive motion synthesis is essential in creating immersive experiences in entertainment applications, such as video games and virtual reality. However, generating animations that are both high-quality and contextually responsive remains a challenge. Traditional techniques in the game industry can produce high-fidelity animations but suffer from high computational costs and poor scalability. T… ▽ More Interactive motion synthesis is essential in creating immersive experiences in entertainment applications, such as video games and virtual reality. However, generating animations that are both high-quality and contextually responsive remains a challenge. Traditional techniques in the game industry can produce high-fidelity animations but suffer from high computational costs and poor scalability. Trained neural network models alleviate the memory and speed issues, yet fall short on generating diverse motions. Diffusion models offer diverse motion synthesis with low memory usage, but require expensive reverse diffusion processes. This paper introduces the Accelerated Auto-regressive Motion Diffusion Model (AAMDM), a novel motion synthesis framework designed to achieve quality, diversity, and efficiency all together. AAMDM integrates Denoising Diffusion GANs as a fast Generation Module, and an Auto-regressive Diffusion Model as a Polishing Module. Furthermore, AAMDM operates in a lower-dimensional embedded space rather than the full-dimensional pose space, which reduces the training complexity as well as further improves the performance. We show that AAMDM outperforms existing methods in motion quality, diversity, and runtime efficiency, through comprehensive quantitative analyses and visual comparisons. We also demonstrate the effectiveness of each algorithmic component through ablation studies. △ Less

Submitted 2 December, 2023; originally announced January 2024.

arXiv:2311.13348 [pdf, other]

MergeSFL: Split Federated Learning with Feature Merging and Batch Size Regulation

Authors: Yunming Liao, Yang Xu, Hongli Xu, Lun Wang, Zhiwei Yao, Chunming Qiao

Abstract: Recently, federated learning (FL) has emerged as a popular technique for edge AI to mine valuable knowledge in edge computing (EC) systems. To mitigate the computing/communication burden on resource-constrained workers and protect model privacy, split federated learning (SFL) has been released by integrating both data and model parallelism. Despite resource limitations, SFL still faces two other c… ▽ More Recently, federated learning (FL) has emerged as a popular technique for edge AI to mine valuable knowledge in edge computing (EC) systems. To mitigate the computing/communication burden on resource-constrained workers and protect model privacy, split federated learning (SFL) has been released by integrating both data and model parallelism. Despite resource limitations, SFL still faces two other critical challenges in EC, i.e., statistical heterogeneity and system heterogeneity. To address these challenges, we propose a novel SFL framework, termed MergeSFL, by incorporating feature merging and batch size regulation in SFL. Concretely, feature merging aims to merge the features from workers into a mixed feature sequence, which is approximately equivalent to the features derived from IID data and is employed to promote model accuracy. While batch size regulation aims to assign diverse and suitable batch sizes for heterogeneous workers to improve training efficiency. Moreover, MergeSFL explores to jointly optimize these two strategies upon their coupled relationship to better enhance the performance of SFL. Extensive experiments are conducted on a physical platform with 80 NVIDIA Jetson edge devices, and the experimental results show that MergeSFL can improve the final model accuracy by 5.82% to 26.22%, with a speedup by about 1.74x to 4.14x, compared to the baselines. △ Less

Submitted 22 November, 2023; originally announced November 2023.

arXiv:2308.00529 [pdf, other]

Variational Label-Correlation Enhancement for Congestion Prediction

Authors: Biao Liu, Congyu Qiao, Ning Xu, Xin Geng, Ziran Zhu, Jun Yang

Abstract: The physical design process of large-scale designs is a time-consuming task, often requiring hours to days to complete, with routing being the most critical and complex step. As the the complexity of Integrated Circuits (ICs) increases, there is an increased demand for accurate routing quality prediction. Accurate congestion prediction aids in identifying design flaws early on, thereby acceleratin… ▽ More The physical design process of large-scale designs is a time-consuming task, often requiring hours to days to complete, with routing being the most critical and complex step. As the the complexity of Integrated Circuits (ICs) increases, there is an increased demand for accurate routing quality prediction. Accurate congestion prediction aids in identifying design flaws early on, thereby accelerating circuit design and conserving resources. Despite the advancements in current congestion prediction methodologies, an essential aspect that has been largely overlooked is the spatial label-correlation between different grids in congestion prediction. The spatial label-correlation is a fundamental characteristic of circuit design, where the congestion status of a grid is not isolated but inherently influenced by the conditions of its neighboring grids. In order to fully exploit the inherent spatial label-correlation between neighboring grids, we propose a novel approach, {\ours}, i.e., VAriational Label-Correlation Enhancement for Congestion Prediction, which considers the local label-correlation in the congestion map, associating the estimated congestion value of each grid with a local label-correlation weight influenced by its surrounding grids. {\ours} leverages variational inference techniques to estimate this weight, thereby enhancing the regression model's performance by incorporating spatial dependencies. Experiment results validate the superior effectiveness of {\ours} on the public available \texttt{ISPD2011} and \texttt{DAC2012} benchmarks using the superblue circuit line. △ Less

Submitted 1 August, 2023; originally announced August 2023.

arXiv:2306.15612 [pdf, other]

Adaptive Multi-Modal Cross-Entropy Loss for Stereo Matching

Authors: Peng Xu, Zhiyu Xiang, Chenyu Qiao, **gyun Fu, Tianyu Pu

Abstract: Despite the great success of deep learning in stereo matching, recovering accurate disparity maps is still challenging. Currently, L1 and cross-entropy are the two most widely used losses for stereo network training. Compared with the former, the latter usually performs better thanks to its probability modeling and direct supervision to the cost volume. However, how to accurately model the stereo… ▽ More Despite the great success of deep learning in stereo matching, recovering accurate disparity maps is still challenging. Currently, L1 and cross-entropy are the two most widely used losses for stereo network training. Compared with the former, the latter usually performs better thanks to its probability modeling and direct supervision to the cost volume. However, how to accurately model the stereo ground-truth for cross-entropy loss remains largely under-explored. Existing works simply assume that the ground-truth distributions are uni-modal, which ignores the fact that most of the edge pixels can be multi-modal. In this paper, a novel adaptive multi-modal cross-entropy loss (ADL) is proposed to guide the networks to learn different distribution patterns for each pixel. Moreover, we optimize the disparity estimator to further alleviate the bleeding or misalignment artifacts in inference. Extensive experimental results show that our method is generic and can help classic stereo networks regain state-of-the-art performance. In particular, GANet with our method ranks $1^{st}$ on both the KITTI 2015 and 2012 benchmarks among the published methods. Meanwhile, excellent synthetic-to-realistic generalization performance can be achieved by simply replacing the traditional loss with ours. △ Less

Submitted 15 March, 2024; v1 submitted 27 June, 2023; originally announced June 2023.

arXiv:2209.06971 [pdf, other]

PointACL:Adversarial Contrastive Learning for Robust Point Clouds Representation under Adversarial Attack

Authors: Junxuan Huang, Yatong An, Lu cheng, Bai Chen, Junsong Yuan, Chunming Qiao

Abstract: Despite recent success of self-supervised based contrastive learning model for 3D point clouds representation, the adversarial robustness of such pre-trained models raised concerns. Adversarial contrastive learning (ACL) is considered an effective way to improve the robustness of pre-trained models. In contrastive learning, the projector is considered an effective component for removing unnecessar… ▽ More Despite recent success of self-supervised based contrastive learning model for 3D point clouds representation, the adversarial robustness of such pre-trained models raised concerns. Adversarial contrastive learning (ACL) is considered an effective way to improve the robustness of pre-trained models. In contrastive learning, the projector is considered an effective component for removing unnecessary feature information during contrastive pretraining and most ACL works also use contrastive loss with projected feature representations to generate adversarial examples in pretraining, while "unprojected " feature representations are used in generating adversarial inputs during inference.Because of the distribution gap between projected and "unprojected" features, their models are constrained of obtaining robust feature representations for downstream tasks. We introduce a new method to generate high-quality 3D adversarial examples for adversarial training by utilizing virtual adversarial loss with "unprojected" feature representations in contrastive learning framework. We present our robust aware loss function to train self-supervised contrastive learning framework adversarially. Furthermore, we find selecting high difference points with the Difference of Normal (DoN) operator as additional input for adversarial self-supervised contrastive learning can significantly improve the adversarial robustness of the pre-trained model. We validate our method, PointACL on downstream tasks, including 3D classification and 3D segmentation with multiple datasets. It obtains comparable robust accuracy over state-of-the-art contrastive adversarial learning methods. △ Less

Submitted 14 September, 2022; originally announced September 2022.

Comments: arXiv admin note: text overlap with arXiv:2109.00179 by other authors

arXiv:2206.00830 [pdf, other]

Progressive Purification for Instance-Dependent Partial Label Learning

Authors: Ning Xu, Biao Liu, Jiaqi Lv, Congyu Qiao, Xin Geng

Abstract: Partial label learning (PLL) aims to train multiclass classifiers from the examples each annotated with a set of candidate labels where a fixed but unknown candidate label is correct. In the last few years, the instance-independent generation process of candidate labels has been extensively studied, on the basis of which many theoretical advances have been made in PLL. Nevertheless, the candidate… ▽ More Partial label learning (PLL) aims to train multiclass classifiers from the examples each annotated with a set of candidate labels where a fixed but unknown candidate label is correct. In the last few years, the instance-independent generation process of candidate labels has been extensively studied, on the basis of which many theoretical advances have been made in PLL. Nevertheless, the candidate labels are always instance-dependent in practice and there is no theoretical guarantee that the model trained on the instance-dependent PLL examples can converge to an ideal one. In this paper, a theoretically grounded and practically effective approach named POP, i.e. PrOgressive Purification for instance-dependent partial label learning, is proposed. Specifically, POP updates the learning model and purifies each candidate label set progressively in every epoch. Theoretically, we prove that POP enlarges the region appropriately fast where the model is reliable, and eventually approximates the Bayes optimal classifier with mild assumptions. Technically, POP is flexible with arbitrary PLL losses and could improve the performance of the previous PLL losses in the instance-dependent case. Experiments on the benchmark datasets and the real-world datasets validate the effectiveness of the proposed method. △ Less

Submitted 9 May, 2023; v1 submitted 1 June, 2022; originally announced June 2022.

Comments: Accepted to International Conference on Machine Learning 2023 (ICML 2023)

arXiv:2206.00517 [pdf, other]

One Positive Label is Sufficient: Single-Positive Multi-Label Learning with Label Enhancement

Authors: Ning Xu, Congyu Qiao, Jiaqi Lv, Xin Geng, Min-Ling Zhang

Abstract: Multi-label learning (MLL) learns from the examples each associated with multiple labels simultaneously, where the high cost of annotating all relevant labels for each training example is challenging for real-world applications. To cope with the challenge, we investigate single-positive multi-label learning (SPMLL) where each example is annotated with only one relevant label, and show that one can… ▽ More Multi-label learning (MLL) learns from the examples each associated with multiple labels simultaneously, where the high cost of annotating all relevant labels for each training example is challenging for real-world applications. To cope with the challenge, we investigate single-positive multi-label learning (SPMLL) where each example is annotated with only one relevant label, and show that one can successfully learn a theoretically grounded multi-label classifier for the problem. In this paper, a novel SPMLL method named SMILE, i.e., Single-positive MultI-label learning with Label Enhancement, is proposed. Specifically, an unbiased risk estimator is derived, which could be guaranteed to approximately converge to the optimal risk minimizer of fully supervised learning and shows that one positive label of each instance is sufficient to train the predictive model. Then, the corresponding empirical risk estimator is established via recovering the latent soft label as a label enhancement process, where the posterior density of the latent soft labels is approximate to the variational Beta density parameterized by an inference model. Experiments on benchmark datasets validate the effectiveness of the proposed method. △ Less

Submitted 11 October, 2022; v1 submitted 1 June, 2022; originally announced June 2022.

Comments: Accepted to NeurIPS 2022

arXiv:2205.13280 [pdf, other]

Objects Matter: Learning Object Relation Graph for Robust Camera Relocalization

Authors: Chengyu Qiao, Zhiyu Xiang, Xinglu Wang

Abstract: Visual relocalization aims to estimate the pose of a camera from one or more images. In recent years deep learning based pose regression methods have attracted many attentions. They feature predicting the absolute poses without relying on any prior built maps or stored images, making the relocalization very efficient. However, robust relocalization under environments with complex appearance change… ▽ More Visual relocalization aims to estimate the pose of a camera from one or more images. In recent years deep learning based pose regression methods have attracted many attentions. They feature predicting the absolute poses without relying on any prior built maps or stored images, making the relocalization very efficient. However, robust relocalization under environments with complex appearance changes and real dynamics remains very challenging. In this paper, we propose to enhance the distinctiveness of the image features by extracting the deep relationship among objects. In particular, we extract objects in the image and construct a deep object relation graph (ORG) to incorporate the semantic connections and relative spatial clues of the objects. We integrate our ORG module into several popular pose regression models. Extensive experiments on various public indoor and outdoor datasets demonstrate that our method improves the performance significantly and outperforms the previous approaches. △ Less

Submitted 26 May, 2022; originally announced May 2022.

arXiv:2204.03845 [pdf, other]

Decompositional Generation Process for Instance-Dependent Partial Label Learning

Authors: Congyu Qiao, Ning Xu, Xin Geng

Abstract: Partial label learning (PLL) is a typical weakly supervised learning problem, where each training example is associated with a set of candidate labels among which only one is true. Most existing PLL approaches assume that the incorrect labels in each training example are randomly picked as the candidate labels and model the generation process of the candidate labels in a simple way. However, these… ▽ More Partial label learning (PLL) is a typical weakly supervised learning problem, where each training example is associated with a set of candidate labels among which only one is true. Most existing PLL approaches assume that the incorrect labels in each training example are randomly picked as the candidate labels and model the generation process of the candidate labels in a simple way. However, these approaches usually do not perform as well as expected due to the fact that the generation process of the candidate labels is always instance-dependent. Therefore, it deserves to be modeled in a refined way. In this paper, we consider instance-dependent PLL and assume that the generation process of the candidate labels could decompose into two sequential parts, where the correct label emerges first in the mind of the annotator but then the incorrect labels related to the feature are also selected with the correct label as candidate labels due to uncertainty of labeling. Motivated by this consideration, we propose a novel PLL method that performs Maximum A Posterior (MAP) based on an explicitly modeled generation process of candidate labels via decomposed probability distribution models. Extensive experiments on manually corrupted benchmark datasets and real-world datasets validate the effectiveness of the proposed method. Source code is available at https://github.com/palm-ml/idgp. △ Less

Submitted 1 February, 2023; v1 submitted 8 April, 2022; originally announced April 2022.

Comments: ICLR 2023 Spotlight

arXiv:2203.05314 [pdf, other]

SoK: On the Semantic AI Security in Autonomous Driving

Authors: Junjie Shen, Ningfei Wang, Ziwen Wan, Yunpeng Luo, Takami Sato, Zhisheng Hu, Xinyang Zhang, Shengjian Guo, Zhenyu Zhong, Kang Li, Ziming Zhao, Chunming Qiao, Qi Alfred Chen

Abstract: Autonomous Driving (AD) systems rely on AI components to make safety and correct driving decisions. Unfortunately, today's AI algorithms are known to be generally vulnerable to adversarial attacks. However, for such AI component-level vulnerabilities to be semantically impactful at the system level, it needs to address non-trivial semantic gaps both (1) from the system-level attack input spaces to… ▽ More Autonomous Driving (AD) systems rely on AI components to make safety and correct driving decisions. Unfortunately, today's AI algorithms are known to be generally vulnerable to adversarial attacks. However, for such AI component-level vulnerabilities to be semantically impactful at the system level, it needs to address non-trivial semantic gaps both (1) from the system-level attack input spaces to those at AI component level, and (2) from AI component-level attack impacts to those at the system level. In this paper, we define such research space as semantic AI security as opposed to generic AI security. Over the past 5 years, increasingly more research works are performed to tackle such semantic AI security challenges in AD context, which has started to show an exponential growth trend. In this paper, we perform the first systematization of knowledge of such growing semantic AD AI security research space. In total, we collect and analyze 53 such papers, and systematically taxonomize them based on research aspects critical for the security field. We summarize 6 most substantial scientific gaps observed based on quantitative comparisons both vertically among existing AD AI security works and horizontally with security works from closely-related domains. With these, we are able to provide insights and potential future directions not only at the design level, but also at the research goal, methodology, and community levels. To address the most critical scientific methodology-level gap, we take the initiative to develop an open-source, uniform, and extensible system-driven evaluation platform, named PASS, for the semantic AD AI security research community. We also use our implemented platform prototype to showcase the capabilities and benefits of such a platform using representative semantic AD AI attacks. △ Less

Submitted 26 April, 2024; v1 submitted 10 March, 2022; originally announced March 2022.

Comments: Project website: https://sites.google.com/view/cav-sec/pass

arXiv:2201.00693 [pdf, other]

Multimodal Entity Tagging with Multimodal Knowledge Base

Authors: Hao Peng, Hang Li, Lei Hou, Juanzi Li, Chao Qiao

Abstract: To enhance research on multimodal knowledge base and multimodal information processing, we propose a new task called multimodal entity tagging (MET) with a multimodal knowledge base (MKB). We also develop a dataset for the problem using an existing MKB. In an MKB, there are entities and their associated texts and images. In MET, given a text-image pair, one uses the information in the MKB to autom… ▽ More To enhance research on multimodal knowledge base and multimodal information processing, we propose a new task called multimodal entity tagging (MET) with a multimodal knowledge base (MKB). We also develop a dataset for the problem using an existing MKB. In an MKB, there are entities and their associated texts and images. In MET, given a text-image pair, one uses the information in the MKB to automatically identify the related entity in the text-image pair. We solve the task by using the information retrieval paradigm and implement several baselines using state-of-the-art methods in NLP and CV. We conduct extensive experiments and make analyses on the experimental results. The results show that the task is challenging, but current technologies can achieve relatively high performance. We will release the dataset, code, and models for future research. △ Less

Submitted 28 July, 2022; v1 submitted 21 December, 2021; originally announced January 2022.

Comments: 11 pages, 4 figures

arXiv:2110.12911 [pdf, other]

Instance-Dependent Partial Label Learning

Authors: Ning Xu, Congyu Qiao, Xin Geng, Min-Ling Zhang

Abstract: Partial label learning (PLL) is a typical weakly supervised learning problem, where each training example is associated with a set of candidate labels among which only one is true. Most existing PLL approaches assume that the incorrect labels in each training example are randomly picked as the candidate labels. However, this assumption is not realistic since the candidate labels are always instanc… ▽ More Partial label learning (PLL) is a typical weakly supervised learning problem, where each training example is associated with a set of candidate labels among which only one is true. Most existing PLL approaches assume that the incorrect labels in each training example are randomly picked as the candidate labels. However, this assumption is not realistic since the candidate labels are always instance-dependent. In this paper, we consider instance-dependent PLL and assume that each example is associated with a latent label distribution constituted by the real number of each label, representing the degree to each label describing the feature. The incorrect label with a high degree is more likely to be annotated as the candidate label. Therefore, the latent label distribution is the essential labeling information in partially labeled examples and worth being leveraged for predictive model training. Motivated by this consideration, we propose a novel PLL method that recovers the label distribution as a label enhancement (LE) process and trains the predictive model iteratively in every epoch. Specifically, we assume the true posterior density of the latent label distribution takes on the variational approximate Dirichlet density parameterized by an inference model. Then the evidence lower bound is deduced for optimizing the inference model and the label distributions generated from the variational posterior are utilized for training the predictive model. Experiments on benchmark and real-world datasets validate the effectiveness of the proposed method. Source code is available at https://github.com/palm-ml/valen. △ Less

Submitted 25 October, 2021; v1 submitted 25 October, 2021; originally announced October 2021.

Comments: NeurIPS 2021 Spotlight

arXiv:2106.12204 [pdf, other]

Real-time Instance Segmentation with Discriminative Orientation Maps

Authors: Wentao Du, Zhiyu Xiang, Shuya Chen, Chengyu Qiao, Yiman Chen, Tingming Bai

Abstract: Although instance segmentation has made considerable advancement over recent years, it's still a challenge to design high accuracy algorithms with real-time performance. In this paper, we propose a real-time instance segmentation framework termed OrienMask. Upon the one-stage object detector YOLOv3, a mask head is added to predict some discriminative orientation maps, which are explicitly defined… ▽ More Although instance segmentation has made considerable advancement over recent years, it's still a challenge to design high accuracy algorithms with real-time performance. In this paper, we propose a real-time instance segmentation framework termed OrienMask. Upon the one-stage object detector YOLOv3, a mask head is added to predict some discriminative orientation maps, which are explicitly defined as spatial offset vectors for both foreground and background pixels. Thanks to the discrimination ability of orientation maps, masks can be recovered without the need for extra foreground segmentation. All instances that match with the same anchor size share a common orientation map. This special sharing strategy reduces the amortized memory utilization for mask predictions but without loss of mask granularity. Given the surviving box predictions after NMS, instance masks can be concurrently constructed from the corresponding orientation maps with low complexity. Owing to the concise design for mask representation and its effective integration with the anchor-based object detector, our method is qualified under real-time conditions while maintaining competitive accuracy. Experiments on COCO benchmark show that OrienMask achieves 34.8 mask AP at the speed of 42.7 fps evaluated with a single RTX 2080 Ti. The code is available at https://github.com/duwt/OrienMask. △ Less

Submitted 1 August, 2021; v1 submitted 23 June, 2021; originally announced June 2021.

arXiv:2105.08109 [pdf, ps, other]

Quantum Transport Protocols for Distributed Quantum Computing

Authors: Yangming Zhao, Chunming Qiao

Abstract: Quantum computing holds a great promise and this work proposes to use new quantum data networks (QDNs) to connect multiple small quantum computers to form a cluster. Such a QDN differs from existing QKD networks in that the former must deliver data qubits reliably within itself. Two types of QDNs are studied, one using teleportation and the other using tell-and-go (TAG) to exchange quantum data. T… ▽ More Quantum computing holds a great promise and this work proposes to use new quantum data networks (QDNs) to connect multiple small quantum computers to form a cluster. Such a QDN differs from existing QKD networks in that the former must deliver data qubits reliably within itself. Two types of QDNs are studied, one using teleportation and the other using tell-and-go (TAG) to exchange quantum data. Two corresponding quantum transport protocols (QTPs), named Tele-QTP and TAG-QTP, are proposed to address many unique design challenges involved in reliable delivery of data qubits, and constraints imposed by quantum physics laws such as the no-cloning theorem, and limited availability of quantum memory. The proposed Tele-QTP and TAG-QTP are the first transport layer protocols for QDNs, complementing other works on the network protocol stack. Tele-QTP and TAG-QTP have novel mechanisms to support congestion-free and reliable delivery of streams of data qubits by managing the limited quantum memory at end hosts as well as intermediate nodes. Both analysis and extensive simulations show that the proposed QTPs can achieve a high throughput and fairness. This study also offers new insights into potential tradeoffs involved in using the two methods, teleportation and TAG, in two types of QDNs. △ Less

Submitted 25 May, 2021; v1 submitted 17 May, 2021; originally announced May 2021.

Comments: 16 pages, 27 figures, will be submitted to an ACM conference

arXiv:2102.07373 [pdf, other]

Generation For Adaption: A GAN-Based Approach for 3D Domain Adaption with Point Cloud Data

Authors: Junxuan Huang, Junsong Yuan, Chunming Qiao

Abstract: Recent deep networks have achieved good performance on a variety of 3d points classification tasks. However, these models often face challenges in "wild tasks".There are considerable differences between the labeled training/source data collected by one Lidar and unseen test/target data collected by a different Lidar. Unsupervised domain adaptation (UDA) seeks to overcome such a problem without tar… ▽ More Recent deep networks have achieved good performance on a variety of 3d points classification tasks. However, these models often face challenges in "wild tasks".There are considerable differences between the labeled training/source data collected by one Lidar and unseen test/target data collected by a different Lidar. Unsupervised domain adaptation (UDA) seeks to overcome such a problem without target domain labels.Instead of aligning features between source data and target data,we propose a method that use a Generative adversarial network to generate synthetic data from the source domain so that the output is close to the target domain.Experiments show that our approach performs better than other state-of-the-art UDA methods in three popular 3D object/scene datasets (i.e., ModelNet, ShapeNet and ScanNet) for cross-domain 3D objects classification. △ Less

Submitted 10 May, 2021; v1 submitted 15 February, 2021; originally announced February 2021.

arXiv:2011.10947 [pdf, other]

Who is in Control? Practical Physical Layer Attack and Defense for mmWave based Sensing in Autonomous Vehicles

Authors: Zhi Sun, Sarankumar Balakrishnan, Lu Su, Arupjyoti Bhuyan, Pu Wang, Chunming Qiao

Abstract: With the wide bandwidths in millimeter wave (mmWave) frequency band that results in unprecedented accuracy, mmWave sensing has become vital for many applications, especially in autonomous vehicles (AVs). In addition, mmWave sensing has superior reliability compared to other sensing counterparts such as camera and LiDAR, which is essential for safety-critical driving. Therefore, it is critical to u… ▽ More With the wide bandwidths in millimeter wave (mmWave) frequency band that results in unprecedented accuracy, mmWave sensing has become vital for many applications, especially in autonomous vehicles (AVs). In addition, mmWave sensing has superior reliability compared to other sensing counterparts such as camera and LiDAR, which is essential for safety-critical driving. Therefore, it is critical to understand the security vulnerabilities and improve the security and reliability of mmWave sensing in AVs. To this end, we perform the end-to-end security analysis of a mmWave-based sensing system in AVs, by designing and implementing practical physical layer attack and defense strategies in a state-of-the-art mmWave testbed and an AV testbed in real-world settings. Various strategies are developed to take control of the victim AV by spoofing its mmWave sensing module, including adding fake obstacles at arbitrary locations and faking the locations of existing obstacles. Five real-world attack scenarios are constructed to spoof the victim AV and force it to make dangerous driving decisions leading to a fatal crash. Field experiments are conducted to study the impact of the various attack scenarios using a Lincoln MKZ-based AV testbed, which validate that the attacker can indeed assume control of the victim AV to compromise its security and safety. To defend the attacks, we design and implement a challenge-response authentication scheme and a RF fingerprinting scheme to reliably detect aforementioned spoofing attacks. △ Less

Submitted 22 November, 2020; originally announced November 2020.

arXiv:2011.04988 [pdf, other]

AIM 2020 Challenge on Rendering Realistic Bokeh

Authors: Andrey Ignatov, Radu Timofte, Ming Qian, Congyu Qiao, Jiamin Lin, Zhenyu Guo, Chenghua Li, Cong Leng, Jian Cheng, Juewen Peng, Xianrui Luo, Ke Xian, Zi** Wu, Zhiguo Cao, Densen Puthussery, Jiji C V, Hrishikesh P S, Melvin Kuriakose, Saikat Dutta, Sourya Dipta Das, Nisarg A. Shah, Kuldeep Purohit, Praveen Kandula, Maitreya Suin, A. N. Rajagopalan , et al. (10 additional authors not shown)

Abstract: This paper reviews the second AIM realistic bokeh effect rendering challenge and provides the description of the proposed solutions and results. The participating teams were solving a real-world bokeh simulation problem, where the goal was to learn a realistic shallow focus technique using a large-scale EBB! bokeh dataset consisting of 5K shallow / wide depth-of-field image pairs captured using th… ▽ More This paper reviews the second AIM realistic bokeh effect rendering challenge and provides the description of the proposed solutions and results. The participating teams were solving a real-world bokeh simulation problem, where the goal was to learn a realistic shallow focus technique using a large-scale EBB! bokeh dataset consisting of 5K shallow / wide depth-of-field image pairs captured using the Canon 7D DSLR camera. The participants had to render bokeh effect based on only one single frame without any additional data from other cameras or sensors. The target metric used in this challenge combined the runtime and the perceptual quality of the solutions measured in the user study. To ensure the efficiency of the submitted models, we measured their runtime on standard desktop CPUs as well as were running the models on smartphone GPUs. The proposed solutions significantly improved the baseline results, defining the state-of-the-art for practical bokeh effect rendering problem. △ Less

Submitted 10 November, 2020; originally announced November 2020.

Comments: Published in ECCV 2020 Workshop (Advances in Image Manipulation), https://data.vision.ee.ethz.ch/cvl/aim20/

arXiv:2011.02242 [pdf, other]

doi 10.1007/978-3-030-67070-2_14

BGGAN: Bokeh-Glass Generative Adversarial Network for Rendering Realistic Bokeh

Authors: Ming Qian, Congyu Qiao, Jiamin Lin, Zhenyu Guo, Chenghua Li, Cong Leng, Jian Cheng

Abstract: A photo captured with bokeh effect often means objects in focus are sharp while the out-of-focus areas are all blurred. DSLR can easily render this kind of effect naturally. However, due to the limitation of sensors, smartphones cannot capture images with depth-of-field effects directly. In this paper, we propose a novel generator called Glass-Net, which generates bokeh images not relying on compl… ▽ More A photo captured with bokeh effect often means objects in focus are sharp while the out-of-focus areas are all blurred. DSLR can easily render this kind of effect naturally. However, due to the limitation of sensors, smartphones cannot capture images with depth-of-field effects directly. In this paper, we propose a novel generator called Glass-Net, which generates bokeh images not relying on complex hardware. Meanwhile, the GAN-based method and perceptual loss are combined for rendering a realistic bokeh effect in the stage of finetuning the model. Moreover, Instance Normalization(IN) is reimplemented in our network, which ensures our tflite model with IN can be accelerated on smartphone GPU. Experiments show that our method is able to render a high-quality bokeh effect and process one $1024 \times 1536$ pixel image in 1.9 seconds on all smartphone chipsets. This approach ranked First in AIM 2020 Rendering Realistic Bokeh Challenge Track 1 \& Track 2. △ Less

Submitted 4 November, 2020; originally announced November 2020.

Comments: accepted by ECCV workshop 2020

Report number: 2020: 229-244

Journal ref: Proceedings of the European Conference on Computer Vision Workshops. 2020: 229-244

arXiv:2007.00916 [pdf, other]

doi 10.18653/v1/2020.acl-main.17

Fact-based Text Editing

Authors: Hayate Iso, Chao Qiao, Hang Li

Abstract: We propose a novel text editing task, referred to as \textit{fact-based text editing}, in which the goal is to revise a given document to better describe the facts in a knowledge base (e.g., several triples). The task is important in practice because reflecting the truth is a common requirement in text editing. First, we propose a method for automatically generating a dataset for research on fact-… ▽ More We propose a novel text editing task, referred to as \textit{fact-based text editing}, in which the goal is to revise a given document to better describe the facts in a knowledge base (e.g., several triples). The task is important in practice because reflecting the truth is a common requirement in text editing. First, we propose a method for automatically generating a dataset for research on fact-based text editing, where each instance consists of a draft text, a revised text, and several facts represented in triples. We apply the method into two public table-to-text datasets, obtaining two new datasets consisting of 233k and 37k instances, respectively. Next, we propose a new neural network architecture for fact-based text editing, called \textsc{FactEditor}, which edits a draft text by referring to given facts using a buffer, a stream, and a memory. A straightforward approach to address the problem would be to employ an encoder-decoder model. Our experimental results on the two datasets show that \textsc{FactEditor} outperforms the encoder-decoder approach in terms of fidelity and fluency. The results also show that \textsc{FactEditor} conducts inference faster than the encoder-decoder approach. △ Less

Submitted 2 July, 2020; originally announced July 2020.

Comments: ACL 2020

arXiv:1905.02607 [pdf, other]

doi 10.1145/3328912

PocketCare: Tracking the Flu with Mobile Phones using Partial Observations of Proximity and Symptoms

Authors: Wen Dong, Tong Guan, Bruno Lepri, Chunming Qiao

Abstract: Mobile phones provide a powerful sensing platform that researchers may adopt to understand proximity interactions among people and the diffusion, through these interactions, of diseases, behaviors, and opinions. However, it remains a challenge to track the proximity-based interactions of a whole community and then model the social diffusion of diseases and behaviors starting from the observations… ▽ More Mobile phones provide a powerful sensing platform that researchers may adopt to understand proximity interactions among people and the diffusion, through these interactions, of diseases, behaviors, and opinions. However, it remains a challenge to track the proximity-based interactions of a whole community and then model the social diffusion of diseases and behaviors starting from the observations of a small fraction of the volunteer population. In this paper, we propose a novel approach that tries to connect together these sparse observations using a model of how individuals interact with each other and how social interactions happen in terms of a sequence of proximity interactions. We apply our approach to track the spreading of flu in the spatial-proximity network of a 3000-people university campus by mobilizing 300 volunteers from this population to monitor nearby mobile phones through Bluetooth scanning and to daily report flu symptoms about and around them. Our aim is to predict the likelihood for an individual to get flu based on how often her/his daily routine intersects with those of the volunteers. Thus, we use the daily routines of the volunteers to build a model of the volunteers as well as of the non-volunteers. Our results show that we can predict flu infection two weeks ahead of time with an average precision from 0.24 to 0.35 depending on the amount of information. This precision is six to nine times higher than with a random guess model. At the population level, we can predict infectious population in a two-week window with an r-squared value of 0.95 (a random-guess model obtains an r-squared value of 0.2). These results point to an innovative approach for tracking individuals who have interacted with people showing symptoms, allowing us to warn those in danger of infection and to inform health researchers about the progression of contact-induced diseases. △ Less

Submitted 7 May, 2019; originally announced May 2019.

arXiv:1810.10752 [pdf, other]

Word Embedding based Edit Distance

Authors: Yilin Niu, Chao Qiao, Hang Li, Minlie Huang

Abstract: Text similarity calculation is a fundamental problem in natural language processing and related fields. In recent years, deep neural networks have been developed to perform the task and high performances have been achieved. The neural networks are usually trained with labeled data in supervised learning, and creation of labeled data is usually very costly. In this short paper, we address unsupervi… ▽ More Text similarity calculation is a fundamental problem in natural language processing and related fields. In recent years, deep neural networks have been developed to perform the task and high performances have been achieved. The neural networks are usually trained with labeled data in supervised learning, and creation of labeled data is usually very costly. In this short paper, we address unsupervised learning for text similarity calculation. We propose a new method called Word Embedding based Edit Distance (WED), which incorporates word embedding into edit distance. Experiments on three benchmark datasets show WED outperforms state-of-the-art unsupervised methods including edit distance, TF-IDF based cosine, word embedding based cosine, Jaccard index, etc. △ Less

Submitted 25 October, 2018; originally announced October 2018.

arXiv:1803.11363 [pdf]

Discovering Student Behavior Patterns from Event Logs: Preliminary Results on A Novel Probabilistic Latent Variable Model

Authors: Chen Qiao, Xiao Hu

Abstract: Digital platforms enable the observation of learning behaviors through fine-grained log traces, offering more detailed clues for analysis. In addition to previous descriptive and predictive log analysis, this study aims to simultaneously model learner activities, event time spans, and interaction levels using the proposed Hidden Behavior Traits Model (HBTM). We evaluated model performance and expl… ▽ More Digital platforms enable the observation of learning behaviors through fine-grained log traces, offering more detailed clues for analysis. In addition to previous descriptive and predictive log analysis, this study aims to simultaneously model learner activities, event time spans, and interaction levels using the proposed Hidden Behavior Traits Model (HBTM). We evaluated model performance and explored their capability of clustering learners on a public dataset, and tried to interpret the machine recognized latent behavior patterns. Quantitative and qualitative results demonstrated the promising value of HBTM. Results of this study can contribute to the literature of online learner modeling and learning service planning. △ Less

Submitted 30 March, 2018; originally announced March 2018.

Comments: 5 pages, accepted as a full paper to 18th IEEE International Conference on Advanced Learning Technologies (ICALT 2018)

arXiv:1607.08025 [pdf, ps, other]

Mutual Information Optimally Local Private Discrete Distribution Estimation

Authors: Shaowei Wang, Liusheng Huang, Pengzhan Wang, Yiwen Nie, Hongli Xu, Wei Yang, Xiang-Yang Li, Chunming Qiao

Abstract: Consider statistical learning (e.g. discrete distribution estimation) with local $ε$-differential privacy, which preserves each data provider's privacy locally, we aim to optimize statistical data utility under the privacy constraints. Specifically, we study maximizing mutual information between a provider's data and its private view, and give the exact mutual information bound along with an attai… ▽ More Consider statistical learning (e.g. discrete distribution estimation) with local $ε$-differential privacy, which preserves each data provider's privacy locally, we aim to optimize statistical data utility under the privacy constraints. Specifically, we study maximizing mutual information between a provider's data and its private view, and give the exact mutual information bound along with an attainable mechanism: $k$-subset mechanism as results. The mutual information optimal mechanism randomly outputs a size $k$ subset of the original data domain with delicate probability assignment, where $k$ varies with the privacy level $ε$ and the data domain size $d$. After analysing the limitations of existing local private mechanisms from mutual information perspective, we propose an efficient implementation of the $k$-subset mechanism for discrete distribution estimation, and show its optimality guarantees over existing approaches. △ Less

Submitted 27 July, 2016; originally announced July 2016.

Comments: submitted to NIPS2016

arXiv:1202.5282 [pdf, ps, other]

How to Bypass Verified Boot Security in Chromium OS

Authors: Mohammad Iftekhar Husain, Lokesh Mandvekar, Chunming Qiao, Ramalingam Sridhar

Abstract: Verified boot is an interesting feature of Chromium OS that supposedly can detect any modification in the root file system (rootfs) by a dedicated adversary. However, by exploiting a design flaw in verified boot, we show that an adversary can replace the original rootfs by a malicious rootfs containing exploits such as a spyware or keylogger and still pass the verified boot process. The exploit is… ▽ More Verified boot is an interesting feature of Chromium OS that supposedly can detect any modification in the root file system (rootfs) by a dedicated adversary. However, by exploiting a design flaw in verified boot, we show that an adversary can replace the original rootfs by a malicious rootfs containing exploits such as a spyware or keylogger and still pass the verified boot process. The exploit is based on the fact that a dedicated adversary can replace the rootfs and the corresponding verification information in the bootloader. We experimentally demonstrate an attack using both the base and developer version of Chromium OS in which the adversary installs a spyware in the target system to send cached user data to the attacker machine in plain text which are otherwise encrypted, and thus inaccessible. We also demonstrate techniques to mitigate this vulnerability. △ Less

Submitted 2 June, 2012; v1 submitted 23 February, 2012; originally announced February 2012.

Comments: Update information about Chromium OS. Added new and advanced exploits. Added mitigation techniques and evaluation

Showing 1–27 of 27 results for author: Qiao, C