Search | arXiv e-print repository

Diffusion-Refined VQA Annotations for Semi-Supervised Gaze Following

Authors: Qiaomu Miao, Alexandros Graikos, **gwei Zhang, Sounak Mondal, Minh Hoai, Dimitris Samaras

Abstract: Training gaze following models requires a large number of images with gaze target coordinates annotated by human annotators, which is a laborious and inherently ambiguous process. We propose the first semi-supervised method for gaze following by introducing two novel priors to the task. We obtain the first prior using a large pretrained Visual Question Answering (VQA) model, where we compute Grad-… ▽ More Training gaze following models requires a large number of images with gaze target coordinates annotated by human annotators, which is a laborious and inherently ambiguous process. We propose the first semi-supervised method for gaze following by introducing two novel priors to the task. We obtain the first prior using a large pretrained Visual Question Answering (VQA) model, where we compute Grad-CAM heatmaps by `prompting' the VQA model with a gaze following question. These heatmaps can be noisy and not suited for use in training. The need to refine these noisy annotations leads us to incorporate a second prior. We utilize a diffusion model trained on limited human annotations and modify the reverse sampling process to refine the Grad-CAM heatmaps. By tuning the diffusion process we achieve a trade-off between the human annotation prior and the VQA heatmap prior, which retains the useful VQA prior information while exhibiting similar properties to the training data distribution. Our method outperforms simple pseudo-annotation generation baselines on the GazeFollow image dataset. More importantly, our pseudo-annotation strategy, applied to a widely used supervised gaze following model (VAT), reduces the annotation need by 50%. Our method also performs the best on the VideoAttentionTarget dataset. △ Less

Submitted 4 June, 2024; originally announced June 2024.

arXiv:2405.19957 [pdf, other]

PLA4D: Pixel-Level Alignments for Text-to-4D Gaussian Splatting

Authors: Qiaowei Miao, Yawei Luo, Yi Yang

Abstract: As text-conditioned diffusion models (DMs) achieve breakthroughs in image, video, and 3D generation, the research community's focus has shifted to the more challenging task of text-to-4D synthesis, which introduces a temporal dimension to generate dynamic 3D objects. In this context, we identify Score Distillation Sampling (SDS), a widely used technique for text-to-3D synthesis, as a significant h… ▽ More As text-conditioned diffusion models (DMs) achieve breakthroughs in image, video, and 3D generation, the research community's focus has shifted to the more challenging task of text-to-4D synthesis, which introduces a temporal dimension to generate dynamic 3D objects. In this context, we identify Score Distillation Sampling (SDS), a widely used technique for text-to-3D synthesis, as a significant hindrance to text-to-4D performance due to its Janus-faced and texture-unrealistic problems coupled with high computational costs. In this paper, we propose \textbf{P}ixel-\textbf{L}evel \textbf{A}lignments for Text-to-\textbf{4D} Gaussian Splatting (\textbf{PLA4D}), a novel method that utilizes text-to-video frames as explicit pixel alignment targets to generate static 3D objects and inject motion into them. Specifically, we introduce Focal Alignment to calibrate camera poses for rendering and GS-Mesh Contrastive Learning to distill geometry priors from rendered image contrasts at the pixel level. Additionally, we develop Motion Alignment using a deformation network to drive changes in Gaussians and implement Reference Refinement for smooth 4D object surfaces. These techniques enable 4D Gaussian Splatting to align geometry, texture, and motion with generated videos at the pixel level. Compared to previous methods, PLA4D produces synthesized outputs with better texture details in less time and effectively mitigates the Janus-faced problem. PLA4D is fully implemented using open-source models, offering an accessible, user-friendly, and promising direction for 4D digital content creation. Our project page: https://miaoqiaowei.github.io/PLA4D/. △ Less

Submitted 5 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.17929 [pdf, other]

Towards Unified Robustness Against Both Backdoor and Adversarial Attacks

Authors: Zhenxing Niu, Yuyao Sun, Qiguang Miao, Rong **, Gang Hua

Abstract: Deep Neural Networks (DNNs) are known to be vulnerable to both backdoor and adversarial attacks. In the literature, these two types of attacks are commonly treated as distinct robustness problems and solved separately, since they belong to training-time and inference-time attacks respectively. However, this paper revealed that there is an intriguing connection between them: (1) planting a backdoor… ▽ More Deep Neural Networks (DNNs) are known to be vulnerable to both backdoor and adversarial attacks. In the literature, these two types of attacks are commonly treated as distinct robustness problems and solved separately, since they belong to training-time and inference-time attacks respectively. However, this paper revealed that there is an intriguing connection between them: (1) planting a backdoor into a model will significantly affect the model's adversarial examples; (2) for an infected model, its adversarial examples have similar features as the triggered images. Based on these observations, a novel Progressive Unified Defense (PUD) algorithm is proposed to defend against backdoor and adversarial attacks simultaneously. Specifically, our PUD has a progressive model purification scheme to jointly erase backdoors and enhance the model's adversarial robustness. At the early stage, the adversarial examples of infected models are utilized to erase backdoors. With the backdoor gradually erased, our model purification can naturally turn into a stage to boost the model's robustness against adversarial attacks. Besides, our PUD algorithm can effectively identify poisoned images, which allows the initial extra dataset not to be completely clean. Extensive experimental results show that, our discovered connection between backdoor and adversarial attacks is ubiquitous, no matter what type of backdoor attack. The proposed PUD outperforms the state-of-the-art backdoor defense, including the model repairing-based and data filtering-based methods. Besides, it also has the ability to compete with the most advanced adversarial defense methods. △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2405.14905 [pdf, other]

Structural Entities Extraction and Patient Indications Incorporation for Chest X-ray Report Generation

Authors: Kang Liu, Zhuoqi Ma, Xiaolu Kang, Zhusi Zhong, Zhicheng Jiao, Grayson Baird, Harrison Bai, Qiguang Miao

Abstract: The automated generation of imaging reports proves invaluable in alleviating the workload of radiologists. A clinically applicable reports generation algorithm should demonstrate its effectiveness in producing reports that accurately describe radiology findings and attend to patient-specific indications. In this paper, we introduce a novel method, \textbf{S}tructural \textbf{E}ntities extraction a… ▽ More The automated generation of imaging reports proves invaluable in alleviating the workload of radiologists. A clinically applicable reports generation algorithm should demonstrate its effectiveness in producing reports that accurately describe radiology findings and attend to patient-specific indications. In this paper, we introduce a novel method, \textbf{S}tructural \textbf{E}ntities extraction and patient indications \textbf{I}ncorporation (SEI) for chest X-ray report generation. Specifically, we employ a structural entities extraction (SEE) approach to eliminate presentation-style vocabulary in reports and improve the quality of factual entity sequences. This reduces the noise in the following cross-modal alignment module by aligning X-ray images with factual entity sequences in reports, thereby enhancing the precision of cross-modal alignment and further aiding the model in gradient-free retrieval of similar historical cases. Subsequently, we propose a cross-modal fusion network to integrate information from X-ray images, similar historical cases, and patient-specific indications. This process allows the text decoder to attend to discriminative features of X-ray images, assimilate historical diagnostic information from similar cases, and understand the examination intention of patients. This, in turn, assists in triggering the text decoder to produce high-quality reports. Experiments conducted on MIMIC-CXR validate the superiority of SEI over state-of-the-art approaches on both natural language generation and clinical efficacy metrics. △ Less

Submitted 22 May, 2024; originally announced May 2024.

Comments: The code is available at https://github.com/mk-runner/SEI-Temp or https://github.com/mk-runner/SEI

arXiv:2405.11151 [pdf, other]

Multi-scale Information Sharing and Selection Network with Boundary Attention for Polyp Segmentation

Authors: Xiaolu Kang, Zhuoqi Ma, Kang Liu, Yunan Li, Qiguang Miao

Abstract: Polyp segmentation for colonoscopy images is of vital importance in clinical practice. It can provide valuable information for colorectal cancer diagnosis and surgery. While existing methods have achieved relatively good performance, polyp segmentation still faces the following challenges: (1) Varying lighting conditions in colonoscopy and differences in polyp locations, sizes, and morphologies. (… ▽ More Polyp segmentation for colonoscopy images is of vital importance in clinical practice. It can provide valuable information for colorectal cancer diagnosis and surgery. While existing methods have achieved relatively good performance, polyp segmentation still faces the following challenges: (1) Varying lighting conditions in colonoscopy and differences in polyp locations, sizes, and morphologies. (2) The indistinct boundary between polyps and surrounding tissue. To address these challenges, we propose a Multi-scale information sharing and selection network (MISNet) for polyp segmentation task. We design a Selectively Shared Fusion Module (SSFM) to enforce information sharing and active selection between low-level and high-level features, thereby enhancing model's ability to capture comprehensive information. We then design a Parallel Attention Module (PAM) to enhance model's attention to boundaries, and a Balancing Weight Module (BWM) to facilitate the continuous refinement of boundary segmentation in the bottom-up process. Experiments on five polyp segmentation datasets demonstrate that MISNet successfully improved the accuracy and clarity of segmentation result, outperforming state-of-the-art methods. △ Less

Submitted 17 May, 2024; originally announced May 2024.

arXiv:2405.09586 [pdf, other]

Factual Serialization Enhancement: A Key Innovation for Chest X-ray Report Generation

Authors: Kang Liu, Zhuoqi Ma, Mengmeng Liu, Zhicheng Jiao, Xiaolu Kang, Qiguang Miao, Kun Xie

Abstract: The automation of writing imaging reports is a valuable tool for alleviating the workload of radiologists. Crucial steps in this process involve the cross-modal alignment between medical images and reports, as well as the retrieval of similar historical cases. However, the presence of presentation-style vocabulary (e.g., sentence structure and grammar) in reports poses challenges for cross-modal a… ▽ More The automation of writing imaging reports is a valuable tool for alleviating the workload of radiologists. Crucial steps in this process involve the cross-modal alignment between medical images and reports, as well as the retrieval of similar historical cases. However, the presence of presentation-style vocabulary (e.g., sentence structure and grammar) in reports poses challenges for cross-modal alignment. Additionally, existing methods for similar historical cases retrieval face suboptimal performance owing to the modal gap issue. In response, this paper introduces a novel method, named Factual Serialization Enhancement (FSE), for chest X-ray report generation. FSE begins with the structural entities approach to eliminate presentation-style vocabulary in reports, providing specific input for our model. Then, uni-modal features are learned through cross-modal alignment between images and factual serialization in reports. Subsequently, we present a novel approach to retrieve similar historical cases from the training set, leveraging aligned image features. These features implicitly preserve semantic similarity with their corresponding reference reports, enabling us to calculate similarity solely among aligned features. This effectively eliminates the modal gap issue for knowledge retrieval without the requirement for disease labels. Finally, the cross-modal fusion network is employed to query valuable information from these cases, enriching image features and aiding the text decoder in generating high-quality reports. Experiments on MIMIC-CXR and IU X-ray datasets from both specific and general scenarios demonstrate the superiority of FSE over state-of-the-art approaches in both natural language generation and clinical efficacy metrics. △ Less

Submitted 15 May, 2024; originally announced May 2024.

arXiv:2404.17510 [pdf, other]

Kerr Nonlinearity Induced Nonreciprocity in dissipatively coupled resonators

Authors: Qingtian Miao, G. S. Agarwal

Abstract: Nonlinearity induced nonreciprocity is studied in a system comprising two resonators coupled to a one-dimensional waveguide when the linear system does not exhibit nonreciprocity. The analysis is based on the Hamiltonian of the coupled system and includes the dissipative coupling between the waveguide and resonators, along with the input-output relations. We consider a large number of scenarios wh… ▽ More Nonlinearity induced nonreciprocity is studied in a system comprising two resonators coupled to a one-dimensional waveguide when the linear system does not exhibit nonreciprocity. The analysis is based on the Hamiltonian of the coupled system and includes the dissipative coupling between the waveguide and resonators, along with the input-output relations. We consider a large number of scenarios which can lead to nonreciprocity. We pay special attention to the case when the linear system does not exhibit nonreciprocal behavior. In this case, we show how very significant nonreciprocal behavior can result from Kerr nonlinearities. We find that the bistability of the nonlinear system can aid in achieving large nonreciprocity. Additionally, We bring out nonreciprocity in the excitation of each resonator, which can be monitored independently. Our results highlight the profound influence of nonlinearity on nonreciprocal behavior, offering a new avenue for controlling light propagation in integrated photonic circuits. Nonlinearity induced nonreciprocity would lead to significant nonreciprocity in quantum fluctuations when our system is treated quantum mechanically. △ Less

Submitted 26 April, 2024; originally announced April 2024.

arXiv:2404.10357 [pdf, other]

Optimization of Prompt Learning via Multi-Knowledge Representation for Vision-Language Models

Authors: Enming Zhang, Bingke Zhu, Yingying Chen, Qinghai Miao, Ming Tang, **qiao Wang

Abstract: Vision-Language Models (VLMs), such as CLIP, play a foundational role in various cross-modal applications. To fully leverage VLMs' potential in adapting to downstream tasks, context optimization methods like Prompt Tuning are essential. However, one key limitation is the lack of diversity in prompt templates, whether they are hand-crafted or learned through additional modules. This limitation rest… ▽ More Vision-Language Models (VLMs), such as CLIP, play a foundational role in various cross-modal applications. To fully leverage VLMs' potential in adapting to downstream tasks, context optimization methods like Prompt Tuning are essential. However, one key limitation is the lack of diversity in prompt templates, whether they are hand-crafted or learned through additional modules. This limitation restricts the capabilities of pretrained VLMs and can result in incorrect predictions in downstream tasks. To address this challenge, we propose Context Optimization with Multi-Knowledge Representation (CoKnow), a framework that enhances Prompt Learning for VLMs with rich contextual knowledge. To facilitate CoKnow during inference, we trained lightweight semantic knowledge mappers, which are capable of generating Multi-Knowledge Representation for an input image without requiring additional priors. Experimentally, We conducted extensive experiments on 11 publicly available datasets, demonstrating that CoKnow outperforms a series of previous methods. We will make all resources open-source: https://github.com/EMZucas/CoKnow. △ Less

Submitted 16 April, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

arXiv:2403.02624 [pdf, other]

Pareto-Optimal Estimation and Policy Learning on Short-term and Long-term Treatment Effects

Authors: Yingrong Wang, Anpeng Wu, Haoxuan Li, Weiming Liu, Qiaowei Miao, Ruoxuan Xiong, Fei Wu, Kun Kuang

Abstract: This paper focuses on develo** Pareto-optimal estimation and policy learning to identify the most effective treatment that maximizes the total reward from both short-term and long-term effects, which might conflict with each other. For example, a higher dosage of medication might increase the speed of a patient's recovery (short-term) but could also result in severe long-term side effects. Altho… ▽ More This paper focuses on develo** Pareto-optimal estimation and policy learning to identify the most effective treatment that maximizes the total reward from both short-term and long-term effects, which might conflict with each other. For example, a higher dosage of medication might increase the speed of a patient's recovery (short-term) but could also result in severe long-term side effects. Although recent works have investigated the problems about short-term or long-term effects or the both, how to trade-off between them to achieve optimal treatment remains an open challenge. Moreover, when multiple objectives are directly estimated using conventional causal representation learning, the optimization directions among various tasks can conflict as well. In this paper, we systematically investigate these issues and introduce a Pareto-Efficient algorithm, comprising Pareto-Optimal Estimation (POE) and Pareto-Optimal Policy Learning (POPL), to tackle them. POE incorporates a continuous Pareto module with representation balancing, enhancing estimation efficiency across multiple tasks. As for POPL, it involves deriving short-term and long-term outcomes linked with various treatment levels, facilitating an exploration of the Pareto frontier emanating from these outcomes. Results on both the synthetic and real-world datasets demonstrate the superiority of our method. △ Less

Submitted 12 March, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

arXiv:2402.17257 [pdf, other]

RIME: Robust Preference-based Reinforcement Learning with Noisy Preferences

Authors: Jie Cheng, Gang Xiong, Xingyuan Dai, Qinghai Miao, Yisheng Lv, Fei-Yue Wang

Abstract: Preference-based Reinforcement Learning (PbRL) circumvents the need for reward engineering by harnessing human preferences as the reward signal. However, current PbRL methods excessively depend on high-quality feedback from domain experts, which results in a lack of robustness. In this paper, we present RIME, a robust PbRL algorithm for effective reward learning from noisy preferences. Our method… ▽ More Preference-based Reinforcement Learning (PbRL) circumvents the need for reward engineering by harnessing human preferences as the reward signal. However, current PbRL methods excessively depend on high-quality feedback from domain experts, which results in a lack of robustness. In this paper, we present RIME, a robust PbRL algorithm for effective reward learning from noisy preferences. Our method utilizes a sample selection-based discriminator to dynamically filter out noise and ensure robust training. To counteract the cumulative error stemming from incorrect selection, we suggest a warm start for the reward model, which additionally bridges the performance gap during the transition from pre-training to online training in PbRL. Our experiments on robotic manipulation and locomotion tasks demonstrate that RIME significantly enhances the robustness of the state-of-the-art PbRL method. Code is available at https://github.com/CJReinforce/RIME_ICML2024. △ Less

Submitted 30 May, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

Comments: Accepted by ICML2024

arXiv:2402.07883 [pdf, other]

Equivalence of cost concentration and gradient vanishing for quantum circuits: an elementary proof in the Riemannian formulation

Authors: Qiang Miao, Thomas Barthel

Abstract: The optimization of quantum circuits can be hampered by a decay of average gradient amplitudes with the system size. When the decay is exponential, this is called the barren plateau problem. Considering explicit circuit parametrizations (in terms of rotation angles), it has been shown in Arrasmith et al., Quantum Sci. Technol. 7, 045015 (2022) that barren plateaus are equivalent to an exponential… ▽ More The optimization of quantum circuits can be hampered by a decay of average gradient amplitudes with the system size. When the decay is exponential, this is called the barren plateau problem. Considering explicit circuit parametrizations (in terms of rotation angles), it has been shown in Arrasmith et al., Quantum Sci. Technol. 7, 045015 (2022) that barren plateaus are equivalent to an exponential decay of the cost-function variance. We show that the issue becomes particularly simple in the (parametrization-free) Riemannian formulation of such optimization problems. An elementary derivation shows that the single-gate variance of the cost function is strictly equal to half the variance of the Riemannian single-gate gradient, where we sample variable gates according to the uniform Haar measure. The total variances of the cost function and its gradient are both bounded from above by the sum of single-gate variances and, conversely, bound single-gate variances from above. So, decays of gradients and cost-function variations go hand in hand, and barren plateau problems cannot be resolved by avoiding gradient-based in favor of gradient-free optimization methods. △ Less

Submitted 12 February, 2024; originally announced February 2024.

Comments: 8 pages, 3 figures

arXiv:2402.01422 [pdf, other]

EmoSpeaker: One-shot Fine-grained Emotion-Controlled Talking Face Generation

Authors: Guanwen Feng, Haoran Cheng, Yunan Li, Zhiyuan Ma, Chaoneng Li, Zhihao Qian, Qiguang Miao, Chi-Man Pun

Abstract: Implementing fine-grained emotion control is crucial for emotion generation tasks because it enhances the expressive capability of the generative model, allowing it to accurately and comprehensively capture and express various nuanced emotional states, thereby improving the emotional quality and personalization of generated content. Generating fine-grained facial animations that accurately portray… ▽ More Implementing fine-grained emotion control is crucial for emotion generation tasks because it enhances the expressive capability of the generative model, allowing it to accurately and comprehensively capture and express various nuanced emotional states, thereby improving the emotional quality and personalization of generated content. Generating fine-grained facial animations that accurately portray emotional expressions using only a portrait and an audio recording presents a challenge. In order to address this challenge, we propose a visual attribute-guided audio decoupler. This enables the obtention of content vectors solely related to the audio content, enhancing the stability of subsequent lip movement coefficient predictions. To achieve more precise emotional expression, we introduce a fine-grained emotion coefficient prediction module. Additionally, we propose an emotion intensity control method using a fine-grained emotion matrix. Through these, effective control over emotional expression in the generated videos and finer classification of emotion intensity are accomplished. Subsequently, a series of 3DMM coefficient generation networks are designed to predict 3D coefficients, followed by the utilization of a rendering network to generate the final video. Our experimental results demonstrate that our proposed method, EmoSpeaker, outperforms existing emotional talking face generation methods in terms of expression variation and lip synchronization. Project page: https://peterfanfan.github.io/EmoSpeaker/ △ Less

Submitted 2 February, 2024; originally announced February 2024.

arXiv:2312.06117 [pdf, other]

M3SOT: Multi-frame, Multi-field, Multi-space 3D Single Object Tracking

Authors: Jiaming Liu, Yue Wu, Maoguo Gong, Qiguang Miao, Wen** Ma, Can Qin

Abstract: 3D Single Object Tracking (SOT) stands a forefront task of computer vision, proving essential for applications like autonomous driving. Sparse and occluded data in scene point clouds introduce variations in the appearance of tracked objects, adding complexity to the task. In this research, we unveil M3SOT, a novel 3D SOT framework, which synergizes multiple input frames (template sets), multiple r… ▽ More 3D Single Object Tracking (SOT) stands a forefront task of computer vision, proving essential for applications like autonomous driving. Sparse and occluded data in scene point clouds introduce variations in the appearance of tracked objects, adding complexity to the task. In this research, we unveil M3SOT, a novel 3D SOT framework, which synergizes multiple input frames (template sets), multiple receptive fields (continuous contexts), and multiple solution spaces (distinct tasks) in ONE model. Remarkably, M3SOT pioneers in modeling temporality, contexts, and tasks directly from point clouds, revisiting a perspective on the key factors influencing SOT. To this end, we design a transformer-based network centered on point cloud targets in the search area, aggregating diverse contextual representations and propagating target cues by employing historical frames. As M3SOT spans varied processing perspectives, we've streamlined the network-trimming its depth and optimizing its structure-to ensure a lightweight and efficient deployment for SOT applications. We posit that, backed by practical construction, M3SOT sidesteps the need for complex frameworks and auxiliary components to deliver sterling results. Extensive experiments on benchmarks such as KITTI, nuScenes, and Waymo Open Dataset demonstrate that M3SOT achieves state-of-the-art performance at 38 FPS. Our code and models are available at https://github.com/ywu0912/TeamCode.git. △ Less

Submitted 10 December, 2023; originally announced December 2023.

Comments: 12 pages, 10 figures, 10 tables, AAAI 2024

Journal ref: AAAI 2024

arXiv:2312.06063 [pdf, other]

PCRDiffusion: Diffusion Probabilistic Models for Point Cloud Registration

Authors: Yue Wu, Yongzhe Yuan, Xiaolong Fan, Xiaoshui Huang, Maoguo Gong, Qiguang Miao

Abstract: We propose a new framework that formulates point cloud registration as a denoising diffusion process from noisy transformation to object transformation. During training stage, object transformation diffuses from ground-truth transformation to random distribution, and the model learns to reverse this noising process. In sampling stage, the model refines randomly generated transformation to the outp… ▽ More We propose a new framework that formulates point cloud registration as a denoising diffusion process from noisy transformation to object transformation. During training stage, object transformation diffuses from ground-truth transformation to random distribution, and the model learns to reverse this noising process. In sampling stage, the model refines randomly generated transformation to the output result in a progressive way. We derive the variational bound in closed form for training and provide implementations of the model. Our work provides the following crucial findings: (i) In contrast to most existing methods, our framework, Diffusion Probabilistic Models for Point Cloud Registration (PCRDiffusion) does not require repeatedly update source point cloud to refine the predicted transformation. (ii) Point cloud registration, one of the representative discriminative tasks, can be solved by a generative way and the unified probabilistic formulation. Finally, we discuss and provide an outlook on the application of diffusion model in different scenarios for point cloud registration. Experimental results demonstrate that our model achieves competitive performance in point cloud registration. In correspondence-free and correspondence-based scenarios, PCRDifussion can both achieve exceeding 50\% performance improvements. △ Less

Submitted 10 December, 2023; originally announced December 2023.

arXiv:2311.04942 [pdf, other]

CSAM: A 2.5D Cross-Slice Attention Module for Anisotropic Volumetric Medical Image Segmentation

Authors: Alex Ling Yu Hung, Haoxin Zheng, Kai Zhao, Xiaoxi Du, Kaifeng Pang, Qi Miao, Steven S. Raman, Demetri Terzopoulos, Kyunghyun Sung

Abstract: A large portion of volumetric medical data, especially magnetic resonance imaging (MRI) data, is anisotropic, as the through-plane resolution is typically much lower than the in-plane resolution. Both 3D and purely 2D deep learning-based segmentation methods are deficient in dealing with such volumetric data since the performance of 3D methods suffers when confronting anisotropic data, and 2D meth… ▽ More A large portion of volumetric medical data, especially magnetic resonance imaging (MRI) data, is anisotropic, as the through-plane resolution is typically much lower than the in-plane resolution. Both 3D and purely 2D deep learning-based segmentation methods are deficient in dealing with such volumetric data since the performance of 3D methods suffers when confronting anisotropic data, and 2D methods disregard crucial volumetric information. Insufficient work has been done on 2.5D methods, in which 2D convolution is mainly used in concert with volumetric information. These models focus on learning the relationship across slices, but typically have many parameters to train. We offer a Cross-Slice Attention Module (CSAM) with minimal trainable parameters, which captures information across all the slices in the volume by applying semantic, positional, and slice attention on deep feature maps at different scales. Our extensive experiments using different network architectures and tasks demonstrate the usefulness and generalizability of CSAM. Associated code is available at https://github.com/aL3x-O-o-Hung/CSAM. △ Less

Submitted 26 November, 2023; v1 submitted 7 November, 2023; originally announced November 2023.

arXiv:2310.15533 [pdf, other]

Learning with Noisy Labels Using Collaborative Sample Selection and Contrastive Semi-Supervised Learning

Authors: Qing Miao, Xiaohe Wu, Chao Xu, Yanli Ji, Wangmeng Zuo, Yiwen Guo, Zhaopeng Meng

Abstract: Learning with noisy labels (LNL) has been extensively studied, with existing approaches typically following a framework that alternates between clean sample selection and semi-supervised learning (SSL). However, this approach has a limitation: the clean set selected by the Deep Neural Network (DNN) classifier, trained through self-training, inevitably contains noisy samples. This mixture of clean… ▽ More Learning with noisy labels (LNL) has been extensively studied, with existing approaches typically following a framework that alternates between clean sample selection and semi-supervised learning (SSL). However, this approach has a limitation: the clean set selected by the Deep Neural Network (DNN) classifier, trained through self-training, inevitably contains noisy samples. This mixture of clean and noisy samples leads to misguidance in DNN training during SSL, resulting in impaired generalization performance due to confirmation bias caused by error accumulation in sample selection. To address this issue, we propose a method called Collaborative Sample Selection (CSS), which leverages the large-scale pre-trained model CLIP. CSS aims to remove the mixed noisy samples from the identified clean set. We achieve this by training a 2-Dimensional Gaussian Mixture Model (2D-GMM) that combines the probabilities from CLIP with the predictions from the DNN classifier. To further enhance the adaptation of CLIP to LNL, we introduce a co-training mechanism with a contrastive loss in semi-supervised learning. This allows us to jointly train the prompt of CLIP and the DNN classifier, resulting in improved feature representation, boosted classification performance of DNNs, and reciprocal benefits to our Collaborative Sample Selection. By incorporating auxiliary information from CLIP and utilizing prompt fine-tuning, we effectively eliminate noisy samples from the clean set and mitigate confirmation bias during training. Experimental results on multiple benchmark datasets demonstrate the effectiveness of our proposed method in comparison with the state-of-the-art approaches. △ Less

Submitted 24 October, 2023; originally announced October 2023.

arXiv:2308.15864 [pdf]

(Mis)align: A Simple Dynamic Framework for Modeling Interpersonal Coordination

Authors: Grace Qiyuan Miao, Rick Dale, Alexia Galati

Abstract: As people coordinate in daily interactions, they engage in different patterns of behavior to achieve successful outcomes. This includes both synchrony - the temporal coordination of the same behaviors at the same time - and complementarity - the coordination of the same or different behaviors that may occur at different relative times. Using computational methods, we develop a simple framework to… ▽ More As people coordinate in daily interactions, they engage in different patterns of behavior to achieve successful outcomes. This includes both synchrony - the temporal coordination of the same behaviors at the same time - and complementarity - the coordination of the same or different behaviors that may occur at different relative times. Using computational methods, we develop a simple framework to describe the interpersonal dynamics of behavioral synchrony and complementarity over time, and explore their task dependence. A key feature of this framework is the inclusion of a task context that mediates interactions, and consists of active, inactive, and inhibitory constraints on communication. Initial simulation results show that these task constraints can be a robust predictor of simulated agents' behaviors over time. We also show that the framework can reproduce some general patterns observed in human interaction data. We describe preliminary theoretical implications from these results, and relate them to broader proposals of synergistic self-organization in communication. △ Less

Submitted 30 August, 2023; originally announced August 2023.

Comments: Code and data necessary to reproduce findings in this article can be found at the following GitHub repository: https://github.com/miaoqy0729/sim-syn-sims

arXiv:2308.12831 [pdf, other]

EFormer: Enhanced Transformer towards Semantic-Contour Features of Foreground for Portraits Matting

Authors: Zitao Wang, Qiguang Miao, Peipei Zhao, Yue Xi

Abstract: The portrait matting task aims to extract an alpha matte with complete semantics and finely-detailed contours. In comparison to CNN-based approaches, transformers with self-attention module have a better capacity to capture long-range dependencies and low-frequency semantic information of a portrait. However, the recent research shows that self-attention mechanism struggles with modeling high-freq… ▽ More The portrait matting task aims to extract an alpha matte with complete semantics and finely-detailed contours. In comparison to CNN-based approaches, transformers with self-attention module have a better capacity to capture long-range dependencies and low-frequency semantic information of a portrait. However, the recent research shows that self-attention mechanism struggles with modeling high-frequency contour information and capturing fine contour details, which can lead to bias while predicting the portrait's contours. To deal with this issue, we propose EFormer to enhance the model's attention towards both of the low-frequency semantic and high-frequency contour features. For the high-frequency contours, our research demonstrates that cross-attention module between different resolutions can guide our model to allocate attention appropriately to these contour regions. Supported on this, we can successfully extract the high-frequency detail information around the portrait's contours, which are previously ignored by self-attention. Based on cross-attention module, we further build a semantic and contour detector (SCD) to accurately capture both of the low-frequency semantic and high-frequency contour features. And we design contour-edge extraction branch and semantic extraction branch to extract refined high-frequency contour features and complete low-frequency semantic information, respectively. Finally, we fuse the two kinds of features and leverage segmentation head to generate a predicted portrait matte. Experiments on VideoMatte240K (JPEG SD Format) and Adobe Image Matting (AIM) datasets demonstrate that EFormer outperforms previous portrait matte methods. △ Less

Submitted 30 November, 2023; v1 submitted 24 August, 2023; originally announced August 2023.

Comments: 10 pages, 5 figures

arXiv:2308.09609 [pdf, ps, other]

Global well-posedness and refined regularity criterion for the uni-directional Euler-alignment system

Authors: Yatao Li, Qianyun Miao, Changhui Tan, Liutang Xue

Abstract: We investigate global solutions to the Euler-alignment system in $d$ dimensions with unidirectional flows and strongly singular communication protocols $φ(x) = |x|^{-(d+α)}$ for $α\in (0,2)$. Our paper establishes global regularity results in both the subcritical regime $1<α<2$ and the critical regime $α=1$. Notably, when $α=1$, the system exhibits a critical scaling similar to the critical quasi-… ▽ More We investigate global solutions to the Euler-alignment system in $d$ dimensions with unidirectional flows and strongly singular communication protocols $φ(x) = |x|^{-(d+α)}$ for $α\in (0,2)$. Our paper establishes global regularity results in both the subcritical regime $1<α<2$ and the critical regime $α=1$. Notably, when $α=1$, the system exhibits a critical scaling similar to the critical quasi-geostrophic equation. To achieve global well-posedness, we employ a novel method based on propagating the modulus of continuity. Our approach introduces the concept of simultaneously propagating multiple moduli of continuity, which allows us to effectively handle the system of two equations with critical scaling. Additionally, we improve the regularity criteria for solutions to this system in the supercritical regime $0<α<1$. △ Less

Submitted 18 August, 2023; originally announced August 2023.

Comments: 31 pages

MSC Class: 35Q35; 76N10; 35B65; 35B40

arXiv:2307.14019 [pdf, other]

One-Nearest Neighborhood Guides Inlier Estimation for Unsupervised Point Cloud Registration

Authors: Yongzhe Yuan, Yue Wu, Maoguo Gong, Qiguang Miao, A. K. Qin

Abstract: The precision of unsupervised point cloud registration methods is typically limited by the lack of reliable inlier estimation and self-supervised signal, especially in partially overlap** scenarios. In this paper, we propose an effective inlier estimation method for unsupervised point cloud registration by capturing geometric structure consistency between the source point cloud and its correspon… ▽ More The precision of unsupervised point cloud registration methods is typically limited by the lack of reliable inlier estimation and self-supervised signal, especially in partially overlap** scenarios. In this paper, we propose an effective inlier estimation method for unsupervised point cloud registration by capturing geometric structure consistency between the source point cloud and its corresponding reference point cloud copy. Specifically, to obtain a high quality reference point cloud copy, an One-Nearest Neighborhood (1-NN) point cloud is generated by input point cloud. This facilitates matching map construction and allows for integrating dual neighborhood matching scores of 1-NN point cloud and input point cloud to improve matching confidence. Benefiting from the high quality reference copy, we argue that the neighborhood graph formed by inlier and its neighborhood should have consistency between source point cloud and its corresponding reference copy. Based on this observation, we construct transformation-invariant geometric structure representations and capture geometric structure consistency to score the inlier confidence for estimated correspondences between source point cloud and its reference copy. This strategy can simultaneously provide the reliable self-supervised signal for model optimization. Finally, we further calculate transformation estimation by the weighted SVD algorithm with the estimated correspondences and corresponding inlier confidence. We train the proposed model in an unsupervised manner, and extensive experiments on synthetic and real-world datasets illustrate the effectiveness of the proposed method. △ Less

Submitted 26 July, 2023; originally announced July 2023.

arXiv:2307.04877 [pdf, ps, other]

doi 10.1103/PhysRevResearch.5.043053

Engineering bound states in continuum via nonlinearity induced extra dimension

Authors: Qingtian Miao, Jayakrishnan M. P. Nair, Girish S. Agarwal

Abstract: Bound states in continuum (BICs) are localized states of a system possessing significantly large life times with applications across various branches of science. In this work, we propose an expedient protocol to engineer BICs which involves the use of Kerr nonlinearities in the system. The generation of BICs is a direct artifact of the nonlinearity and the associated expansion in the dimensionalit… ▽ More Bound states in continuum (BICs) are localized states of a system possessing significantly large life times with applications across various branches of science. In this work, we propose an expedient protocol to engineer BICs which involves the use of Kerr nonlinearities in the system. The generation of BICs is a direct artifact of the nonlinearity and the associated expansion in the dimensionality of the system. In particular, we consider single and two mode anharmonic systems and provide a number of solutions apposite for the creation of BICs. In close vicinity to the BIC, the steady state response of the system is immensely sensitive to perturbations in natural frequencies of the system and we illustrate its propitious sensing potential in the context of experimentally realizable setups for both optical and magnetic nonlinearities. △ Less

Submitted 10 July, 2023; originally announced July 2023.

Comments: 7 pages, 4 figures

Journal ref: Physical Review RESEARCH 5, 043053 (2023)

arXiv:2304.14320 [pdf, other]

Isometric tensor network optimization for extensive Hamiltonians is free of barren plateaus

Authors: Qiang Miao, Thomas Barthel

Abstract: We explain why and numerically confirm that there are no barren plateaus in the energy optimization of isometric tensor network states (TNS) for extensive Hamiltonians with finite-range interactions which are, for example, typical in condensed matter physics. Specifically, we consider matrix product states (MPS) with open boundary conditions, tree tensor network states (TTNS), and the multiscale e… ▽ More We explain why and numerically confirm that there are no barren plateaus in the energy optimization of isometric tensor network states (TNS) for extensive Hamiltonians with finite-range interactions which are, for example, typical in condensed matter physics. Specifically, we consider matrix product states (MPS) with open boundary conditions, tree tensor network states (TTNS), and the multiscale entanglement renormalization ansatz (MERA). MERA are isometric by construction and, for the MPS and TTNS, the tensor network gauge freedom allows us to choose all tensors as partial isometries. The variance of the energy gradient, evaluated by taking the Haar average over the TNS tensors, has a leading system-size independent term and decreases according to a power law in the bond dimension. For a hierarchical TNS (TTNS and MERA) with branching ratio $b$, the variance of the gradient with respect to a tensor in layer $τ$ scales as $(bη)^τ$, where $η$ is the second largest eigenvalue of a Haar-average doubled layer-transition channel and decreases algebraically with increasing bond dimension. The absence of barren plateaus substantiates that isometric TNS are a promising route for an efficient quantum-computation-based investigation of strongly-correlated quantum matter. The observed scaling properties of the gradient amplitudes bear implications for efficient TNS initialization procedures. △ Less

Submitted 11 March, 2024; v1 submitted 27 April, 2023; originally announced April 2023.

Comments: 7 pages, 5 figures; improved and extended discussion; added analysis showing that power-law decay observed in [Liu et al., arXiv:1902.02663] for MPS is a finite-size effect

arXiv:2304.00680 [pdf, other]

Polaritonic Ultrastrong Coupling: Quantum Entanglement in Ground State

Authors: Qingtian Miao, G. S. Agarwal

Abstract: The ultrastrong coupling between the elementary excitations of matter and microcavity modes is studied in a fully analytical quantum-mechanical theoretical framework. The elementary excitation could be phonons, excitons, plasmons, etc. From the diagonalization of the Hamiltonian, we obtain the ground state of the polariton Hamiltonian. The ground state belongs to the Gaussian class. Using the Gaus… ▽ More The ultrastrong coupling between the elementary excitations of matter and microcavity modes is studied in a fully analytical quantum-mechanical theoretical framework. The elementary excitation could be phonons, excitons, plasmons, etc. From the diagonalization of the Hamiltonian, we obtain the ground state of the polariton Hamiltonian. The ground state belongs to the Gaussian class. Using the Gaussian property we calculate the quantum entanglement in the ground state. We use two different measures for quantum entanglement -- entanglement entropy and the logarithmic negativity parameter and obtain rather simple analytical expressions for the entanglement measures. Our findings show that the amount of quantum entanglement in the ground state is quite significant in the ultrastrong coupling regime. It can be obtained from the measurement of the polariton frequencies. △ Less

Submitted 2 April, 2023; originally announced April 2023.

arXiv:2304.00161 [pdf, other]

Absence of barren plateaus and scaling of gradients in the energy optimization of isometric tensor network states

Authors: Thomas Barthel, Qiang Miao

Abstract: Vanishing gradients can pose substantial obstacles for high-dimensional optimization problems. Here we consider energy minimization problems for quantum many-body systems with extensive Hamiltonians, which can be studied on classical computers or in the form of variational quantum eigensolvers on quantum computers. Barren plateaus correspond to scenarios where the average amplitude of the energy g… ▽ More Vanishing gradients can pose substantial obstacles for high-dimensional optimization problems. Here we consider energy minimization problems for quantum many-body systems with extensive Hamiltonians, which can be studied on classical computers or in the form of variational quantum eigensolvers on quantum computers. Barren plateaus correspond to scenarios where the average amplitude of the energy gradient decreases exponentially with increasing system size. This occurs, for example, for quantum neural networks and for brickwall quantum circuits when the depth increases polynomially in the system size. Here we prove that the variational optimization problems for matrix product states, tree tensor networks, and the multiscale entanglement renormalization ansatz are free of barren plateaus. The derived scaling properties for the gradient variance provide an analytical guarantee for the trainability of randomly initialized tensor network states (TNS) and motivate certain initialization schemes. In a suitable representation, unitary tensors that parametrize the TNS are sampled according to the uniform Haar measure. We employ a Riemannian formulation of the gradient based optimizations which simplifies the analytical evaluation. △ Less

Submitted 19 May, 2023; v1 submitted 31 March, 2023; originally announced April 2023.

Comments: 29 pages main text, 11 pages appendix, 11 figures; added 6 figures concerning MERA and TTNS, added analysis for nonary 2D MERA and TTNS, additional references, further minor improvements

arXiv:2303.08910 [pdf, other]

Convergence and Quantum Advantage of Trotterized MERA for Strongly-Correlated Systems

Authors: Qiang Miao, Thomas Barthel

Abstract: Strongly-correlated quantum many-body systems are difficult to study and simulate classically. Our recent work [arXiv:2108.13401] proposed a variational quantum eigensolver (VQE) based on the multiscale entanglement renormalization ansatz (MERA) with tensors constrained to certain Trotter circuits. Here, we extend the theoretical analysis, testing different initialization and convergence schemes,… ▽ More Strongly-correlated quantum many-body systems are difficult to study and simulate classically. Our recent work [arXiv:2108.13401] proposed a variational quantum eigensolver (VQE) based on the multiscale entanglement renormalization ansatz (MERA) with tensors constrained to certain Trotter circuits. Here, we extend the theoretical analysis, testing different initialization and convergence schemes, determining the scaling of computation costs for various critical spin models, and establishing a quantum advantage. For the Trotter circuits being composed of single-qubit and two-qubit rotations, it is experimentally advantageous to have small rotation angles. We find that the average angle amplitude can be reduced substantially with negligible effect on the energy accuracy. Benchmark simulations show that choosing TMERA tensors as brick-wall circuits or parallel random-pair circuits yields very similar energy accuracies. △ Less

Submitted 15 March, 2023; originally announced March 2023.

Comments: 7 pages, 6 figures

arXiv:2302.12434 [pdf, other]

Catch You and I Can: Revealing Source Voiceprint Against Voice Conversion

Authors: Jiangyi Deng, Yanjiao Chen, Yinan Zhong, Qianhao Miao, Xueluan Gong, Wenyuan Xu

Abstract: Voice conversion (VC) techniques can be abused by malicious parties to transform their audios to sound like a target speaker, making it hard for a human being or a speaker verification/identification system to trace the source speaker. In this paper, we make the first attempt to restore the source voiceprint from audios synthesized by voice conversion methods with high credit. However, unveiling t… ▽ More Voice conversion (VC) techniques can be abused by malicious parties to transform their audios to sound like a target speaker, making it hard for a human being or a speaker verification/identification system to trace the source speaker. In this paper, we make the first attempt to restore the source voiceprint from audios synthesized by voice conversion methods with high credit. However, unveiling the features of the source speaker from a converted audio is challenging since the voice conversion operation intends to disentangle the original features and infuse the features of the target speaker. To fulfill our goal, we develop Revelio, a representation learning model, which learns to effectively extract the voiceprint of the source speaker from converted audio samples. We equip Revelio with a carefully-designed differential rectification algorithm to eliminate the influence of the target speaker by removing the representation component that is parallel to the voiceprint of the target speaker. We have conducted extensive experiments to evaluate the capability of Revelio in restoring voiceprint from audios converted by VQVC, VQVC+, AGAIN, and BNE. The experiments verify that Revelio is able to rebuild voiceprints that can be traced to the source speaker by speaker verification and identification systems. Revelio also exhibits robust performance under inter-gender conversion, unseen languages, and telephony networks. △ Less

Submitted 23 February, 2023; originally announced February 2023.

Comments: Accepted by USENIX Security Symposium 2023. Please cite this paper as "Jiangyi Deng, Yanjiao Chen, Yinan Zhong, Qianhao Miao, Xueluan Gong, Wenyuan Xu. Catch You and I Can: Revealing Source Voiceprint Against Voice Conversion. In 32nd USENIX Security Symposium (USENIX Security 23)."

arXiv:2212.05679 [pdf, other]

Evolutionary Multitasking with Solution Space Cutting for Point Cloud Registration

Authors: Wu Yue, Peiran Gong, Maoguo Gong, Hangqi Ding, Zedong Tang, Yibo Liu, Wen** Ma, Qiguang Miao

Abstract: Point cloud registration (PCR) is a popular research topic in computer vision. Recently, the registration method in an evolutionary way has received continuous attention because of its robustness to the initial pose and flexibility in objective function design. However, most evolving registration methods cannot tackle the local optimum well and they have rarely investigated the success ratio, whic… ▽ More Point cloud registration (PCR) is a popular research topic in computer vision. Recently, the registration method in an evolutionary way has received continuous attention because of its robustness to the initial pose and flexibility in objective function design. However, most evolving registration methods cannot tackle the local optimum well and they have rarely investigated the success ratio, which implies the probability of not falling into local optima and is closely related to the practicality of the algorithm. Evolutionary multi-task optimization (EMTO) is a widely used paradigm, which can boost exploration capability through knowledge transfer among related tasks. Inspired by this concept, this study proposes a novel evolving registration algorithm via EMTO, where the multi-task configuration is based on the idea of solution space cutting. Concretely, one task searching in cut space assists another task with complex function landscape in esca** from local optima and enhancing successful registration ratio. To reduce unnecessary computational cost, a sparse-to-dense strategy is proposed. In addition, a novel fitness function robust to various overlap rates as well as a problem-specific metric of computational cost is introduced. Compared with 8 evolving approaches, 4 traditional approaches and 3 deep learning approaches on the object-scale and scene-scale registration datasets, experimental results demonstrate that the proposed method has superior performances in terms of precision and tackling local optima. △ Less

Submitted 14 June, 2023; v1 submitted 11 December, 2022; originally announced December 2022.

arXiv:2211.11062 [pdf, other]

Patch-level Gaze Distribution Prediction for Gaze Following

Authors: Qiaomu Miao, Minh Hoai, Dimitris Samaras

Abstract: Gaze following aims to predict where a person is looking in a scene, by predicting the target location, or indicating that the target is located outside the image. Recent works detect the gaze target by training a heatmap regression task with a pixel-wise mean-square error (MSE) loss, while formulating the in/out prediction task as a binary classification task. This training formulation puts a str… ▽ More Gaze following aims to predict where a person is looking in a scene, by predicting the target location, or indicating that the target is located outside the image. Recent works detect the gaze target by training a heatmap regression task with a pixel-wise mean-square error (MSE) loss, while formulating the in/out prediction task as a binary classification task. This training formulation puts a strict, pixel-level constraint in higher resolution on the single annotation available in training, and does not consider annotation variance and the correlation between the two subtasks. To address these issues, we introduce the patch distribution prediction (PDP) method. We replace the in/out prediction branch in previous models with the PDP branch, by predicting a patch-level gaze distribution that also considers the outside cases. Experiments show that our model regularizes the MSE loss by predicting better heatmap distributions on images with larger annotation variances, meanwhile bridging the gap between the target prediction and in/out prediction subtasks, showing a significant improvement in performance on both subtasks on public gaze following datasets. △ Less

Submitted 20 November, 2022; originally announced November 2022.

Comments: Accepted to WACV 2023

arXiv:2211.00277 [pdf]

HFN: Heterogeneous Feature Network for Multivariate Time Series Anomaly Detection

Authors: Jun Zhan, Chengkun Wu, Canqun Yang, Qiucheng Miao, Xiandong Ma

Abstract: Network or physical attacks on industrial equipment or computer systems may cause massive losses. Therefore, a quick and accurate anomaly detection (AD) based on monitoring data, especially the multivariate time-series (MTS) data, is of great significance. As the key step of anomaly detection for MTS data, learning the relations among different variables has been explored by many approaches. Howev… ▽ More Network or physical attacks on industrial equipment or computer systems may cause massive losses. Therefore, a quick and accurate anomaly detection (AD) based on monitoring data, especially the multivariate time-series (MTS) data, is of great significance. As the key step of anomaly detection for MTS data, learning the relations among different variables has been explored by many approaches. However, most of the existing approaches do not consider the heterogeneity between variables, that is, different types of variables (continuous numerical variables, discrete categorical variables or hybrid variables) may have different and distinctive edge distributions. In this paper, we propose a novel semi-supervised anomaly detection framework based on a heterogeneous feature network (HFN) for MTS, learning heterogeneous structure information from a mass of unlabeled time-series data to improve the accuracy of anomaly detection, and using attention coefficient to provide an explanation for the detected anomalies. Specifically, we first combine the embedding similarity subgraph generated by sensor embedding and feature value similarity subgraph generated by sensor values to construct a time-series heterogeneous graph, which fully utilizes the rich heterogeneous mutual information among variables. Then, a prediction model containing nodes and channel attentions is jointly optimized to obtain better time-series representations. This approach fuses the state-of-the-art technologies of heterogeneous graph structure learning (HGSL) and representation learning. The experiments on four sensor datasets from real-world applications demonstrate that our approach detects the anomalies more accurately than those baseline approaches, thus providing a basis for the rapid positioning of anomalies. △ Less

Submitted 1 November, 2022; v1 submitted 1 November, 2022; originally announced November 2022.

arXiv:2210.02655 [pdf, other]

Domain Generalization via Contrastive Causal Learning

Authors: Qiaowei Miao, Junkun Yuan, Kun Kuang

Abstract: Domain Generalization (DG) aims to learn a model that can generalize well to unseen target domains from a set of source domains. With the idea of invariant causal mechanism, a lot of efforts have been put into learning robust causal effects which are determined by the object yet insensitive to the domain changes. Despite the invariance of causal effects, they are difficult to be quantified and opt… ▽ More Domain Generalization (DG) aims to learn a model that can generalize well to unseen target domains from a set of source domains. With the idea of invariant causal mechanism, a lot of efforts have been put into learning robust causal effects which are determined by the object yet insensitive to the domain changes. Despite the invariance of causal effects, they are difficult to be quantified and optimized. Inspired by the ability that humans adapt to new environments by prior knowledge, We develop a novel Contrastive Causal Model (CCM) to transfer unseen images to taught knowledge which are the features of seen images, and quantify the causal effects based on taught knowledge. Considering the transfer is affected by domain shifts in DG, we propose a more inclusive causal graph to describe DG task. Based on this causal graph, CCM controls the domain factor to cut off excess causal paths and uses the remaining part to calculate the causal effects of images to labels via the front-door criterion. Specifically, CCM is composed of three components: (i) domain-conditioned supervised learning which teaches CCM the correlation between images and labels, (ii) causal effect learning which helps CCM measure the true causal effects of images to labels, (iii) contrastive similarity learning which clusters the features of images that belong to the same class and provides the quantification of similarity. Finally, we test the performance of CCM on multiple datasets including PACS, OfficeHome, and TerraIncognita. The extensive experiments demonstrate that CCM surpasses the previous DG methods with clear margins. △ Less

Submitted 5 October, 2022; originally announced October 2022.

arXiv:2208.09240 [pdf, other]

An Unsupervised Short- and Long-Term Mask Representation for Multivariate Time Series Anomaly Detection

Authors: Qiucheng Miao, Chuanfu Xu, Jun Zhan, Dong Zhu, Chengkun Wu

Abstract: Anomaly detection of multivariate time series is meaningful for system behavior monitoring. This paper proposes an anomaly detection method based on unsupervised Short- and Long-term Mask Representation learning (SLMR). The main idea is to extract short-term local dependency patterns and long-term global trend patterns of the multivariate time series by using multi-scale residual dilated convoluti… ▽ More Anomaly detection of multivariate time series is meaningful for system behavior monitoring. This paper proposes an anomaly detection method based on unsupervised Short- and Long-term Mask Representation learning (SLMR). The main idea is to extract short-term local dependency patterns and long-term global trend patterns of the multivariate time series by using multi-scale residual dilated convolution and Gated Recurrent Unit(GRU) respectively. Furthermore, our approach can comprehend temporal contexts and feature correlations by combining spatial-temporal masked self-supervised representation learning and sequence split. It considers the importance of features is different, and we introduce the attention mechanism to adjust the contribution of each feature. Finally, a forecasting-based model and a reconstruction-based model are integrated to focus on single timestamp prediction and latent representation of time series. Experiments show that the performance of our method outperforms other state-of-the-art models on three real-world datasets. Further analysis shows that our method is good at interpretability. △ Less

Submitted 19 August, 2022; originally announced August 2022.

arXiv:2208.03561 [pdf]

Study of detecting behavioral signatures within DeepFake videos

Authors: Qiaomu Miao, Sinhwa Kang, Stacy Marsella, Steve DiPaola, Chao Wang, Ari Shapiro

Abstract: There is strong interest in the generation of synthetic video imagery of people talking for various purposes, including entertainment, communication, training, and advertisement. With the development of deep fake generation models, synthetic video imagery will soon be visually indistinguishable to the naked eye from a naturally capture video. In addition, many methods are continuing to improve to… ▽ More There is strong interest in the generation of synthetic video imagery of people talking for various purposes, including entertainment, communication, training, and advertisement. With the development of deep fake generation models, synthetic video imagery will soon be visually indistinguishable to the naked eye from a naturally capture video. In addition, many methods are continuing to improve to avoid more careful, forensic visual analysis. Some deep fake videos are produced through the use of facial puppetry, which directly controls the head and face of the synthetic image through the movements of the actor, allow the actor to 'puppet' the image of another. In this paper, we address the question of whether one person's movements can be distinguished from the original speaker by controlling the visual appearance of the speaker but transferring the behavior signals from another source. We conduct a study by comparing synthetic imagery that: 1) originates from a different person speaking a different utterance, 2) originates from the same person speaking a different utterance, and 3) originates from a different person speaking the same utterance. Our study shows that synthetic videos in all three cases are seen as less real and less engaging than the original source video. Our results indicate that there could be a behavioral signature that is detectable from a person's movements that is separate from their visual appearance, and that this behavioral signature could be used to distinguish a deep fake from a properly captured video. △ Less

Submitted 6 August, 2022; originally announced August 2022.

Comments: 9 pages

arXiv:2207.06143 [pdf, ps, other]

A global second order Sobolev regularity for $p$-Laplacian type equations with variable coefficients in bounded domains

Authors: Qianyun Miao, Fa Peng, Yuan Zhou

Abstract: Let $Ω\subset R^n$ be a bounded convex domain with $n\ge2$. Suppose that $A$ is uniformly elliptic and belongs to $W^{1,n}$ when $n\ge 3$ or $W^{1,q}$ for some $q>2$ when $n=2$. For $1<p<\infty$, we build up a global second order regularity estimate $$\|D[|Du|^{p-2} Du]\|_{L^2(Ω)}+\|D[ |\sqrt{A}Du|^{p-2} A Du]\|_{L^2(Ω)} \le C \|f\|_{L^2(Ω)} $$ for inhomogeneous $p$-Laplace type equation \begin{eq… ▽ More Let $Ω\subset R^n$ be a bounded convex domain with $n\ge2$. Suppose that $A$ is uniformly elliptic and belongs to $W^{1,n}$ when $n\ge 3$ or $W^{1,q}$ for some $q>2$ when $n=2$. For $1<p<\infty$, we build up a global second order regularity estimate $$\|D[|Du|^{p-2} Du]\|_{L^2(Ω)}+\|D[ |\sqrt{A}Du|^{p-2} A Du]\|_{L^2(Ω)} \le C \|f\|_{L^2(Ω)} $$ for inhomogeneous $p$-Laplace type equation \begin{equation} -\mathrm{div}\big(\langle A Du,Du\rangle ^{\frac{p-2}2} A Du\big)=f \quad\rm{in }\ Ω\mbox{ with Dirichlet/Neumann $0$-boundary.} \end{equation} Similar result was also built up for certain bounded Lipschitz domain whose boundary is weakly second order differentiable and satisfies some smallness assumptions. △ Less

Submitted 13 July, 2022; originally announced July 2022.

arXiv:2207.02429 [pdf, ps, other]

Global well-posedness and asymptotic behavior in critical spaces for the compressible Euler system with velocity alignment

Authors: Xiang Bai, Qianyun Miao, Changhui Tan, Liutang Xue

Abstract: In this paper, we study the Cauchy problem of the compressible Euler system with strongly singular velocity alignment. We prove the existence and uniqueness of global solutions in critical Besov spaces to the considered system with small initial data. The local-in-time solvability is also addressed. Moreover, we show the large-time asymptotic behavior and optimal decay estimates of the solutions a… ▽ More In this paper, we study the Cauchy problem of the compressible Euler system with strongly singular velocity alignment. We prove the existence and uniqueness of global solutions in critical Besov spaces to the considered system with small initial data. The local-in-time solvability is also addressed. Moreover, we show the large-time asymptotic behavior and optimal decay estimates of the solutions as $t\to \infty$. △ Less

Submitted 6 July, 2022; originally announced July 2022.

Comments: 39 pages

MSC Class: 35Q31; 35R11; 76N10; 35B40

arXiv:2205.12662 [pdf, other]

DFM: Dialogue Foundation Model for Universal Large-Scale Dialogue-Oriented Task Learning

Authors: Zhi Chen, Jijia Bao, Lu Chen, Yuncong Liu, Da Ma, Bei Chen, Mengyue Wu, Su Zhu, Xin Dong, Fujiang Ge, Qingliang Miao, Jian-Guang Lou, Kai Yu

Abstract: Building a universal conversational agent has been a long-standing goal of the dialogue research community. Most previous works only focus on a small set of dialogue tasks. In this work, we aim to build a unified dialogue foundation model (DFM) which can be used to solve massive diverse dialogue tasks. To achieve this goal, a large-scale well-annotated dialogue dataset with rich task diversity (Di… ▽ More Building a universal conversational agent has been a long-standing goal of the dialogue research community. Most previous works only focus on a small set of dialogue tasks. In this work, we aim to build a unified dialogue foundation model (DFM) which can be used to solve massive diverse dialogue tasks. To achieve this goal, a large-scale well-annotated dialogue dataset with rich task diversity (DialogZoo) is collected. We introduce a framework to unify all dialogue tasks and propose novel auxiliary self-supervised tasks to achieve stable training of DFM on the highly diverse large scale DialogZoo corpus. Experiments show that, compared with models of the same size, DFM can achieve state-of-the-art or competitive performance on very rich cross-domain downstream dialogue tasks. This demonstrates that DFM largely extends the ability of unified dialogue pre-trained model. △ Less

Submitted 9 October, 2022; v1 submitted 25 May, 2022; originally announced May 2022.

Comments: Work in Progress

arXiv:2205.02996 [pdf, other]

Multi-view Point Cloud Registration based on Evolutionary Multitasking with Bi-Channel Knowledge Sharing Mechanism

Authors: Yue Wu, Yibo Liu, Maoguo Gong, Peiran Gong, Hao Li, Zedong Tang, Qiguang Miao, Wen** Ma

Abstract: Multi-view point cloud registration is fundamental in 3D reconstruction. Since there are close connections between point clouds captured from different viewpoints, registration performance can be enhanced if these connections be harnessed properly. Therefore, this paper models the registration problem as multi-task optimization, and proposes a novel bi-channel knowledge sharing mechanism for effec… ▽ More Multi-view point cloud registration is fundamental in 3D reconstruction. Since there are close connections between point clouds captured from different viewpoints, registration performance can be enhanced if these connections be harnessed properly. Therefore, this paper models the registration problem as multi-task optimization, and proposes a novel bi-channel knowledge sharing mechanism for effective and efficient problem solving. The modeling of multi-view point cloud registration as multi-task optimization are twofold. By simultaneously considering the local accuracy of two point clouds as well as the global consistency posed by all the point clouds involved, a fitness function with an adaptive threshold is derived. Also a framework of the co-evolutionary search process is defined for the concurrent optimization of multiple fitness functions belonging to related tasks. To enhance solution quality and convergence speed, the proposed bi-channel knowledge sharing mechanism plays its role. The intra-task knowledge sharing introduces aiding tasks that are much simpler to solve, and useful information is shared across aiding tasks and the original tasks, accelerating the search process. The inter-task knowledge sharing explores commonalities buried among the original tasks, aiming to prevent tasks from getting stuck to local optima. Comprehensive experiments conducted on model object as well as scene point clouds show the efficacy of the proposed method. △ Less

Submitted 23 August, 2022; v1 submitted 5 May, 2022; originally announced May 2022.

arXiv:2203.15163 [pdf, other]

CAT-Net: A Cross-Slice Attention Transformer Model for Prostate Zonal Segmentation in MRI

Authors: Alex Ling Yu Hung, Haoxin Zheng, Qi Miao, Steven S. Raman, Demetri Terzopoulos, Kyunghyun Sung

Abstract: Prostate cancer is the second leading cause of cancer death among men in the United States. The diagnosis of prostate MRI often relies on the accurate prostate zonal segmentation. However, state-of-the-art automatic segmentation methods often fail to produce well-contained volumetric segmentation of the prostate zones since certain slices of prostate MRI, such as base and apex slices, are harder t… ▽ More Prostate cancer is the second leading cause of cancer death among men in the United States. The diagnosis of prostate MRI often relies on the accurate prostate zonal segmentation. However, state-of-the-art automatic segmentation methods often fail to produce well-contained volumetric segmentation of the prostate zones since certain slices of prostate MRI, such as base and apex slices, are harder to segment than other slices. This difficulty can be overcome by accounting for the cross-slice relationship of adjacent slices, but current methods do not fully learn and exploit such relationships. In this paper, we propose a novel cross-slice attention mechanism, which we use in a Transformer module to systematically learn the cross-slice relationship at different scales. The module can be utilized in any existing learning-based segmentation framework with skip connections. Experiments show that our cross-slice attention is able to capture the cross-slice information in prostate zonal segmentation and improve the performance of current state-of-the-art methods. Our method improves segmentation accuracy in the peripheral zone, such that the segmentation results are consistent across all the prostate slices (apex, mid-gland, and base). △ Less

Submitted 16 June, 2022; v1 submitted 28 March, 2022; originally announced March 2022.

arXiv:2201.00443 [pdf, other]

Scene Graph Generation: A Comprehensive Survey

Authors: Guangming Zhu, Liang Zhang, Youliang Jiang, Yixuan Dang, Haoran Hou, Peiyi Shen, Mingtao Feng, Xia Zhao, Qiguang Miao, Syed Afaq Ali Shah, Mohammed Bennamoun

Abstract: Deep learning techniques have led to remarkable breakthroughs in the field of generic object detection and have spawned a lot of scene-understanding tasks in recent years. Scene graph has been the focus of research because of its powerful semantic representation and applications to scene understanding. Scene Graph Generation (SGG) refers to the task of automatically map** an image into a semanti… ▽ More Deep learning techniques have led to remarkable breakthroughs in the field of generic object detection and have spawned a lot of scene-understanding tasks in recent years. Scene graph has been the focus of research because of its powerful semantic representation and applications to scene understanding. Scene Graph Generation (SGG) refers to the task of automatically map** an image into a semantic structural scene graph, which requires the correct labeling of detected objects and their relationships. Although this is a challenging task, the community has proposed a lot of SGG approaches and achieved good results. In this paper, we provide a comprehensive survey of recent achievements in this field brought about by deep learning techniques. We review 138 representative works that cover different input modalities, and systematically summarize existing methods of image-based SGG from the perspective of feature extraction and fusion. We attempt to connect and systematize the existing visual relationship detection methods, to summarize, and interpret the mechanisms and the strategies of SGG in a comprehensive way. Finally, we finish this survey with deep discussions about current existing problems and future research directions. This survey will help readers to develop a better understanding of the current research status and ideas. △ Less

Submitted 22 June, 2022; v1 submitted 2 January, 2022; originally announced January 2022.

Comments: Submitted to TPAMI

arXiv:2112.04999 [pdf, other]

doi 10.1007/978-3-030-88480-2_40

Few-Shot NLU with Vector Projection Distance and Abstract Triangular CRF

Authors: Su Zhu, Lu Chen, Ruisheng Cao, Zhi Chen, Qingliang Miao, Kai Yu

Abstract: Data sparsity problem is a key challenge of Natural Language Understanding (NLU), especially for a new target domain. By training an NLU model in source domains and applying the model to an arbitrary target domain directly (even without fine-tuning), few-shot NLU becomes crucial to mitigate the data scarcity issue. In this paper, we propose to improve prototypical networks with vector projection d… ▽ More Data sparsity problem is a key challenge of Natural Language Understanding (NLU), especially for a new target domain. By training an NLU model in source domains and applying the model to an arbitrary target domain directly (even without fine-tuning), few-shot NLU becomes crucial to mitigate the data scarcity issue. In this paper, we propose to improve prototypical networks with vector projection distance and abstract triangular Conditional Random Field (CRF) for the few-shot NLU. The vector projection distance exploits projections of contextual word embeddings on label vectors as word-label similarities, which is equivalent to a normalized linear model. The abstract triangular CRF learns domain-agnostic label transitions for joint intent classification and slot filling tasks. Extensive experiments demonstrate that our proposed methods can significantly surpass strong baselines. Specifically, our approach can achieve a new state-of-the-art on two few-shot NLU benchmarks (Few-Joint and SNIPS) in Chinese and English without fine-tuning on target domains. △ Less

Submitted 9 December, 2021; originally announced December 2021.

Comments: Accepted by NLPCC 2021

arXiv:2111.12235 [pdf, ps, other]

Global well-posedness for 2D fractional inhomogeneous Navier-Stokes equations with rough density

Authors: Yatao Li, Qianyun Miao, Liutang Xue

Abstract: The paper concerns with the global well-posedness issue of the 2D incompressible inhomogeneous Navier-Stokes (INS) equations with fractional dissipation and rough density. We first establish the $L^q_t(L^p)$-maximal regularity estimate for the generalized Stokes system with fractional dissipation, and then we employ it to obtain the global existence of solution for the 2D fractional INS equations… ▽ More The paper concerns with the global well-posedness issue of the 2D incompressible inhomogeneous Navier-Stokes (INS) equations with fractional dissipation and rough density. We first establish the $L^q_t(L^p)$-maximal regularity estimate for the generalized Stokes system with fractional dissipation, and then we employ it to obtain the global existence of solution for the 2D fractional INS equations with large velocity field, provided that the $L^2\cap L^\infty$-norm of density minus constant 1 is small enough. Moreover, by additionally assuming that the density minus 1 is sufficiently small in the norm of some multiplier spaces, we prove the uniqueness of the constructed solution by using the Lagrangian coordinates approach. We also consider the density patch problem for the 2D fractional INS equations, and show the global persistence of $C^{1,γ}$-regularity of the density patch boundary when the piecewise jump of density is small enough. △ Less

Submitted 23 November, 2021; originally announced November 2021.

Comments: 43 pages

MSC Class: 35Q30; 76D05; 35B40

arXiv:2110.06442 [pdf, ps, other]

Global regularity of non-diffusive temperature fronts for the 2D viscous Boussinesq system

Authors: Dongho Chae, Qianyun Miao, Liutang Xue

Abstract: In this paper we address the temperature patch problem of the 2D viscous Boussinesq system without heat diffusion term. The temperature satisfies the transport equation and the initial data of temperature is given in the form of non-constant patch, usually called the temperature front initial data. Introducing a good unknown and applying the method of striated estimates, we prove that our partiall… ▽ More In this paper we address the temperature patch problem of the 2D viscous Boussinesq system without heat diffusion term. The temperature satisfies the transport equation and the initial data of temperature is given in the form of non-constant patch, usually called the temperature front initial data. Introducing a good unknown and applying the method of striated estimates, we prove that our partially viscous Boussinesq system admits a unique global regular solution and the initial $C^{k,γ}$ and $W^{2,\infty}$ regularity of the temperature front boundary with $k\in \mathbb{Z}^+ = \{1,2,\cdots\}$ and $γ\in (0,1)$ will be preserved for all the time. In particular, this naturally extends the previous work by Danchin $\&$ Zhang (2017) and Gancedo $\&$ García-Juárez (2017). In the proof of the persistence result of higher boundary regularity, we introduce the striated type Besov space $\mathcal{B}^{s,\ell}_{p,r,W}(\mathbb{R}^d)$ and establish a series of refined striated estimates in such a function space, which may have its own interest. △ Less

Submitted 28 October, 2021; v1 submitted 12 October, 2021; originally announced October 2021.

Comments: 50 pages. The striated estimates are simplified

MSC Class: 76D03; 35Q35; 35Q86

arXiv:2108.13401 [pdf, other]

doi 10.1103/PhysRevResearch.5.033141

Quantum-classical eigensolver using multiscale entanglement renormalization

Authors: Qiang Miao, Thomas Barthel

Abstract: We propose a variational quantum eigensolver (VQE) for the simulation of strongly-correlated quantum matter based on a multi-scale entanglement renormalization ansatz (MERA) and gradient-based optimization. This MERA quantum eigensolver can have substantially lower computation costs than corresponding classical algorithms. Due to its narrow causal cone, the algorithm can be implemented on noisy in… ▽ More We propose a variational quantum eigensolver (VQE) for the simulation of strongly-correlated quantum matter based on a multi-scale entanglement renormalization ansatz (MERA) and gradient-based optimization. This MERA quantum eigensolver can have substantially lower computation costs than corresponding classical algorithms. Due to its narrow causal cone, the algorithm can be implemented on noisy intermediate-scale quantum (NISQ) devices and still describe large systems. It is particularly attractive for ion-trap devices with ion-shuttling capabilities. The number of required qubits is system-size independent, and increases only to a logarithmic scaling when using quantum amplitude estimation to speed up gradient evaluations. Translation invariance can be used to make computation costs square-logarithmic in the system size and describe the thermodynamic limit. We demonstrate the approach numerically for a MERA with Trotterized disentanglers and isometries. With a few Trotter steps, one recovers the accuracy of the full MERA. △ Less

Submitted 31 August, 2023; v1 submitted 30 August, 2021; originally announced August 2021.

Comments: 14 pages, 9 figures; additional discussions of the computational complexity, layer-transition maps for homogeneous MERA, mid-circuit qubit resets, and data on the quantum advantage; further minor improvements; published version

Journal ref: Phys. Rev. Res. 5, 033141 (2023)

arXiv:2104.09079 [pdf, other]

doi 10.1016/j.ymssp.2021.108616

A novel time-frequency Transformer based on self-attention mechanism and its application in fault diagnosis of rolling bearings

Authors: Yifei Ding, Min** Jia, Qiuhua Miao, Yudong Cao

Abstract: The scope of data-driven fault diagnosis models is greatly extended through deep learning (DL). However, the classical convolution and recurrent structure have their defects in computational efficiency and feature representation, while the latest Transformer architecture based on attention mechanism has not yet been applied in this field. To solve these problems, we propose a novel time-frequency… ▽ More The scope of data-driven fault diagnosis models is greatly extended through deep learning (DL). However, the classical convolution and recurrent structure have their defects in computational efficiency and feature representation, while the latest Transformer architecture based on attention mechanism has not yet been applied in this field. To solve these problems, we propose a novel time-frequency Transformer (TFT) model inspired by the massive success of vanilla Transformer in sequence processing. Specially, we design a fresh tokenizer and encoder module to extract effective abstractions from the time-frequency representation (TFR) of vibration signals. On this basis, a new end-to-end fault diagnosis framework based on time-frequency Transformer is presented in this paper. Through the case studies on bearing experimental datasets, we construct the optimal Transformer structure and verify its fault diagnosis performance. The superiority of the proposed method is demonstrated in comparison with the benchmark models and other state-of-the-art methods. △ Less

Submitted 4 December, 2021; v1 submitted 19 April, 2021; originally announced April 2021.

Journal ref: Mech. Syst. Signal Process., vol. 168, p. 108616, Apr. 2022

arXiv:2010.07265 [pdf, other]

doi 10.22331/q-2022-02-02-642

Eigenstate entanglement scaling for critical interacting spin chains

Authors: Qiang Miao, Thomas Barthel

Abstract: With increasing subsystem size and energy, bipartite entanglement entropies of energy eigenstates cross over from the groundstate scaling to a volume law. In previous work, we pointed out that, when strong or weak eigenstate thermalization (ETH) applies, the entanglement entropies of all or, respectively, almost all eigenstates follow a single crossover function. The crossover functions are determ… ▽ More With increasing subsystem size and energy, bipartite entanglement entropies of energy eigenstates cross over from the groundstate scaling to a volume law. In previous work, we pointed out that, when strong or weak eigenstate thermalization (ETH) applies, the entanglement entropies of all or, respectively, almost all eigenstates follow a single crossover function. The crossover functions are determined by the subsystem entropy of thermal states and assume universal scaling forms in quantum-critical regimes. This was demonstrated by field-theoretical arguments and the analysis of large systems of non-interacting fermions and bosons. Here, we substantiate such scaling properties for integrable and non-integrable interacting spin-1/2 chains at criticality using exact diagonalization. In particular, we analyze XXZ and transverse-field Ising models with and without next-nearest-neighbor interactions. Indeed, the crossover of thermal subsystem entropies can be described by a universal scaling function following from conformal field theory. Furthermore, we analyze the validity of ETH for entanglement in these models. Even for the relatively small system sizes that can be simulated, the distributions of eigenstate entanglement entropies are sharply peaked around the subsystem entropies of the corresponding thermal ensembles. △ Less

Submitted 1 February, 2022; v1 submitted 14 October, 2020; originally announced October 2020.

Comments: 8 pages, 7 figures. Data for larger systems, added discussion of substructures due to approximately conserved quantities, minor improvements. Published version. This complements arXiv:1905.07760 and arXiv:1912.10045

Journal ref: Quantum 6, 642 (2022)

arXiv:2010.03365 [pdf, other]

"Drunk Man" Saves Our Lives: Route Planning by a Biased Random Walk Mode

Authors: Xinyi Hu, Quchen Miao, Zexuan Zhao

Abstract: Based on the hurricane striking Puerto Rico in 2017, we developed a transportable disaster response system "DroneGo" featuring a drone fleet capable of delivering the medical package and videoing roads. Covering with a genetic algorithm and a biased random walk model mimicking a drunk man to explore feasible routes on a field with altitude and road information. A proposal mechanism guaranteeing st… ▽ More Based on the hurricane striking Puerto Rico in 2017, we developed a transportable disaster response system "DroneGo" featuring a drone fleet capable of delivering the medical package and videoing roads. Covering with a genetic algorithm and a biased random walk model mimicking a drunk man to explore feasible routes on a field with altitude and road information. A proposal mechanism guaranteeing stochasticity and an objective function biasing randomness are combined. The results showed high performance though time-consuming. △ Less

Submitted 4 October, 2020; originally announced October 2020.

arXiv:2004.03652 [pdf, ps, other]

Global regularity for a 1D Euler-alignment system with misalignment

Authors: Qianyun Miao, Changhui Tan, Liutang Xue

Abstract: We study one-dimensional Eulerian dynamics with nonlocal alignment interactions, featuring strong short-range alignment, and long-range misalignment. Compared with the well-studied Euler-alignment system, the presence of the misalignment brings different behaviors of the solutions, including the possible creation of vacuum at infinite time, which destabilizes the solutions. We show that with a str… ▽ More We study one-dimensional Eulerian dynamics with nonlocal alignment interactions, featuring strong short-range alignment, and long-range misalignment. Compared with the well-studied Euler-alignment system, the presence of the misalignment brings different behaviors of the solutions, including the possible creation of vacuum at infinite time, which destabilizes the solutions. We show that with a strongly singular short-range alignment interaction, the solution is globally regular, despite the effect of misalignment. △ Less

Submitted 7 April, 2020; originally announced April 2020.

Comments: 38 pages, 1 figure

MSC Class: 35Q35; 35R11; 92D25; 76N10

arXiv:2001.03859 [pdf, ps, other]

doi 10.1103/PhysRevC.101.065801

Nucleon effective mass in hot dense matter

Authors: X. L. Shang, A. Li, Z. Q. Miao, G. F. Burgio, H. -J. Schulze

Abstract: Nucleon effective masses are studied in the framework of the Brueckner-Hartree-Fock many-body approach at finite temperature. Self-consistent calculations using the Argonne $V_{18}$ interaction including microscopic three-body forces are reported for varying temperature and proton fraction up to several times the nuclear saturation density. Our calculations are based on the exact treatment of the… ▽ More Nucleon effective masses are studied in the framework of the Brueckner-Hartree-Fock many-body approach at finite temperature. Self-consistent calculations using the Argonne $V_{18}$ interaction including microscopic three-body forces are reported for varying temperature and proton fraction up to several times the nuclear saturation density. Our calculations are based on the exact treatment of the center-of-mass momentum instead of the average-momentum approximation employed in previous works. We discuss in detail the effects of the temperature together with those of the three-body forces, the density, and the isospin asymmetry. We also provide an analytical fit of the effective mass taking these dependencies into account. The temperature effects on the cooling of neutron stars are briefly discussed based on the results for betastable matter. △ Less

Submitted 28 April, 2020; v1 submitted 12 January, 2020; originally announced January 2020.

Comments: version accepted for publication in Physical Review C

Journal ref: Phys. Rev. C 101, 065801 (2020)

arXiv:1912.10045 [pdf, other]

doi 10.1103/PhysRevA.104.022414

Scaling functions for eigenstate entanglement crossovers in harmonic lattices

Authors: Thomas Barthel, Qiang Miao

Abstract: For quantum matter, eigenstate entanglement entropies obey an area law or log-area law at low energies and small subsystem sizes and cross over to volume laws for high energies and large subsystems. This transition is captured by crossover functions, which assume a universal scaling form in quantum critical regimes. We demonstrate this for the harmonic lattice model, which describes quantized latt… ▽ More For quantum matter, eigenstate entanglement entropies obey an area law or log-area law at low energies and small subsystem sizes and cross over to volume laws for high energies and large subsystems. This transition is captured by crossover functions, which assume a universal scaling form in quantum critical regimes. We demonstrate this for the harmonic lattice model, which describes quantized lattice vibrations and is a regularization for free scalar field theories, modeling, e.g., spin-0 bosonic particles. In one dimension, the groundstate entanglement obeys a log-area law. For dimensions $d\geq 2$, it displays area laws, even at criticality. The distribution of excited-state entanglement entropies is found to be sharply peaked around subsystem entropies of corresponding thermodynamic ensembles in accordance with the eigenstate thermalization hypothesis. Numerically, we determine crossover scaling functions for the quantum critical regime of the model and do a large-deviation analysis. We show how infrared singularities of the system can be handled and how to access the thermodynamic limit using a perturbative trick for the covariance matrix. Eigenstates for quasi-free bosonic systems are not Gaussian. We resolve this problem by considering appropriate squeezed states instead. For these, entanglement entropies can be evaluated efficiently. △ Less

Submitted 7 September, 2021; v1 submitted 20 December, 2019; originally announced December 2019.

Comments: 12 pages, 5 figures. Added a large-deviation analysis, improved text; published version. See also arXiv:1905.07760 [PRL 127, 040603 (2021)], where the concept of scaling functions for eigenstate entanglement crossovers has been introduced and demonstrated for other models

Journal ref: Phys. Rev. A 104, 022414 (2021)

arXiv:1908.06904 [pdf, ps, other]

doi 10.4064/cm140-1-4

The Defocusing Energy-critical Klein-Gordon-Hartree Equation

Authors: Qianyun Miao, Jiqiang Zheng

Abstract: In this paper, we study the scattering theory for the defocusing energy-critical Klein-Gordon equation with a cubic convolution $u_{tt}-Δu+u+(|x|^{-4}\ast|u|^2)u=0$ in the spatial dimension $d \geq 5$. We utilize the strategy in [S. Ibrahim, N. Masmoudi and K. Nakanishi, Scattering threshold for the focusing nonlinear Klein-Gordon equation. Analysis and PDE., 4 (2011), 405-460.] derived from conce… ▽ More In this paper, we study the scattering theory for the defocusing energy-critical Klein-Gordon equation with a cubic convolution $u_{tt}-Δu+u+(|x|^{-4}\ast|u|^2)u=0$ in the spatial dimension $d \geq 5$. We utilize the strategy in [S. Ibrahim, N. Masmoudi and K. Nakanishi, Scattering threshold for the focusing nonlinear Klein-Gordon equation. Analysis and PDE., 4 (2011), 405-460.] derived from concentration compactness ideas to show that the proof of the global well-posedness and scattering is reduced to disprove the existence of the soliton-like solution. Employing technique from [B. Pausader, Scattering for the Beam Equation in Low Dimensions. Indiana Univ. Math. J., 59 (2010), 791-822.], we consider a virial-type identity in the direction orthogonal to the momentum vector so as to exclude such solution. △ Less

Submitted 16 August, 2019; originally announced August 2019.

Comments: 23 pages. arXiv admin note: substantial text overlap with arXiv:math/0612028

Journal ref: Colloquium Mathematicum, 140(2015),31-58

arXiv:1907.12193 [pdf, other]

doi 10.1109/TCYB.2020.3012092

ChaLearn Looking at People: IsoGD and ConGD Large-scale RGB-D Gesture Recognition

Authors: Jun Wan, Chi Lin, Longyin Wen, Yunan Li, Qiguang Miao, Sergio Escalera, Gholamreza Anbarjafari, Isabelle Guyon, Guodong Guo, Stan Z. Li

Abstract: The ChaLearn large-scale gesture recognition challenge has been run twice in two workshops in conjunction with the International Conference on Pattern Recognition (ICPR) 2016 and International Conference on Computer Vision (ICCV) 2017, attracting more than $200$ teams round the world. This challenge has two tracks, focusing on isolated and continuous gesture recognition, respectively. This paper d… ▽ More The ChaLearn large-scale gesture recognition challenge has been run twice in two workshops in conjunction with the International Conference on Pattern Recognition (ICPR) 2016 and International Conference on Computer Vision (ICCV) 2017, attracting more than $200$ teams round the world. This challenge has two tracks, focusing on isolated and continuous gesture recognition, respectively. This paper describes the creation of both benchmark datasets and analyzes the advances in large-scale gesture recognition based on these two datasets. We discuss the challenges of collecting large-scale ground-truth annotations of gesture recognition, and provide a detailed analysis of the current state-of-the-art methods for large-scale isolated and continuous gesture recognition based on RGB-D video sequences. In addition to recognition rate and mean jaccard index (MJI) as evaluation metrics used in our previous challenges, we also introduce the corrected segmentation rate (CSR) metric to evaluate the performance of temporal segmentation for continuous gesture recognition. Furthermore, we propose a bidirectional long short-term memory (Bi-LSTM) baseline method, determining the video division points based on the skeleton points extracted by convolutional pose machine (CPM). Experiments demonstrate that the proposed Bi-LSTM outperforms the state-of-the-art methods with an absolute improvement of $8.1\%$ (from $0.8917$ to $0.9639$) of CSR. △ Less

Submitted 28 July, 2019; originally announced July 2019.

Comments: 14 pages, 8 figures, 6 tables

Journal ref: IEEE Transactions on Cybernetics 2020

Showing 1–50 of 59 results for author: Miao, Q