-
BEVGPT: Generative Pre-trained Large Model for Autonomous Driving Prediction, Decision-Making, and Planning
Authors:
Pengqin Wang,
Meixin Zhu,
Hongliang Lu,
Hui Zhong,
Xianda Chen,
Shaojie Shen,
Xuesong Wang,
Yinhai Wang
Abstract:
Prediction, decision-making, and motion planning are essential for autonomous driving. In most contemporary works, they are considered as individual modules or combined into a multi-task learning paradigm with a shared backbone but separate task heads. However, we argue that they should be integrated into a comprehensive framework. Although several recent approaches follow this scheme, they suffer…
▽ More
Prediction, decision-making, and motion planning are essential for autonomous driving. In most contemporary works, they are considered as individual modules or combined into a multi-task learning paradigm with a shared backbone but separate task heads. However, we argue that they should be integrated into a comprehensive framework. Although several recent approaches follow this scheme, they suffer from complicated input representations and redundant framework designs. More importantly, they can not make long-term predictions about future driving scenarios. To address these issues, we rethink the necessity of each module in an autonomous driving task and incorporate only the required modules into a minimalist autonomous driving framework. We propose BEVGPT, a generative pre-trained large model that integrates driving scenario prediction, decision-making, and motion planning. The model takes the bird's-eye-view (BEV) images as the only input source and makes driving decisions based on surrounding traffic scenarios. To ensure driving trajectory feasibility and smoothness, we develop an optimization-based motion planning method. We instantiate BEVGPT on Lyft Level 5 Dataset and use Woven Planet L5Kit for realistic driving simulation. The effectiveness and robustness of the proposed framework are verified by the fact that it outperforms previous methods in 100% decision-making metrics and 66% motion planning metrics. Furthermore, the ability of our framework to accurately generate BEV images over the long term is demonstrated through the task of driving scenario prediction. To the best of our knowledge, this is the first generative pre-trained large model for autonomous driving prediction, decision-making, and motion planning with only BEV images as input.
△ Less
Submitted 16 October, 2023;
originally announced October 2023.
-
RT-SRTS: Angle-Agnostic Real-Time Simultaneous 3D Reconstruction and Tumor Segmentation from Single X-Ray Projection
Authors:
Miao Zhu,
Qiming Fu,
Bo Liu,
Mengxi Zhang,
Bojian Li,
Xiaoyan Luo,
Fugen Zhou
Abstract:
Radiotherapy is one of the primary treatment methods for tumors, but the organ movement caused by respiration limits its accuracy. Recently, 3D imaging from a single X-ray projection has received extensive attention as a promising approach to address this issue. However, current methods can only reconstruct 3D images without directly locating the tumor and are only validated for fixed-angle imagin…
▽ More
Radiotherapy is one of the primary treatment methods for tumors, but the organ movement caused by respiration limits its accuracy. Recently, 3D imaging from a single X-ray projection has received extensive attention as a promising approach to address this issue. However, current methods can only reconstruct 3D images without directly locating the tumor and are only validated for fixed-angle imaging, which fails to fully meet the requirements of motion control in radiotherapy. In this study, a novel imaging method RT-SRTS is proposed which integrates 3D imaging and tumor segmentation into one network based on multi-task learning (MTL) and achieves real-time simultaneous 3D reconstruction and tumor segmentation from a single X-ray projection at any angle. Furthermore, the attention enhanced calibrator (AEC) and uncertain-region elaboration (URE) modules have been proposed to aid feature extraction and improve segmentation accuracy. The proposed method was evaluated on fifteen patient cases and compared with three state-of-the-art methods. It not only delivers superior 3D reconstruction but also demonstrates commendable tumor segmentation results. Simultaneous reconstruction and segmentation can be completed in approximately 70 ms, significantly faster than the required time threshold for real-time tumor tracking. The efficacies of both AEC and URE have also been validated in ablation studies. The code of work is available at https://github.com/ZywooSimple/RT-SRTS.
△ Less
Submitted 28 March, 2024; v1 submitted 12 October, 2023;
originally announced October 2023.
-
Lie Neurons: Adjoint-Equivariant Neural Networks for Semisimple Lie Algebras
Authors:
Tzu-Yuan Lin,
Minghan Zhu,
Maani Ghaffari
Abstract:
This paper proposes an equivariant neural network that takes data in any semi-simple Lie algebra as input. The corresponding group acts on the Lie algebra as adjoint operations, making our proposed network adjoint-equivariant. Our framework generalizes the Vector Neurons, a simple $\mathrm{SO}(3)$-equivariant network, from 3-D Euclidean space to Lie algebra spaces, building upon the invariance pro…
▽ More
This paper proposes an equivariant neural network that takes data in any semi-simple Lie algebra as input. The corresponding group acts on the Lie algebra as adjoint operations, making our proposed network adjoint-equivariant. Our framework generalizes the Vector Neurons, a simple $\mathrm{SO}(3)$-equivariant network, from 3-D Euclidean space to Lie algebra spaces, building upon the invariance property of the Killing form. Furthermore, we propose novel Lie bracket layers and geometric channel mixing layers that extend the modeling capacity. Experiments are conducted for the $\mathfrak{so}(3)$, $\mathfrak{sl}(3)$, and $\mathfrak{sp}(4)$ Lie algebras on various tasks, including fitting equivariant and invariant functions, learning system dynamics, point cloud registration, and homography-based shape classification. Our proposed equivariant network shows wide applicability and competitive performance in various domains.
△ Less
Submitted 6 June, 2024; v1 submitted 6 October, 2023;
originally announced October 2023.
-
Nearest neighbor synthesis of CNOT circuits on general quantum architectures
Authors:
Xinyu Chen,
Mingqiang Zhu,
Xueyun Cheng,
Pengcheng Zhu,
Zhi** Guan
Abstract:
In recent years, quantum computing has entered the Noisy Intermediate-Scale Quantum (NISQ). However, NISQ devices have inherent limitations in terms of connectivity and hardware noise, necessitating the transformation of quantum logic circuits for correct execution on NISQ chips. The synthesis of CNOT circuits considering physical constraints can transform quantum algorithms into low-level quantum…
▽ More
In recent years, quantum computing has entered the Noisy Intermediate-Scale Quantum (NISQ). However, NISQ devices have inherent limitations in terms of connectivity and hardware noise, necessitating the transformation of quantum logic circuits for correct execution on NISQ chips. The synthesis of CNOT circuits considering physical constraints can transform quantum algorithms into low-level quantum circuits, which can be directly executed on physical chips. In the current trend, quantum chip architectures without Hamiltonian paths are gradually replacing architectures with Hamiltonian paths due to their scalability and low-noise characteristics. To this end, this paper addresses the nearest neighbor synthesis of CNOT circuits in the architecture with and without Hamiltonian paths, aiming to enhance the fidelity of the circuits after execution. Firstly, a key-qubit priority map** model for the general architecture with and without Hamiltonian paths is proposed. Secondly, the initial map** is further improved by using tabu search to reduce the number of CNOT gates after circuit synthesis and enhance its fidelity. Finally, the noise-aware CNOT circuit nearest neighbor synthesis algorithm for the general architecture is proposed based on the key-qubit priority map** model. Experimental results show that the proposed method can enhance the fidelity of the CNOT circuit by about 64.7% on a real quantum computing device, achieving a significant optimization effect. Furthermore, the method can be extended to other circuits, thereby improving the overall performance of quantum computing on NISQ devices.
△ Less
Submitted 1 October, 2023;
originally announced October 2023.
-
Temporal credit assignment for one-shot learning utilizing a phase transition material
Authors:
Alessandro R. Galloni,
Yifan Yuan,
Minning Zhu,
Haoming Yu,
Ravindra S. Bisht,
Chung-Tse Michael Wu,
Christine Grienberger,
Shriram Ramanathan,
Aaron D. Milstein
Abstract:
Design of hardware based on biological principles of neuronal computation and plasticity in the brain is a leading approach to realizing energy- and sample-efficient artificial intelligence and learning machines. An important factor in selection of the hardware building blocks is the identification of candidate materials with physical properties suitable to emulate the large dynamic ranges and var…
▽ More
Design of hardware based on biological principles of neuronal computation and plasticity in the brain is a leading approach to realizing energy- and sample-efficient artificial intelligence and learning machines. An important factor in selection of the hardware building blocks is the identification of candidate materials with physical properties suitable to emulate the large dynamic ranges and varied timescales of neuronal signaling. Previous work has shown that the all-or-none spiking behavior of neurons can be mimicked by threshold switches utilizing phase transitions. Here we demonstrate that devices based on a prototypical metal-insulator-transition material, vanadium dioxide (VO2), can be dynamically controlled to access a continuum of intermediate resistance states. Furthermore, the timescale of their intrinsic relaxation can be configured to match a range of biologically-relevant timescales from milliseconds to seconds. We exploit these device properties to emulate three aspects of neuronal analog computation: fast (~1 ms) spiking in a neuronal soma compartment, slow (~100 ms) spiking in a dendritic compartment, and ultraslow (~1 s) biochemical signaling involved in temporal credit assignment for a recently discovered biological mechanism of one-shot learning. Simulations show that an artificial neural network using properties of VO2 devices to control an agent navigating a spatial environment can learn an efficient path to a reward in up to 4 fold fewer trials than standard methods. The phase relaxations described in our study may be engineered in a variety of materials, and can be controlled by thermal, electrical, or optical stimuli, suggesting further opportunities to emulate biological learning.
△ Less
Submitted 29 September, 2023;
originally announced October 2023.
-
Diff-Privacy: Diffusion-based Face Privacy Protection
Authors:
Xiao He,
Mingrui Zhu,
Dongxin Chen,
Nannan Wang,
Xinbo Gao
Abstract:
Privacy protection has become a top priority as the proliferation of AI techniques has led to widespread collection and misuse of personal data. Anonymization and visual identity information hiding are two important facial privacy protection tasks that aim to remove identification characteristics from facial images at the human perception level. However, they have a significant difference in that…
▽ More
Privacy protection has become a top priority as the proliferation of AI techniques has led to widespread collection and misuse of personal data. Anonymization and visual identity information hiding are two important facial privacy protection tasks that aim to remove identification characteristics from facial images at the human perception level. However, they have a significant difference in that the former aims to prevent the machine from recognizing correctly, while the latter needs to ensure the accuracy of machine recognition. Therefore, it is difficult to train a model to complete these two tasks simultaneously. In this paper, we unify the task of anonymization and visual identity information hiding and propose a novel face privacy protection method based on diffusion models, dubbed Diff-Privacy. Specifically, we train our proposed multi-scale image inversion module (MSI) to obtain a set of SDM format conditional embeddings of the original image. Based on the conditional embeddings, we design corresponding embedding scheduling strategies and construct different energy functions during the denoising process to achieve anonymization and visual identity information hiding. Extensive experiments have been conducted to validate the effectiveness of our proposed framework in protecting facial privacy.
△ Less
Submitted 11 September, 2023;
originally announced September 2023.
-
Enhancing Asynchronous Time Series Forecasting with Contrastive Relational Inference
Authors:
Yan Wang,
Zhixuan Chu,
Tao Zhou,
Caigao Jiang,
Hongyan Hao,
Minjie Zhu,
Xindong Cai,
Qing Cui,
Longfei Li,
James Y Zhang,
Siqiao Xue,
Jun Zhou
Abstract:
Asynchronous time series, also known as temporal event sequences, are the basis of many applications throughout different industries. Temporal point processes(TPPs) are the standard method for modeling such data. Existing TPP models have focused on parameterizing the conditional distribution of future events instead of explicitly modeling event interactions, imposing challenges for event predictio…
▽ More
Asynchronous time series, also known as temporal event sequences, are the basis of many applications throughout different industries. Temporal point processes(TPPs) are the standard method for modeling such data. Existing TPP models have focused on parameterizing the conditional distribution of future events instead of explicitly modeling event interactions, imposing challenges for event predictions. In this paper, we propose a novel approach that leverages Neural Relational Inference (NRI) to learn a relation graph that infers interactions while simultaneously learning the dynamics patterns from observational data. Our approach, the Contrastive Relational Inference-based Hawkes Process (CRIHP), reasons about event interactions under a variational inference framework. It utilizes intensity-based learning to search for prototype paths to contrast relationship constraints. Extensive experiments on three real-world datasets demonstrate the effectiveness of our model in capturing event interactions for event sequence modeling tasks. Code will be integrated into the EasyTPP framework.
△ Less
Submitted 6 October, 2023; v1 submitted 6 September, 2023;
originally announced September 2023.
-
Scenario-Aware Hierarchical Dynamic Network for Multi-Scenario Recommendation
Authors:
**gtong Gao,
Bo Chen,
Menghui Zhu,
Xiangyu Zhao,
Xiaopeng Li,
Yuhao Wang,
Yichao Wang,
Huifeng Guo,
Ruiming Tang
Abstract:
Click-Through Rate (CTR) prediction is a fundamental technique in recommendation and advertising systems. Recent studies have shown that implementing multi-scenario recommendations contributes to strengthening information sharing and improving overall performance. However, existing multi-scenario models only consider coarse-grained explicit scenario modeling that depends on pre-defined scenario id…
▽ More
Click-Through Rate (CTR) prediction is a fundamental technique in recommendation and advertising systems. Recent studies have shown that implementing multi-scenario recommendations contributes to strengthening information sharing and improving overall performance. However, existing multi-scenario models only consider coarse-grained explicit scenario modeling that depends on pre-defined scenario identification from manual prior rules, which is biased and sub-optimal. To address these limitations, we propose a Scenario-Aware Hierarchical Dynamic Network for Multi-Scenario Recommendations (HierRec), which perceives implicit patterns adaptively and conducts explicit and implicit scenario modeling jointly. In particular, HierRec designs a basic scenario-oriented module based on the dynamic weight to capture scenario-specific information. Then the hierarchical explicit and implicit scenario-aware modules are proposed to model hybrid-grained scenario information. The multi-head implicit modeling design contributes to perceiving distinctive patterns from different perspectives. Our experiments on two public datasets and real-world industrial applications on a mainstream online advertising platform demonstrate that our HierRec outperforms existing models significantly.
△ Less
Submitted 5 September, 2023;
originally announced September 2023.
-
FAST discovery of a fast neutral hydrogen outflow
Authors:
Renzhi Su,
Minfeng Gu,
S. J. Curran,
Elizabeth K. Mahony,
Ningyu Tang,
James R. Allison,
Di Li,
Ming Zhu,
J. N. H. S. Aditya,
Hyein Yoon,
Zheng Zheng,
Zhongzu Wu
Abstract:
In this letter, we report the discovery of a fast neutral hydrogen outflow in SDSS J145239.38+062738.0, a merging radio galaxy containing an optical type I active galactic nuclei (AGN). This discovery was made through observations conducted by the Five-hundred-meter Aperture Spherical radio Telescope (FAST) using redshifted 21-cm absorption. The outflow exhibits a blueshifted velocity likely up to…
▽ More
In this letter, we report the discovery of a fast neutral hydrogen outflow in SDSS J145239.38+062738.0, a merging radio galaxy containing an optical type I active galactic nuclei (AGN). This discovery was made through observations conducted by the Five-hundred-meter Aperture Spherical radio Telescope (FAST) using redshifted 21-cm absorption. The outflow exhibits a blueshifted velocity likely up to $\sim-1000\,\rm km\,s^{-1}$ with respect to the systemic velocity of the host galaxy with an absorption strength of $\sim -0.6\,\rm mJy\,beam^{-1}$ corresponding to an optical depth of 0.002 at $v=-500\,\rm km\,s^{-1}$. The mass outflow rate ranges between $2.8\times10^{-2}$ and $3.6\, \rm M_\odot \, yr^{-1}$, implying an energy outflow rate ranging between $4.2\times10^{39}$ and $9.7\times10^{40}\rm\,erg\,s^{-1}$, assuming 100 K $<T_{\rm s}<$ 1000 K. Plausible drivers of the outflow include the star bursts, the AGN radiation, and the radio jet, the last of which is considered the most likely culprit according to the kinematics. By analysing the properties of the outflow, the AGN, and the jet, we find that if the HI outflow is driven by the AGN radiation, the AGN radiation seems not powerful enough to provide negative feedback whereas the radio jet shows the potential to provide negative feedback. Our observations contribute another example of a fast outflow detected in neutral hydrogen, as well as demonstrate the capability of FAST in detecting such outflows.
△ Less
Submitted 4 September, 2023;
originally announced September 2023.
-
Fine-Grained Spatiotemporal Motion Alignment for Contrastive Video Representation Learning
Authors:
Minghao Zhu,
Xiao Lin,
Ronghao Dang,
Chengju Liu,
Qijun Chen
Abstract:
As the most essential property in a video, motion information is critical to a robust and generalized video representation. To inject motion dynamics, recent works have adopted frame difference as the source of motion information in video contrastive learning, considering the trade-off between quality and cost. However, existing works align motion features at the instance level, which suffers from…
▽ More
As the most essential property in a video, motion information is critical to a robust and generalized video representation. To inject motion dynamics, recent works have adopted frame difference as the source of motion information in video contrastive learning, considering the trade-off between quality and cost. However, existing works align motion features at the instance level, which suffers from spatial and temporal weak alignment across modalities. In this paper, we present a \textbf{Fi}ne-grained \textbf{M}otion \textbf{A}lignment (FIMA) framework, capable of introducing well-aligned and significant motion information. Specifically, we first develop a dense contrastive learning framework in the spatiotemporal domain to generate pixel-level motion supervision. Then, we design a motion decoder and a foreground sampling strategy to eliminate the weak alignments in terms of time and space. Moreover, a frame-level motion contrastive loss is presented to improve the temporal diversity of the motion features. Extensive experiments demonstrate that the representations learned by FIMA possess great motion-awareness capabilities and achieve state-of-the-art or competitive results on downstream tasks across UCF101, HMDB51, and Diving48 datasets. Code is available at \url{https://github.com/ZMHH-H/FIMA}.
△ Less
Submitted 1 September, 2023;
originally announced September 2023.
-
A sharp trace Adams' inequality in $\mathbb{R}^{4}$ and Existence of the extremals
Authors:
Lu Chen,
Guozhen Lu,
Maochun Zhu
Abstract:
Let $Ω\subseteq \mathbb{R}^{4}$ be a bounded domain with smooth boundary $\partialΩ$. In this paper, we establish the following sharp form of the trace Adams' inequality in $W^{2,2}(Ω)$ with
zero mean value and zero Neumann boundary condition: \begin{equation*} S(α)=\underset{\int_Ωudx=0,\frac{\partial u}{\partialν}|_{\partialΩ}=0,\VertΔu\Vert_{2}\leq{1}}{\underset {u\in{W^{2,2}(Ω)\setminus\{0\}…
▽ More
Let $Ω\subseteq \mathbb{R}^{4}$ be a bounded domain with smooth boundary $\partialΩ$. In this paper, we establish the following sharp form of the trace Adams' inequality in $W^{2,2}(Ω)$ with
zero mean value and zero Neumann boundary condition: \begin{equation*} S(α)=\underset{\int_Ωudx=0,\frac{\partial u}{\partialν}|_{\partialΩ}=0,\VertΔu\Vert_{2}\leq{1}}{\underset {u\in{W^{2,2}(Ω)\setminus\{0\}}}{\sup}}\int_{\partial Ω} e^{αu^{2}}dσ<\infty \end{equation*} holds if and only if $ α\leq12π^2$.
Moreover, we prove a classification theorem for the solutions of a class of nonlinear boundary value problem of bi-harmonic equations on the half space $\mathbb{R}^4_{+}$. With this classification result, we can show that $S({12π^2})$ is attained by using the blow-up analysis and capacitary estimate. As an application, we prove a sharp trace Adams-Onofri type inequality in general four dimensional bounded domains with smooth boundary.
△ Less
Submitted 30 August, 2023;
originally announced August 2023.
-
SAM-Med2D
Authors:
Junlong Cheng,
** Ye,
Zhongying Deng,
Jianpin Chen,
Tianbin Li,
Haoyu Wang,
Yanzhou Su,
Ziyan Huang,
Jilong Chen,
Lei Jiang,
Hui Sun,
Junjun He,
Shaoting Zhang,
Min Zhu,
Yu Qiao
Abstract:
The Segment Anything Model (SAM) represents a state-of-the-art research advancement in natural image segmentation, achieving impressive results with input prompts such as points and bounding boxes. However, our evaluation and recent research indicate that directly applying the pretrained SAM to medical image segmentation does not yield satisfactory performance. This limitation primarily arises fro…
▽ More
The Segment Anything Model (SAM) represents a state-of-the-art research advancement in natural image segmentation, achieving impressive results with input prompts such as points and bounding boxes. However, our evaluation and recent research indicate that directly applying the pretrained SAM to medical image segmentation does not yield satisfactory performance. This limitation primarily arises from significant domain gap between natural images and medical images. To bridge this gap, we introduce SAM-Med2D, the most comprehensive studies on applying SAM to medical 2D images. Specifically, we first collect and curate approximately 4.6M images and 19.7M masks from public and private datasets, constructing a large-scale medical image segmentation dataset encompassing various modalities and objects. Then, we comprehensively fine-tune SAM on this dataset and turn it into SAM-Med2D. Unlike previous methods that only adopt bounding box or point prompts as interactive segmentation approach, we adapt SAM to medical image segmentation through more comprehensive prompts involving bounding boxes, points, and masks. We additionally fine-tune the encoder and decoder of the original SAM to obtain a well-performed SAM-Med2D, leading to the most comprehensive fine-tuning strategies to date. Finally, we conducted a comprehensive evaluation and analysis to investigate the performance of SAM-Med2D in medical image segmentation across various modalities, anatomical structures, and organs. Concurrently, we validated the generalization capability of SAM-Med2D on 9 datasets from MICCAI 2023 challenge. Overall, our approach demonstrated significantly superior performance and generalization capability compared to SAM.
△ Less
Submitted 30 August, 2023;
originally announced August 2023.
-
Native approach to controlled-Z gates in inductively coupled fluxonium qubits
Authors:
Xizheng Ma,
Gengyan Zhang,
Feng Wu,
Feng Bao,
Xu Chang,
Jianjun Chen,
Hao Deng,
Ran Gao,
Xun Gao,
Lijuan Hu,
Honghong Ji,
Hsiang-Sheng Ku,
Kannan Lu,
Lu Ma,
Liyong Mao,
Zhijun Song,
Hantao Sun,
Chengchun Tang,
Fei Wang,
Hongcheng Wang,
Tenghui Wang,
Tian Xia,
Make Ying,
Huijuan Zhan,
Tao Zhou
, et al. (5 additional authors not shown)
Abstract:
The fluxonium qubits have emerged as a promising platform for gate-based quantum information processing. However, their extraordinary protection against charge fluctuations comes at a cost: when coupled capacitively, the qubit-qubit interactions are restricted to XX-interactions. Consequently, effective XX- or XZ-interactions are only constructed either by temporarily populating higher-energy stat…
▽ More
The fluxonium qubits have emerged as a promising platform for gate-based quantum information processing. However, their extraordinary protection against charge fluctuations comes at a cost: when coupled capacitively, the qubit-qubit interactions are restricted to XX-interactions. Consequently, effective XX- or XZ-interactions are only constructed either by temporarily populating higher-energy states, or by exploiting perturbative effects under microwave driving. Instead, we propose and demonstrate an inductive coupling scheme, which offers a wide selection of native qubit-qubit interactions for fluxonium. In particular, we leverage a built-in, flux-controlled ZZ-interaction to perform qubit entanglement. To combat the increased flux-noise-induced dephasing away from the flux-insensitive position, we use a continuous version of the dynamical decoupling scheme to perform noise filtering. Combining these, we demonstrate a 20 ns controlled-Z (CZ) gate with a mean fidelity of 99.53%. More than confirming the efficacy of our gate scheme, this high-fidelity result also reveals a promising but rarely explored parameter space uniquely suitable for gate operations between fluxonium qubits.
△ Less
Submitted 30 August, 2023;
originally announced August 2023.
-
EnsembleFollower: A Hybrid Car-Following Framework Based On Reinforcement Learning and Hierarchical Planning
Authors:
Xu Han,
Xianda Chen,
Meixin Zhu,
Pinlong Cai,
Jianshan Zhou,
Xiaowen Chu
Abstract:
Car-following models have made significant contributions to our understanding of longitudinal driving behavior. However, they often exhibit limited accuracy and flexibility, as they cannot fully capture the complexity inherent in car-following processes, or may falter in unseen scenarios due to their reliance on confined driving skills present in training data. It is worth noting that each car-fol…
▽ More
Car-following models have made significant contributions to our understanding of longitudinal driving behavior. However, they often exhibit limited accuracy and flexibility, as they cannot fully capture the complexity inherent in car-following processes, or may falter in unseen scenarios due to their reliance on confined driving skills present in training data. It is worth noting that each car-following model possesses its own strengths and weaknesses depending on specific driving scenarios. Therefore, we propose EnsembleFollower, a hierarchical planning framework for achieving advanced human-like car-following. The EnsembleFollower framework involves a high-level Reinforcement Learning-based agent responsible for judiciously managing multiple low-level car-following models according to the current state, either by selecting an appropriate low-level model to perform an action or by allocating different weights across all low-level components. Moreover, we propose a jerk-constrained kinematic model for more convincing car-following simulations. We evaluate the proposed method based on real-world driving data from the HighD dataset. The experimental results illustrate that EnsembleFollower yields improved accuracy of human-like behavior and achieves effectiveness in combining hybrid models, demonstrating that our proposed framework can handle diverse car-following conditions by leveraging the strengths of various low-level models.
△ Less
Submitted 30 August, 2023;
originally announced August 2023.
-
On k-Mer-Based and Maximum Likelihood Estimation Algorithms for Trace Reconstruction
Authors:
Kuan Cheng,
Elena Grigorescu,
Xin Li,
Madhu Sudan,
Minshen Zhu
Abstract:
The goal of the trace reconstruction problem is to recover a string $x\in\{0,1\}^n$ given many independent {\em traces} of $x$, where a trace is a subsequence obtained from deleting bits of $x$ independently with some given probability $p\in [0,1).$ A recent result of Chase (STOC 2021) shows how $x$ can be determined (in exponential time) from $\exp(\widetilde{O}(n^{1/5}))$ traces. This is the sta…
▽ More
The goal of the trace reconstruction problem is to recover a string $x\in\{0,1\}^n$ given many independent {\em traces} of $x$, where a trace is a subsequence obtained from deleting bits of $x$ independently with some given probability $p\in [0,1).$ A recent result of Chase (STOC 2021) shows how $x$ can be determined (in exponential time) from $\exp(\widetilde{O}(n^{1/5}))$ traces. This is the state-of-the-art result on the sample complexity of trace reconstruction.
In this paper we consider two kinds of algorithms for the trace reconstruction problem.
Our first, and technically more involved, result shows that any $k$-mer-based algorithm for trace reconstruction must use $\exp(Ω(n^{1/5}))$ traces, under the assumption that the estimator requires $poly(2^k, 1/\varepsilon)$ traces, thus establishing the optimality of this number of traces. The analysis of this result also shows that the analysis technique used by Chase (STOC 2021) is essentially tight, and hence new techniques are needed in order to improve the worst-case upper bound.
Our second, simple, result considers the performance of the Maximum Likelihood Estimator (MLE), which specifically picks the source string that has the maximum likelihood to generate the samples (traces). We show that the MLE algorithm uses a nearly optimal number of traces, \ie, up to a factor of $n$ in the number of samples needed for an optimal algorithm, and show that this factor of $n$ loss may be necessary under general ``model estimation'' settings.
△ Less
Submitted 26 January, 2024; v1 submitted 28 August, 2023;
originally announced August 2023.
-
Confucius: Iterative Tool Learning from Introspection Feedback by Easy-to-Difficult Curriculum
Authors:
Shen Gao,
Zhengliang Shi,
Minghang Zhu,
Bowen Fang,
Xin Xin,
Pengjie Ren,
Zhumin Chen,
Jun Ma,
Zhaochun Ren
Abstract:
Augmenting large language models (LLMs) with external tools has emerged as a promising approach to extending the capability of LLMs. Although some works employ open-source LLMs for the tool learning task, most of them are trained in a controlled environment in which LLMs only learn to execute the human-provided tools. However, selecting proper tools from the large toolset is also a crucial ability…
▽ More
Augmenting large language models (LLMs) with external tools has emerged as a promising approach to extending the capability of LLMs. Although some works employ open-source LLMs for the tool learning task, most of them are trained in a controlled environment in which LLMs only learn to execute the human-provided tools. However, selecting proper tools from the large toolset is also a crucial ability for the tool learning model to be applied in real-world applications. Existing methods usually directly employ self-instruction methods to train the model, which ignores differences in tool complexity. In this paper, we propose the Confucius, a novel tool learning framework to train LLM to use complicated tools in real-world scenarios, which contains two main phases: (1) We first propose a multi-stage learning method to teach the LLM to use various tools from an easy-to-difficult curriculum; (2) thenceforth, we propose the Iterative Self-instruct from Introspective Feedback (ISIF) to dynamically construct the dataset to improve the ability to use the complicated tool. Extensive experiments conducted on both controlled and real-world settings demonstrate the superiority of our tool learning framework in the real-world application scenarios compared to both tuning-free (e.g. ChatGPT, Claude) and tuning-based baselines (e.g. GPT4Tools).
△ Less
Submitted 21 December, 2023; v1 submitted 27 August, 2023;
originally announced August 2023.
-
Rogue peakon, well-posedness, ill-posedness and blow-up phenomenon for an integrable Camassa-Holm type equation
Authors:
Mingxuan Zhu,
Zhenteng Zeng,
Zaihong Jiang,
Baoqiang Xia,
Zhijun Qiao
Abstract:
In this paper, we study an integrable Camassa-Holm (CH) type equation with quadratic nonlinearity. The CH type equation is shown integrable through a Lax pair, and particularly the equation is found to possess a new kind of peaked soliton (peakon) solution - called {\sf rogue peakon}, that is given in a rational form with some logarithmic function, but not a regular traveling wave. We also provide…
▽ More
In this paper, we study an integrable Camassa-Holm (CH) type equation with quadratic nonlinearity. The CH type equation is shown integrable through a Lax pair, and particularly the equation is found to possess a new kind of peaked soliton (peakon) solution - called {\sf rogue peakon}, that is given in a rational form with some logarithmic function, but not a regular traveling wave. We also provide multi-rogue peakon solutions. Furthermore, we discuss the local well-posedness of the solution in the Besov space $B_{p,r}^{s}$ with $1\leq p,r\leq\infty$, $s>\max \left\{1+1/p,3/2\right\}$ or $B_{2,1}^{3/2}$, and then prove the ill-posedness of the solution in $B_{2,\infty}^{3/2}$. Moreover, we establish the global existence and blow-up phenomenon of the solution, which is, if $m_0(x)=u_0-u_{0xx}\geq(\not\equiv) 0$, then the corresponding solution exists globally, meanwhile, if $m_0(x)\leq(\not\equiv) 0$, then the corresponding solution blows up in a finite time.
△ Less
Submitted 23 August, 2023; v1 submitted 22 August, 2023;
originally announced August 2023.
-
Toward a direct measurement of the cosmic acceleration: The pilot observation of H I 21cm absorption line at FAST
Authors:
Jiangang Kang,
Chang-Zhi Lu,
TongJie Zhang,
Ming Zhu
Abstract:
This study presents results on detecting neutral atomic hydrogen (HI) 21cm absorption in the spectrum of PKS1413+135 at redshift $z=0.24670041$. The observation was conducted by FAST, with a spectral resolution of 10 Hz, using 10 minutes of observing time. The global spectral profile is examined by modeling the absorption line using a single Gaussian function with a resolution of 10 kHz within a 2…
▽ More
This study presents results on detecting neutral atomic hydrogen (HI) 21cm absorption in the spectrum of PKS1413+135 at redshift $z=0.24670041$. The observation was conducted by FAST, with a spectral resolution of 10 Hz, using 10 minutes of observing time. The global spectral profile is examined by modeling the absorption line using a single Gaussian function with a resolution of 10 kHz within a 2 MHz bandwidth. The goal is to determine the rate of the latest cosmic acceleration by directly measuring redshift evolution of H I 21 cm absorption line with Hubble flow towards a same background Quasar over a decade or longer time span. This will serve as a detectable signal generated by the accelerated expansion of the Universe at redshift $z < 1$, referred to as redshift drift $\dot{z}$ or the SL effect. The measured HI gas column density in this DLA system is approximately equivalent to the initial observation value, considering uncertainties of the spin temperature of a spiral host galaxy. The high signal-to-noise ratio of 57, obtained at a 10 kHz resolution, strongly supports the feasibility of using the H I 21 cm absorption line in DLA systems to accurately measure the redshift drift rate at a precision level of around $10^{-10}$ per decade.
△ Less
Submitted 7 May, 2024; v1 submitted 17 August, 2023;
originally announced August 2023.
-
Joint Data Collection and Sensor Positioning in Multi-UAV-Assisted Wireless Sensor Network
Authors:
Mingyue Zhu,
Zhiqing Wei,
Chen Qiu,
Wangjun Jiang,
Huici Wu,
Zhiying Feng
Abstract:
Due to the high mobility and easy deployment, unmanned aerial vehicles (UAVs) have attracted much attention in the field of wireless communication and positioning. To meet the challenges of lack of infrastructure coverage, uncertain sensor position and large amount of sensing data collection in wireless sensor network (WSN), this paper presents an efficient joint data collection and sensor positio…
▽ More
Due to the high mobility and easy deployment, unmanned aerial vehicles (UAVs) have attracted much attention in the field of wireless communication and positioning. To meet the challenges of lack of infrastructure coverage, uncertain sensor position and large amount of sensing data collection in wireless sensor network (WSN), this paper presents an efficient joint data collection and sensor positioning scheme for WSN supported by multiple UAVs. Specifically, a UAV is set as the main UAV to collect data, and other UAVs are used as auxiliary UAVs for sensor positioning using time difference of arrival (TDoA). A mixed-integer non-convex optimization problem with uncertain sensor position is established. The goal is to minimize the average positioning error of all sensors by jointly optimizing the UAV trajectories, sensor transmission schedule and positioning observation points (POPs). To solve this optimization model, the original problem is decomposed into two sub-problems based on the path discrete method. Firstly, the block coordinate descent (BCD) and successive convex approximation (SCA) techniques are applied to iteratively optimize the trajectory of the main UAV and the sensor transmission schedule, so as to maximize the minimum amount of data uploaded by the sensor. Then, based on the trajectory of the main UAV, a particle swarm optimization (PSO)-based algorithm is designed to optimize the POPs of UAVs. Finally, the spline curve is applied to generate the trajectories of auxiliary UAVs. The simulation results show that the proposed scheme can meet the requirements of data collection and has a good positioning performance.
△ Less
Submitted 13 August, 2023;
originally announced August 2023.
-
EquiDiff: A Conditional Equivariant Diffusion Model For Trajectory Prediction
Authors:
Kehua Chen,
Xianda Chen,
Zihan Yu,
Meixin Zhu,
Hai Yang
Abstract:
Accurate trajectory prediction is crucial for the safe and efficient operation of autonomous vehicles. The growing popularity of deep learning has led to the development of numerous methods for trajectory prediction. While deterministic deep learning models have been widely used, deep generative models have gained popularity as they learn data distributions from training data and account for traje…
▽ More
Accurate trajectory prediction is crucial for the safe and efficient operation of autonomous vehicles. The growing popularity of deep learning has led to the development of numerous methods for trajectory prediction. While deterministic deep learning models have been widely used, deep generative models have gained popularity as they learn data distributions from training data and account for trajectory uncertainties. In this study, we propose EquiDiff, a deep generative model for predicting future vehicle trajectories. EquiDiff is based on the conditional diffusion model, which generates future trajectories by incorporating historical information and random Gaussian noise. The backbone model of EquiDiff is an SO(2)-equivariant transformer that fully utilizes the geometric properties of location coordinates. In addition, we employ Recurrent Neural Networks and Graph Attention Networks to extract social interactions from historical trajectories. To evaluate the performance of EquiDiff, we conduct extensive experiments on the NGSIM dataset. Our results demonstrate that EquiDiff outperforms other baseline models in short-term prediction, but has slightly higher errors for long-term prediction. Furthermore, we conduct an ablation study to investigate the contribution of each component of EquiDiff to the prediction accuracy. Additionally, we present a visualization of the generation process of our diffusion model, providing insights into the uncertainty of the prediction.
△ Less
Submitted 29 August, 2023; v1 submitted 12 August, 2023;
originally announced August 2023.
-
SegPrompt: Boosting Open-world Segmentation via Category-level Prompt Learning
Authors:
Muzhi Zhu,
Hengtao Li,
Hao Chen,
Chengxiang Fan,
Weian Mao,
Chenchen **g,
Yifan Liu,
Chunhua Shen
Abstract:
Current closed-set instance segmentation models rely on pre-defined class labels for each mask during training and evaluation, largely limiting their ability to detect novel objects. Open-world instance segmentation (OWIS) models address this challenge by detecting unknown objects in a class-agnostic manner. However, previous OWIS approaches completely erase category information during training to…
▽ More
Current closed-set instance segmentation models rely on pre-defined class labels for each mask during training and evaluation, largely limiting their ability to detect novel objects. Open-world instance segmentation (OWIS) models address this challenge by detecting unknown objects in a class-agnostic manner. However, previous OWIS approaches completely erase category information during training to keep the model's ability to generalize to unknown objects. In this work, we propose a novel training mechanism termed SegPrompt that uses category information to improve the model's class-agnostic segmentation ability for both known and unknown categories. In addition, the previous OWIS training setting exposes the unknown classes to the training set and brings information leakage, which is unreasonable in the real world. Therefore, we provide a new open-world benchmark closer to a real-world scenario by dividing the dataset classes into known-seen-unseen parts. For the first time, we focus on the model's ability to discover objects that never appear in the training set images.
Experiments show that SegPrompt can improve the overall and unseen detection performance by 5.6% and 6.1% in AR on our new benchmark without affecting the inference efficiency. We further demonstrate the effectiveness of our method on existing cross-dataset transfer and strongly supervised settings, leading to 5.5% and 12.3% relative improvement.
△ Less
Submitted 12 August, 2023;
originally announced August 2023.
-
PyStructureFactor: A Python code for the molecular structure factor in tunneling ionization rates
Authors:
Shanshan Song,
Mingyu Zhu,
Hongcheng Ni,
Jian Wu
Abstract:
Tunneling ionization is at the core of strong-field and attosecond science. In this paper, we present PyStructureFactor - a general Python code towards the calculation of the structure factor in the tunneling ionization rate of common molecules under intense laser fields. The numerical implementation is based on the well-developed weak-field asymptotic theory in the integral representation. The in…
▽ More
Tunneling ionization is at the core of strong-field and attosecond science. In this paper, we present PyStructureFactor - a general Python code towards the calculation of the structure factor in the tunneling ionization rate of common molecules under intense laser fields. The numerical implementation is based on the well-developed weak-field asymptotic theory in the integral representation. The information of the electronic structure of the molecules is obtained via the PySCF quantum chemistry package. PyStructureFactor is a general computational framework that can be utilized to compute the molecular structure factor of various types of molecules, including polar and non-polar diatomic molecules, degenerate molecules, and open-shell molecules. Examples are given that are benchmarked against known results with good agreements. The present PyStructureFactor is implemented in an efficient manner and is easily applicable towards larger molecules.
△ Less
Submitted 8 August, 2023;
originally announced August 2023.
-
Weakly Semi-Supervised Detection in Lung Ultrasound Videos
Authors:
Jiahong Ouyang,
Li Chen,
Gary Y. Li,
Naveen Balaraju,
Shubham Patil,
Courosh Mehanian,
Sourabh Kulhare,
Rachel Millin,
Kenton W. Gregory,
Cynthia R. Gregory,
Meihua Zhu,
David O. Kessler,
Laurie Malia,
Almaz Dessie,
Joni Rabiner,
Di Coneybeare,
Bo Shopsin,
Andrew Hersh,
Cristian Madar,
Jeffrey Shupp,
Laura S. Johnson,
Jacob Avila,
Kristin Dwyer,
Peter Weimersheimer,
Balasundar Raju
, et al. (2 additional authors not shown)
Abstract:
Frame-by-frame annotation of bounding boxes by clinical experts is often required to train fully supervised object detection models on medical video data. We propose a method for improving object detection in medical videos through weak supervision from video-level labels. More concretely, we aggregate individual detection predictions into video-level predictions and extend a teacher-student train…
▽ More
Frame-by-frame annotation of bounding boxes by clinical experts is often required to train fully supervised object detection models on medical video data. We propose a method for improving object detection in medical videos through weak supervision from video-level labels. More concretely, we aggregate individual detection predictions into video-level predictions and extend a teacher-student training strategy to provide additional supervision via a video-level loss. We also introduce improvements to the underlying teacher-student framework, including methods to improve the quality of pseudo-labels based on weak supervision and adaptive schemes to optimize knowledge transfer between the student and teacher networks. We apply this approach to the clinically important task of detecting lung consolidations (seen in respiratory infections such as COVID-19 pneumonia) in medical ultrasound videos. Experiments reveal that our framework improves detection accuracy and robustness compared to baseline semi-supervised models, and improves efficiency in data and annotation usage.
△ Less
Submitted 7 August, 2023;
originally announced August 2023.
-
Federated Inference with Reliable Uncertainty Quantification over Wireless Channels via Conformal Prediction
Authors:
Meiyi Zhu,
Matteo Zecchin,
Sangwoo Park,
Caili Guo,
Chunyan Feng,
Osvaldo Simeone
Abstract:
In this paper, we consider a wireless federated inference scenario in which devices and a server share a pre-trained machine learning model. The devices communicate statistical information about their local data to the server over a common wireless channel, aiming to enhance the quality of the inference decision at the server. Recent work has introduced federated conformal prediction (CP), which l…
▽ More
In this paper, we consider a wireless federated inference scenario in which devices and a server share a pre-trained machine learning model. The devices communicate statistical information about their local data to the server over a common wireless channel, aiming to enhance the quality of the inference decision at the server. Recent work has introduced federated conformal prediction (CP), which leverages devices-to-server communication to improve the reliability of the server's decision. With federated CP, devices communicate to the server information about the loss accrued by the shared pre-trained model on the local data, and the server leverages this information to calibrate a decision interval, or set, so that it is guaranteed to contain the correct answer with a pre-defined target reliability level. Previous work assumed noise-free communication, whereby devices can communicate a single real number to the server. In this paper, we study for the first time federated CP in a wireless setting. We introduce a novel protocol, termed wireless federated conformal prediction (WFCP), which builds on type-based multiple access (TBMA) and on a novel quantile correction strategy. WFCP is proved to provide formal reliability guarantees in terms of coverage of the predicted set produced by the server. Using numerical results, we demonstrate the significant advantages of WFCP against digital implementations of existing federated CP schemes, especially in regimes with limited communication resources and/or large number of devices.
△ Less
Submitted 15 December, 2023; v1 submitted 8 August, 2023;
originally announced August 2023.
-
Anatomy of spin Hall effect in ferromagnetic metals
Authors:
Fanxing Zheng,
Jianting Dong,
Xinlu Li,
Meng Zhu,
Ye Zhou,
Jia Zhang
Abstract:
The spin Hall effect in nonmagnetic materials has been intensively studied and became one of the most crucial spin-charge conversion mechanism in spintronics. However, the spin Hall effect in ferromagnetic metals has been less investigated and remains unclear. In this work, we investigate the spin Hall effect in representative ferromagnetic alloy by using first-principles calculations. We first cl…
▽ More
The spin Hall effect in nonmagnetic materials has been intensively studied and became one of the most crucial spin-charge conversion mechanism in spintronics. However, the spin Hall effect in ferromagnetic metals has been less investigated and remains unclear. In this work, we investigate the spin Hall effect in representative ferromagnetic alloy by using first-principles calculations. We first clarify the spin Hall effect into three different types including conventional (CSHE), spin anomalous (SAHE) and magnetic spin Hall effect (MSHE) and then calculate the corresponding spin Hall conductivity and spin Hall angle for (Fe, Co, Ni)Pt, NiFe and CoFe alloy. We find the above three spin Hall mechanisms do coexist in ferromagnetic metals. Particularly, for Pt-based ferromagnetic alloy, a sizable conventional and magnetic spin Hall angles comparable to that of Pt have been predicted. The remarkable unconventional spin Hall effect in ferromagnetic metal may enrich the spin-charge conversion phenomena. For instance, the spin current generated by remarkable MSHE with out-of-plane spin-polarization should be helpful for field-free switching of perpendicular magnetization through spin-orbit torque effect. This work may stimulate future studies on the spin Hall effect in ferromagnetic metals and pave their promising applications for spin-charge conversion devices in spintronics.
△ Less
Submitted 6 August, 2023;
originally announced August 2023.
-
One-stage Low-resolution Text Recognition with High-resolution Knowledge Transfer
Authors:
Hang Guo,
Tao Dai,
Mingyan Zhu,
Guanghao Meng,
Bin Chen,
Zhi Wang,
Shu-Tao Xia
Abstract:
Recognizing characters from low-resolution (LR) text images poses a significant challenge due to the information deficiency as well as the noise and blur in low-quality images. Current solutions for low-resolution text recognition (LTR) typically rely on a two-stage pipeline that involves super-resolution as the first stage followed by the second-stage recognition. Although this pipeline is straig…
▽ More
Recognizing characters from low-resolution (LR) text images poses a significant challenge due to the information deficiency as well as the noise and blur in low-quality images. Current solutions for low-resolution text recognition (LTR) typically rely on a two-stage pipeline that involves super-resolution as the first stage followed by the second-stage recognition. Although this pipeline is straightforward and intuitive, it has to use an additional super-resolution network, which causes inefficiencies during training and testing. Moreover, the recognition accuracy of the second stage heavily depends on the reconstruction quality of the first stage, causing ineffectiveness. In this work, we attempt to address these challenges from a novel perspective: adapting the recognizer to low-resolution inputs by transferring the knowledge from the high-resolution. Guided by this idea, we propose an efficient and effective knowledge distillation framework to achieve multi-level knowledge transfer. Specifically, the visual focus loss is proposed to extract the character position knowledge with resolution gap reduction and character region focus, the semantic contrastive loss is employed to exploit the contextual semantic knowledge with contrastive learning, and the soft logits loss facilitates both local word-level and global sequence-level learning from the soft teacher label. Extensive experiments show that the proposed one-stage pipeline significantly outperforms super-resolution based two-stage frameworks in terms of effectiveness and efficiency, accompanied by favorable robustness. Code is available at https://github.com/csguoh/KD-LTR.
△ Less
Submitted 4 August, 2023;
originally announced August 2023.
-
VideoPro: A Visual Analytics Approach for Interactive Video Programming
Authors:
Jianben He,
Xingbo Wang,
Kam Kwai Wong,
Xijie Huang,
Changjian Chen,
Zixin Chen,
Fengjie Wang,
Min Zhu,
Huamin Qu
Abstract:
Constructing supervised machine learning models for real-world video analysis require substantial labeled data, which is costly to acquire due to scarce domain expertise and laborious manual inspection. While data programming shows promise in generating labeled data at scale with user-defined labeling functions, the high dimensional and complex temporal information in videos poses additional chall…
▽ More
Constructing supervised machine learning models for real-world video analysis require substantial labeled data, which is costly to acquire due to scarce domain expertise and laborious manual inspection. While data programming shows promise in generating labeled data at scale with user-defined labeling functions, the high dimensional and complex temporal information in videos poses additional challenges for effectively composing and evaluating labeling functions. In this paper, we propose VideoPro, a visual analytics approach to support flexible and scalable video data programming for model steering with reduced human effort. We first extract human-understandable events from videos using computer vision techniques and treat them as atomic components of labeling functions. We further propose a two-stage template mining algorithm that characterizes the sequential patterns of these events to serve as labeling function templates for efficient data labeling. The visual interface of VideoPro facilitates multifaceted exploration, examination, and application of the labeling templates, allowing for effective programming of video data at scale. Moreover, users can monitor the impact of programming on model performance and make informed adjustments during the iterative programming process. We demonstrate the efficiency and effectiveness of our approach with two case studies and expert interviews.
△ Less
Submitted 1 August, 2023;
originally announced August 2023.
-
Pulsar timing array observations as possible hints for nonsingular cosmology
Authors:
Mian Zhu,
Gen Ye,
Yong Cai
Abstract:
Recent pulsar timing array (PTA) experiments have reported strong evidence of the stochastic gravitational wave background (SGWB). If interpreted as primordial gravitational waves (GWs), the signal favors a strongly blue-tilted spectrum. Consequently, the nonsingular cosmology, which is able to predict a strongly blue-tilted GW spectrum with $n_T \simeq 2$ on certain scales, offers a potential exp…
▽ More
Recent pulsar timing array (PTA) experiments have reported strong evidence of the stochastic gravitational wave background (SGWB). If interpreted as primordial gravitational waves (GWs), the signal favors a strongly blue-tilted spectrum. Consequently, the nonsingular cosmology, which is able to predict a strongly blue-tilted GW spectrum with $n_T \simeq 2$ on certain scales, offers a potential explanation for the observed SGWB signal. In this paper, we present a Genesis-inflation model capable of explaining the SGWB signal observed by the PTA collaborations while also overcoming the initial singularity problem associated with the inflationary cosmology. Furthermore, our model predicts distinctive features in the SGWB spectrum, which might be examined by forthcoming space-based gravitational wave experiments.
△ Less
Submitted 13 September, 2023; v1 submitted 30 July, 2023;
originally announced July 2023.
-
ClickSeg: 3D Instance Segmentation with Click-Level Weak Annotations
Authors:
Leyao Liu,
Tao Kong,
Minzhao Zhu,
Jiashuo Fan,
Lu Fang
Abstract:
3D instance segmentation methods often require fully-annotated dense labels for training, which are costly to obtain. In this paper, we present ClickSeg, a novel click-level weakly supervised 3D instance segmentation method that requires one point per instance annotation merely. Such a problem is very challenging due to the extremely limited labels, which has rarely been solved before. We first de…
▽ More
3D instance segmentation methods often require fully-annotated dense labels for training, which are costly to obtain. In this paper, we present ClickSeg, a novel click-level weakly supervised 3D instance segmentation method that requires one point per instance annotation merely. Such a problem is very challenging due to the extremely limited labels, which has rarely been solved before. We first develop a baseline weakly-supervised training method, which generates pseudo labels for unlabeled data by the model itself. To utilize the property of click-level annotation setting, we further propose a new training framework. Instead of directly using the model inference way, i.e., mean-shift clustering, to generate the pseudo labels, we propose to use k-means with fixed initial seeds: the annotated points. New similarity metrics are further designed for clustering. Experiments on ScanNetV2 and S3DIS datasets show that the proposed ClickSeg surpasses the previous best weakly supervised instance segmentation result by a large margin (e.g., +9.4% mAP on ScanNetV2). Using 0.02% supervision signals merely, ClickSeg achieves $\sim$90% of the accuracy of the fully-supervised counterpart. Meanwhile, it also achieves state-of-the-art semantic segmentation results among weakly supervised methods that use the same annotation settings.
△ Less
Submitted 18 July, 2023;
originally announced July 2023.
-
Modeling Orders of User Behaviors via Differentiable Sorting: A Multi-task Framework to Predicting User Post-click Conversion
Authors:
Menghan Wang,
**ming Yang,
Yuchen Guo,
Yuming Shen,
Mengying Zhu,
Yanlin Wang
Abstract:
User post-click conversion prediction is of high interest to researchers and developers. Recent studies employ multi-task learning to tackle the selection bias and data sparsity problem, two severe challenges in post-click behavior prediction, by incorporating click data. However, prior works mainly focused on pointwise learning and the orders of labels (i.e., click and post-click) are not well ex…
▽ More
User post-click conversion prediction is of high interest to researchers and developers. Recent studies employ multi-task learning to tackle the selection bias and data sparsity problem, two severe challenges in post-click behavior prediction, by incorporating click data. However, prior works mainly focused on pointwise learning and the orders of labels (i.e., click and post-click) are not well explored, which naturally poses a listwise learning problem. Inspired by recent advances on differentiable sorting, in this paper, we propose a novel multi-task framework that leverages orders of user behaviors to predict user post-click conversion in an end-to-end approach. Specifically, we define an aggregation operator to combine predicted outputs of different tasks to a unified score, then we use the computed scores to model the label relations via differentiable sorting. Extensive experiments on public and industrial datasets show the superiority of our proposed model against competitive baselines.
△ Less
Submitted 18 July, 2023;
originally announced July 2023.
-
PromptMagician: Interactive Prompt Engineering for Text-to-Image Creation
Authors:
Yingchaojie Feng,
Xingbo Wang,
Kam Kwai Wong,
Sijia Wang,
Yuhong Lu,
Minfeng Zhu,
Baicheng Wang,
Wei Chen
Abstract:
Generative text-to-image models have gained great popularity among the public for their powerful capability to generate high-quality images based on natural language prompts. However, develo** effective prompts for desired images can be challenging due to the complexity and ambiguity of natural language. This research proposes PromptMagician, a visual analysis system that helps users explore the…
▽ More
Generative text-to-image models have gained great popularity among the public for their powerful capability to generate high-quality images based on natural language prompts. However, develo** effective prompts for desired images can be challenging due to the complexity and ambiguity of natural language. This research proposes PromptMagician, a visual analysis system that helps users explore the image results and refine the input prompts. The backbone of our system is a prompt recommendation model that takes user prompts as input, retrieves similar prompt-image pairs from DiffusionDB, and identifies special (important and relevant) prompt keywords. To facilitate interactive prompt refinement, PromptMagician introduces a multi-level visualization for the cross-modal embedding of the retrieved images and recommended keywords, and supports users in specifying multiple criteria for personalized exploration. Two usage scenarios, a user study, and expert interviews demonstrate the effectiveness and usability of our system, suggesting it facilitates prompt engineering and improves the creativity support of the generative text-to-image model.
△ Less
Submitted 15 August, 2023; v1 submitted 18 July, 2023;
originally announced July 2023.
-
Quantivine: A Visualization Approach for Large-scale Quantum Circuit Representation and Analysis
Authors:
Zhen Wen,
Yihan Liu,
Siwei Tan,
Jieyi Chen,
Minfeng Zhu,
Dongming Han,
Jianwei Yin,
Mingliang Xu,
Wei Chen
Abstract:
Quantum computing is a rapidly evolving field that enables exponential speed-up over classical algorithms. At the heart of this revolutionary technology are quantum circuits, which serve as vital tools for implementing, analyzing, and optimizing quantum algorithms. Recent advancements in quantum computing and the increasing capability of quantum devices have led to the development of more complex…
▽ More
Quantum computing is a rapidly evolving field that enables exponential speed-up over classical algorithms. At the heart of this revolutionary technology are quantum circuits, which serve as vital tools for implementing, analyzing, and optimizing quantum algorithms. Recent advancements in quantum computing and the increasing capability of quantum devices have led to the development of more complex quantum circuits. However, traditional quantum circuit diagrams suffer from scalability and readability issues, which limit the efficiency of analysis and optimization processes. In this research, we propose a novel visualization approach for large-scale quantum circuits by adopting semantic analysis to facilitate the comprehension of quantum circuits. We first exploit meta-data and semantic information extracted from the underlying code of quantum circuits to create component segmentations and pattern abstractions, allowing for easier wrangling of massive circuit diagrams. We then develop Quantivine, an interactive system for exploring and understanding quantum circuits. A series of novel circuit visualizations are designed to uncover contextual details such as qubit provenance, parallelism, and entanglement. The effectiveness of Quantivine is demonstrated through two usage scenarios of quantum circuits with up to 100 qubits and a formal user evaluation with quantum experts. A free copy of this paper and all supplemental materials are available at https://osf.io/2m9yh/?view_only=0aa1618c97244f5093cd7ce15f1431f9.
△ Less
Submitted 18 July, 2023;
originally announced July 2023.
-
Quantum Circuit AutoEncoder
Authors:
Jun Wu,
Hao Fu,
Mingzheng Zhu,
Haiyue Zhang,
Wei Xie,
Xiang-Yang Li
Abstract:
Quantum autoencoder is a quantum neural network model for compressing information stored in quantum states. However, one needs to process information stored in quantum circuits for many tasks in the emerging quantum information technology. In this work, generalizing the ideas of classical and quantum autoencoder, we introduce the model of Quantum Circuit AutoEncoder (QCAE) to compress and encode i…
▽ More
Quantum autoencoder is a quantum neural network model for compressing information stored in quantum states. However, one needs to process information stored in quantum circuits for many tasks in the emerging quantum information technology. In this work, generalizing the ideas of classical and quantum autoencoder, we introduce the model of Quantum Circuit AutoEncoder (QCAE) to compress and encode information within quantum circuits. We provide a comprehensive protocol for QCAE and design a variational quantum algorithm, varQCAE, for its implementation. We theoretically analyze this model by deriving conditions for lossless compression and establishing both upper and lower bounds on its recovery fidelity. Finally, we apply varQCAE to three practical tasks and numerical results show that it can effectively (1) compress the information within quantum circuits, (2) detect anomalies in quantum circuits, and (3) mitigate the depolarizing noise in quantum devices. This suggests that our algorithm is potentially applicable to other information processing tasks for quantum circuits.
△ Less
Submitted 30 October, 2023; v1 submitted 17 July, 2023;
originally announced July 2023.
-
Generalizable and explainable prediction of potential miRNA-disease associations based on heterogeneous graph learning
Authors:
Yi Zhou,
Meixuan Wu,
Chengzhou Ouyang,
Min Zhu
Abstract:
Biomedical research has revealed the crucial role of miRNAs in the progression of many diseases, and computational prediction methods are increasingly proposed for assisting biological experiments to verify miRNA-disease associations (MDAs). However, the generalizability and explainability are currently underemphasized. It's significant to generalize effective predictions to entities with fewer or…
▽ More
Biomedical research has revealed the crucial role of miRNAs in the progression of many diseases, and computational prediction methods are increasingly proposed for assisting biological experiments to verify miRNA-disease associations (MDAs). However, the generalizability and explainability are currently underemphasized. It's significant to generalize effective predictions to entities with fewer or no existing MDAs and reveal how the prediction scores are derived. In this study, our work contributes to data, model, and result analysis. First, for better formulation of the MDA issue, we integrate multi-source data into a heterogeneous graph with a broader learning and prediction scope, and we split massive verified MDAs into independent training, validation, and test sets as a benchmark. Second, we construct an end-to-end data-driven model that performs node feature encoding, graph structure learning, and binary prediction sequentially, with a heterogeneous graph transformer as the central module. Finally, computational experiments illustrate that our method outperforms existing state-of-the-art methods, achieving better evaluation metrics and alleviating the neglect of unknown miRNAs and diseases effectively. Case studies further demonstrate that we can make reliable MDA detections on diseases without MDA records, and the predictions can be explained in general and case by case.
△ Less
Submitted 27 August, 2023; v1 submitted 16 July, 2023;
originally announced July 2023.
-
Accurate 3D Prediction of Missing Teeth in Diverse Patterns for Precise Dental Implant Planning
Authors:
Lei Ma,
Peng Xue,
Yuning Gu,
Yue Zhao,
Min Zhu,
Zhongxiang Ding,
Dinggang Shen
Abstract:
In recent years, the demand for dental implants has surged, driven by their high success rates and esthetic advantages. However, accurate prediction of missing teeth for precise digital implant planning remains a challenge due to the intricate nature of dental structures and the variability in tooth loss patterns. This study presents a novel framework for accurate prediction of missing teeth in di…
▽ More
In recent years, the demand for dental implants has surged, driven by their high success rates and esthetic advantages. However, accurate prediction of missing teeth for precise digital implant planning remains a challenge due to the intricate nature of dental structures and the variability in tooth loss patterns. This study presents a novel framework for accurate prediction of missing teeth in different patterns, facilitating digital implant planning. The proposed framework begins by estimating point-to-point correspondence among a dataset of dental mesh models reconstructed from CBCT images of healthy subjects. Subsequently, tooth dictionaries are constructed for each tooth type, encoding their position and shape information based on the established point-to-point correspondence. To predict missing teeth in a given dental mesh model, sparse coefficients are learned by sparsely representing adjacent teeth of the missing teeth using the corresponding tooth dictionaries. These coefficients are then applied to the dictionaries of the missing teeth to generate accurate predictions of their positions and shapes. The evaluation results on real subjects shows that our proposed framework achieves an average prediction error of 1.04mm for predictions of single missing tooth and an average prediction error of 1.33mm for the prediction of 14 missing teeth, which demonstrates its capability of accurately predicting missing teeth in various patterns. By accurately predicting missing teeth, dental professionals can improve the planning and placement of dental implants, leading to better esthetic and functional outcomes for patients undergoing dental implant procedures.
△ Less
Submitted 16 July, 2023;
originally announced July 2023.
-
Secrets of RLHF in Large Language Models Part I: PPO
Authors:
Rui Zheng,
Shihan Dou,
Songyang Gao,
Yuan Hua,
Wei Shen,
Binghai Wang,
Yan Liu,
Senjie **,
Qin Liu,
Yuhao Zhou,
Limao Xiong,
Lu Chen,
Zhiheng Xi,
Nuo Xu,
Wenbin Lai,
Minghao Zhu,
Cheng Chang,
Zhangyue Yin,
Rongxiang Weng,
Wensen Cheng,
Haoran Huang,
Tianxiang Sun,
Hang Yan,
Tao Gui,
Qi Zhang
, et al. (2 additional authors not shown)
Abstract:
Large language models (LLMs) have formulated a blueprint for the advancement of artificial general intelligence. Its primary objective is to function as a human-centric (helpful, honest, and harmless) assistant. Alignment with humans assumes paramount significance, and reinforcement learning with human feedback (RLHF) emerges as the pivotal technological paradigm underpinning this pursuit. Current…
▽ More
Large language models (LLMs) have formulated a blueprint for the advancement of artificial general intelligence. Its primary objective is to function as a human-centric (helpful, honest, and harmless) assistant. Alignment with humans assumes paramount significance, and reinforcement learning with human feedback (RLHF) emerges as the pivotal technological paradigm underpinning this pursuit. Current technical routes usually include \textbf{reward models} to measure human preferences, \textbf{Proximal Policy Optimization} (PPO) to optimize policy model outputs, and \textbf{process supervision} to improve step-by-step reasoning capabilities. However, due to the challenges of reward design, environment interaction, and agent training, coupled with huge trial and error cost of large language models, there is a significant barrier for AI researchers to motivate the development of technical alignment and safe landing of LLMs. The stable training of RLHF has still been a puzzle. In the first report, we dissect the framework of RLHF, re-evaluate the inner workings of PPO, and explore how the parts comprising PPO algorithms impact policy agent training. We identify policy constraints being the key factor for the effective implementation of the PPO algorithm. Therefore, we explore the PPO-max, an advanced version of PPO algorithm, to efficiently improve the training stability of the policy model. Based on our main results, we perform a comprehensive analysis of RLHF abilities compared with SFT models and ChatGPT. The absence of open-source implementations has posed significant challenges to the investigation of LLMs alignment. Therefore, we are eager to release technical reports, reward models and PPO codes, aiming to make modest contributions to the advancement of LLMs.
△ Less
Submitted 18 July, 2023; v1 submitted 10 July, 2023;
originally announced July 2023.
-
Revisiting Cross-Lingual Summarization: A Corpus-based Study and A New Benchmark with Improved Annotation
Authors:
Yulong Chen,
Huajian Zhang,
Yijie Zhou,
Xuefeng Bai,
Yueguan Wang,
Ming Zhong,
Jianhao Yan,
Yafu Li,
Judy Li,
Michael Zhu,
Yue Zhang
Abstract:
Most existing cross-lingual summarization (CLS) work constructs CLS corpora by simply and directly translating pre-annotated summaries from one language to another, which can contain errors from both summarization and translation processes. To address this issue, we propose ConvSumX, a cross-lingual conversation summarization benchmark, through a new annotation schema that explicitly considers sou…
▽ More
Most existing cross-lingual summarization (CLS) work constructs CLS corpora by simply and directly translating pre-annotated summaries from one language to another, which can contain errors from both summarization and translation processes. To address this issue, we propose ConvSumX, a cross-lingual conversation summarization benchmark, through a new annotation schema that explicitly considers source input context. ConvSumX consists of 2 sub-tasks under different real-world scenarios, with each covering 3 language directions. We conduct thorough analysis on ConvSumX and 3 widely-used manually annotated CLS corpora and empirically find that ConvSumX is more faithful towards input text. Additionally, based on the same intuition, we propose a 2-Step method, which takes both conversation and summary as input to simulate human annotation process. Experimental results show that 2-Step method surpasses strong baselines on ConvSumX under both automatic and human evaluation. Analysis shows that both source input text and summary are crucial for modeling cross-lingual summaries.
△ Less
Submitted 8 July, 2023;
originally announced July 2023.
-
A direct approach to sharp Li-Yau Estimates on closed manifolds with negative Ricci lower bound
Authors:
Xingyu Song,
Ling Wu,
Meng Zhu
Abstract:
Recently, Qi S.Zhang [26] has derived a sharp Li-Yau estimate for positive solutions of the heat equation on closed Riemannian manifolds with the Ricci curvature bounded below by a negative constant. The proof is based on an integral iteration argument which utilizes Hamilton's gradient estimate, heat kernel Gaussian bounds and parabolic Harnack inequality.
In this paper, we show that the sharp…
▽ More
Recently, Qi S.Zhang [26] has derived a sharp Li-Yau estimate for positive solutions of the heat equation on closed Riemannian manifolds with the Ricci curvature bounded below by a negative constant. The proof is based on an integral iteration argument which utilizes Hamilton's gradient estimate, heat kernel Gaussian bounds and parabolic Harnack inequality.
In this paper, we show that the sharp Li-Yau estimate can actually be obtained directly following the classical maximum principle argument, which simplifies the proof in [26]. In addition, we apply the same idea to the heat and conjugate heat equations under the Ricci flow and prove some Li-Yau type estimates with optimal coefficients.
△ Less
Submitted 24 August, 2023; v1 submitted 7 July, 2023;
originally announced July 2023.
-
SegNetr: Rethinking the local-global interactions and skip connections in U-shaped networks
Authors:
Junlong Cheng,
Chengrui Gao,
Fengjie Wang,
Min Zhu
Abstract:
Recently, U-shaped networks have dominated the field of medical image segmentation due to their simple and easily tuned structure. However, existing U-shaped segmentation networks: 1) mostly focus on designing complex self-attention modules to compensate for the lack of long-term dependence based on convolution operation, which increases the overall number of parameters and computational complexit…
▽ More
Recently, U-shaped networks have dominated the field of medical image segmentation due to their simple and easily tuned structure. However, existing U-shaped segmentation networks: 1) mostly focus on designing complex self-attention modules to compensate for the lack of long-term dependence based on convolution operation, which increases the overall number of parameters and computational complexity of the network; 2) simply fuse the features of encoder and decoder, ignoring the connection between their spatial locations. In this paper, we rethink the above problem and build a lightweight medical image segmentation network, called SegNetr. Specifically, we introduce a novel SegNetr block that can perform local-global interactions dynamically at any stage and with only linear complexity. At the same time, we design a general information retention skip connection (IRSC) to preserve the spatial location information of encoder features and achieve accurate fusion with the decoder features. We validate the effectiveness of SegNetr on four mainstream medical image segmentation datasets, with 59\% and 76\% fewer parameters and GFLOPs than vanilla U-Net, while achieving segmentation performance comparable to state-of-the-art methods. Notably, the components proposed in this paper can be applied to other U-shaped networks to improve their segmentation performance.
△ Less
Submitted 21 July, 2023; v1 submitted 6 July, 2023;
originally announced July 2023.
-
Learning Evacuee Models from Robot-Guided Emergency Evacuation Experiments
Authors:
Mollik Nayyar,
Ghanghoon Paik,
Zhenyuan Yuan,
Tongjia Zheng,
Minghui Zhu,
Hai Lin,
Alan R. Wagner
Abstract:
Recent research has examined the possibility of using robots to guide evacuees to safe exits during emergencies. Yet, there are many factors that can impact a person's decision to follow a robot. Being able to model how an evacuee follows an emergency robot guide could be crucial for designing robots that effectively guide evacuees during an emergency. This paper presents a method for develo** r…
▽ More
Recent research has examined the possibility of using robots to guide evacuees to safe exits during emergencies. Yet, there are many factors that can impact a person's decision to follow a robot. Being able to model how an evacuee follows an emergency robot guide could be crucial for designing robots that effectively guide evacuees during an emergency. This paper presents a method for develo** realistic and predictive human evacuee models from physical human evacuation experiments. The paper analyzes the behavior of 14 human subjects during physical robot-guided evacuation. We then use the video data to create evacuee motion models that predict the person's future positions during the emergency. Finally, we validate the resulting models by running a k-fold cross-validation on the data collected during physical human subject experiments. We also present performance results of the model using data from a similar simulated emergency evacuation experiment demonstrating that these models can serve as a tool to predict evacuee behavior in novel evacuation simulations.
△ Less
Submitted 30 June, 2023;
originally announced June 2023.
-
Microelectronic Morphogenesis: Progress towards Artificial Organisms
Authors:
John S. McCaskill,
Daniil Karnaushenko,
Minshen Zhu,
Oliver G. Schmidt
Abstract:
Microelectronic morphogenesis is the creation and maintenance of complex functional structures by microelectronic information within shape-changing materials. Only recently has in-built information technology begun to be used to reshape materials and their functions in three dimensions to form smart microdevices and microrobots. Electronic information that controls morphology is inheritable like i…
▽ More
Microelectronic morphogenesis is the creation and maintenance of complex functional structures by microelectronic information within shape-changing materials. Only recently has in-built information technology begun to be used to reshape materials and their functions in three dimensions to form smart microdevices and microrobots. Electronic information that controls morphology is inheritable like its biological counterpart, genetic information, and is set to open new vistas of technology leading to artificial organisms when coupled with modular design and self-assembly that can make reversible microscopic electrical connections. Three core capabilities of cells in organisms, self-maintenance (homeostatic metabolism utilizing free energy), self-containment (distinguishing self from non-self), and self-reproduction (cell division with inherited properties), once well out of reach for technology, are now within the grasp of information-directed materials. Construction-aware electronics can be used to proof-read and initiate game-changing error correction in microelectronic self-assembly. Furthermore, non-contact communication and electronically supported learning enable one to implement guided self-assembly and enhance functionality. This article reviews the fundamental breakthroughs that have opened the pathway to this prospective path, analyzes the extent and way in which the core properties of life can be addressed and discusses the potential and indeed necessity of such technology for sustainable high technology in society.
△ Less
Submitted 3 July, 2023; v1 submitted 29 June, 2023;
originally announced June 2023.
-
Neural Polarizer: A Lightweight and Effective Backdoor Defense via Purifying Poisoned Features
Authors:
Mingli Zhu,
Shaokui Wei,
Hongyuan Zha,
Baoyuan Wu
Abstract:
Recent studies have demonstrated the susceptibility of deep neural networks to backdoor attacks. Given a backdoored model, its prediction of a poisoned sample with trigger will be dominated by the trigger information, though trigger information and benign information coexist. Inspired by the mechanism of the optical polarizer that a polarizer could pass light waves with particular polarizations wh…
▽ More
Recent studies have demonstrated the susceptibility of deep neural networks to backdoor attacks. Given a backdoored model, its prediction of a poisoned sample with trigger will be dominated by the trigger information, though trigger information and benign information coexist. Inspired by the mechanism of the optical polarizer that a polarizer could pass light waves with particular polarizations while filtering light waves with other polarizations, we propose a novel backdoor defense method by inserting a learnable neural polarizer into the backdoored model as an intermediate layer, in order to purify the poisoned sample via filtering trigger information while maintaining benign information. The neural polarizer is instantiated as one lightweight linear transformation layer, which is learned through solving a well designed bi-level optimization problem, based on a limited clean dataset. Compared to other fine-tuning-based defense methods which often adjust all parameters of the backdoored model, the proposed method only needs to learn one additional layer, such that it is more efficient and requires less clean data. Extensive experiments demonstrate the effectiveness and efficiency of our method in removing backdoors across various neural network architectures and datasets, especially in the case of very limited clean data.
△ Less
Submitted 29 June, 2023;
originally announced June 2023.
-
Computational Study of Rarefied Gas Flow and Heat Transfer in Lid-driven Cylindrical Cavities
Authors:
Mengbo Zhu,
Ehsan Roohi,
Amin Ebrahimi
Abstract:
The gas flow characteristics in lid-driven cavities are influenced by several factors, such as cavity geometry, gas properties, and boundary conditions. In this study, the physics of heat and gas flow in cylindrical lid-driven cavities with various cross-sections, including fully or partially rounded edges, is investigated through numerical simulations using the direct simulation Monte Carlo (DSMC…
▽ More
The gas flow characteristics in lid-driven cavities are influenced by several factors, such as cavity geometry, gas properties, and boundary conditions. In this study, the physics of heat and gas flow in cylindrical lid-driven cavities with various cross-sections, including fully or partially rounded edges, is investigated through numerical simulations using the direct simulation Monte Carlo (DSMC) and the discrete unified gas kinetic scheme (DUGKS) methods. The thermal and fluid flow fields are systematically studied for both constant and oscillatory lid velocities, for various degrees of gas rarefaction ranging from the slip to the free-molecular regimes. The impact of expansion cooling and viscous dissipation on the thermal and flow fields, as well as the occurrence of counter-gradient heat transfer (also known as anti-Fourier heat transfer) under non-equilibrium conditions, are explained based on the results obtained from numerical simulations. Furthermore, the influence of the incomplete tangential accommodation coefficient on the thermal and fluid flow fields is discussed. A comparison is made between the thermal and fluid flow fields predicted in cylindrical cavities and those in square-shaped cavities. The present work contributes to the advancement of micro/nano-electromechanical systems (MEMS/NEMS) by providing valuable insights into rarefied gas flow and heat transfer in lid-driven cavities.
△ Less
Submitted 16 May, 2023;
originally announced June 2023.
-
Heat kernel estimate for the Laplace-Beltrami operator under Bakry-Émery Ricci curvature condition and applications
Authors:
Xingyu Song,
Ling Wu,
Meng Zhu
Abstract:
We establish a Gaussian upper bound of the heat kernel for the Laplace-Beltrami operator on complete Riemannian manifolds with Bakry-Émery Ricci curvature bounded below. As applications, we first prove an L^1-Liouville property for non-negative subharmonic functions when the potential function of the Bakry-Émery Ricci curvature tensor is of at most quadratic growth. Then we derive lower bounds of…
▽ More
We establish a Gaussian upper bound of the heat kernel for the Laplace-Beltrami operator on complete Riemannian manifolds with Bakry-Émery Ricci curvature bounded below. As applications, we first prove an L^1-Liouville property for non-negative subharmonic functions when the potential function of the Bakry-Émery Ricci curvature tensor is of at most quadratic growth. Then we derive lower bounds of the eigenvalues of the Laplace-Beltrami operator on closed manifolds. An upper bound of the bottom spectrum is also obtained.
△ Less
Submitted 26 June, 2023; v1 submitted 22 June, 2023;
originally announced June 2023.
-
$\mathbf{\mathbb{E}^{FWI}}$: Multi-parameter Benchmark Datasets for Elastic Full Waveform Inversion of Geophysical Properties
Authors:
Shihang Feng,
Hanchen Wang,
Chengyuan Deng,
Yinan Feng,
Yanhua Liu,
Min Zhu,
Peng **,
Yinpeng Chen,
Youzuo Lin
Abstract:
Elastic geophysical properties (such as P- and S-wave velocities) are of great importance to various subsurface applications like CO$_2$ sequestration and energy exploration (e.g., hydrogen and geothermal). Elastic full waveform inversion (FWI) is widely applied for characterizing reservoir properties. In this paper, we introduce $\mathbf{\mathbb{E}^{FWI}}$, a comprehensive benchmark dataset that…
▽ More
Elastic geophysical properties (such as P- and S-wave velocities) are of great importance to various subsurface applications like CO$_2$ sequestration and energy exploration (e.g., hydrogen and geothermal). Elastic full waveform inversion (FWI) is widely applied for characterizing reservoir properties. In this paper, we introduce $\mathbf{\mathbb{E}^{FWI}}$, a comprehensive benchmark dataset that is specifically designed for elastic FWI. $\mathbf{\mathbb{E}^{FWI}}$ encompasses 8 distinct datasets that cover diverse subsurface geologic structures (flat, curve, faults, etc). The benchmark results produced by three different deep learning methods are provided. In contrast to our previously presented dataset (pressure recordings) for acoustic FWI (referred to as OpenFWI), the seismic dataset in $\mathbf{\mathbb{E}^{FWI}}$ has both vertical and horizontal components. Moreover, the velocity maps in $\mathbf{\mathbb{E}^{FWI}}$ incorporate both P- and S-wave velocities. While the multicomponent data and the added S-wave velocity make the data more realistic, more challenges are introduced regarding the convergence and computational cost of the inversion. We conduct comprehensive numerical experiments to explore the relationship between P-wave and S-wave velocities in seismic data. The relation between P- and S-wave velocities provides crucial insights into the subsurface properties such as lithology, porosity, fluid content, etc. We anticipate that $\mathbf{\mathbb{E}^{FWI}}$ will facilitate future research on multiparameter inversions and stimulate endeavors in several critical research topics of carbon-zero and new energy exploration. All datasets, codes and relevant information can be accessed through our website at https://efwi-lanl.github.io/
△ Less
Submitted 7 September, 2023; v1 submitted 21 June, 2023;
originally announced June 2023.
-
GenImage: A Million-Scale Benchmark for Detecting AI-Generated Image
Authors:
Mingjian Zhu,
Hanting Chen,
Qiangyu Yan,
Xudong Huang,
Guanyu Lin,
Wei Li,
Zhijun Tu,
Hailin Hu,
Jie Hu,
Yunhe Wang
Abstract:
The extraordinary ability of generative models to generate photographic images has intensified concerns about the spread of disinformation, thereby leading to the demand for detectors capable of distinguishing between AI-generated fake images and real images. However, the lack of large datasets containing images from the most advanced image generators poses an obstacle to the development of such d…
▽ More
The extraordinary ability of generative models to generate photographic images has intensified concerns about the spread of disinformation, thereby leading to the demand for detectors capable of distinguishing between AI-generated fake images and real images. However, the lack of large datasets containing images from the most advanced image generators poses an obstacle to the development of such detectors. In this paper, we introduce the GenImage dataset, which has the following advantages: 1) Plenty of Images, including over one million pairs of AI-generated fake images and collected real images. 2) Rich Image Content, encompassing a broad range of image classes. 3) State-of-the-art Generators, synthesizing images with advanced diffusion models and GANs. The aforementioned advantages allow the detectors trained on GenImage to undergo a thorough evaluation and demonstrate strong applicability to diverse images. We conduct a comprehensive analysis of the dataset and propose two tasks for evaluating the detection method in resembling real-world scenarios. The cross-generator image classification task measures the performance of a detector trained on one generator when tested on the others. The degraded image classification task assesses the capability of the detectors in handling degraded images such as low-resolution, blurred, and compressed images. With the GenImage dataset, researchers can effectively expedite the development and evaluation of superior AI-generated image detectors in comparison to prevailing methodologies.
△ Less
Submitted 24 June, 2023; v1 submitted 14 June, 2023;
originally announced June 2023.
-
SAGE-NDVI: A Stereotype-Breaking Evaluation Metric for Remote Sensing Image Dehazing Using Satellite-to-Ground NDVI Knowledge
Authors:
Zepeng Liu,
Zhicheng Yang,
Mingye Zhu,
Andy Wong,
Yibing Wei,
Mei Han,
Jun Yu,
Jui-Hsin Lai
Abstract:
Image dehazing is a meaningful low-level computer vision task and can be applied to a variety of contexts. In our industrial deployment scenario based on remote sensing (RS) images, the quality of image dehazing directly affects the grade of our crop identification and growth monitoring products. However, the widely used peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM) prov…
▽ More
Image dehazing is a meaningful low-level computer vision task and can be applied to a variety of contexts. In our industrial deployment scenario based on remote sensing (RS) images, the quality of image dehazing directly affects the grade of our crop identification and growth monitoring products. However, the widely used peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM) provide ambiguous visual interpretation. In this paper, we design a new objective metric for RS image dehazing evaluation. Our proposed metric leverages a ground-based phenology observation resource to calculate the vegetation index error between RS and ground images at a hazy date. Extensive experiments validate that our metric appropriately evaluates different dehazing models and is in line with human visual perception.
△ Less
Submitted 9 June, 2023;
originally announced June 2023.
-
FollowNet: A Comprehensive Benchmark for Car-Following Behavior Modeling
Authors:
Xianda Chen,
Meixin Zhu,
Kehua Chen,
Pengqin Wang,
Hongliang Lu,
Hui Zhong,
Xu Han,
Yinhai Wang
Abstract:
Car-following is a control process in which a following vehicle (FV) adjusts its acceleration to keep a safe distance from the lead vehicle (LV). Recently, there has been a booming of data-driven models that enable more accurate modeling of car-following through real-world driving datasets. Although there are several public datasets available, their formats are not always consistent, making it cha…
▽ More
Car-following is a control process in which a following vehicle (FV) adjusts its acceleration to keep a safe distance from the lead vehicle (LV). Recently, there has been a booming of data-driven models that enable more accurate modeling of car-following through real-world driving datasets. Although there are several public datasets available, their formats are not always consistent, making it challenging to determine the state-of-the-art models and how well a new model performs compared to existing ones. In contrast, research fields such as image recognition and object detection have benchmark datasets like ImageNet, Microsoft COCO, and KITTI. To address this gap and promote the development of microscopic traffic flow modeling, we establish a public benchmark dataset for car-following behavior modeling. The benchmark consists of more than 80K car-following events extracted from five public driving datasets using the same criteria. These events cover diverse situations including different road types, various weather conditions, and mixed traffic flows with autonomous vehicles. Moreover, to give an overview of current progress in car-following modeling, we implemented and tested representative baseline models with the benchmark. Results show that the deep deterministic policy gradient (DDPG) based model performs competitively with a lower MSE for spacing compared to traditional intelligent driver model (IDM) and Gazis-Herman-Rothery (GHR) models, and a smaller collision rate compared to fully connected neural network (NN) and long short-term memory (LSTM) models in most datasets. The established benchmark will provide researchers with consistent data formats and metrics for cross-comparing different car-following models, promoting the development of more accurate models. We open-source our dataset and implementation code in https://github.com/HKUST-DRIVE-AI-LAB/FollowNet.
△ Less
Submitted 25 May, 2023;
originally announced June 2023.
-
FAST reveals new evidence for M94 as a merger
Authors:
Ruilei Zhou,
Ming Zhu,
Yanbin Yang,
Haiyang Yu,
Lixia Yuan,
Peng Jiang,
Wenzhe Xi
Abstract:
We report the first high-sensitivity HI observation toward the spiral galaxy M94 with the Five-hundred-meter Aperture Spherical radio Telescope (FAST). From these observations, we discovered that M94 has a very extended HI disk, twice larger than that observed by THINGS, which is accompanied by an HI filament and seven HVCs (high velocity clouds) at different distances. The projected distances of…
▽ More
We report the first high-sensitivity HI observation toward the spiral galaxy M94 with the Five-hundred-meter Aperture Spherical radio Telescope (FAST). From these observations, we discovered that M94 has a very extended HI disk, twice larger than that observed by THINGS, which is accompanied by an HI filament and seven HVCs (high velocity clouds) at different distances. The projected distances of these clouds and filament are less than 50 kpc from the galactic center. We measured a total integrated flux (including all clouds/filament) of 127.3 ($\pm$1) Jy km s$^{-1}$, corresponding to a H I mass of (6.51$\pm$0.06)$\times$10$^{8}$M$_{\odot}$, which is 63.0% more than that observed by THINGS. By comparing numerical simulations with the HI maps and the optical morphology of M94, we suggest that M94 is likely a remnant of a major merger of two galaxies, and the HVCs and HI filament could be the tidal features originated from the first collision of the merger happened about 5 Gyr ago. Furthermore, we found a seemingly isolated HI cloud at a projection distance of 109 kpc without any optical counterpart detected. We discussed the possibilities of the origin of this cloud, such as dark dwarf galaxy and RELHIC (REionization-Limited HI Cloud). Our results demonstrate that high-sensitivity and wide-field HI imaging is important in revealing the diffuse cold gas structures and tidal debris which is crucial to understanding the dynamical evolution of galaxies.
△ Less
Submitted 8 June, 2023;
originally announced June 2023.
-
Set-to-Sequence Ranking-based Concept-aware Learning Path Recommendation
Authors:
Xianyu Chen,
Jian Shen,
Wei Xia,
Jiarui **,
Yakun Song,
Weinan Zhang,
Weiwen Liu,
Menghui Zhu,
Ruiming Tang,
Kai Dong,
Dingyin Xia,
Yong Yu
Abstract:
With the development of the online education system, personalized education recommendation has played an essential role. In this paper, we focus on develo** path recommendation systems that aim to generating and recommending an entire learning path to the given user in each session. Noticing that existing approaches fail to consider the correlations of concepts in the path, we propose a novel fr…
▽ More
With the development of the online education system, personalized education recommendation has played an essential role. In this paper, we focus on develo** path recommendation systems that aim to generating and recommending an entire learning path to the given user in each session. Noticing that existing approaches fail to consider the correlations of concepts in the path, we propose a novel framework named Set-to-Sequence Ranking-based Concept-aware Learning Path Recommendation (SRC), which formulates the recommendation task under a set-to-sequence paradigm. Specifically, we first design a concept-aware encoder module which can capture the correlations among the input learning concepts. The outputs are then fed into a decoder module that sequentially generates a path through an attention mechanism that handles correlations between the learning and target concepts. Our recommendation policy is optimized by policy gradient. In addition, we also introduce an auxiliary module based on knowledge tracing to enhance the model's stability by evaluating students' learning effects on learning concepts. We conduct extensive experiments on two real-world public datasets and one industrial dataset, and the experimental results demonstrate the superiority and effectiveness of SRC. Code will be available at https://gitee.com/mindspore/models/tree/master/research/recommend/SRC.
△ Less
Submitted 7 June, 2023;
originally announced June 2023.