Search | arXiv e-print repository

PCRDiffusion: Diffusion Probabilistic Models for Point Cloud Registration

Authors: Yue Wu, Yongzhe Yuan, Xiaolong Fan, Xiaoshui Huang, Maoguo Gong, Qiguang Miao

Abstract: We propose a new framework that formulates point cloud registration as a denoising diffusion process from noisy transformation to object transformation. During training stage, object transformation diffuses from ground-truth transformation to random distribution, and the model learns to reverse this noising process. In sampling stage, the model refines randomly generated transformation to the outp… ▽ More We propose a new framework that formulates point cloud registration as a denoising diffusion process from noisy transformation to object transformation. During training stage, object transformation diffuses from ground-truth transformation to random distribution, and the model learns to reverse this noising process. In sampling stage, the model refines randomly generated transformation to the output result in a progressive way. We derive the variational bound in closed form for training and provide implementations of the model. Our work provides the following crucial findings: (i) In contrast to most existing methods, our framework, Diffusion Probabilistic Models for Point Cloud Registration (PCRDiffusion) does not require repeatedly update source point cloud to refine the predicted transformation. (ii) Point cloud registration, one of the representative discriminative tasks, can be solved by a generative way and the unified probabilistic formulation. Finally, we discuss and provide an outlook on the application of diffusion model in different scenarios for point cloud registration. Experimental results demonstrate that our model achieves competitive performance in point cloud registration. In correspondence-free and correspondence-based scenarios, PCRDifussion can both achieve exceeding 50\% performance improvements. △ Less

Submitted 10 December, 2023; originally announced December 2023.

arXiv:2312.04333 [pdf, other]

Is Bigger and Deeper Always Better? Probing LLaMA Across Scales and Layers

Authors: Nuo Chen, Ning Wu, Shining Liang, Ming Gong, Linjun Shou, Dongmei Zhang, Jia Li

Abstract: This paper presents an in-depth analysis of Large Language Models (LLMs), focusing on LLaMA, a prominent open-source foundational model in natural language processing. Instead of assessing LLaMA through its generative output, we design multiple-choice tasks to probe its intrinsic understanding in high-order tasks such as reasoning and computation. We examine the model horizontally, comparing diffe… ▽ More This paper presents an in-depth analysis of Large Language Models (LLMs), focusing on LLaMA, a prominent open-source foundational model in natural language processing. Instead of assessing LLaMA through its generative output, we design multiple-choice tasks to probe its intrinsic understanding in high-order tasks such as reasoning and computation. We examine the model horizontally, comparing different sizes, and vertically, assessing different layers. We unveil several key and uncommon findings based on the designed probing tasks: (1) Horizontally, enlarging model sizes almost could not automatically impart additional knowledge or computational prowess. Instead, it can enhance reasoning abilities, especially in math problem solving, and helps reduce hallucinations, but only beyond certain size thresholds; (2) In vertical analysis, the lower layers of LLaMA lack substantial arithmetic and factual knowledge, showcasing logical thinking, multilingual and recognitive abilities, with top layers housing most computational power and real-world knowledge. △ Less

Submitted 9 January, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

Comments: 15 pages

arXiv:2311.09233 [pdf, other]

Neural Packing: from Visual Sensing to Reinforcement Learning

Authors: Juzhan Xu, Minglun Gong, Hao Zhang, Hui Huang, Ruizhen Hu

Abstract: We present a novel learning framework to solve the transport-and-packing (TAP) problem in 3D. It constitutes a full solution pipeline from partial observations of input objects via RGBD sensing and recognition to final box placement, via robotic motion planning, to arrive at a compact packing in a target container. The technical core of our method is a neural network for TAP, trained via reinforce… ▽ More We present a novel learning framework to solve the transport-and-packing (TAP) problem in 3D. It constitutes a full solution pipeline from partial observations of input objects via RGBD sensing and recognition to final box placement, via robotic motion planning, to arrive at a compact packing in a target container. The technical core of our method is a neural network for TAP, trained via reinforcement learning (RL), to solve the NP-hard combinatorial optimization problem. Our network simultaneously selects an object to pack and determines the final packing location, based on a judicious encoding of the continuously evolving states of partially observed source objects and available spaces in the target container, using separate encoders both enabled with attention mechanisms. The encoded feature vectors are employed to compute the matching scores and feasibility masks of different pairings of box selection and available space configuration for packing strategy optimization. Extensive experiments, including ablation studies and physical packing execution by a real robot (Universal Robot UR5e), are conducted to evaluate our method in terms of its design choices, scalability, generalizability, and comparisons to baselines, including the most recent RL-based TAP solution. We also contribute the first benchmark for TAP which covers a variety of input settings and difficulty levels. △ Less

Submitted 16 October, 2023; originally announced November 2023.

arXiv:2311.08643 [pdf, other]

Theory of mobility edge and non-ergodic extended phase in coupled random matrices

Authors: Xiaoshui Lin, Guang-Can Guo, Ming Gong

Abstract: The mobility edge, as a central concept in disordered models for localization-delocalization transitions, has rarely been discussed in the context of random matrix theory (RMT). Here we report a new class of random matrix model by direct coupling between two random matrices, showing that their overlapped spectra and un-overlapped spectra exhibit totally different scaling behaviors, which can be us… ▽ More The mobility edge, as a central concept in disordered models for localization-delocalization transitions, has rarely been discussed in the context of random matrix theory (RMT). Here we report a new class of random matrix model by direct coupling between two random matrices, showing that their overlapped spectra and un-overlapped spectra exhibit totally different scaling behaviors, which can be used to construct tunable mobility edges. This model is a direct generalization of the Rosenzweig-Porter model, which hosts ergodic, localized, and non-ergodic extended (NEE) phases. A generic theory for these phase transitions is presented, which applies equally well to dense, sparse, and even corrected random matrices in different ensembles. We show that the phase diagram is fully characterized by two scaling exponents, and they are mapped out in various conditions. Our model provides a general framework to realize the mobility edges and non-ergodic phases in a controllable way in RMT, which pave avenue for many intriguing applications both from the pure mathematics of RMT and the possible implementations of ME in many-body models, chiral symmetry breaking in QCD and the stability of the large ecosystems. △ Less

Submitted 14 November, 2023; originally announced November 2023.

Comments: 7+10 pages, 5+7 figures

arXiv:2311.06920 [pdf, other]

Quantumness and quantum to classical transition in the generalized Rabi model

Authors: Wei-Feng Zhuang, Yun-Tong Yang, Hong-Gang Luo, Ming Gong, Guang-Can Guo

Abstract: The quantum to classical transition (QCT) is one of the central mysteries in quantum physics. This process is generally interpreted as state collapse from measurement or decoherence from interacting with the environment. Here we define the quantumness of a Hamiltonian by the free energy difference between its quantum and classical descriptions, which vanishes during QCT. We apply this criterion to… ▽ More The quantum to classical transition (QCT) is one of the central mysteries in quantum physics. This process is generally interpreted as state collapse from measurement or decoherence from interacting with the environment. Here we define the quantumness of a Hamiltonian by the free energy difference between its quantum and classical descriptions, which vanishes during QCT. We apply this criterion to the many-body Rabi model and study its scaling law across the phase transition, finding that not only the temperature and Planck constant, but also all the model parameters are important for this transition. We show that the Jaynes-Cummings and anti Jaynes-Cummings models exhibit greater quantumness than the Rabi model. Moreover, we show that the rotating wave and anti-rotating wave terms in this model have opposite quantumness in QCT. We demonstrate that the quantumness may be enhanced or suppressed at the critical point. Finally, we estimate the quantumness of the Rabi model in current trapped ion experiments. The quantumness provides an important tool to characterize the QCT in a vast number of many-body models. △ Less

Submitted 12 November, 2023; originally announced November 2023.

Comments: 6 pages, 5 figures

arXiv:2311.03253 [pdf, other]

Coherent Entity Disambiguation via Modeling Topic and Categorical Dependency

Authors: Zilin Xiao, Linjun Shou, Xingyao Zhang, Jie Wu, Ming Gong, Jian Pei, Daxin Jiang

Abstract: Previous entity disambiguation (ED) methods adopt a discriminative paradigm, where prediction is made based on matching scores between mention context and candidate entities using length-limited encoders. However, these methods often struggle to capture explicit discourse-level dependencies, resulting in incoherent predictions at the abstract level (e.g. topic or category). We propose CoherentED,… ▽ More Previous entity disambiguation (ED) methods adopt a discriminative paradigm, where prediction is made based on matching scores between mention context and candidate entities using length-limited encoders. However, these methods often struggle to capture explicit discourse-level dependencies, resulting in incoherent predictions at the abstract level (e.g. topic or category). We propose CoherentED, an ED system equipped with novel designs aimed at enhancing the coherence of entity predictions. Our method first introduces an unsupervised variational autoencoder (VAE) to extract latent topic vectors of context sentences. This approach not only allows the encoder to handle longer documents more effectively, conserves valuable input space, but also keeps a topic-level coherence. Additionally, we incorporate an external category memory, enabling the system to retrieve relevant categories for undecided mentions. By employing step-by-step entity decisions, this design facilitates the modeling of entity-entity interactions, thereby maintaining maximum coherence at the category level. We achieve new state-of-the-art results on popular ED benchmarks, with an average improvement of 1.3 F1 points. Our model demonstrates particularly outstanding performance on challenging long-text scenarios. △ Less

Submitted 6 November, 2023; originally announced November 2023.

Comments: Accepted to EMNLP 2023 Findings

arXiv:2311.03250 [pdf, other]

Instructed Language Models with Retrievers Are Powerful Entity Linkers

Authors: Zilin Xiao, Ming Gong, Jie Wu, Xingyao Zhang, Linjun Shou, Jian Pei, Daxin Jiang

Abstract: Generative approaches powered by large language models (LLMs) have demonstrated emergent abilities in tasks that require complex reasoning abilities. Yet the generative nature still makes the generated content suffer from hallucinations, thus unsuitable for entity-centric tasks like entity linking (EL) requiring precise entity predictions over a large knowledge base. We present Instructed Generati… ▽ More Generative approaches powered by large language models (LLMs) have demonstrated emergent abilities in tasks that require complex reasoning abilities. Yet the generative nature still makes the generated content suffer from hallucinations, thus unsuitable for entity-centric tasks like entity linking (EL) requiring precise entity predictions over a large knowledge base. We present Instructed Generative Entity Linker (INSGENEL), the first approach that enables casual language models to perform entity linking over knowledge bases. Several methods to equip language models with EL capability were proposed in this work, including (i) a sequence-to-sequence training EL objective with instruction-tuning, (ii) a novel generative EL framework based on a light-weight potential mention retriever that frees the model from heavy and non-parallelizable decoding, achieving 4$\times$ speedup without compromise on linking metrics. INSGENEL outperforms previous generative alternatives with +6.8 F1 points gain on average, also with a huge advantage in training data efficiency and training compute consumption. In addition, our skillfully engineered in-context learning (ICL) framework for EL still lags behind INSGENEL significantly, reaffirming that the EL task remains a persistent hurdle for general LLMs. △ Less

Submitted 6 November, 2023; originally announced November 2023.

Comments: Accepted to EMNLP 2023 Main

arXiv:2310.20246 [pdf, other]

Breaking Language Barriers in Multilingual Mathematical Reasoning: Insights and Observations

Authors: Nuo Chen, Zinan Zheng, Ning Wu, Ming Gong, Yangqiu Song, Dongmei Zhang, Jia Li

Abstract: Existing research predominantly focuses on develo** powerful language learning models (LLMs) for mathematical reasoning within monolingual languages, with few explorations in preserving efficacy in a multilingual context. To bridge this gap, this paper pioneers exploring and training powerful Multilingual Math Reasoning (xMR) LLMs. Firstly, by utilizing translation, we construct the first multil… ▽ More Existing research predominantly focuses on develo** powerful language learning models (LLMs) for mathematical reasoning within monolingual languages, with few explorations in preserving efficacy in a multilingual context. To bridge this gap, this paper pioneers exploring and training powerful Multilingual Math Reasoning (xMR) LLMs. Firstly, by utilizing translation, we construct the first multilingual math reasoning instruction dataset, MGSM8KInstruct, encompassing ten distinct languages, thus addressing the issue of training data scarcity in xMR tasks. Based on the collected dataset, we propose different training strategies to build powerful xMR LLMs, named MathOctopus, notably outperform conventional open-source LLMs and exhibit superiority over ChatGPT in few-shot scenarios. Notably, MathOctopus-13B reaches 47.6% accuracy which exceeds ChatGPT 46.3% on MGSM testset. Beyond remarkable results, we unearth several pivotal observations and insights from extensive experiments: (1) When extending the rejection sampling strategy to the multilingual context, it proves effective for model performances, albeit limited. (2) Employing parallel corpora for math Supervised Fine-Tuning (SFT) across multiple languages not only significantly enhances model performance multilingually but also elevates their monolingual performance. This indicates that crafting multilingual corpora can be regarded as a vital strategy for enhancing model performance in a specific language, especially in mathematical reasoning tasks. For instance, MathOctopus-7B improves its counterparts that trained on English from 42.2% to 50.8% on GSM8K testset. △ Less

Submitted 28 November, 2023; v1 submitted 31 October, 2023; originally announced October 2023.

Comments: Work in Progress

arXiv:2310.19491 [pdf, ps, other]

Generator Identification for Linear SDEs with Additive and Multiplicative Noise

Authors: Yuanyuan Wang, Xi Geng, Wei Huang, Biwei Huang, Mingming Gong

Abstract: In this paper, we present conditions for identifying the generator of a linear stochastic differential equation (SDE) from the distribution of its solution process with a given fixed initial state. These identifiability conditions are crucial in causal inference using linear SDEs as they enable the identification of the post-intervention distributions from its observational distribution. Specifica… ▽ More In this paper, we present conditions for identifying the generator of a linear stochastic differential equation (SDE) from the distribution of its solution process with a given fixed initial state. These identifiability conditions are crucial in causal inference using linear SDEs as they enable the identification of the post-intervention distributions from its observational distribution. Specifically, we derive a sufficient and necessary condition for identifying the generator of linear SDEs with additive noise, as well as a sufficient condition for identifying the generator of linear SDEs with multiplicative noise. We show that the conditions derived for both types of SDEs are generic. Moreover, we offer geometric interpretations of the derived identifiability conditions to enhance their understanding. To validate our theoretical results, we perform a series of simulations, which support and substantiate the established findings. △ Less

Submitted 21 January, 2024; v1 submitted 30 October, 2023; originally announced October 2023.

arXiv:2310.17546 [pdf, other]

A changepoint approach to modelling non-stationary soil moisture dynamics

Authors: Mengyi Gong, Rebecca Killick, Christopher Nemeth, John Quinton

Abstract: Soil moisture dynamics provide an indicator of soil health that scientists model via soil drydown curves. The typical modeling process requires the soil moisture time series to be manually separated into drydown segments and then exponential decay models are fitted to them independently. Sensor development over recent years means that experiments that were previously conducted over a few field cam… ▽ More Soil moisture dynamics provide an indicator of soil health that scientists model via soil drydown curves. The typical modeling process requires the soil moisture time series to be manually separated into drydown segments and then exponential decay models are fitted to them independently. Sensor development over recent years means that experiments that were previously conducted over a few field campaigns can now be scaled to months or even years, often at a higher sampling rate. Manual identification of drydown segments is no longer practical. To better meet the challenge of increasing data size, this paper proposes a novel changepoint-based approach to automatically identify structural changes in the soil drying process, and estimate the parameters characterizing the drying processes simultaneously. A simulation study is carried out to assess the performance of the method. The results demonstrate its ability to identify structural changes and retrieve key parameters of interest to soil scientists. The method is applied to hourly soil moisture time series from the NEON data portal to investigate the temporal dynamics of soil moisture drydown. We recover known relationships previously identified manually, alongside delivering new insights into the temporal variability across soil types and locations. △ Less

Submitted 26 October, 2023; originally announced October 2023.

Comments: 19 pages for the main manuscript, 6 pages for the supplemental document

MSC Class: 62M10; 62P12

arXiv:2310.15580 [pdf, other]

Identifiable Latent Polynomial Causal Models Through the Lens of Change

Authors: Yuhang Liu, Zhen Zhang, Dong Gong, Mingming Gong, Biwei Huang, Anton van den Hengel, Kun Zhang, Javen Qinfeng Shi

Abstract: Causal representation learning aims to unveil latent high-level causal representations from observed low-level data. One of its primary tasks is to provide reliable assurance of identifying these latent causal models, known as identifiability. A recent breakthrough explores identifiability by leveraging the change of causal influences among latent causal variables across multiple environments \cit… ▽ More Causal representation learning aims to unveil latent high-level causal representations from observed low-level data. One of its primary tasks is to provide reliable assurance of identifying these latent causal models, known as identifiability. A recent breakthrough explores identifiability by leveraging the change of causal influences among latent causal variables across multiple environments \citep{liu2022identifying}. However, this progress rests on the assumption that the causal relationships among latent causal variables adhere strictly to linear Gaussian models. In this paper, we extend the scope of latent causal models to involve nonlinear causal relationships, represented by polynomial models, and general noise distributions conforming to the exponential family. Additionally, we investigate the necessity of imposing changes on all causal parameters and present partial identifiability results when part of them remains unchanged. Further, we propose a novel empirical estimation method, grounded in our theoretical finding, that enables learning consistent latent causal representations. Our experimental results, obtained from both synthetic and real-world data, validate our theoretical contributions concerning identifiability and consistency. △ Less

Submitted 24 October, 2023; originally announced October 2023.

arXiv:2310.11239 [pdf, other]

LiDAR-based 4D Occupancy Completion and Forecasting

Authors: Xinhao Liu, Moonjun Gong, Qi Fang, Haoyu Xie, Yiming Li, Hang Zhao, Chen Feng

Abstract: Scene completion and forecasting are two popular perception problems in research for mobile agents like autonomous vehicles. Existing approaches treat the two problems in isolation, resulting in a separate perception of the two aspects. In this paper, we introduce a novel LiDAR perception task of Occupancy Completion and Forecasting (OCF) in the context of autonomous driving to unify these aspects… ▽ More Scene completion and forecasting are two popular perception problems in research for mobile agents like autonomous vehicles. Existing approaches treat the two problems in isolation, resulting in a separate perception of the two aspects. In this paper, we introduce a novel LiDAR perception task of Occupancy Completion and Forecasting (OCF) in the context of autonomous driving to unify these aspects into a cohesive framework. This task requires new algorithms to address three challenges altogether: (1) sparse-to-dense reconstruction, (2) partial-to-complete hallucination, and (3) 3D-to-4D prediction. To enable supervision and evaluation, we curate a large-scale dataset termed OCFBench from public autonomous driving datasets. We analyze the performance of closely related existing baseline models and our own ones on our dataset. We envision that this research will inspire and call for further investigation in this evolving and crucial area of 4D perception. Our code for data curation and baseline implementation is available at https://github.com/ai4ce/Occ4cast. △ Less

Submitted 17 October, 2023; originally announced October 2023.

arXiv:2310.00814 [pdf, other]

Quark masses and low energy constants in the continuum from the tadpole improved clover ensembles

Authors: Zhi-Cheng Hu, Bo-Lun Hu, Ji-Hao Wang, Ming Gong, Liuming Liu, Peng Sun, Wei Sun, Wei Wang, Yi-Bo Yang, Dian-Jun Zhao

Abstract: We present the light-flavor quark masses and low energy constants using the 2+1 flavor full-QCD ensembles with stout smeared clover fermion action and Symanzik gauge actions. Both the fermion and gauge actions are tadpole improved self-consistently. The simulations are performed on 11 ensembles at 3 lattice spacings $a\in[0.05,0.11]$ fm, 4 spatial sizes $L\in[2.5, 5.1]$ fm, 7 pion masses… ▽ More We present the light-flavor quark masses and low energy constants using the 2+1 flavor full-QCD ensembles with stout smeared clover fermion action and Symanzik gauge actions. Both the fermion and gauge actions are tadpole improved self-consistently. The simulations are performed on 11 ensembles at 3 lattice spacings $a\in[0.05,0.11]$ fm, 4 spatial sizes $L\in[2.5, 5.1]$ fm, 7 pion masses $m_π\in[135,350]$ MeV, and several values of the strange quark mass. The quark mass is defined through the partially conserved axial current (PCAC) relation and renormalized to $\overline{\mathrm{MS}}$ 2 GeV through the intermediate regularization independent momentum subtraction (RI/MOM) scheme. The systematic uncertainty of using the symmetric momentum subtraction (SMOM) scheme is also included. Eventually, we predict $m_u=2.45(22)(20)$ MeV, $m_d=4.74(11)(09)$ MeV, and $m_s=98.8(2.9)(4.7)$ MeV with the systematic uncertainties from lattice spacing determination, continuum extrapolation and renormalization constant included. We also obtain the chiral condensate $Σ^{1/3}=268.6(3.6)(0.7)$ MeV and the pion decay constant $F=86.6(7)(1.4) $ MeV in the $N_f=2$ chiral limit, and the next-to-leading order low energy constants $\ell_3=2.43(54)(05)$ and $\ell_4=4.322(75)(96)$. △ Less

Submitted 7 January, 2024; v1 submitted 1 October, 2023; originally announced October 2023.

Comments: Version accepted by PRD. 7 pages, 4 figures, with more details in the appendix

arXiv:2309.10469 [pdf, other]

doi 10.1145/3583780.3615498

RUEL: Retrieval-Augmented User Representation with Edge Browser Logs for Sequential Recommendation

Authors: Ning Wu, Ming Gong, Linjun Shou, Jian Pei, Daxin Jiang

Abstract: Online recommender systems (RS) aim to match user needs with the vast amount of resources available on various platforms. A key challenge is to model user preferences accurately under the condition of data sparsity. To address this challenge, some methods have leveraged external user behavior data from multiple platforms to enrich user representation. However, all of these methods require a consis… ▽ More Online recommender systems (RS) aim to match user needs with the vast amount of resources available on various platforms. A key challenge is to model user preferences accurately under the condition of data sparsity. To address this challenge, some methods have leveraged external user behavior data from multiple platforms to enrich user representation. However, all of these methods require a consistent user ID across platforms and ignore the information from similar users. In this study, we propose RUEL, a novel retrieval-based sequential recommender that can effectively incorporate external anonymous user behavior data from Edge browser logs to enhance recommendation. We first collect and preprocess a large volume of Edge browser logs over a one-year period and link them to target entities that correspond to candidate items in recommendation datasets. We then design a contrastive learning framework with a momentum encoder and a memory bank to retrieve the most relevant and diverse browsing sequences from the full browsing log based on the semantic similarity between user representations. After retrieval, we apply an item-level attentive selector to filter out noisy items and generate refined sequence embeddings for the final predictor. RUEL is the first method that connects user browsing data with typical recommendation datasets and can be generalized to various recommendation scenarios and datasets. We conduct extensive experiments on four real datasets for sequential recommendation tasks and demonstrate that RUEL significantly outperforms state-of-the-art baselines. We also conduct ablation studies and qualitative analysis to validate the effectiveness of each component of RUEL and provide additional insights into our method. △ Less

Submitted 19 September, 2023; originally announced September 2023.

Comments: CIKM 2023 ADS

arXiv:2309.10279 [pdf, other]

360$^\circ$ Reconstruction From a Single Image Using Space Carved Outpainting

Authors: Nuri Ryu, Minsu Gong, Geonung Kim, Joo-Haeng Lee, Sunghyun Cho

Abstract: We introduce POP3D, a novel framework that creates a full $360^\circ$-view 3D model from a single image. POP3D resolves two prominent issues that limit the single-view reconstruction. Firstly, POP3D offers substantial generalizability to arbitrary categories, a trait that previous methods struggle to achieve. Secondly, POP3D further improves reconstruction fidelity and naturalness, a crucial aspec… ▽ More We introduce POP3D, a novel framework that creates a full $360^\circ$-view 3D model from a single image. POP3D resolves two prominent issues that limit the single-view reconstruction. Firstly, POP3D offers substantial generalizability to arbitrary categories, a trait that previous methods struggle to achieve. Secondly, POP3D further improves reconstruction fidelity and naturalness, a crucial aspect that concurrent works fall short of. Our approach marries the strengths of four primary components: (1) a monocular depth and normal predictor that serves to predict crucial geometric cues, (2) a space carving method capable of demarcating the potentially unseen portions of the target object, (3) a generative model pre-trained on a large-scale image dataset that can complete unseen regions of the target, and (4) a neural implicit surface reconstruction method tailored in reconstructing objects using RGB images along with monocular geometric cues. The combination of these components enables POP3D to readily generalize across various in-the-wild images and generate state-of-the-art reconstructions, outperforming similar works by a significant margin. Project page: \url{http://cg.postech.ac.kr/research/POP3D} △ Less

Submitted 18 September, 2023; originally announced September 2023.

Comments: Accepted to SIGGRAPH Asia 2023 (Conference Track). For the project page, see http://cg.postech.ac.kr/research/POP3D For the supplementary document, see http://cg.postech.ac.kr/papers/2023_SIGAsia_Ryu_Supp.pdf

arXiv:2309.07407 [pdf, other]

Deep Reinforcement Learning-based Scheduling for Optimizing System Load and Response Time in Edge and Fog Computing Environments

Authors: Zhiyu Wang, Mohammad Goudarzi, Mingming Gong, Rajkumar Buyya

Abstract: Edge/fog computing, as a distributed computing paradigm, satisfies the low-latency requirements of ever-increasing number of IoT applications and has become the mainstream computing paradigm behind IoT applications. However, because large number of IoT applications require execution on the edge/fog resources, the servers may be overloaded. Hence, it may disrupt the edge/fog servers and also negati… ▽ More Edge/fog computing, as a distributed computing paradigm, satisfies the low-latency requirements of ever-increasing number of IoT applications and has become the mainstream computing paradigm behind IoT applications. However, because large number of IoT applications require execution on the edge/fog resources, the servers may be overloaded. Hence, it may disrupt the edge/fog servers and also negatively affect IoT applications' response time. Moreover, many IoT applications are composed of dependent components incurring extra constraints for their execution. Besides, edge/fog computing environments and IoT applications are inherently dynamic and stochastic. Thus, efficient and adaptive scheduling of IoT applications in heterogeneous edge/fog computing environments is of paramount importance. However, limited computational resources on edge/fog servers imposes an extra burden for applying optimal but computationally demanding techniques. To overcome these challenges, we propose a Deep Reinforcement Learning-based IoT application Scheduling algorithm, called DRLIS to adaptively and efficiently optimize the response time of heterogeneous IoT applications and balance the load of the edge/fog servers. We implemented DRLIS as a practical scheduler in the FogBus2 function-as-a-service framework for creating an edge-fog-cloud integrated serverless computing environment. Results obtained from extensive experiments show that DRLIS significantly reduces the execution cost of IoT applications by up to 55%, 37%, and 50% in terms of load balancing, response time, and weighted cost, respectively, compared with metaheuristic algorithms and other reinforcement learning techniques. △ Less

Submitted 22 October, 2023; v1 submitted 13 September, 2023; originally announced September 2023.

arXiv:2308.04696 [pdf, other]

Explainable AI in Orthopedics: Challenges, Opportunities, and Prospects

Authors: Soheyla Amirian, Luke A. Carlson, Matthew F. Gong, Ines Lohse, Kurt R. Weiss, Johannes F. Plate, Ahmad P. Tafti

Abstract: While artificial intelligence (AI) has made many successful applications in various domains, its adoption in healthcare lags a little bit behind other high-stakes settings. Several factors contribute to this slower uptake, including regulatory frameworks, patient privacy concerns, and data heterogeneity. However, one significant challenge that impedes the implementation of AI in healthcare, partic… ▽ More While artificial intelligence (AI) has made many successful applications in various domains, its adoption in healthcare lags a little bit behind other high-stakes settings. Several factors contribute to this slower uptake, including regulatory frameworks, patient privacy concerns, and data heterogeneity. However, one significant challenge that impedes the implementation of AI in healthcare, particularly in orthopedics, is the lack of explainability and interpretability around AI models. Addressing the challenge of explainable AI (XAI) in orthopedics requires develo** AI models and algorithms that prioritize transparency and interpretability, allowing clinicians, surgeons, and patients to understand the contributing factors behind any AI-powered predictive or descriptive models. The current contribution outlines several key challenges and opportunities that manifest in XAI in orthopedic practice. This work emphasizes the need for interdisciplinary collaborations between AI practitioners, orthopedic specialists, and regulatory entities to establish standards and guidelines for the adoption of XAI in orthopedics. △ Less

Submitted 9 August, 2023; originally announced August 2023.

Comments: This paper was accepted at The 2023 World Congress in Computer Science, Computer Engineering, and Applied Computing (CSCE'23)

arXiv:2308.04356 [pdf, other]

Learning Unbiased Image Segmentation: A Case Study with Plain Knee Radiographs

Authors: Nickolas Littlefield, Johannes F. Plate, Kurt R. Weiss, Ines Lohse, Avani Chhabra, Ismaeel A. Siddiqui, Zoe Menezes, George Mastorakos, Sakshi Mehul Thakar, Mehrnaz Abedian, Matthew F. Gong, Luke A. Carlson, Hamidreza Moradi, Soheyla Amirian, Ahmad P. Tafti

Abstract: Automatic segmentation of knee bony anatomy is essential in orthopedics, and it has been around for several years in both pre-operative and post-operative settings. While deep learning algorithms have demonstrated exceptional performance in medical image analysis, the assessment of fairness and potential biases within these models remains limited. This study aims to revisit deep learning-powered k… ▽ More Automatic segmentation of knee bony anatomy is essential in orthopedics, and it has been around for several years in both pre-operative and post-operative settings. While deep learning algorithms have demonstrated exceptional performance in medical image analysis, the assessment of fairness and potential biases within these models remains limited. This study aims to revisit deep learning-powered knee-bony anatomy segmentation using plain radiographs to uncover visible gender and racial biases. The current contribution offers the potential to advance our understanding of biases, and it provides practical insights for researchers and practitioners in medical imaging. The proposed mitigation strategies mitigate gender and racial biases, ensuring fair and unbiased segmentation results. Furthermore, this work promotes equal access to accurate diagnoses and treatment outcomes for diverse patient populations, fostering equitable and inclusive healthcare provision. △ Less

Submitted 8 August, 2023; originally announced August 2023.

Comments: This paper has been accepted by IEEE BHI 2023

arXiv:2308.00410 [pdf, other]

doi 10.1016/j.eng.2021.10.022

A Cyber-Physical Routing Protocol Exploiting Trajectory Dynamics for Mission-Oriented Flying Ad Hoc Networks

Authors: Die Hu, Shaoshi Yang, Min Gong, Zhiyong Feng, Xuejun Zhu

Abstract: As a special type of mobile ad hoc network (MANET), the flying ad hoc network (FANET) has the potential to enable a variety of emerging applications in both civilian wireless communications (e.g., 5G and 6G) and the defense industry. The routing protocol plays a pivotal role in FANET. However, when designing the routing protocol for FANET, it is conventionally assumed that the aerial nodes move ra… ▽ More As a special type of mobile ad hoc network (MANET), the flying ad hoc network (FANET) has the potential to enable a variety of emerging applications in both civilian wireless communications (e.g., 5G and 6G) and the defense industry. The routing protocol plays a pivotal role in FANET. However, when designing the routing protocol for FANET, it is conventionally assumed that the aerial nodes move randomly. This is clearly inappropriate for a mission-oriented FANET (MO-FANET), in which the aerial nodes typically move toward a given destination from given departure point(s), possibly along a roughly deterministic flight path while maintaining a well-established formation, in order to carry out certain missions. In this paper, a novel cyber-physical routing protocol exploiting the particular mobility pattern of an MO-FANET is proposed based on cross-disciplinary integration, which makes full use of the mission-determined trajectory dynamics to construct the time sequence of rejoining and separating, as well as the adjacency matrix for each node, as prior information. Compared with the existing representative routing protocols used in FANETs, our protocol achieves a higher packet-delivery ratio (PDR) at the cost of even lower overhead and lower average end-to-end latency, while maintaining a reasonably moderate and stable network jitter, as demonstrated by extensive ns-3-based simulations assuming realistic configurations in an MO-FANET. △ Less

Submitted 1 August, 2023; originally announced August 2023.

Comments: 12 pages, 24 figures, accepted to appear on Engineering in Dec. 2022 (ISSN 2095-8099)

Journal ref: Engineering, Volume 19, Pages 217-227, Dec. 2022

arXiv:2307.16405 [pdf, other]

Causal-learn: Causal Discovery in Python

Authors: Yujia Zheng, Biwei Huang, Wei Chen, Joseph Ramsey, Mingming Gong, Ruichu Cai, Shohei Shimizu, Peter Spirtes, Kun Zhang

Abstract: Causal discovery aims at revealing causal relations from observational data, which is a fundamental task in science and engineering. We describe $\textit{causal-learn}$, an open-source Python library for causal discovery. This library focuses on bringing a comprehensive collection of causal discovery methods to both practitioners and researchers. It provides easy-to-use APIs for non-specialists, m… ▽ More Causal discovery aims at revealing causal relations from observational data, which is a fundamental task in science and engineering. We describe $\textit{causal-learn}$, an open-source Python library for causal discovery. This library focuses on bringing a comprehensive collection of causal discovery methods to both practitioners and researchers. It provides easy-to-use APIs for non-specialists, modular building blocks for developers, detailed documentation for learners, and comprehensive methods for all. Different from previous packages in R or Java, $\textit{causal-learn}$ is fully developed in Python, which could be more in tune with the recent preference shift in programming languages within related communities. The library is available at https://github.com/py-why/causal-learn. △ Less

Submitted 31 July, 2023; originally announced July 2023.

Journal ref: Journal of Machine Learning Research 25 (2024)

arXiv:2307.14019 [pdf, other]

One-Nearest Neighborhood Guides Inlier Estimation for Unsupervised Point Cloud Registration

Authors: Yongzhe Yuan, Yue Wu, Maoguo Gong, Qiguang Miao, A. K. Qin

Abstract: The precision of unsupervised point cloud registration methods is typically limited by the lack of reliable inlier estimation and self-supervised signal, especially in partially overlap** scenarios. In this paper, we propose an effective inlier estimation method for unsupervised point cloud registration by capturing geometric structure consistency between the source point cloud and its correspon… ▽ More The precision of unsupervised point cloud registration methods is typically limited by the lack of reliable inlier estimation and self-supervised signal, especially in partially overlap** scenarios. In this paper, we propose an effective inlier estimation method for unsupervised point cloud registration by capturing geometric structure consistency between the source point cloud and its corresponding reference point cloud copy. Specifically, to obtain a high quality reference point cloud copy, an One-Nearest Neighborhood (1-NN) point cloud is generated by input point cloud. This facilitates matching map construction and allows for integrating dual neighborhood matching scores of 1-NN point cloud and input point cloud to improve matching confidence. Benefiting from the high quality reference copy, we argue that the neighborhood graph formed by inlier and its neighborhood should have consistency between source point cloud and its corresponding reference copy. Based on this observation, we construct transformation-invariant geometric structure representations and capture geometric structure consistency to score the inlier confidence for estimated correspondences between source point cloud and its reference copy. This strategy can simultaneously provide the reliable self-supervised signal for model optimization. Finally, we further calculate transformation estimation by the weighted SVD algorithm with the estimated correspondences and corresponding inlier confidence. We train the proposed model in an unsupervised manner, and extensive experiments on synthetic and real-world datasets illustrate the effectiveness of the proposed method. △ Less

Submitted 26 July, 2023; originally announced July 2023.

arXiv:2307.12631 [pdf, other]

doi 10.1103/PhysRevA.109.033310

Fate of localization in coupled free chain and disordered chain

Authors: Xiaoshui Lin, Ming Gong

Abstract: It has been widely believed that almost all states in one-dimensional (1d) disordered systems with short-range hop** and uncorrelated random potential are localized. Here, we consider the fate of these localized states by coupling between a disordered chain (with localized states) and a free chain (with extended states), showing that states in the overlapped and un-overlapped regimes exhibit tot… ▽ More It has been widely believed that almost all states in one-dimensional (1d) disordered systems with short-range hop** and uncorrelated random potential are localized. Here, we consider the fate of these localized states by coupling between a disordered chain (with localized states) and a free chain (with extended states), showing that states in the overlapped and un-overlapped regimes exhibit totally different localization behaviors, which is not a phase transition process. In particular, while states in the overlapped regime are localized by resonant coupling, in the un-overlapped regime of the free chain, significant suppression of the localization with a prefactor of $ξ^{-1} \propto t_v^4/Δ^4$ appeared, where $t_v$ is the inter-chain coupling strength and $Δ$ is the energy shift between them. This system may exhibit localization lengths that are comparable with the system size even when the potential in the disordered chain is strong. We confirm these results using the transfer matrix method and sparse matrix method for systems $L \sim 10^6 - 10^9$. These findings extend our understanding of localization in low-dimensional disordered systems and provide a concrete example, which may call for much more advanced numerical methods in high-dimensional models. △ Less

Submitted 24 July, 2023; originally announced July 2023.

Comments: 7 pages, 6 figures

Journal ref: Phys. Rev. A 109, 033310 (2024)

arXiv:2307.05948 [pdf, other]

Diversity-enhancing Generative Network for Few-shot Hypothesis Adaptation

Authors: Ruijiang Dong, Feng Liu, Haoang Chi, Tongliang Liu, Mingming Gong, Gang Niu, Masashi Sugiyama, Bo Han

Abstract: Generating unlabeled data has been recently shown to help address the few-shot hypothesis adaptation (FHA) problem, where we aim to train a classifier for the target domain with a few labeled target-domain data and a well-trained source-domain classifier (i.e., a source hypothesis), for the additional information of the highly-compatible unlabeled data. However, the generated data of the existing… ▽ More Generating unlabeled data has been recently shown to help address the few-shot hypothesis adaptation (FHA) problem, where we aim to train a classifier for the target domain with a few labeled target-domain data and a well-trained source-domain classifier (i.e., a source hypothesis), for the additional information of the highly-compatible unlabeled data. However, the generated data of the existing methods are extremely similar or even the same. The strong dependency among the generated data will lead the learning to fail. In this paper, we propose a diversity-enhancing generative network (DEG-Net) for the FHA problem, which can generate diverse unlabeled data with the help of a kernel independence measure: the Hilbert-Schmidt independence criterion (HSIC). Specifically, DEG-Net will generate data via minimizing the HSIC value (i.e., maximizing the independence) among the semantic features of the generated data. By DEG-Net, the generated unlabeled data are more diverse and more effective for addressing the FHA problem. Experimental results show that the DEG-Net outperforms existing FHA baselines and further verifies that generating diverse data plays a vital role in addressing the FHA problem △ Less

Submitted 12 July, 2023; originally announced July 2023.

arXiv:2307.01675 [pdf, other]

doi 10.1103/PhysRevA.109.032626

Two-photon-transition superadiabatic passage in an nitrogen-vacancy center in diamond

Authors: Musang Gong, Min Yu, Yaoming Chu, Wei Chen, Qingyun Cao, Ning Wang, Jianming Cai, Ralf Betzholz, Luigi Giannelli

Abstract: Reaching a given target quantum state with high fidelity and fast operation speed close to the quantum limit represents an important goal in quantum information science. Here, we experimentally demonstrate superadiabatic quantum driving to achieve population transfer in a three-level solid-state spin system. Starting from traditional stimulated Raman adiabatic passage (STIRAP), our approach implem… ▽ More Reaching a given target quantum state with high fidelity and fast operation speed close to the quantum limit represents an important goal in quantum information science. Here, we experimentally demonstrate superadiabatic quantum driving to achieve population transfer in a three-level solid-state spin system. Starting from traditional stimulated Raman adiabatic passage (STIRAP), our approach implements superadiabatic corrections to the STIRAP Hamiltonians with several paradigmatic pulse shapes. It requires no need of intense microwave pulses or long transfer times and shows enhanced robustness over pulse imperfections. These results might provide a useful tool for quantum information processing and coherent manipulations of quantum systems. △ Less

Submitted 4 July, 2023; originally announced July 2023.

Comments: 8 pages, 7 figures

Journal ref: Physical Review A 109, 032626 (2024)

arXiv:2307.01638 [pdf, other]

From single-particle to many-body mobility edges and the fate of overlapped spectra in coupled disorder models

Authors: Xiaoshui Lin, Ming Gong, Guang-Can Guo

Abstract: Mobility edge (ME) has played an essential role in disordered models. However, while this concept has been well established in disordered single-particle models, its existence in disordered many-body models is still under controversy. Here, a general approach based on coupling between extended and localized states in their overlapped spectra for ME is presented. We show that in the one-dimensional… ▽ More Mobility edge (ME) has played an essential role in disordered models. However, while this concept has been well established in disordered single-particle models, its existence in disordered many-body models is still under controversy. Here, a general approach based on coupling between extended and localized states in their overlapped spectra for ME is presented. We show that in the one-dimensional (1d) disordered single-particle models, all states are localized by direct coupling between them. However, in $d \ge 2$ disordered single-particle and 1d disordered many-body models, the resonant hybridization between these states in their overlapped spectra makes all states be extended, while these in the un-overlapped spectra are unchanged, leading to tunable MEs. We propose several models, including two disordered many-body spin models, to verify this mechanism. Our results establish a unified mechanism for MEs and demonstrate its universality in single-particle and many-body models, which opens an intriguing avenue for the realization and verification of MEs in many-body localization. △ Less

Submitted 4 July, 2023; originally announced July 2023.

Comments: 7+11 pages

arXiv:2306.16408 [pdf, other]

doi 10.3847/1538-4357/ace2c5

Coagulation-Fragmentation Equilibrium for Charged Dust: Abundance of Submicron Grains Increases Dramatically in Protoplanetary Disks

Authors: Vitaly Akimkin, Alexei V. Ivlev, Paola Caselli, Munan Gong, Kedron Silsbee

Abstract: Dust coagulation in protoplanetary disks is not straightforward and is subject to several slow-down mechanisms, such as bouncing, fragmentation and radial drift to the star. Furthermore, dust grains in UV-shielded disk regions are negatively charged due to collisions with the surrounding electrons and ions, which leads to their electrostatic repulsion. For typical disk conditions, the relative vel… ▽ More Dust coagulation in protoplanetary disks is not straightforward and is subject to several slow-down mechanisms, such as bouncing, fragmentation and radial drift to the star. Furthermore, dust grains in UV-shielded disk regions are negatively charged due to collisions with the surrounding electrons and ions, which leads to their electrostatic repulsion. For typical disk conditions, the relative velocities between micron-size grains are small and their collisions are strongly affected by the repulsion. On the other hand, collisions between pebble-size grains can be too energetic, leading to grain fragmentation. The aim of the present paper is to study a combined effect of the electrostatic and fragmentation barriers on dust evolution. We numerically solve the Smoluchowski coagulation-fragmentation equation for grains whose charging occurs under conditions typical for the inner disk regions, where thermal ionization operates. We find that dust fragmentation efficiently resupplies the population of small grains under the electrostatic barrier. As a result, the equilibrium abundance of sub-micron grains is enhanced by several orders of magnitude compared to the case of neutral dust. For some conditions with fragmentation velocities $\sim 1$ m s$^{-1}$, macroscopic grains are completely destroyed. △ Less

Submitted 28 June, 2023; originally announced June 2023.

Comments: accepted for publication in ApJ

arXiv:2306.12884 [pdf, other]

Decays of $1^{-+}$ Charmoniumlike Hybrid

Authors: Chunjiang Shi, Ying Chen, Ming Gong, Xiangyu Jiang, Zhaofeng Liu, Wei Sun

Abstract: By extracting the transition amplitudes, we give the first lattice QCD prediction of the two-body decay partial widths of the $1^{-+}$ charmoniumlike hybrid $η_{c1}$. Given the calculated mass value $m_{η_{c1}}=4.329(36)$ GeV, the $η_{c1}$ decay is dominated by the open charm modes $D_1\bar{D}$, $D^*\bar{D}$ and $D^*\bar{D}^*$ with partial widths of $258(133)$ MeV, $88(18)$ MeV and $150(118)$ MeV,… ▽ More By extracting the transition amplitudes, we give the first lattice QCD prediction of the two-body decay partial widths of the $1^{-+}$ charmoniumlike hybrid $η_{c1}$. Given the calculated mass value $m_{η_{c1}}=4.329(36)$ GeV, the $η_{c1}$ decay is dominated by the open charm modes $D_1\bar{D}$, $D^*\bar{D}$ and $D^*\bar{D}^*$ with partial widths of $258(133)$ MeV, $88(18)$ MeV and $150(118)$ MeV, respectively. The coupling of $η_{c1}$ to $χ_{c1}$ plus a flavor singlet pseudoscalar is not small, but $χ_{c1}η$ decay is suppressed by the small $η-η'$ mixing angle. The partial width of $η_{c1}\to η_cη'$ is estimated to be around 1 MeV. We suggest experiments to search for $η_{c1}$ in the $P$-wave $D^*\bar{D}$ and $D^*\bar{D}^*$ systems. Especially, the polarization of $D^*\bar{D}^*$ can be used to distinguish the $1^{-+}$ product (total spin $S=1$) from $1^{--}$ products ($S=0$). △ Less

Submitted 19 March, 2024; v1 submitted 22 June, 2023; originally announced June 2023.

Comments: 17 pages, 7 figures. Revised to article format

arXiv:2306.12511 [pdf, other]

Semi-Implicit Denoising Diffusion Models (SIDDMs)

Authors: Yanwu Xu, Mingming Gong, Shaoan Xie, Wei Wei, Matthias Grundmann, Kayhan Batmanghelich, Tingbo Hou

Abstract: Despite the proliferation of generative models, achieving fast sampling during inference without compromising sample diversity and quality remains challenging. Existing models such as Denoising Diffusion Probabilistic Models (DDPM) deliver high-quality, diverse samples but are slowed by an inherently high number of iterative steps. The Denoising Diffusion Generative Adversarial Networks (DDGAN) at… ▽ More Despite the proliferation of generative models, achieving fast sampling during inference without compromising sample diversity and quality remains challenging. Existing models such as Denoising Diffusion Probabilistic Models (DDPM) deliver high-quality, diverse samples but are slowed by an inherently high number of iterative steps. The Denoising Diffusion Generative Adversarial Networks (DDGAN) attempted to circumvent this limitation by integrating a GAN model for larger jumps in the diffusion process. However, DDGAN encountered scalability limitations when applied to large datasets. To address these limitations, we introduce a novel approach that tackles the problem by matching implicit and explicit factors. More specifically, our approach involves utilizing an implicit model to match the marginal distributions of noisy data and the explicit conditional distribution of the forward diffusion. This combination allows us to effectively match the joint denoising distributions. Unlike DDPM but similar to DDGAN, we do not enforce a parametric distribution for the reverse step, enabling us to take large steps during inference. Similar to the DDPM but unlike DDGAN, we take advantage of the exact form of the diffusion process. We demonstrate that our proposed method obtains comparable generative performance to diffusion-based models and vastly superior results to models with a small number of sampling steps. △ Less

Submitted 10 October, 2023; v1 submitted 21 June, 2023; originally announced June 2023.

arXiv:2306.09001 [pdf, other]

SSCBench: Monocular 3D Semantic Scene Completion Benchmark in Street Views

Authors: Yiming Li, Sihang Li, Xinhao Liu, Moonjun Gong, Kenan Li, Nuo Chen, Zijun Wang, Zhiheng Li, Tao Jiang, Fisher Yu, Yue Wang, Hang Zhao, Zhiding Yu, Chen Feng

Abstract: Monocular scene understanding is a foundational component of autonomous systems. Within the spectrum of monocular perception topics, one crucial and useful task for holistic 3D scene understanding is semantic scene completion (SSC), which jointly completes semantic information and geometric details from RGB input. However, progress in SSC, particularly in large-scale street views, is hindered by t… ▽ More Monocular scene understanding is a foundational component of autonomous systems. Within the spectrum of monocular perception topics, one crucial and useful task for holistic 3D scene understanding is semantic scene completion (SSC), which jointly completes semantic information and geometric details from RGB input. However, progress in SSC, particularly in large-scale street views, is hindered by the scarcity of high-quality datasets. To address this issue, we introduce SSCBench, a comprehensive benchmark that integrates scenes from widely used automotive datasets (e.g., KITTI-360, nuScenes, and Waymo). SSCBench follows an established setup and format in the community, facilitating the easy exploration of SSC methods in various street views. We benchmark models using monocular, trinocular, and point cloud input to assess the performance gap resulting from sensor coverage and modality. Moreover, we have unified semantic labels across diverse datasets to simplify cross-domain generalization testing. We commit to including more datasets and SSC models to drive further advancements in this field. △ Less

Submitted 29 September, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

arXiv:2306.06561 [pdf, other]

Learning World Models with Identifiable Factorization

Authors: Yu-Ren Liu, Biwei Huang, Zhengmao Zhu, Honglong Tian, Mingming Gong, Yang Yu, Kun Zhang

Abstract: Extracting a stable and compact representation of the environment is crucial for efficient reinforcement learning in high-dimensional, noisy, and non-stationary environments. Different categories of information coexist in such environments -- how to effectively extract and disentangle these information remains a challenging problem. In this paper, we propose IFactor, a general framework to model f… ▽ More Extracting a stable and compact representation of the environment is crucial for efficient reinforcement learning in high-dimensional, noisy, and non-stationary environments. Different categories of information coexist in such environments -- how to effectively extract and disentangle these information remains a challenging problem. In this paper, we propose IFactor, a general framework to model four distinct categories of latent state variables that capture various aspects of information within the RL system, based on their interactions with actions and rewards. Our analysis establishes block-wise identifiability of these latent variables, which not only provides a stable and compact representation but also discloses that all reward-relevant factors are significant for policy learning. We further present a practical approach to learning the world model with identifiable blocks, ensuring the removal of redundants but retaining minimal and sufficient information for policy optimization. Experiments in synthetic worlds demonstrate that our method accurately identifies the ground-truth latent variables, substantiating our theoretical findings. Moreover, experiments in variants of the DeepMind Control Suite and RoboDesk showcase the superior performance of our approach over baselines. △ Less

Submitted 27 June, 2023; v1 submitted 10 June, 2023; originally announced June 2023.

arXiv:2306.04927 [pdf, other]

An Efficient Transformer for Simultaneous Learning of BEV and Lane Representations in 3D Lane Detection

Authors: Ziye Chen, Kate Smith-Miles, Bo Du, Guoqi Qian, Mingming Gong

Abstract: Accurately detecting lane lines in 3D space is crucial for autonomous driving. Existing methods usually first transform image-view features into bird-eye-view (BEV) by aid of inverse perspective map** (IPM), and then detect lane lines based on the BEV features. However, IPM ignores the changes in road height, leading to inaccurate view transformations. Additionally, the two separate stages of th… ▽ More Accurately detecting lane lines in 3D space is crucial for autonomous driving. Existing methods usually first transform image-view features into bird-eye-view (BEV) by aid of inverse perspective map** (IPM), and then detect lane lines based on the BEV features. However, IPM ignores the changes in road height, leading to inaccurate view transformations. Additionally, the two separate stages of the process can cause cumulative errors and increased complexity. To address these limitations, we propose an efficient transformer for 3D lane detection. Different from the vanilla transformer, our model contains a decomposed cross-attention mechanism to simultaneously learn lane and BEV representations. The mechanism decomposes the cross-attention between image-view and BEV features into the one between image-view and lane features, and the one between lane and BEV features, both of which are supervised with ground-truth lane lines. Our method obtains 2D and 3D lane predictions by applying the lane features to the image-view and BEV features, respectively. This allows for a more accurate view transformation than IPM-based methods, as the view transformation is learned from data with a supervised cross-attention. Additionally, the cross-attention between lane and BEV features enables them to adjust to each other, resulting in more accurate lane detection than the two separate stages. Finally, the decomposed cross-attention is more efficient than the original one. Experimental results on OpenLane and ONCE-3DLanes demonstrate the state-of-the-art performance of our method. △ Less

Submitted 8 June, 2023; originally announced June 2023.

arXiv:2305.15972 [pdf, other]

Logical Magic State Preparation with Fidelity Beyond the Distillation Threshold on a Superconducting Quantum Processor

Authors: Yangsen Ye, Tan He, He-Liang Huang, Zuolin Wei, Yiming Zhang, Youwei Zhao, Dachao Wu, Qingling Zhu, Huijie Guan, Sirui Cao, Fusheng Chen, Tung-Hsun Chung, Hui Deng, Dao** Fan, Ming Gong, Cheng Guo, Shaojun Guo, Lianchen Han, Na Li, Shaowei Li, Yuan Li, Futian Liang, ** Lin, Haoran Qian, Hao Rong , et al. (13 additional authors not shown)

Abstract: Fault-tolerant quantum computing based on surface code has emerged as an attractive candidate for practical large-scale quantum computers to achieve robust noise resistance. To achieve universality, magic states preparation is a commonly approach for introducing non-Clifford gates. Here, we present a hardware-efficient and scalable protocol for arbitrary logical state preparation for the rotated s… ▽ More Fault-tolerant quantum computing based on surface code has emerged as an attractive candidate for practical large-scale quantum computers to achieve robust noise resistance. To achieve universality, magic states preparation is a commonly approach for introducing non-Clifford gates. Here, we present a hardware-efficient and scalable protocol for arbitrary logical state preparation for the rotated surface code, and further experimentally implement it on the \textit{Zuchongzhi} 2.1 superconducting quantum processor. An average of \hhl{$0.8983 \pm 0.0002$} logical fidelity at different logical states with distance-three is achieved, \hhl{taking into account both state preparation and measurement errors.} In particular, \hhl{the magic states $|A^{π/4}\rangle_L$, $|H\rangle_L$, and $|T\rangle_L$ are prepared non-destructively with logical fidelities of $0.8771 \pm 0.0009 $, $0.9090 \pm 0.0009 $, and $0.8890 \pm 0.0010$, respectively, which are higher than the state distillation protocol threshold, 0.859 (for H-type magic state) and 0.827 (for T -type magic state).} Our work provides a viable and efficient avenue for generating high-fidelity raw logical magic states, which is essential for realizing non-Clifford logical gates in the surface code. △ Less

Submitted 30 May, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

Comments: In this version, we do not employ readout error mitigation strategies (in the previous version, we use readout transition matrix to mitigate the measurement error) to remove measurement errors because we believe it provides a more predictive assessment of the actual fidelity when generating and consuming magic states for a non-Clifford gate, as consuming the state involves measurement

arXiv:2305.15732 [pdf, other]

CLIP3Dstyler: Language Guided 3D Arbitrary Neural Style Transfer

Authors: Ming Gao, YanWu Xu, Yang Zhao, Tingbo Hou, Chenkai Zhao, Mingming Gong

Abstract: In this paper, we propose a novel language-guided 3D arbitrary neural style transfer method (CLIP3Dstyler). We aim at stylizing any 3D scene with an arbitrary style from a text description, and synthesizing the novel stylized view, which is more flexible than the image-conditioned style transfer. Compared with the previous 2D method CLIPStyler, we are able to stylize a 3D scene and generalize to n… ▽ More In this paper, we propose a novel language-guided 3D arbitrary neural style transfer method (CLIP3Dstyler). We aim at stylizing any 3D scene with an arbitrary style from a text description, and synthesizing the novel stylized view, which is more flexible than the image-conditioned style transfer. Compared with the previous 2D method CLIPStyler, we are able to stylize a 3D scene and generalize to novel scenes without re-train our model. A straightforward solution is to combine previous image-conditioned 3D style transfer and text-conditioned 2D style transfer \bigskip methods. However, such a solution cannot achieve our goal due to two main challenges. First, there is no multi-modal model matching point clouds and language at different feature scales (low-level, high-level). Second, we observe a style mixing issue when we stylize the content with different style conditions from text prompts. To address the first issue, we propose a 3D stylization framework to match the point cloud features with text features in local and global views. For the second issue, we propose an improved directional divergence loss to make arbitrary text styles more distinguishable as a complement to our framework. We conduct extensive experiments to show the effectiveness of our model on text-guided 3D scene style transfer. △ Less

Submitted 25 May, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

Comments: 17 pages, 14 figures

arXiv:2305.08870 [pdf]

Allying nanophotonic structures with two-dimensional van der Waals materials

Authors: Yuan Meng, Hongkun Zhong, Zhihao Xu, Tiantian He, Justin S. Kim, Sangmoon Han, Yijie Shen, Mali Gong, Sang-Hoon Bae, Qirong Xiao

Abstract: The integration of two-dimensional (2D) materials with photonic structures has catalyzed a wide spectrum of optical and optoelectronic applications. Conventional nanophotonic structures generally lack efficient reconfigurability and multifunctionality. The atomically thin 2D van der Waals materials can thus infuse new functionality and reconfigurability to the well-established library of photonic… ▽ More The integration of two-dimensional (2D) materials with photonic structures has catalyzed a wide spectrum of optical and optoelectronic applications. Conventional nanophotonic structures generally lack efficient reconfigurability and multifunctionality. The atomically thin 2D van der Waals materials can thus infuse new functionality and reconfigurability to the well-established library of photonic structures such as integrated waveguides, optical fibers, photonic crystals, micro-cavities, and metasurface, to name a few. Thanks to the handiness of van der Waals interfaces, the 2D materials can be easily transferred and mixed with other prefabricated photonic templates with high degrees of freedom, and can act as the optical gain, modulation, sensing, or plasmonic media for diverse applications. Here we review recent advents on combining 2D materials to nanophotonic structures for new functionality development or performance enhancements. Challenges and emerging opportunities in integrating van der Waals building blocks beyond 2D materials are also discussed. △ Less

Submitted 12 May, 2023; originally announced May 2023.

Comments: 23 pages

arXiv:2305.06154 [pdf, other]

Alleviating Over-smoothing for Unsupervised Sentence Representation

Authors: Nuo Chen, Linjun Shou, Ming Gong, Jian Pei, Bowen Cao, Jianhui Chang, Daxin Jiang, Jia Li

Abstract: Currently, learning better unsupervised sentence representations is the pursuit of many natural language processing communities. Lots of approaches based on pre-trained language models (PLMs) and contrastive learning have achieved promising results on this task. Experimentally, we observe that the over-smoothing problem reduces the capacity of these powerful PLMs, leading to sub-optimal sentence r… ▽ More Currently, learning better unsupervised sentence representations is the pursuit of many natural language processing communities. Lots of approaches based on pre-trained language models (PLMs) and contrastive learning have achieved promising results on this task. Experimentally, we observe that the over-smoothing problem reduces the capacity of these powerful PLMs, leading to sub-optimal sentence representations. In this paper, we present a Simple method named Self-Contrastive Learning (SSCL) to alleviate this issue, which samples negatives from PLMs intermediate layers, improving the quality of the sentence representation. Our proposed method is quite simple and can be easily extended to various state-of-the-art models for performance boosting, which can be seen as a plug-and-play contrastive framework for learning unsupervised sentence representation. Extensive results prove that SSCL brings the superior performance improvements of different strong baselines (e.g., BERT and SimCSE) on Semantic Textual Similarity and Transfer datasets. Our codes are available at https://github.com/nuochenpku/SSCL. △ Less

Submitted 9 May, 2023; originally announced May 2023.

Comments: 13 pages

Journal ref: ACL 2023

arXiv:2305.04965 [pdf, other]

Implementation of chemistry in the Athena++ code

Authors: Munan Gong, Ka-Wai Ho, James M. Stone, Eve C. Ostriker, Paola Caselli, Tommaso Grassi, Chang-Goo Kim, Jeong-Gyu Kim, Goni Halevi

Abstract: Chemistry plays a key role in many aspects of astrophysical fluids. Atoms and molecules are agents for heating and cooling, determine the ionization fraction, serve as observational tracers, and build the molecular foundation of life. We present the implementation of a chemistry module in the publicly available magneto-hydrodynamic code Athena++. We implement several chemical networks and heating… ▽ More Chemistry plays a key role in many aspects of astrophysical fluids. Atoms and molecules are agents for heating and cooling, determine the ionization fraction, serve as observational tracers, and build the molecular foundation of life. We present the implementation of a chemistry module in the publicly available magneto-hydrodynamic code Athena++. We implement several chemical networks and heating and cooling processes suitable for simulating the interstellar medium (ISM). A general chemical network framework in the KIDA format is also included, allowing the user to easily implement their own chemistry. Radiation transfer and cosmic-ray ionization are coupled with chemistry and solved with the simple six-ray approximation. The chemical and thermal processes are evolved as a system of coupled ODEs with an implicit solver from the CVODE library. We perform and present a series of tests to ensure the numerical accuracy and convergence of the code. Many tests combine chemistry with gas dynamics, including comparisons with analytic solutions, 1D problems of the photo-dissociation regions and shocks, and realistic 3D simulations of the turbulent ISM. We release the code with the new public version of Athena++, aiming to provide a robust and flexible code for the astrochemical simulation community. △ Less

Submitted 8 May, 2023; originally announced May 2023.

arXiv:2305.01605 [pdf, ps, other]

Randomized algorithms for fully online multiprocessor scheduling with testing

Authors: Mingyang Gong, Zhi-Zhong Chen, Guohui Lin, Lusheng Wang

Abstract: We contribute the first randomized algorithm that is an integration of arbitrarily many deterministic algorithms for the fully online multiprocessor scheduling with testing problem. When there are two machines, we show that with two component algorithms its expected competitive ratio is already strictly smaller than the best proven deterministic competitive ratio lower bound. Such algorithmic resu… ▽ More We contribute the first randomized algorithm that is an integration of arbitrarily many deterministic algorithms for the fully online multiprocessor scheduling with testing problem. When there are two machines, we show that with two component algorithms its expected competitive ratio is already strictly smaller than the best proven deterministic competitive ratio lower bound. Such algorithmic results are rarely seen in the literature. Multiprocessor scheduling is one of the first combinatorial optimization problems that have received numerous studies. Recently, several research groups examined its testing variant, in which each job $J_j$ arrives with an upper bound $u_j$ on the processing time and a testing operation of length $t_j$; one can choose to execute $J_j$ for $u_j$ time, or to test $J_j$ for $t_j$ time to obtain the exact processing time $p_j$ followed by immediately executing the job for $p_j$ time. Our target problem is the fully online version, in which the jobs arrive in sequence so that the testing decision needs to be made at the job arrival as well as the designated machine. We propose an expected $(\sqrt{\varphi + 3} + 1) (\approx 3.1490)$-competitive randomized algorithm as a non-uniform probability distribution over arbitrarily many deterministic algorithms, where $\varphi = \frac {\sqrt{5} + 1}2$ is the Golden ratio. When there are two machines, we show that our randomized algorithm based on two deterministic algorithms is already expected $\frac {3 \varphi + 3 \sqrt{13 - 7\varphi}}4 (\approx 2.1839)$-competitive. Besides, we use Yao's principle to prove lower bounds of $1.6682$ and $1.6522$ on the expected competitive ratio for any randomized algorithm at the presence of at least three machines and only two machines, respectively, and prove a lower bound of $2.2117$ on the competitive ratio for any deterministic algorithm when there are only two machines. △ Less

Submitted 27 June, 2023; v1 submitted 2 May, 2023; originally announced May 2023.

Comments: 21 pages with 1 plot; an extended abstract to be submitted

arXiv:2304.12779 [pdf, ps, other]

An Approximation Algorithm for Covering Vertices by 4^+-Paths

Authors: Mingyang Gong, Zhi-Zhong Chen, Guohui Lin, Zhaohui Zhan

Abstract: This paper deals with the problem of finding a collection of vertex-disjoint paths in a given graph G=(V,E) such that each path has at least four vertices and the total number of vertices in these paths is maximized. The problem is NP-hard and admits an approximation algorithm which achieves a ratio of 2 and runs in O(|V|^8) time. The known algorithm is based on time-consuming local search, and it… ▽ More This paper deals with the problem of finding a collection of vertex-disjoint paths in a given graph G=(V,E) such that each path has at least four vertices and the total number of vertices in these paths is maximized. The problem is NP-hard and admits an approximation algorithm which achieves a ratio of 2 and runs in O(|V|^8) time. The known algorithm is based on time-consuming local search, and its authors ask whether one can design a better approximation algorithm by a completely different approach. In this paper, we answer their question in the affirmative by presenting a new approximation algorithm for the problem. Our algorithm achieves a ratio of 1.874 and runs in O(min{|E|^2|V|^2, |V|^5}) time. Unlike the previously best algorithm, ours starts with a maximum matching M of G and then tries to transform M into a solution by utilizing a maximum-weight path-cycle cover in a suitably constructed graph. △ Less

Submitted 25 April, 2023; originally announced April 2023.

arXiv:2304.12035 [pdf, other]

GRIG: Few-Shot Generative Residual Image Inpainting

Authors: Wanglong Lu, Xianta Jiang, Xiaogang **, Yong-Liang Yang, Minglun Gong, Tao Wang, Kaijie Shi, Hanli Zhao

Abstract: Image inpainting is the task of filling in missing or masked region of an image with semantically meaningful contents. Recent methods have shown significant improvement in dealing with large-scale missing regions. However, these methods usually require large training datasets to achieve satisfactory results and there has been limited research into training these models on a small number of samples… ▽ More Image inpainting is the task of filling in missing or masked region of an image with semantically meaningful contents. Recent methods have shown significant improvement in dealing with large-scale missing regions. However, these methods usually require large training datasets to achieve satisfactory results and there has been limited research into training these models on a small number of samples. To address this, we present a novel few-shot generative residual image inpainting method that produces high-quality inpainting results. The core idea is to propose an iterative residual reasoning method that incorporates Convolutional Neural Networks (CNNs) for feature extraction and Transformers for global reasoning within generative adversarial networks, along with image-level and patch-level discriminators. We also propose a novel forgery-patch adversarial training strategy to create faithful textures and detailed appearances. Extensive evaluations show that our method outperforms previous methods on the few-shot image inpainting task, both quantitatively and qualitatively. △ Less

Submitted 24 April, 2023; originally announced April 2023.

Comments: There are 12 pages and 10 figures in this paper

ACM Class: I.4.4; I.4.5; I.4.9

arXiv:2304.10672 [pdf, other]

doi 10.1103/PhysRevA.107.L040602

Accelerated quantum control in a three-level system by jum** along the geodesics

Authors: Musang Gong, Min Yu, Ralf Betzholz, Yaoming Chu, Pengcheng Yang, Zhenyu Wang, Jianming Cai

Abstract: In a solid-state spin system, we experimentally demonstrate a protocol for quantum-state population transfer with an improved efficiency compared to traditional stimulated Raman adiabatic passage (STIRAP). Using the ground-state triplet of the nitrogen-vacancy center in diamond, we show that the required evolution time for high-fidelity state transfer can be reduced by almost one order of magnitud… ▽ More In a solid-state spin system, we experimentally demonstrate a protocol for quantum-state population transfer with an improved efficiency compared to traditional stimulated Raman adiabatic passage (STIRAP). Using the ground-state triplet of the nitrogen-vacancy center in diamond, we show that the required evolution time for high-fidelity state transfer can be reduced by almost one order of magnitude. Furthermore, we establish an improved robustness against frequency detuning caused by magnetic noise as compared to STIRAP. These results provide a powerful tool for coherent spin manipulation in the context of quantum sensing and quantum computation. △ Less

Submitted 20 April, 2023; originally announced April 2023.

Comments: 8 pages, 6 figures

arXiv:2304.10075 [pdf, other]

Multiscale Representation for Real-Time Anti-Aliasing Neural Rendering

Authors: Dongting Hu, Zhenkai Zhang, Tingbo Hou, Tongliang Liu, Huan Fu, Mingming Gong

Abstract: The rendering scheme in neural radiance field (NeRF) is effective in rendering a pixel by casting a ray into the scene. However, NeRF yields blurred rendering results when the training images are captured at non-uniform scales, and produces aliasing artifacts if the test images are taken in distant views. To address this issue, Mip-NeRF proposes a multiscale representation as a conical frustum to… ▽ More The rendering scheme in neural radiance field (NeRF) is effective in rendering a pixel by casting a ray into the scene. However, NeRF yields blurred rendering results when the training images are captured at non-uniform scales, and produces aliasing artifacts if the test images are taken in distant views. To address this issue, Mip-NeRF proposes a multiscale representation as a conical frustum to encode scale information. Nevertheless, this approach is only suitable for offline rendering since it relies on integrated positional encoding (IPE) to query a multilayer perceptron (MLP). To overcome this limitation, we propose mip voxel grids (Mip-VoG), an explicit multiscale representation with a deferred architecture for real-time anti-aliasing rendering. Our approach includes a density Mip-VoG for scene geometry and a feature Mip-VoG with a small MLP for view-dependent color. Mip-VoG encodes scene scale using the level of detail (LOD) derived from ray differentials and uses quadrilinear interpolation to map a queried 3D location to its features and density from two neighboring downsampled voxel grids. To our knowledge, our approach is the first to offer multiscale training and real-time anti-aliasing rendering simultaneously. We conducted experiments on multiscale datasets, and the results show that our approach outperforms state-of-the-art real-time rendering baselines. △ Less

Submitted 18 September, 2023; v1 submitted 20 April, 2023; originally announced April 2023.

arXiv:2304.08138 [pdf, other]

Typos-aware Bottlenecked Pre-Training for Robust Dense Retrieval

Authors: Shengyao Zhuang, Linjun Shou, Jian Pei, Ming Gong, Houxing Ren, Guido Zuccon, Daxin Jiang

Abstract: Current dense retrievers (DRs) are limited in their ability to effectively process misspelled queries, which constitute a significant portion of query traffic in commercial search engines. The main issue is that the pre-trained language model-based encoders used by DRs are typically trained and fine-tuned using clean, well-curated text data. Misspelled queries are typically not found in the data u… ▽ More Current dense retrievers (DRs) are limited in their ability to effectively process misspelled queries, which constitute a significant portion of query traffic in commercial search engines. The main issue is that the pre-trained language model-based encoders used by DRs are typically trained and fine-tuned using clean, well-curated text data. Misspelled queries are typically not found in the data used for training these models, and thus misspelled queries observed at inference time are out-of-distribution compared to the data used for training and fine-tuning. Previous efforts to address this issue have focused on \textit{fine-tuning} strategies, but their effectiveness on misspelled queries remains lower than that of pipelines that employ separate state-of-the-art spell-checking components. To address this challenge, we propose ToRoDer (TypOs-aware bottlenecked pre-training for RObust DEnse Retrieval), a novel re-training strategy for DRs that increases their robustness to misspelled queries while preserving their effectiveness in downstream retrieval tasks. ToRoDer utilizes an encoder-decoder architecture where the encoder takes misspelled text with masked tokens as input and outputs bottlenecked information to the decoder. The decoder then takes as input the bottlenecked embeddings, along with token embeddings of the original text with the misspelled tokens masked out. The pre-training task is to recover the masked tokens for both the encoder and decoder. Our extensive experimental results and detailed ablation studies show that DRs pre-trained with ToRoDer exhibit significantly higher effectiveness on misspelled queries, sensibly closing the gap with pipelines that use a separate, complex spell-checker component, while retaining their effectiveness on correctly spelled queries. △ Less

Submitted 26 November, 2023; v1 submitted 17 April, 2023; originally announced April 2023.

Comments: 10 pages, accepted at SIGIR-AP

arXiv:2304.04732 [pdf, other]

doi 10.3847/1538-4357/accb86

The Physical Drivers and Observational Tracers of CO-to-H2 Conversion Factor Variations in Nearby Barred Galaxy Centers

Authors: Yu-Hsuan Teng, Karin M. Sandstrom, Jiayi Sun, Munan Gong, Alberto D. Bolatto, I-Da Chiang, Adam K. Leroy, Antonio Usero, Simon C. O. Glover, Ralf S. Klessen, Daizhong Liu, Miguel Querejeta, Eva Schinnerer, Frank Bigiel, Yixian Cao, Melanie Chevance, Cosima Eibensteiner, Kathryn Grasha, Frank P. Israel, Eric J. Murphy, Lukas Neumann, Hsi-An Pan, Francesca Pinna, Mattia C. Sormani, J. D. T. Smith , et al. (2 additional authors not shown)

Abstract: The CO-to-H$_2$ conversion factor ($α_\rm{CO}$) is central to measuring the amount and properties of molecular gas. It is known to vary with environmental conditions, and previous studies have revealed lower $α_\rm{CO}$ in the centers of some barred galaxies on kpc scales. To unveil the physical drivers of such variations, we obtained ALMA Band 3, 6, and 7 observations toward the inner 2 kpc of NG… ▽ More The CO-to-H$_2$ conversion factor ($α_\rm{CO}$) is central to measuring the amount and properties of molecular gas. It is known to vary with environmental conditions, and previous studies have revealed lower $α_\rm{CO}$ in the centers of some barred galaxies on kpc scales. To unveil the physical drivers of such variations, we obtained ALMA Band 3, 6, and 7 observations toward the inner 2 kpc of NGC 3627 and NGC 4321 tracing $^{12}$CO, $^{13}$CO, and C$^{18}$O lines on 100 pc scales. Our multi-line modeling and Bayesian likelihood analysis of these datasets reveal variations of molecular gas density, temperature, optical depth, and velocity dispersion, which are among the key drivers of $α_\rm{CO}$. The central 300 pc nuclei in both galaxies show strong enhancement of temperature $T_\rm{k}>100$ K and density $n_\rm{H_2}>10^3$ cm$^{-3}$. Assuming a CO-to-H$_2$ abundance of $3\times10^{-4}$, we derive 4-15 times lower $α_\rm{CO}$ than the Galactic value across our maps, which agrees well with previous kpc-scale measurements. Combining the results with our previous work on NGC 3351, we find a strong correlation of $α_\rm{CO}$ with low-J $^{12}$CO optical depths ($τ_\rm{CO}$), as well as an anti-correlation with $T_\rm{k}$. The $τ_\rm{CO}$ correlation explains most of the $α_\rm{CO}$ variation in the three galaxy centers, whereas changes in $T_\rm{k}$ influence $α_\rm{CO}$ to second order. Overall, the observed line width and $^{12}$CO/$^{13}$CO 2-1 line ratio correlate with $τ_\rm{CO}$ variation in these centers, and thus they are useful observational indicators for $α_\rm{CO}$ variation. We also test current simulation-based $α_\rm{CO}$ prescriptions and find a systematic overprediction, which likely originates from the mismatch of gas conditions between our data and the simulations. △ Less

Submitted 10 April, 2023; originally announced April 2023.

Comments: Accepted for publication in ApJ; 30 pages of main text + 3 appendices

arXiv:2303.17981 [pdf, other]

doi 10.1109/ICRA48891.2023.10161047

Knowledge Distillation for Feature Extraction in Underwater VSLAM

Authors: **ghe Yang, Mingming Gong, Girish Nair, Jung Hoon Lee, Jason Monty, Ye Pu

Abstract: In recent years, learning-based feature detection and matching have outperformed manually-designed methods in in-air cases. However, it is challenging to learn the features in the underwater scenario due to the absence of annotated underwater datasets. This paper proposes a cross-modal knowledge distillation framework for training an underwater feature detection and matching network (UFEN). In par… ▽ More In recent years, learning-based feature detection and matching have outperformed manually-designed methods in in-air cases. However, it is challenging to learn the features in the underwater scenario due to the absence of annotated underwater datasets. This paper proposes a cross-modal knowledge distillation framework for training an underwater feature detection and matching network (UFEN). In particular, we use in-air RGBD data to generate synthetic underwater images based on a physical underwater imaging formation model and employ these as the medium to distil knowledge from a teacher model SuperPoint pretrained on in-air images. We embed UFEN into the ORB-SLAM3 framework to replace the ORB feature by introducing an additional binarization layer. To test the effectiveness of our method, we built a new underwater dataset with groundtruth measurements named EASI (https://github.com/**ghe-mel/UFEN-SLAM), recorded in an indoor water tank for different turbidity levels. The experimental results on the existing dataset and our new dataset demonstrate the effectiveness of our method. △ Less

Submitted 31 March, 2023; originally announced March 2023.

Comments: Accepted by IEEE International Conference on Robotics and Automation (ICRA 2023),6 pages

arXiv:2303.16434 [pdf, other]

TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs

Authors: Yaobo Liang, Chenfei Wu, Ting Song, Wenshan Wu, Yan Xia, Yu Liu, Yang Ou, Shuai Lu, Lei Ji, Shaoguang Mao, Yun Wang, Linjun Shou, Ming Gong, Nan Duan

Abstract: Artificial Intelligence (AI) has made incredible progress recently. On the one hand, advanced foundation models like ChatGPT can offer powerful conversation, in-context learning and code generation abilities on a broad range of open-domain tasks. They can also generate high-level solution outlines for domain-specific tasks based on the common sense knowledge they have acquired. However, they still… ▽ More Artificial Intelligence (AI) has made incredible progress recently. On the one hand, advanced foundation models like ChatGPT can offer powerful conversation, in-context learning and code generation abilities on a broad range of open-domain tasks. They can also generate high-level solution outlines for domain-specific tasks based on the common sense knowledge they have acquired. However, they still face difficulties with some specialized tasks because they lack enough domain-specific data during pre-training or they often have errors in their neural network computations on those tasks that need accurate executions. On the other hand, there are also many existing models and systems (symbolic-based or neural-based) that can do some domain-specific tasks very well. However, due to the different implementation or working mechanisms, they are not easily accessible or compatible with foundation models. Therefore, there is a clear and pressing need for a mechanism that can leverage foundation models to propose task solution outlines and then automatically match some of the sub-tasks in the outlines to the off-the-shelf models and systems with special functionalities to complete them. Inspired by this, we introduce TaskMatrix.AI as a new AI ecosystem that connects foundation models with millions of APIs for task completion. Unlike most previous work that aimed to improve a single AI model, TaskMatrix.AI focuses more on using existing foundation models (as a brain-like central system) and APIs of other AI models and systems (as sub-task solvers) to achieve diversified tasks in both digital and physical domains. As a position paper, we will present our vision of how to build such an ecosystem, explain each key component, and use study cases to illustrate both the feasibility of this vision and the main challenges we need to address next. △ Less

Submitted 28 March, 2023; originally announced March 2023.

arXiv:2303.15078 [pdf, other]

Large Language Models are Diverse Role-Players for Summarization Evaluation

Authors: Ning Wu, Ming Gong, Linjun Shou, Shining Liang, Daxin Jiang

Abstract: Text summarization has a wide range of applications in many scenarios. The evaluation of the quality of the generated text is a complex problem. A big challenge to language evaluation is that there is a clear divergence between existing metrics and human evaluation. A document summary's quality can be assessed by human annotators on various criteria, both objective ones like grammar and correctnes… ▽ More Text summarization has a wide range of applications in many scenarios. The evaluation of the quality of the generated text is a complex problem. A big challenge to language evaluation is that there is a clear divergence between existing metrics and human evaluation. A document summary's quality can be assessed by human annotators on various criteria, both objective ones like grammar and correctness, and subjective ones like informativeness, succinctness, and appeal. Most of the automatic evaluation methods like BLUE/ROUGE may be not able to adequately capture the above dimensions. In this paper, we propose a new evaluation framework based on LLMs, which provides a comprehensive evaluation framework by comparing generated text and reference text from both objective and subjective aspects. First, we propose to model objective and subjective dimensions of generated text based on roleplayers prompting mechanism. Furthermore, we introduce a context-based prompting mechanism that is able to generate dynamic roleplayer profiles based on input context. Finally, we design a multi-roleplayer prompting technology based on batch prompting and integrate multiple outputs into the final evaluation results. Experimental results on three real datasets for summarization show that our model is highly competitive and has a very high consistency with human annotators. △ Less

Submitted 19 September, 2023; v1 submitted 27 March, 2023; originally announced March 2023.

Comments: NLPCC 2023

arXiv:2303.14991 [pdf, other]

Empowering Dual-Encoder with Query Generator for Cross-Lingual Dense Retrieval

Authors: Houxing Ren, Linjun Shou, Ning Wu, Ming Gong, Daxin Jiang

Abstract: In monolingual dense retrieval, lots of works focus on how to distill knowledge from cross-encoder re-ranker to dual-encoder retriever and these methods achieve better performance due to the effectiveness of cross-encoder re-ranker. However, we find that the performance of the cross-encoder re-ranker is heavily influenced by the number of training samples and the quality of negative samples, which… ▽ More In monolingual dense retrieval, lots of works focus on how to distill knowledge from cross-encoder re-ranker to dual-encoder retriever and these methods achieve better performance due to the effectiveness of cross-encoder re-ranker. However, we find that the performance of the cross-encoder re-ranker is heavily influenced by the number of training samples and the quality of negative samples, which is hard to obtain in the cross-lingual setting. In this paper, we propose to use a query generator as the teacher in the cross-lingual setting, which is less dependent on enough training samples and high-quality negative samples. In addition to traditional knowledge distillation, we further propose a novel enhancement method, which uses the query generator to help the dual-encoder align queries from different languages, but does not need any additional parallel sentences. The experimental results show that our method outperforms the state-of-the-art methods on two benchmark datasets. △ Less

Submitted 27 March, 2023; originally announced March 2023.

Comments: EMNLP 2022 main conference

arXiv:2303.14979 [pdf, other]

Lexicon-Enhanced Self-Supervised Training for Multilingual Dense Retrieval

Authors: Houxing Ren, Linjun Shou, Jian Pei, Ning Wu, Ming Gong, Daxin Jiang

Abstract: Recent multilingual pre-trained models have shown better performance in various multilingual tasks. However, these models perform poorly on multilingual retrieval tasks due to lacking multilingual training data. In this paper, we propose to mine and generate self-supervised training data based on a large-scale unlabeled corpus. We carefully design a mining method which combines the sparse and dens… ▽ More Recent multilingual pre-trained models have shown better performance in various multilingual tasks. However, these models perform poorly on multilingual retrieval tasks due to lacking multilingual training data. In this paper, we propose to mine and generate self-supervised training data based on a large-scale unlabeled corpus. We carefully design a mining method which combines the sparse and dense models to mine the relevance of unlabeled queries and passages. And we introduce a query generator to generate more queries in target languages for unlabeled passages. Through extensive experiments on Mr. TYDI dataset and an industrial dataset from a commercial search engine, we demonstrate that our method performs better than baselines based on various pre-trained multilingual models. Our method even achieves on-par performance with the supervised method on the latter dataset. △ Less

Submitted 27 March, 2023; originally announced March 2023.

Comments: EMNLP 2022 Findings

arXiv:2303.12997 [pdf, other]

FER-former: Multi-modal Transformer for Facial Expression Recognition

Authors: Yande Li, Mingjie Wang, Minglun Gong, Yonggang Lu, Li Liu

Abstract: The ever-increasing demands for intuitive interactions in Virtual Reality has triggered a boom in the realm of Facial Expression Recognition (FER). To address the limitations in existing approaches (e.g., narrow receptive fields and homogenous supervisory signals) and further cement the capacity of FER tools, a novel multifarious supervision-steering Transformer for FER in the wild is proposed in… ▽ More The ever-increasing demands for intuitive interactions in Virtual Reality has triggered a boom in the realm of Facial Expression Recognition (FER). To address the limitations in existing approaches (e.g., narrow receptive fields and homogenous supervisory signals) and further cement the capacity of FER tools, a novel multifarious supervision-steering Transformer for FER in the wild is proposed in this paper. Referred as FER-former, our approach features multi-granularity embedding integration, hybrid self-attention scheme, and heterogeneous domain-steering supervision. In specific, to dig deep into the merits of the combination of features provided by prevailing CNNs and Transformers, a hybrid stem is designed to cascade two types of learning paradigms simultaneously. Wherein, a FER-specific transformer mechanism is devised to characterize conventional hard one-hot label-focusing and CLIP-based text-oriented tokens in parallel for final classification. To ease the issue of annotation ambiguity, a heterogeneous domains-steering supervision module is proposed to make image features also have text-space semantic correlations by supervising the similarity between image features and text features. On top of the collaboration of multifarious token heads, diverse global receptive fields with multi-modal semantic cues are captured, thereby delivering superb learning capability. Extensive experiments on popular benchmarks demonstrate the superiority of the proposed FER-former over the existing state-of-the-arts. △ Less

Submitted 22 March, 2023; originally announced March 2023.

arXiv:2303.09174 [pdf, other]

Grab What You Need: Rethinking Complex Table Structure Recognition with Flexible Components Deliberation

Authors: Hao Liu, Xin Li, Mingming Gong, Bing Liu, Yunfei Wu, Deqiang Jiang, Yinsong Liu, Xing Sun

Abstract: Recently, Table Structure Recognition (TSR) task, aiming at identifying table structure into machine readable formats, has received increasing interest in the community. While impressive success, most single table component-based methods can not perform well on unregularized table cases distracted by not only complicated inner structure but also exterior capture distortion. In this paper, we raise… ▽ More Recently, Table Structure Recognition (TSR) task, aiming at identifying table structure into machine readable formats, has received increasing interest in the community. While impressive success, most single table component-based methods can not perform well on unregularized table cases distracted by not only complicated inner structure but also exterior capture distortion. In this paper, we raise it as Complex TSR problem, where the performance degeneration of existing methods is attributable to their inefficient component usage and redundant post-processing. To mitigate it, we shift our perspective from table component extraction towards the efficient multiple components leverage, which awaits further exploration in the field. Specifically, we propose a seminal method, termed GrabTab, equipped with newly proposed Component Deliberator. Thanks to its progressive deliberation mechanism, our GrabTab can flexibly accommodate to most complex tables with reasonable components selected but without complicated post-processing involved. Quantitative experimental results on public benchmarks demonstrate that our method significantly outperforms the state-of-the-arts, especially under more challenging scenes. △ Less

Submitted 16 March, 2023; originally announced March 2023.

Comments: Tech report

Showing 51–100 of 482 results for author: Gong, M