-
ARTrackV2: Prompting Autoregressive Tracker Where to Look and How to Describe
Authors:
Yifan Bai,
Zeyang Zhao,
Yihong Gong,
Xing Wei
Abstract:
We present ARTrackV2, which integrates two pivotal aspects of tracking: determining where to look (localization) and how to describe (appearance analysis) the target object across video frames. Building on the foundation of its predecessor, ARTrackV2 extends the concept by introducing a unified generative framework to "read out" object's trajectory and "retell" its appearance in an autoregressive…
▽ More
We present ARTrackV2, which integrates two pivotal aspects of tracking: determining where to look (localization) and how to describe (appearance analysis) the target object across video frames. Building on the foundation of its predecessor, ARTrackV2 extends the concept by introducing a unified generative framework to "read out" object's trajectory and "retell" its appearance in an autoregressive manner. This approach fosters a time-continuous methodology that models the joint evolution of motion and visual features, guided by previous estimates. Furthermore, ARTrackV2 stands out for its efficiency and simplicity, obviating the less efficient intra-frame autoregression and hand-tuned parameters for appearance updates. Despite its simplicity, ARTrackV2 achieves state-of-the-art performance on prevailing benchmark datasets while demonstrating remarkable efficiency improvement. In particular, ARTrackV2 achieves AO score of 79.5\% on GOT-10k, and AUC of 86.1\% on TrackingNet while being $3.6 \times$ faster than ARTrack. The code will be released.
△ Less
Submitted 13 February, 2024; v1 submitted 28 December, 2023;
originally announced December 2023.
-
LLM Factoscope: Uncovering LLMs' Factual Discernment through Inner States Analysis
Authors:
**wen He,
Yujia Gong,
Kai Chen,
Zi** Lin,
Chengan Wei,
Yue Zhao
Abstract:
Large Language Models (LLMs) have revolutionized various domains with extensive knowledge and creative capabilities. However, a critical issue with LLMs is their tendency to produce outputs that diverge from factual reality. This phenomenon is particularly concerning in sensitive applications such as medical consultation and legal advice, where accuracy is paramount. In this paper, we introduce th…
▽ More
Large Language Models (LLMs) have revolutionized various domains with extensive knowledge and creative capabilities. However, a critical issue with LLMs is their tendency to produce outputs that diverge from factual reality. This phenomenon is particularly concerning in sensitive applications such as medical consultation and legal advice, where accuracy is paramount. In this paper, we introduce the LLM factoscope, a novel Siamese network-based model that leverages the inner states of LLMs for factual detection. Our investigation reveals distinguishable patterns in LLMs' inner states when generating factual versus non-factual content. We demonstrate the LLM factoscope's effectiveness across various architectures, achieving over 96% accuracy in factual detection. Our work opens a new avenue for utilizing LLMs' inner states for factual detection and encourages further exploration into LLMs' inner workings for enhanced reliability and transparency.
△ Less
Submitted 29 December, 2023; v1 submitted 26 December, 2023;
originally announced December 2023.
-
Quantum-Assisted Online Task Offloading and Resource Allocation in MEC-Enabled Satellite-Aerial-Terrestrial Integrated Networks
Authors:
Yu Zhang,
Yanmin Gong,
Lei Fan,
Yu Wang,
Zhu Han,
Yuanxiong Guo
Abstract:
In the era of Internet of Things (IoT), multi-access edge computing (MEC)-enabled satellite-aerial-terrestrial integrated network (SATIN) has emerged as a promising technology to provide massive IoT devices with seamless and reliable communication and computation services. This paper investigates the cooperation of low Earth orbit (LEO) satellites, high altitude platforms (HAPs), and terrestrial b…
▽ More
In the era of Internet of Things (IoT), multi-access edge computing (MEC)-enabled satellite-aerial-terrestrial integrated network (SATIN) has emerged as a promising technology to provide massive IoT devices with seamless and reliable communication and computation services. This paper investigates the cooperation of low Earth orbit (LEO) satellites, high altitude platforms (HAPs), and terrestrial base stations (BSs) to provide relaying and computation services for vastly distributed IoT devices. Considering the uncertainty in dynamic SATIN systems, we formulate a stochastic optimization problem to minimize the time-average expected service delay by jointly optimizing resource allocation and task offloading while satisfying the energy constraints. To solve the formulated problem, we first develop a Lyapunov-based online control algorithm to decompose it into multiple one-slot problems. Since each one-slot problem is a large-scale mixed-integer nonlinear program (MINLP) that is intractable for classical computers, we further propose novel hybrid quantum-classical generalized Benders' decomposition (HQCGBD) algorithms to solve the problem efficiently by leveraging quantum advantages in parallel computing. Numerical results validate the effectiveness of the proposed MEC-enabled SATIN schemes.
△ Less
Submitted 25 December, 2023;
originally announced December 2023.
-
Quantum-Assisted Joint Caching and Power Allocation for Integrated Satellite-Terrestrial Networks
Authors:
Yu Zhang,
Yanmin Gong,
Lei Fan,
Yu Wang,
Zhu Han,
Yuanxiong Guo
Abstract:
Low earth orbit (LEO) satellite network can complement terrestrial networks for achieving global wireless coverage and improving delay-sensitive Internet services. This paper proposes an integrated satellite-terrestrial network (ISTN) architecture to provide ground users with seamless and reliable content delivery services. For optimal service provisioning in this architecture, we formulate an opt…
▽ More
Low earth orbit (LEO) satellite network can complement terrestrial networks for achieving global wireless coverage and improving delay-sensitive Internet services. This paper proposes an integrated satellite-terrestrial network (ISTN) architecture to provide ground users with seamless and reliable content delivery services. For optimal service provisioning in this architecture, we formulate an optimization model to maximize the network throughput by jointly optimizing content delivery policy, cache placement, and transmission power allocation. The resulting optimization model is a large-scale mixed-integer nonlinear program (MINLP) that is intractable for classical computer solvers. Inspired by quantum computing techniques, we propose a hybrid quantum-classical generalized Benders' decomposition (HQCGBD) algorithm to address this challenge. Specifically, we first exploit the generalized Benders' decomposition (GBD) to decompose the problem into a master problem and a subproblem and then leverage the state-of-art quantum annealer to solve the challenging master problem.
△ Less
Submitted 22 December, 2023;
originally announced December 2023.
-
Joint Channel Estimation and Cooperative Localization for Near-Field Ultra-Massive MIMO
Authors:
Ruoxiao Cao,
Hengtao He,
Xianghao Yu,
Shenghui Song,
Kaibin Huang,
Jun Zhang,
Yi Gong,
Khaled B. Letaief
Abstract:
The next-generation (6G) wireless networks are expected to provide not only seamless and high data-rate communications, but also ubiquitous sensing services. By providing vast spatial degrees of freedom (DoFs), ultra-massive multiple-input multiple-output (UM-MIMO) technology is a key enabler for both sensing and communications in 6G. However, the adoption of UM-MIMO leads to a shift from the far…
▽ More
The next-generation (6G) wireless networks are expected to provide not only seamless and high data-rate communications, but also ubiquitous sensing services. By providing vast spatial degrees of freedom (DoFs), ultra-massive multiple-input multiple-output (UM-MIMO) technology is a key enabler for both sensing and communications in 6G. However, the adoption of UM-MIMO leads to a shift from the far field to the near field in terms of the electromagnetic propagation, which poses novel challenges in system design. Specifically, near-field effects introduce highly non-linear spherical wave models that render existing designs based on plane wave assumptions ineffective. In this paper, we focus on two crucial tasks in sensing and communications, respectively, i.e., localization and channel estimation, and investigate their joint design by exploring the near-field propagation characteristics, achieving mutual benefits between two tasks. In addition, multiple base stations (BSs) are leveraged to collaboratively facilitate a cooperative localization framework. To address the joint channel estimation and cooperative localization problem for near-field UM-MIMO systems, we propose a variational Newtonized near-field channel estimation (VNNCE) algorithm and a Gaussian fusion cooperative localization (GFCL) algorithm. The VNNCE algorithm exploits the spatial DoFs provided by the near-field channel to obtain position-related soft information, while the GFCL algorithm fuses this soft information to achieve more accurate localization. Additionally, we introduce a joint architecture that seamlessly integrates channel estimation and cooperative localization.
△ Less
Submitted 21 December, 2023;
originally announced December 2023.
-
Estimating Photometric Redshift from Mock Flux for CSST Survey by using Weighted Random Forest
Authors:
Junhao Lu,
Zhijian Luo,
Zhu Chen,
Li** Fu,
Wei Du,
Yan Gong,
Yicheng Li,
Xian-Min Meng,
Zhirui Tang,
Shaohua Zhang,
Chenggang Shu,
Xingchen Zhou,
Zuhui Fan
Abstract:
Accurate estimation of photometric redshifts (photo-$z$) is crucial in studies of both galaxy evolution and cosmology using current and future large sky surveys. In this study, we employ Random Forest (RF), a machine learning algorithm, to estimate photo-$z$ and investigate the systematic uncertainties affecting the results. Using galaxy flux and color as input features, we construct a map** bet…
▽ More
Accurate estimation of photometric redshifts (photo-$z$) is crucial in studies of both galaxy evolution and cosmology using current and future large sky surveys. In this study, we employ Random Forest (RF), a machine learning algorithm, to estimate photo-$z$ and investigate the systematic uncertainties affecting the results. Using galaxy flux and color as input features, we construct a map** between input features and redshift by using a training set of simulated data, generated from the Hubble Space Telescope Advanced Camera for Surveys (HST-ACS) and COSMOS catalogue, with the expected instrumental effects of the planned China Space Station Telescope (CSST). To improve the accuracy and confidence of predictions, we incorporate inverse variance weighting and perturb the catalog using input feature errors. Our results show that weighted RF can achieve a photo-$z$ accuracy of $\rm σ_{NMAD}=0.025$ and an outlier fraction of $\rm η=2.045\%$, significantly better than the values of $\rm σ_{NMAD}=0.043$ and $\rm η=6.45\%$ obtained by the widely used Easy and Accurate Zphot from Yale (EAZY) software which uses template-fitting method. Furthermore, we have calculated the importance of each input feature for different redshift ranges and found that the most important input features reflect the approximate position of the break features in galaxy spectra, demonstrating the algorithm's ability to extract physical information from data. Additionally, we have established confidence indices and error bars for each prediction value based on the shape of the redshift probability distribution function, suggesting that screening sources with high confidence can further reduce the outlier fraction.
△ Less
Submitted 25 December, 2023; v1 submitted 20 December, 2023;
originally announced December 2023.
-
BloomVQA: Assessing Hierarchical Multi-modal Comprehension
Authors:
Yunye Gong,
Robik Shrestha,
Jared Claypoole,
Michael Cogswell,
Arijit Ray,
Christopher Kanan,
Ajay Divakaran
Abstract:
We propose a novel VQA dataset, BloomVQA, to facilitate comprehensive evaluation of large vision-language models on comprehension tasks. Unlike current benchmarks that often focus on fact-based memorization and simple reasoning tasks without theoretical grounding, we collect multiple-choice samples based on picture stories that reflect different levels of comprehension, as laid out in Bloom's Taxo…
▽ More
We propose a novel VQA dataset, BloomVQA, to facilitate comprehensive evaluation of large vision-language models on comprehension tasks. Unlike current benchmarks that often focus on fact-based memorization and simple reasoning tasks without theoretical grounding, we collect multiple-choice samples based on picture stories that reflect different levels of comprehension, as laid out in Bloom's Taxonomy, a classic framework for learning assessment widely adopted in education research. Our data maps to a novel hierarchical graph representation which enables automatic data augmentation and novel measures characterizing model consistency. We perform graded evaluation and reliability analysis on recent multi-modal models. In comparison to low-level tasks, we observe decreased performance on tasks requiring advanced comprehension and cognitive skills with up to 38.0\% drop in VQA accuracy. In comparison to earlier models, GPT-4V demonstrates improved accuracy over all comprehension levels and shows a tendency of bypassing visual inputs especially for higher-level tasks. Current models also show consistency patterns misaligned with human comprehension in various scenarios, demonstrating the need for improvement based on theoretically-grounded criteria.
△ Less
Submitted 10 June, 2024; v1 submitted 19 December, 2023;
originally announced December 2023.
-
3DAxiesPrompts: Unleashing the 3D Spatial Task Capabilities of GPT-4V
Authors:
Dingning Liu,
Xiaomeng Dong,
Renrui Zhang,
Xu Luo,
Peng Gao,
Xiaoshui Huang,
Yongshun Gong,
Zhihui Wang
Abstract:
In this work, we present a new visual prompting method called 3DAxiesPrompts (3DAP) to unleash the capabilities of GPT-4V in performing 3D spatial tasks. Our investigation reveals that while GPT-4V exhibits proficiency in discerning the position and interrelations of 2D entities through current visual prompting techniques, its abilities in handling 3D spatial tasks have yet to be explored. In our…
▽ More
In this work, we present a new visual prompting method called 3DAxiesPrompts (3DAP) to unleash the capabilities of GPT-4V in performing 3D spatial tasks. Our investigation reveals that while GPT-4V exhibits proficiency in discerning the position and interrelations of 2D entities through current visual prompting techniques, its abilities in handling 3D spatial tasks have yet to be explored. In our approach, we create a 3D coordinate system tailored to 3D imagery, complete with annotated scale information. By presenting images infused with the 3DAP visual prompt as inputs, we empower GPT-4V to ascertain the spatial positioning information of the given 3D target image with a high degree of precision. Through experiments, We identified three tasks that could be stably completed using the 3DAP method, namely, 2D to 3D Point Reconstruction, 2D to 3D point matching, and 3D Object Detection. We perform experiments on our proposed dataset 3DAP-Data, the results from these experiments validate the efficacy of 3DAP-enhanced GPT-4V inputs, marking a significant stride in 3D spatial task execution.
△ Less
Submitted 15 December, 2023;
originally announced December 2023.
-
Urban Region Embedding via Multi-View Contrastive Prediction
Authors:
Zechen Li,
Weiming Huang,
Kai Zhao,
Min Yang,
Yongshun Gong,
Meng Chen
Abstract:
Recently, learning urban region representations utilizing multi-modal data (information views) has become increasingly popular, for deep understanding of the distributions of various socioeconomic features in cities. However, previous methods usually blend multi-view information in a posteriors stage, falling short in learning coherent and consistent representations across different views. In this…
▽ More
Recently, learning urban region representations utilizing multi-modal data (information views) has become increasingly popular, for deep understanding of the distributions of various socioeconomic features in cities. However, previous methods usually blend multi-view information in a posteriors stage, falling short in learning coherent and consistent representations across different views. In this paper, we form a new pipeline to learn consistent representations across varying views, and propose the multi-view Contrastive Prediction model for urban Region embedding (ReCP), which leverages the multiple information views from point-of-interest (POI) and human mobility data. Specifically, ReCP comprises two major modules, namely an intra-view learning module utilizing contrastive learning and feature reconstruction to capture the unique information from each single view, and inter-view learning module that perceives the consistency between the two views using a contrastive prediction learning scheme. We conduct thorough experiments on two downstream tasks to assess the proposed model, i.e., land use clustering and region popularity prediction. The experimental results demonstrate that our model outperforms state-of-the-art baseline methods significantly in urban region representation learning.
△ Less
Submitted 15 December, 2023;
originally announced December 2023.
-
Learning a Low-Rank Feature Representation: Achieving Better Trade-Off between Stability and Plasticity in Continual Learning
Authors:
Zhenrong Liu,
Yang Li,
Yi Gong,
Yik-Chung Wu
Abstract:
In continual learning, networks confront a trade-off between stability and plasticity when trained on a sequence of tasks. To bolster plasticity without sacrificing stability, we propose a novel training algorithm called LRFR. This approach optimizes network parameters in the null space of the past tasks' feature representation matrix to guarantee the stability. Concurrently, we judiciously select…
▽ More
In continual learning, networks confront a trade-off between stability and plasticity when trained on a sequence of tasks. To bolster plasticity without sacrificing stability, we propose a novel training algorithm called LRFR. This approach optimizes network parameters in the null space of the past tasks' feature representation matrix to guarantee the stability. Concurrently, we judiciously select only a subset of neurons in each layer of the network while training individual tasks to learn the past tasks' feature representation matrix in low-rank. This increases the null space dimension when designing network parameters for subsequent tasks, thereby enhancing the plasticity. Using CIFAR-100 and TinyImageNet as benchmark datasets for continual learning, the proposed approach consistently outperforms state-of-the-art methods.
△ Less
Submitted 14 December, 2023;
originally announced December 2023.
-
Uncertainty Propagation through Trained Deep Neural Networks Using Factor Graphs
Authors:
Angel Daruna,
Yunye Gong,
Abhinav Rajvanshi,
Han-Pang Chiu,
Yi Yao
Abstract:
Predictive uncertainty estimation remains a challenging problem precluding the use of deep neural networks as subsystems within safety-critical applications. Aleatoric uncertainty is a component of predictive uncertainty that cannot be reduced through model improvements. Uncertainty propagation seeks to estimate aleatoric uncertainty by propagating input uncertainties to network predictions. Exist…
▽ More
Predictive uncertainty estimation remains a challenging problem precluding the use of deep neural networks as subsystems within safety-critical applications. Aleatoric uncertainty is a component of predictive uncertainty that cannot be reduced through model improvements. Uncertainty propagation seeks to estimate aleatoric uncertainty by propagating input uncertainties to network predictions. Existing uncertainty propagation techniques use one-way information flows, propagating uncertainties layer-by-layer or across the entire neural network while relying either on sampling or analytical techniques for propagation. Motivated by the complex information flows within deep neural networks (e.g. skip connections), we developed and evaluated a novel approach by posing uncertainty propagation as a non-linear optimization problem using factor graphs. We observed statistically significant improvements in performance over prior work when using factor graphs across most of our experiments that included three datasets and two neural network architectures. Our implementation balances the benefits of sampling and analytical propagation techniques, which we believe, is a key factor in achieving performance improvements.
△ Less
Submitted 10 December, 2023;
originally announced December 2023.
-
Competition-Level Problems are Effective LLM Evaluators
Authors:
Yiming Huang,
Zhenghao Lin,
Xiao Liu,
Yeyun Gong,
Shuai Lu,
Fangyu Lei,
Yaobo Liang,
Yelong Shen,
Chen Lin,
Nan Duan,
Weizhu Chen
Abstract:
Large language models (LLMs) have demonstrated impressive reasoning capabilities, yet there is ongoing debate about these abilities and the potential data contamination problem recently. This paper aims to evaluate the reasoning capacities of LLMs, specifically in solving recent competition-level programming problems in Codeforces, which are expert-crafted and unique, requiring deep understanding…
▽ More
Large language models (LLMs) have demonstrated impressive reasoning capabilities, yet there is ongoing debate about these abilities and the potential data contamination problem recently. This paper aims to evaluate the reasoning capacities of LLMs, specifically in solving recent competition-level programming problems in Codeforces, which are expert-crafted and unique, requiring deep understanding and robust reasoning skills. We first provide a comprehensive evaluation of GPT-4's peiceived zero-shot performance on this task, considering various aspects such as problems' release time, difficulties, and types of errors encountered. Surprisingly, the peiceived performance of GPT-4 has experienced a cliff like decline in problems after September 2021 consistently across all the difficulties and types of problems, which shows the potential data contamination, as well as the challenges for any existing LLM to solve unseen complex reasoning problems. We further explore various approaches such as fine-tuning, Chain-of-Thought prompting and problem description simplification, unfortunately none of them is able to consistently mitigate the challenges. Through our work, we emphasis the importance of this excellent data source for assessing the genuine reasoning capabilities of LLMs, and foster the development of LLMs with stronger reasoning abilities and better generalization in the future.
△ Less
Submitted 4 June, 2024; v1 submitted 4 December, 2023;
originally announced December 2023.
-
Highly sensitive magnetic properties and large linear magnetoresistance in antiferromagnetic CrxSe(0.875\lex\le1)single crystals
Authors:
Yuqing Bai,
Shuang Pan,
Ziqian Lu,
Yuanyuan Gong,
Guizhou Xu,
Feng Xu
Abstract:
CrxSe (x\le1) is a class of quasi-layered binary compounds with potential applications in spintronics due to its intriguing antiferromagnetic properties. In this work, CrxSe single crystals with high Cr content (x=0.87, 0.91 and 0.95) were grown, and their magnetic and transport properties were investigated in detail. It is found that with small increase of Cr content, the Néel temperature (TN) of…
▽ More
CrxSe (x\le1) is a class of quasi-layered binary compounds with potential applications in spintronics due to its intriguing antiferromagnetic properties. In this work, CrxSe single crystals with high Cr content (x=0.87, 0.91 and 0.95) were grown, and their magnetic and transport properties were investigated in detail. It is found that with small increase of Cr content, the Néel temperature (TN) of the samples can dramatically increase from 147 K to 257 K, accompanied with obvious changes in the magnetic anisotropy and hysteresis. The phenomena of field-induced spin-flop transitions were unveiled in these alloys, indicating their comparatively low anisotropy. The magnetoresistance (MR) of the three compounds showed positive dependence at low temperatures and particularly, non-saturated linear positive MR was observed in Cr0.91Se and Cr0.95Se, with a large value of 16.2% achieved in Cr0.91Se (10K, 9T). The calculated Fermi surface and MR showed that the quasi-linear MR is a product of carrier compensation. Our work revealed highly sensitive magnetic and transport properties in the Cr-Se compounds, which can lay foundation when constructing further antiferromagnetic spintronic devices based on them.
△ Less
Submitted 30 November, 2023;
originally announced November 2023.
-
Anisotropic Neural Representation Learning for High-Quality Neural Rendering
Authors:
Y. Wang,
J. Xu,
Y. Zeng,
Y. Gong
Abstract:
Neural radiance fields (NeRFs) have achieved impressive view synthesis results by learning an implicit volumetric representation from multi-view images. To project the implicit representation into an image, NeRF employs volume rendering that approximates the continuous integrals of rays as an accumulation of the colors and densities of the sampled points. Although this approximation enables effici…
▽ More
Neural radiance fields (NeRFs) have achieved impressive view synthesis results by learning an implicit volumetric representation from multi-view images. To project the implicit representation into an image, NeRF employs volume rendering that approximates the continuous integrals of rays as an accumulation of the colors and densities of the sampled points. Although this approximation enables efficient rendering, it ignores the direction information in point intervals, resulting in ambiguous features and limited reconstruction quality. In this paper, we propose an anisotropic neural representation learning method that utilizes learnable view-dependent features to improve scene representation and reconstruction. We model the volumetric function as spherical harmonic (SH)-guided anisotropic features, parameterized by multilayer perceptrons, facilitating ambiguity elimination while preserving the rendering efficiency. To achieve robust scene reconstruction without anisotropy overfitting, we regularize the energy of the anisotropic features during training. Our method is flexiable and can be plugged into NeRF-based frameworks. Extensive experiments show that the proposed representation can boost the rendering quality of various NeRFs and achieve state-of-the-art rendering performance on both synthetic and real-world scenes.
△ Less
Submitted 10 March, 2024; v1 submitted 30 November, 2023;
originally announced November 2023.
-
Forecasting the BAO Measurements of the CSST galaxy and AGN Spectroscopic Surveys
Authors:
Haitao Miao,
Yan Gong,
Xuelei Chen,
Zhiqi Huang,
Xiao-Dong Li,
Hu Zhan
Abstract:
The spectroscopic survey of China's Space Survey Telescope (CSST) is expected to obtain a huge number of slitless spectra, including more than one hundred million galaxy spectra and millions of active galactic nuclei (AGN) spectra. By making use of these spectra, we can measure the Baryon Acoustic Oscillation (BAO) signals over large redshift ranges with excellent precisions. In this work, we pred…
▽ More
The spectroscopic survey of China's Space Survey Telescope (CSST) is expected to obtain a huge number of slitless spectra, including more than one hundred million galaxy spectra and millions of active galactic nuclei (AGN) spectra. By making use of these spectra, we can measure the Baryon Acoustic Oscillation (BAO) signals over large redshift ranges with excellent precisions. In this work, we predict the CSST measurements of the post-reconstruction galaxy power spectra at $0<z<1.2$ and pre-reconstruction AGN power spectra at $0<z<4$, and derive the BAO signals at different redshift bins by constraining the BAO scaling parameters using the Markov Chain Monte Carlo method. Our result shows that the CSST spectroscopic survey can provide accurate BAO measurements with precisions higher than 1\% and 3\% for the galaxy and AGN surveys, respectively. By comparing with current measurements in the same range at low redshifts, this can improve the precisions by a factor of $2\sim3$, and similar precisions can be obtained in the pessimistic case. We also investigate the constraints on the cosmological parameters using the measured BAO data by the CSST, and obtain stringent constraint results for the energy density of dark matter, Hubble constant, and equation of state of dark energy.
△ Less
Submitted 29 May, 2024; v1 submitted 28 November, 2023;
originally announced November 2023.
-
Point Cloud Pre-training with Diffusion Models
Authors:
Xiao Zheng,
Xiaoshui Huang,
Guofeng Mei,
Yuenan Hou,
Zhaoyang Lyu,
Bo Dai,
Wanli Ouyang,
Yongshun Gong
Abstract:
Pre-training a model and then fine-tuning it on downstream tasks has demonstrated significant success in the 2D image and NLP domains. However, due to the unordered and non-uniform density characteristics of point clouds, it is non-trivial to explore the prior knowledge of point clouds and pre-train a point cloud backbone. In this paper, we propose a novel pre-training method called Point cloud Di…
▽ More
Pre-training a model and then fine-tuning it on downstream tasks has demonstrated significant success in the 2D image and NLP domains. However, due to the unordered and non-uniform density characteristics of point clouds, it is non-trivial to explore the prior knowledge of point clouds and pre-train a point cloud backbone. In this paper, we propose a novel pre-training method called Point cloud Diffusion pre-training (PointDif). We consider the point cloud pre-training task as a conditional point-to-point generation problem and introduce a conditional point generator. This generator aggregates the features extracted by the backbone and employs them as the condition to guide the point-to-point recovery from the noisy point cloud, thereby assisting the backbone in capturing both local and global geometric priors as well as the global point density distribution of the object. We also present a recurrent uniform sampling optimization strategy, which enables the model to uniformly recover from various noise levels and learn from balanced supervision. Our PointDif achieves substantial improvement across various real-world datasets for diverse downstream tasks such as classification, segmentation and detection. Specifically, PointDif attains 70.0% mIoU on S3DIS Area 5 for the segmentation task and achieves an average improvement of 2.4% on ScanObjectNN for the classification task compared to TAP. Furthermore, our pre-training framework can be flexibly applied to diverse point cloud backbones and bring considerable gains.
△ Less
Submitted 25 November, 2023;
originally announced November 2023.
-
Hyb-NeRF: A Multiresolution Hybrid Encoding for Neural Radiance Fields
Authors:
Yifan Wang,
Yi Gong,
Yuan Zeng
Abstract:
Recent advances in Neural radiance fields (NeRF) have enabled high-fidelity scene reconstruction for novel view synthesis. However, NeRF requires hundreds of network evaluations per pixel to approximate a volume rendering integral, making it slow to train. Caching NeRFs into explicit data structures can effectively enhance rendering speed but at the cost of higher memory usage. To address these is…
▽ More
Recent advances in Neural radiance fields (NeRF) have enabled high-fidelity scene reconstruction for novel view synthesis. However, NeRF requires hundreds of network evaluations per pixel to approximate a volume rendering integral, making it slow to train. Caching NeRFs into explicit data structures can effectively enhance rendering speed but at the cost of higher memory usage. To address these issues, we present Hyb-NeRF, a novel neural radiance field with a multi-resolution hybrid encoding that achieves efficient neural modeling and fast rendering, which also allows for high-quality novel view synthesis. The key idea of Hyb-NeRF is to represent the scene using different encoding strategies from coarse-to-fine resolution levels. Hyb-NeRF exploits memory-efficiency learnable positional features at coarse resolutions and the fast optimization speed and local details of hash-based feature grids at fine resolutions. In addition, to further boost performance, we embed cone tracing-based features in our learnable positional encoding that eliminates encoding ambiguity and reduces aliasing artifacts. Extensive experiments on both synthetic and real-world datasets show that Hyb-NeRF achieves faster rendering speed with better rending quality and even a lower memory footprint in comparison to previous state-of-the-art methods.
△ Less
Submitted 21 November, 2023;
originally announced November 2023.
-
The first Ka-band (26.1-35 GHz) blind line survey towards Orion KL
Authors:
Xunchuan Liu,
Tie Liu,
Zhiqiang Shen,
Sheng-Li Qin,
Qiuyi Luo,
Yan Gong,
Yu Cheng,
Christian Henkel,
Qilao Gu,
Fengyao Zhu,
Tianwei Zhang,
Rongbing Zhao,
Yajun Wu,
Bin Li,
Juan Li,
Zhang Zhao,
**qing Wang,
Weiye Zhong,
Qinghui Liu,
Bo Xia,
Li Fu,
Zhen Yan,
Chao Zhang,
Lingling Wang,
Qian Ye
, et al. (9 additional authors not shown)
Abstract:
We conducted a Ka-band (26.1--35 GHz) line survey towards Orion KL using the TianMa 65-m Radio Telescope (TMRT). It is the first blind line survey in the Ka band, and achieves a sensitivity of mK level (1--3 mK at a spectral resolution of $\sim$1 km s$^{-1}$). In total, 592 Gaussian features are extracted. Among them, 257 radio recombination lines (RRLs) are identified. The maximum $Δn$ of RRLs of…
▽ More
We conducted a Ka-band (26.1--35 GHz) line survey towards Orion KL using the TianMa 65-m Radio Telescope (TMRT). It is the first blind line survey in the Ka band, and achieves a sensitivity of mK level (1--3 mK at a spectral resolution of $\sim$1 km s$^{-1}$). In total, 592 Gaussian features are extracted. Among them, 257 radio recombination lines (RRLs) are identified. The maximum $Δn$ of RRLs of H, He and C are 20, 15, and 5, respectively. Through stacking, we have detected the $β$ lines of ion RRLs (RRLs of C$^+$ with possible contribution of other ions like O$^+$) for the first time, and tentative signal of the $γ$ lines of ion RRLs can also be seen on the stacked spectrum. Besides, 318 other line features were assigned to 37 molecular species, and ten of these species were not detected in the Q-band survey of TMRT. The vibrationally excited states of nine species were also detected. Emission of most species can be modeled under LTE. A number of transitions of E-CH3OH ($J_2-J_1$) display maser effects, which are confirmed by our modeling, and besides the bum** peak at $J\sim 6$ there is another peak at $J\sim 13$. Methylcyanoacetylene (CH$_3$C$_3$N) is detected in Orion KL for the first time. This work emphasizes that the Ka band, which was long-ignored for spectral line surveys, is very useful for surveying RRLs and molecular lines simultaneously.
△ Less
Submitted 20 November, 2023;
originally announced November 2023.
-
Analytical models of supermassive black holes in galaxies surrounded by dark matter halos
Authors:
Zibo Shen,
Anzhong Wang,
Yungui Gong,
Shaoyu Yin
Abstract:
In this Letter, we present five analytical models in closed forms, each representing a supermassive black hole (SMBH) located at the center of a galaxy surrounded by dark matter (DM) halo. The density profile of the halo vanishes inside twice the Schwarzschild radius of the hole and satisfies the weak, strong, and dominant energy conditions. The spacetime are asymptotically flat, and the differenc…
▽ More
In this Letter, we present five analytical models in closed forms, each representing a supermassive black hole (SMBH) located at the center of a galaxy surrounded by dark matter (DM) halo. The density profile of the halo vanishes inside twice the Schwarzschild radius of the hole and satisfies the weak, strong, and dominant energy conditions. The spacetime are asymptotically flat, and the difference among the models lies in the slopes of the density profiles in the spike and regions far from the center of the galaxy. Three of them represent cusp models, whereas the other two represent core models. With the well-known (generalized) Newman-Janis algorithm, rotating SMBHs with DM halos can be easily constructed from these models.
△ Less
Submitted 19 June, 2024; v1 submitted 20 November, 2023;
originally announced November 2023.
-
Modern extreme value statistics for Utopian extremes
Authors:
Jordan Richards,
Noura Alotaibi,
Daniela Cisneros,
Yan Gong,
Matheus B. Guerrero,
Paolo Redondo,
Xuanjie Shao
Abstract:
Capturing the extremal behaviour of data often requires bespoke marginal and dependence models which are grounded in rigorous asymptotic theory, and hence provide reliable extrapolation into the upper tails of the data-generating distribution. We present a toolbox of four methodological frameworks, motivated by modern extreme value theory, that can be used to accurately estimate extreme exceedance…
▽ More
Capturing the extremal behaviour of data often requires bespoke marginal and dependence models which are grounded in rigorous asymptotic theory, and hence provide reliable extrapolation into the upper tails of the data-generating distribution. We present a toolbox of four methodological frameworks, motivated by modern extreme value theory, that can be used to accurately estimate extreme exceedance probabilities or the corresponding level in either a univariate or multivariate setting. Our frameworks were used to facilitate the winning contribution of Team Yalla to the EVA (2023) Conference Data Challenge, which was organised for the 13$^\text{th}$ International Conference on Extreme Value Analysis. This competition comprised seven teams competing across four separate sub-challenges, with each requiring the modelling of data simulated from known, yet highly complex, statistical distributions, and extrapolation far beyond the range of the available samples in order to predict probabilities of extreme events. Data were constructed to be representative of real environmental data, sampled from the fantasy country of "Utopia"
△ Less
Submitted 1 May, 2024; v1 submitted 18 November, 2023;
originally announced November 2023.
-
Value FULCRA: Map** Large Language Models to the Multidimensional Spectrum of Basic Human Values
Authors:
**g Yao,
Xiaoyuan Yi,
Xiting Wang,
Yifan Gong,
Xing Xie
Abstract:
The rapid advancement of Large Language Models (LLMs) has attracted much attention to value alignment for their responsible development. However, how to define values in this context remains a largely unexplored question. Existing work mainly follows the Helpful, Honest, Harmless principle and specifies values as risk criteria formulated in the AI community, e.g., fairness and privacy protection,…
▽ More
The rapid advancement of Large Language Models (LLMs) has attracted much attention to value alignment for their responsible development. However, how to define values in this context remains a largely unexplored question. Existing work mainly follows the Helpful, Honest, Harmless principle and specifies values as risk criteria formulated in the AI community, e.g., fairness and privacy protection, suffering from poor clarity, adaptability and transparency. Inspired by basic values in humanity and social science across cultures, this work proposes a novel basic value alignment paradigm and introduces a value space spanned by basic value dimensions. All LLMs' behaviors can be mapped into the space by identifying the underlying values, possessing the potential to address the three challenges. To foster future research, we apply the representative Schwartz's Theory of Basic Values as an initialized example and construct FULCRA, a dataset consisting of 5k (LLM output, value vector) pairs. Our extensive analysis of FULCRA reveals the underlying relation between basic values and LLMs' behaviors, demonstrating that our approach not only covers existing mainstream risks but also anticipates possibly unidentified ones. Additionally, we present an initial implementation of the basic value evaluation and alignment, paving the way for future research in this line.
△ Less
Submitted 15 November, 2023;
originally announced November 2023.
-
Semantic-Relay-Aided Text Transmission: Placement Optimization and Bandwidth Allocation
Authors:
Tianyu Liu,
Changsheng You,
Zeyang Hu,
Chenyu Wu,
Yi Gong,
Kaibin Huang
Abstract:
Semantic communication has emerged as a promising technology to break the Shannon limit by extracting the meaning of source data and sending relevant semantic information only. However, some mobile devices may have limited computation and storage resources, which renders it difficult to deploy and implement the resource-demanding deep learning based semantic encoder/decoder. To tackle this challen…
▽ More
Semantic communication has emerged as a promising technology to break the Shannon limit by extracting the meaning of source data and sending relevant semantic information only. However, some mobile devices may have limited computation and storage resources, which renders it difficult to deploy and implement the resource-demanding deep learning based semantic encoder/decoder. To tackle this challenge, we propose in this paper a new semantic relay (SemRelay), which is equipped with a semantic receiver for assisting text transmission from a resource-abundant base station (BS) to a resource-constrained mobile device. Specifically, the SemRelay first decodes the semantic information sent by the BS (with a semantic transmitter) and then forwards it to the user by adopting conventional bit transmission, hence effectively improving the text transmission efficiency. We formulate an optimization problem to maximize the achievable (effective) bit rate by jointly designing the SemRelay placement and bandwidth allocation. Although this problem is non-convex and generally difficult to solve, we propose an efficient penalty-based algorithm to obtain a high-quality suboptimal solution. Numerical results show the close-to-optimal performance of the proposed algorithm as well as significant rate performance gain of the proposed SemRelay over conventional decode-and-forward relay.
△ Less
Submitted 16 November, 2023;
originally announced November 2023.
-
New advancements, challenges and opportunities of nanophotonics for neuromorphic computing: A state-of-the-art review
Authors:
Renjie Li,
Yuanhao Gong,
Hai Huang,
Yuze Zhou,
Sixuan Mao,
Connie Chang-Hasnain,
Zhaoyu Zhang
Abstract:
The expansion of optoelectronic devices on photonic integration platforms has led to significant growth in the field of photonic computing. Photonic integrated circuits have facilitated the creation of ultrafast artificial neural networks, forming the basis for a novel category of information processing devices. Their application extends to diverse domains such as medical diagnosis, language model…
▽ More
The expansion of optoelectronic devices on photonic integration platforms has led to significant growth in the field of photonic computing. Photonic integrated circuits have facilitated the creation of ultrafast artificial neural networks, forming the basis for a novel category of information processing devices. Their application extends to diverse domains such as medical diagnosis, language models, telecommunications, quantum computing, and the metaverse, addressing the escalating demands of machine learning and artificial intelligence (AI). In contrast, conventional electronics faces challenges in latency, crosstalk, and energy consumption. Neuromorphic photonics emerges as a compelling solution, featuring sub-nanosecond latencies, minimal heat dissipation, and high parallelism, expanding the scope of AI and Optical Neural Networks. This review explores recent advances in integrated photonic neuromorphic systems, focusing on materials and device engineering breakthroughs needed to overcome existing challenges. Examining various technologies in AI accelerators, from traditional optics to PICs, we assess energy efficiency through operations per joule and compute density in operations per squared millimeter per second. A comparative analysis highlights crucial technical aspects, emphasizing nanophotonic components like VCSEL lasers, optical interconnects, nanocavity resonators, and frequency microcombs. These components showcase recent breakthroughs in photonic engineering and materials science, enabling the creation of customized neuromorphic systems for AI tasks. Despite progress, current technologies face obstacles in achieving photonic AI accelerators with computing speed and energy efficiencies reaching the petaOPS range. The review explores potential future approaches in new devices, fabrication, materials, scalability, and integration to enhance critical performance metrics.
△ Less
Submitted 16 November, 2023;
originally announced November 2023.
-
Integrating Sensing, Communication, and Power Transfer: From Theory to Practice
Authors:
Xiaoyang Li,
Zidong Han,
Guangxu Zhu,
Yuanming Shi,
Jie Xu,
Yi Gong,
Qinyu Zhang,
Kaibin Huang,
Khaled B. Letaief
Abstract:
To support the development of internet-of-things applications, an enormous population of low-power devices are expected to be incorporated in wireless networks performing sensing and communication tasks. As a key technology for improving the data collection efficiency, integrated sensing and communication (ISAC) enables simultaneous data transmission and radar sensing by reusing the same radio sig…
▽ More
To support the development of internet-of-things applications, an enormous population of low-power devices are expected to be incorporated in wireless networks performing sensing and communication tasks. As a key technology for improving the data collection efficiency, integrated sensing and communication (ISAC) enables simultaneous data transmission and radar sensing by reusing the same radio signals. In addition to information carriers, wireless signals can also serve as energy delivers, which enables simultaneous wireless information and power transfer (SWIPT). To improve the energy and spectrum efficiency, the advantages of ISAC and SWIPT are expected to be exploited, leading to the emerging technology of integrating sensing, communication, and power transfer (ISCPT). In this article, a timely overview of ISCPT is provided with the description of the fundamentals, the characterization of the theoretical boundary, the discussion on the key technologies, and the demonstration of the implementation platform.
△ Less
Submitted 18 February, 2024; v1 submitted 15 November, 2023;
originally announced November 2023.
-
Just Ask One More Time! Self-Agreement Improves Reasoning of Language Models in (Almost) All Scenarios
Authors:
Lei Lin,
Jiayi Fu,
Pengli Liu,
Qingyang Li,
Yan Gong,
Junchen Wan,
Fuzheng Zhang,
Zhongyuan Wang,
Di Zhang,
Kun Gai
Abstract:
Although chain-of-thought (CoT) prompting combined with language models has achieved encouraging results on complex reasoning tasks, the naive greedy decoding used in CoT prompting usually causes the repetitiveness and local optimality. To address this shortcoming, ensemble-optimization tries to obtain multiple reasoning paths to get the final answer assembly. However, current ensemble-optimizatio…
▽ More
Although chain-of-thought (CoT) prompting combined with language models has achieved encouraging results on complex reasoning tasks, the naive greedy decoding used in CoT prompting usually causes the repetitiveness and local optimality. To address this shortcoming, ensemble-optimization tries to obtain multiple reasoning paths to get the final answer assembly. However, current ensemble-optimization methods either simply employ rule-based post-processing such as \textit{self-consistency}, or train an additional model based on several task-related human annotations to select the best one among multiple reasoning paths, yet fail to generalize to realistic settings where the type of input questions is unknown or the answer format of reasoning paths is unknown. To avoid their limitations, we propose \textbf{Self-Agreement}, a generalizable ensemble-optimization method applying in almost all scenarios where the type of input questions and the answer format of reasoning paths may be known or unknown. Self-agreement firstly samples from language model's decoder to generate a \textit{diverse} set of reasoning paths, and subsequently prompts the language model \textit{one more time} to determine the optimal answer by selecting the most \textit{agreed} answer among the sampled reasoning paths. Self-agreement simultaneously achieves remarkable performance on six public reasoning benchmarks and superior generalization capabilities.
△ Less
Submitted 24 May, 2024; v1 submitted 14 November, 2023;
originally announced November 2023.
-
FigStep: Jailbreaking Large Vision-language Models via Typographic Visual Prompts
Authors:
Yichen Gong,
Delong Ran,
**yuan Liu,
Conglei Wang,
Tianshuo Cong,
Anyu Wang,
Sisi Duan,
Xiaoyun Wang
Abstract:
Ensuring the safety of artificial intelligence-generated content (AIGC) is a longstanding topic in the artificial intelligence (AI) community, and the safety concerns associated with Large Language Models (LLMs) have been widely investigated. Recently, large vision-language models (VLMs) represent an unprecedented revolution, as they are built upon LLMs but can incorporate additional modalities (e…
▽ More
Ensuring the safety of artificial intelligence-generated content (AIGC) is a longstanding topic in the artificial intelligence (AI) community, and the safety concerns associated with Large Language Models (LLMs) have been widely investigated. Recently, large vision-language models (VLMs) represent an unprecedented revolution, as they are built upon LLMs but can incorporate additional modalities (e.g., images). However, the safety of VLMs lacks systematic evaluation, and there may be an overconfidence in the safety guarantees provided by their underlying LLMs. In this paper, to demonstrate that introducing additional modality modules leads to unforeseen AI safety issues, we propose FigStep, a straightforward yet effective jailbreaking algorithm against VLMs. Instead of feeding textual harmful instructions directly, FigStep converts the harmful content into images through typography to bypass the safety alignment within the textual module of the VLMs, inducing VLMs to output unsafe responses that violate common AI safety policies. In our evaluation, we manually review 46,500 model responses generated by 3 families of the promising open-source VLMs, i.e., LLaVA, MiniGPT4, and CogVLM (a total of 6 VLMs). The experimental results show that FigStep can achieve an average attack success rate of 82.50% on 500 harmful queries in 10 topics. Moreover, we demonstrate that the methodology of FigStep can even jailbreak GPT-4V, which already leverages an OCR detector to filter harmful queries. Above all, our work reveals that VLMs are vulnerable to jailbreaking attacks, which highlights the necessity of novel safety alignments between visual and textual modalities.
△ Less
Submitted 13 December, 2023; v1 submitted 9 November, 2023;
originally announced November 2023.
-
Noisy Pair Corrector for Dense Retrieval
Authors:
Hang Zhang,
Yeyun Gong,
Xingwei He,
Dayiheng Liu,
Daya Guo,
Jiancheng Lv,
Jian Guo
Abstract:
Most dense retrieval models contain an implicit assumption: the training query-document pairs are exactly matched. Since it is expensive to annotate the corpus manually, training pairs in real-world applications are usually collected automatically, which inevitably introduces mismatched-pair noise. In this paper, we explore an interesting and challenging problem in dense retrieval, how to train an…
▽ More
Most dense retrieval models contain an implicit assumption: the training query-document pairs are exactly matched. Since it is expensive to annotate the corpus manually, training pairs in real-world applications are usually collected automatically, which inevitably introduces mismatched-pair noise. In this paper, we explore an interesting and challenging problem in dense retrieval, how to train an effective model with mismatched-pair noise. To solve this problem, we propose a novel approach called Noisy Pair Corrector (NPC), which consists of a detection module and a correction module. The detection module estimates noise pairs by calculating the perplexity between annotated positive and easy negative documents. The correction module utilizes an exponential moving average (EMA) model to provide a soft supervised signal, aiding in mitigating the effects of noise. We conduct experiments on text-retrieval benchmarks Natural Question and TriviaQA, code-search benchmarks StaQC and SO-DS. Experimental results show that NPC achieves excellent performance in handling both synthetic and realistic noise.
△ Less
Submitted 7 November, 2023;
originally announced November 2023.
-
Maser Investigation toward Off-Plane Stars (MIOPS): detection of SiO masers in the Galactic thick disk and halo
Authors:
Wen** Yang,
Yuanwei Wu,
Yan Gong,
Nicolas Mauron,
Bo Zhang,
Karl M. Menten,
Xiaofeng Mai,
Dejian Liu,
Juan Li,
**g**g Li
Abstract:
Studying stars that are located off the Galactic plane is important for understanding the formation history of the Milky Way. We searched for SiO masers toward off-plane O-rich asymptotic giant branch (AGB) stars from the catalog presented by Mauron et al. (2019) in order to shed light on the origin of these objects. A total of 102 stars were observed in the SiO $J$=1-0, $v=1$ and 2 transitions wi…
▽ More
Studying stars that are located off the Galactic plane is important for understanding the formation history of the Milky Way. We searched for SiO masers toward off-plane O-rich asymptotic giant branch (AGB) stars from the catalog presented by Mauron et al. (2019) in order to shed light on the origin of these objects. A total of 102 stars were observed in the SiO $J$=1-0, $v=1$ and 2 transitions with the Effelsberg-100 m and Tianma-65 m telescopes. SiO masers were discovered in eight stars, all first detections. The measured maser velocities allow the first estimates of the host AGB stars' radial velocities. We find that the radial velocities of three stars (namely G068.881-24.615, G070.384-24.886, and G084.453-21.863) significantly deviate from the values expected from Galactic circular motion. The updated distances and 3D motions indicate that G068.881$-$24.615 is likely located in the Galactic halo, while G160.648-08.846 is probably located in the Galactic thin disk, and the other six stars are probably part of the Galactic thick disk.
△ Less
Submitted 20 October, 2023;
originally announced October 2023.
-
Constraining Ultralight Axions with CSST Weak Gravitational Lensing and Galaxy Clustering Photometric Surveys
Authors:
Hengjie Lin,
Furen Deng,
Yan Gong,
Xuelei Chen
Abstract:
Ultralight axion (ULA) can be one of the potential candidates for dark matter. The extremely low mass of the ULA can lead to a de Broglie wavelength the size of galaxies which results in a suppression of the growth of structure on small scales. In this work, we forecast the constraint on the ULA particle mass $m_{\text{a}}$ and relative fraction to dark matter…
▽ More
Ultralight axion (ULA) can be one of the potential candidates for dark matter. The extremely low mass of the ULA can lead to a de Broglie wavelength the size of galaxies which results in a suppression of the growth of structure on small scales. In this work, we forecast the constraint on the ULA particle mass $m_{\text{a}}$ and relative fraction to dark matter $f_{\text{a}} = Ω_{\text{a}}/Ω_{\text{d}}$ for the forthcoming Stage IV space-based optical survey equipment $\it{CSST}$ (China Space Station Telescope). We focus on the $\it{CSST}$ cosmic shear and galaxy clustering photometric surveys, and forecast the measurements of shear, galaxy, and galaxy-galaxy lensing power spectra (i.e. 3$\times$2pt). The effects of neutrino, baryonic feedback, and uncertainties of intrinsic alignment, shear calibration, galaxy bias, and photometric redshift are also included in the analysis. After performing a joint constraint on all the cosmological and systematical parameters based on the simulated data from the theoretical prediction, we obtain a lower limit of the ULA particle mass $\text{log}_{10}(m_{\text{a}}/\text{eV}) \geqslant -22.5$ and an upper limit of the ULA fraction $f_{\text{a}} \leqslant 0.83$ at 95\% confidence level, and $\text{log}_{10}(m_{\text{a}}/\text{eV}) \geqslant -21.9$ with $f_{\text{a}} \leqslant 0.77$ when ignoring the baryonic feedback. We find that the CSST photometric surveys can improve the constraint on the ULA mass by about one order of magnitude, compared to the current constraints using the same kind of observational data.
△ Less
Submitted 28 February, 2024; v1 submitted 16 October, 2023;
originally announced October 2023.
-
A global view on star formation: The GLOSTAR Galactic plane survey. IX. Radio Source Catalog III: 2<l<28, 36<l<40, 56<l<60 and |b|<1, VLA B-configuration
Authors:
A. Y. Yang,
S. A. Dzib,
J. S. Urquhart,
A. Brunthaler,
S. -N. X. Medina,
K. M. Menten,
F. Wyrowski,
G. N. Ortiz-León,
W. D. Cotton,
Y. Gong,
R. Dokara,
M. R. Rugel,
H. Beuther,
J. D. Pandian,
T. Csengeri,
V. S. Veena,
N. Roy,
H. Nguyen,
B. Winkel,
J. Ott,
C. Carrasco-Gonzalez,
S. Khan,
A. Cheema
Abstract:
As part of the GLOSTAR (GLObal view of STAR formation in the Milky Way) survey, we present the high-resolution continuum source catalog for the regions (l = 2-28, 36-40, 56-60, &|b|<1.0), observed with the Karl G. Jansky Very Large Array (VLA) in its B-configuration. The continuum images are optimized to detect compact sources on angular scales up to 4", and have a typical noise level of 1sigma ~…
▽ More
As part of the GLOSTAR (GLObal view of STAR formation in the Milky Way) survey, we present the high-resolution continuum source catalog for the regions (l = 2-28, 36-40, 56-60, &|b|<1.0), observed with the Karl G. Jansky Very Large Array (VLA) in its B-configuration. The continuum images are optimized to detect compact sources on angular scales up to 4", and have a typical noise level of 1sigma ~ 0.08mJy/beam for an angular resolution of 1", which makes GLOSTAR currently the highest resolution as well as the most sensitive radio survey of the northern Galactic plane at 4-8GHz. We extracted 13354 sources above a threshold of 5sigma and 5437 sources above 7sigma that represent the high-reliability catalog. We determined the in-band spectral index (alpha) for the sources in the 7sigma-threshold catalog. The mean value is alpha=-0.6, which indicates that the catalog is dominated by sources emitting non-thermal radio emission. We identified the most common source types detected in radio surveys: 251 HII region candidates (113 new), 282 planetary nebulae (PNe) candidates (127 new), 784 radio star candidates (581 new), and 4080 extragalactic radio source candidates (2175 new). A significant fraction of HII regions and PNe candidates have alpha<-0.1 indicating that these candidates could contain radio jets, winds or outflows from high-mass and low-mass stellar objects. We identified 245 variable radio sources by comparing the flux densities of compact sources from the GLOSTAR survey and the Co-Ordinated Radio `N' Infrared Survey for High-mass star formation (CORNISH), and find that most of them are infrared quiet. The catalog is typically 95% complete for point sources at a flux density of 0.6 mJy (i.e. typical 7sigma level) and the systematic positional uncertainty is <= 0.1". The GLOSTAR data and catalogs are available online at https://glostar.mpifr-bonn.mpg.de.
△ Less
Submitted 23 October, 2023; v1 submitted 15 October, 2023;
originally announced October 2023.
-
MetaBox: A Benchmark Platform for Meta-Black-Box Optimization with Reinforcement Learning
Authors:
Zeyuan Ma,
Hongshu Guo,
Jiacheng Chen,
Zhenrui Li,
Guojun Peng,
Yue-Jiao Gong,
Yining Ma,
Zhiguang Cao
Abstract:
Recently, Meta-Black-Box Optimization with Reinforcement Learning (MetaBBO-RL) has showcased the power of leveraging RL at the meta-level to mitigate manual fine-tuning of low-level black-box optimizers. However, this field is hindered by the lack of a unified benchmark. To fill this gap, we introduce MetaBox, the first benchmark platform expressly tailored for develo** and evaluating MetaBBO-RL…
▽ More
Recently, Meta-Black-Box Optimization with Reinforcement Learning (MetaBBO-RL) has showcased the power of leveraging RL at the meta-level to mitigate manual fine-tuning of low-level black-box optimizers. However, this field is hindered by the lack of a unified benchmark. To fill this gap, we introduce MetaBox, the first benchmark platform expressly tailored for develo** and evaluating MetaBBO-RL methods. MetaBox offers a flexible algorithmic template that allows users to effortlessly implement their unique designs within the platform. Moreover, it provides a broad spectrum of over 300 problem instances, collected from synthetic to realistic scenarios, and an extensive library of 19 baseline methods, including both traditional black-box optimizers and recent MetaBBO-RL methods. Besides, MetaBox introduces three standardized performance metrics, enabling a more thorough assessment of the methods. In a bid to illustrate the utility of MetaBox for facilitating rigorous evaluation and in-depth analysis, we carry out a wide-ranging benchmarking study on existing MetaBBO-RL methods. Our MetaBox is open-source and accessible at: https://github.com/GMC-DRL/MetaBox.
△ Less
Submitted 27 October, 2023; v1 submitted 12 October, 2023;
originally announced October 2023.
-
RSMS: Towards Reliable and Secure Metaverse Service Provision
Authors:
Yanwei Gong,
Xiaolin Chang,
Jelena Mišić,
Vojislav B. Mišić,
Yingying Yao
Abstract:
Establishing and sustaining Metaverse service necessitates an unprecedented scale of resources. This paper considers the deployment of Metaverse service in a cloud-edge resource architecture, which can satisfy the escalating demand for Metaverse service resources while ensuring both high bandwidth and low latency. We propose a novel mechanism, named Reliable and Secure Metaverse Service (RSMS), to…
▽ More
Establishing and sustaining Metaverse service necessitates an unprecedented scale of resources. This paper considers the deployment of Metaverse service in a cloud-edge resource architecture, which can satisfy the escalating demand for Metaverse service resources while ensuring both high bandwidth and low latency. We propose a novel mechanism, named Reliable and Secure Metaverse Service (RSMS), to ensure Metaverse service reliability and security without sacrificing performance. RSMS consists of two protocols: (1) One is a blockchain-based lightweight mutual authentication protocol concerning heterogeneous Metaverse service resource nodes (RNs) dynamically joining a Metaverse service resource pool while guaranteeing their trustworthiness, which guarantees the security of Metaverse service. (2) The other is a group authentication protocol used to form and maintain a stable and secure Metaverse service group composed by RNs, which ensures the reliability and enhances the security of Metaverse service. The reliability and security of Metaverse service under RSMS are thoroughly discussed, and informal and formal security analysis are conducted. Additionally, we study the impact of RSMS on Metaverse service throughput, demonstrating its lightweight feature.
△ Less
Submitted 8 October, 2023;
originally announced October 2023.
-
Adapting LLM Agents with Universal Feedback in Communication
Authors:
Kuan Wang,
Yadong Lu,
Michael Santacroce,
Yeyun Gong,
Chao Zhang,
Yelong Shen
Abstract:
Recent advances in large language models (LLMs) have demonstrated potential for LLM agents. To facilitate the training for these agents with both linguistic feedback and non-linguistic reward signals, we introduce Learning through Communication (LTC). We design a universal buffer to store all the feedback, and an iterative pipeline to enable an LLM agent to explore and update its policy in an give…
▽ More
Recent advances in large language models (LLMs) have demonstrated potential for LLM agents. To facilitate the training for these agents with both linguistic feedback and non-linguistic reward signals, we introduce Learning through Communication (LTC). We design a universal buffer to store all the feedback, and an iterative pipeline to enable an LLM agent to explore and update its policy in an given environment. To optimize agent interactions for task-specific learning with our universal buffer and pipeline, we introduce diverse communication patterns tailored for both single-agent and multi-agent environments. We evaluate the efficacy of our LTC approach on four diverse datasets: ALFWorld (single-agent), HotpotQA (multi-agent collaboration), Chameleon (multi-agent competition), and GSM8k (multi-agent teacher-student). On these data sets, LTC outperforms the supervised instruction fine-tuning baselines by 3.6% to 12%. These results highlight the versatility and efficiency of LTC in facilitating online adaptation for LLM agents.
△ Less
Submitted 13 April, 2024; v1 submitted 1 October, 2023;
originally announced October 2023.
-
Near-field Integrated Sensing and Communication: Opportunities and Challenges
Authors:
Jiayi Cong,
Changsheng You,
Jiapeng Li,
Li Chen,
Beixiong Zheng,
Yuanwei Liu,
Wen Wu,
Yi Gong,
Shi **,
Rui Zhang
Abstract:
With the extremely large-scale array XL-array deployed in future wireless systems, wireless communication and sensing are expected to operate in the radiative near-field region, which needs to be characterized by the spherical rather than planar wavefronts. Unlike most existing works that considered far-field integrated sensing and communication (ISAC), we study in this article the new near-field…
▽ More
With the extremely large-scale array XL-array deployed in future wireless systems, wireless communication and sensing are expected to operate in the radiative near-field region, which needs to be characterized by the spherical rather than planar wavefronts. Unlike most existing works that considered far-field integrated sensing and communication (ISAC), we study in this article the new near-field ISAC, which integrates both functions of sensing and communication in the near-field region. To this end, we first discuss the appealing advantages of near-field communication and sensing over their far-field counterparts, respectively. Then, we introduce three approaches for near-field ISAC, including joint near-field communication and sensing, sensing-assisted near-field communication, and communication-assisted near-field sensing. We discuss their individual research opportunities, new design issues, as well as propose promising solutions. Finally, several important directions in near-field ISAC are also highlighted to motivate future work.
△ Less
Submitted 17 October, 2023; v1 submitted 2 October, 2023;
originally announced October 2023.
-
ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving
Authors:
Zhibin Gou,
Zhihong Shao,
Yeyun Gong,
Yelong Shen,
Yujiu Yang,
Minlie Huang,
Nan Duan,
Weizhu Chen
Abstract:
Large language models have made significant progress in various language tasks, yet they still struggle with complex mathematics. In this paper, we propose ToRA a series of Tool-integrated Reasoning Agents designed to solve challenging mathematical problems by seamlessly integrating natural language reasoning with the utilization of external tools (e.g., computation libraries and symbolic solvers)…
▽ More
Large language models have made significant progress in various language tasks, yet they still struggle with complex mathematics. In this paper, we propose ToRA a series of Tool-integrated Reasoning Agents designed to solve challenging mathematical problems by seamlessly integrating natural language reasoning with the utilization of external tools (e.g., computation libraries and symbolic solvers), thereby amalgamating the analytical prowess of language and the computational efficiency of tools. To train ToRA, we curate interactive tool-use trajectories on mathematical datasets, apply imitation learning on the annotations, and propose output space sha** to further refine models' reasoning behavior. As a result, ToRA models significantly outperform open-source models on 10 mathematical reasoning datasets across all scales with 13%-19% absolute improvements on average. Notably, ToRA-7B reaches 44.6% on the competition-level dataset MATH, surpassing the best open-source model WizardMath-70B by 22% absolute. ToRA-Code-34B is also the first open-source model that achieves an accuracy exceeding 50% on MATH, which significantly outperforms GPT-4's CoT result, and is competitive with GPT-4 solving problems with programs. Additionally, we conduct a comprehensive analysis of the benefits and remaining challenges of tool interaction for mathematical reasoning, providing valuable insights for future research.
△ Less
Submitted 21 February, 2024; v1 submitted 29 September, 2023;
originally announced September 2023.
-
P2I-NET: Map** Camera Pose to Image via Adversarial Learning for New View Synthesis in Real Indoor Environments
Authors:
Xujie Kang,
Kanglin Liu,
Jiang Duan,
Yuanhao Gong,
Guo** Qiu
Abstract:
Given a new $6DoF$ camera pose in an indoor environment, we study the challenging problem of predicting the view from that pose based on a set of reference RGBD views. Existing explicit or implicit 3D geometry construction methods are computationally expensive while those based on learning have predominantly focused on isolated views of object categories with regular geometric structure. Differing…
▽ More
Given a new $6DoF$ camera pose in an indoor environment, we study the challenging problem of predicting the view from that pose based on a set of reference RGBD views. Existing explicit or implicit 3D geometry construction methods are computationally expensive while those based on learning have predominantly focused on isolated views of object categories with regular geometric structure. Differing from the traditional \textit{render-inpaint} approach to new view synthesis in the real indoor environment, we propose a conditional generative adversarial neural network (P2I-NET) to directly predict the new view from the given pose. P2I-NET learns the conditional distribution of the images of the environment for establishing the correspondence between the camera pose and its view of the environment, and achieves this through a number of innovative designs in its architecture and training lost function. Two auxiliary discriminator constraints are introduced for enforcing the consistency between the pose of the generated image and that of the corresponding real world image in both the latent feature space and the real world pose space. Additionally a deep convolutional neural network (CNN) is introduced to further reinforce this consistency in the pixel space. We have performed extensive new view synthesis experiments on real indoor datasets. Results show that P2I-NET has superior performance against a number of NeRF based strong baseline models. In particular, we show that P2I-NET is 40 to 100 times faster than these competitor techniques while synthesising similar quality images. Furthermore, we contribute a new publicly available indoor environment dataset containing 22 high resolution RGBD videos where each frame also has accurate camera pose parameters.
△ Less
Submitted 27 September, 2023;
originally announced September 2023.
-
Sulfur isotope ratios in the Large Magellanic Cloud
Authors:
Y. Gong,
C. Henkel,
K. M. Menten,
C. -H. R. Chen,
Z. Y. Zhang,
Y. T. Yan,
A. Weiss,
N. Langer,
J. Z. Wang,
R. Q. Mao,
X. D. Tang,
W. Yang,
Y. P. Ao,
M. Wang
Abstract:
Sulfur isotope ratios have emerged as a promising tool for tracing stellar nucleosynthesis, quantifying stellar populations, and investigating the chemical evolution of galaxies. While extensively studied in the Milky Way, in extragalactic environments they remain largely unexplored. We focus on investigating the sulfur isotope ratios in the Large Magellanic Cloud (LMC) to gain insights into sulfu…
▽ More
Sulfur isotope ratios have emerged as a promising tool for tracing stellar nucleosynthesis, quantifying stellar populations, and investigating the chemical evolution of galaxies. While extensively studied in the Milky Way, in extragalactic environments they remain largely unexplored. We focus on investigating the sulfur isotope ratios in the Large Magellanic Cloud (LMC) to gain insights into sulfur enrichment in this nearby system and to establish benchmarks for such ratios in metal-poor galaxies. We conducted pointed observations of CS and its isotopologues toward N113, one of the most prominent star-formation regions in the LMC, utilizing the Atacama Pathfinder EXperiment 12~m telescope. We present the first robust detection of C$^{33}$S in the LMC by successfully identifying two C$^{33}$S transitions on a large scale of $\sim$5 pc. Our measurements result in an accurate determination of the $^{34}$S/$^{33}$S isotope ratio, which is 2.0$\pm$0.2. Our comparative analysis indicates that the $^{32}$S/$^{33}$S and $^{34}$S/$^{33}$S isotope ratios are about a factor of 2 lower in the LMC than in the Milky Way. Our findings suggest that the low $^{34}$S/$^{33}$S isotope ratio in the LMC can be attributed to a combination of the age effect, low metallicity, and star formation history.
△ Less
Submitted 18 October, 2023; v1 submitted 26 September, 2023;
originally announced September 2023.
-
Navigating Text-To-Image Customization: From LyCORIS Fine-Tuning to Model Evaluation
Authors:
Shih-Ying Yeh,
Yu-Guan Hsieh,
Zhidong Gao,
Bernard B W Yang,
Giyeong Oh,
Yanmin Gong
Abstract:
Text-to-image generative models have garnered immense attention for their ability to produce high-fidelity images from text prompts. Among these, Stable Diffusion distinguishes itself as a leading open-source model in this fast-growing field. However, the intricacies of fine-tuning these models pose multiple challenges from new methodology integration to systematic evaluation. Addressing these iss…
▽ More
Text-to-image generative models have garnered immense attention for their ability to produce high-fidelity images from text prompts. Among these, Stable Diffusion distinguishes itself as a leading open-source model in this fast-growing field. However, the intricacies of fine-tuning these models pose multiple challenges from new methodology integration to systematic evaluation. Addressing these issues, this paper introduces LyCORIS (Lora beYond Conventional methods, Other Rank adaptation Implementations for Stable diffusion) [https://github.com/KohakuBlueleaf/LyCORIS], an open-source library that offers a wide selection of fine-tuning methodologies for Stable Diffusion. Furthermore, we present a thorough framework for the systematic assessment of varied fine-tuning techniques. This framework employs a diverse suite of metrics and delves into multiple facets of fine-tuning, including hyperparameter adjustments and the evaluation with different prompt types across various concept categories. Through this comprehensive approach, our work provides essential insights into the nuanced effects of fine-tuning parameters, bridging the gap between state-of-the-art research and practical application.
△ Less
Submitted 11 March, 2024; v1 submitted 26 September, 2023;
originally announced September 2023.
-
PLMM: Personal Large Language Models on Mobile Devices
Authors:
Yuanhao Gong
Abstract:
Inspired by Federated Learning, in this paper, we propose personal large models that are distilled from traditional large language models but more adaptive to local users' personal information such as education background and hobbies. We classify the large language models into three levels: the personal level, expert level and traditional level. The personal level models are adaptive to users' per…
▽ More
Inspired by Federated Learning, in this paper, we propose personal large models that are distilled from traditional large language models but more adaptive to local users' personal information such as education background and hobbies. We classify the large language models into three levels: the personal level, expert level and traditional level. The personal level models are adaptive to users' personal information. They encrypt the users' input and protect their privacy. The expert level models focus on merging specific knowledge such as finance, IT and art. The traditional models focus on the universal knowledge discovery and upgrading the expert models. In such classifications, the personal models directly interact with the user. For the whole system, the personal models have users' (encrypted) personal information. Moreover, such models must be small enough to be performed on personal computers or mobile devices. Finally, they also have to response in real-time for better user experience and produce high quality results. The proposed personal large models can be applied in a wide range of applications such as language and vision tasks.
△ Less
Submitted 4 May, 2024; v1 submitted 26 September, 2023;
originally announced September 2023.
-
Joint Audio and Speech Understanding
Authors:
Yuan Gong,
Alexander H. Liu,
Hongyin Luo,
Leonid Karlinsky,
James Glass
Abstract:
Humans are surrounded by audio signals that include both speech and non-speech sounds. The recognition and understanding of speech and non-speech audio events, along with a profound comprehension of the relationship between them, constitute fundamental cognitive capabilities. For the first time, we build a machine learning model, called LTU-AS, that has a conceptually similar universal audio perce…
▽ More
Humans are surrounded by audio signals that include both speech and non-speech sounds. The recognition and understanding of speech and non-speech audio events, along with a profound comprehension of the relationship between them, constitute fundamental cognitive capabilities. For the first time, we build a machine learning model, called LTU-AS, that has a conceptually similar universal audio perception and advanced reasoning ability. Specifically, by integrating Whisper as a perception module and LLaMA as a reasoning module, LTU-AS can simultaneously recognize and jointly understand spoken text, speech paralinguistics, and non-speech audio events - almost everything perceivable from audio signals.
△ Less
Submitted 10 December, 2023; v1 submitted 25 September, 2023;
originally announced September 2023.
-
REWAFL: Residual Energy and Wireless Aware Participant Selection for Efficient Federated Learning over Mobile Devices
Authors:
Y. Li,
X. Qin,
J. Geng,
R. Chen,
Y. Hou,
Y. Gong,
M. Pan,
P. Zhang
Abstract:
Participant selection (PS) helps to accelerate federated learning (FL) convergence, which is essential for the practical deployment of FL over mobile devices. While most existing PS approaches focus on improving training accuracy and efficiency rather than residual energy of mobile devices, which fundamentally determines whether the selected devices can participate. Meanwhile, the impacts of mobil…
▽ More
Participant selection (PS) helps to accelerate federated learning (FL) convergence, which is essential for the practical deployment of FL over mobile devices. While most existing PS approaches focus on improving training accuracy and efficiency rather than residual energy of mobile devices, which fundamentally determines whether the selected devices can participate. Meanwhile, the impacts of mobile devices' heterogeneous wireless transmission rates on PS and FL training efficiency are largely ignored. Moreover, PS causes the staleness issue. Prior research exploits isolated functions to force long-neglected devices to participate, which is decoupled from original PS designs. In this paper, we propose a residual energy and wireless aware PS design for efficient FL training over mobile devices (REWAFL). REW AFL introduces a novel PS utility function that jointly considers global FL training utilities and local energy utility, which integrates energy consumption and residual battery energy of candidate mobile devices. Under the proposed PS utility function framework, REW AFL further presents a residual energy and wireless aware local computing policy. Besides, REWAFL buries the staleness solution into its utility function and local computing policy. The experimental results show that REW AFL is effective in improving training accuracy and efficiency, while avoiding "flat battery" of mobile devices.
△ Less
Submitted 24 September, 2023;
originally announced September 2023.
-
Confidence Calibration for Systems with Cascaded Predictive Modules
Authors:
Yunye Gong,
Yi Yao,
Xiao Lin,
Ajay Divakaran,
Melinda Gervasio
Abstract:
Existing conformal prediction algorithms estimate prediction intervals at target confidence levels to characterize the performance of a regression model on new test samples. However, considering an autonomous system consisting of multiple modules, prediction intervals constructed for individual modules fall short of accommodating uncertainty propagation over different modules and thus cannot provi…
▽ More
Existing conformal prediction algorithms estimate prediction intervals at target confidence levels to characterize the performance of a regression model on new test samples. However, considering an autonomous system consisting of multiple modules, prediction intervals constructed for individual modules fall short of accommodating uncertainty propagation over different modules and thus cannot provide reliable predictions on system behavior. We address this limitation and present novel solutions based on conformal prediction to provide prediction intervals calibrated for a predictive system consisting of cascaded modules (e.g., an upstream feature extraction module and a downstream regression module). Our key idea is to leverage module-level validation data to characterize the system-level error distribution without direct access to end-to-end validation data. We provide theoretical justification and empirical experimental results to demonstrate the effectiveness of proposed solutions. In comparison to prediction intervals calibrated for individual modules, our solutions generate improved intervals with more accurate performance guarantees for system predictions, which are demonstrated on both synthetic systems and real-world systems performing overlap prediction for indoor navigation using the Matterport3D dataset.
△ Less
Submitted 21 September, 2023;
originally announced September 2023.
-
Beamforming Design for RIS-Aided THz Wideband Communication Systems
Authors:
Yihang Jiang,
Ziqin Zhou,
Xiaoyang Li,
Yi Gong
Abstract:
Benefiting from tens of GHz of bandwidth, terahertz (THz) communications has become a promising technology for future 6G networks. However, the conventional hybrid beamforming architecture based on frequency-independent phase-shifters is not able to cope with the beam split effect (BSE) in THz massive multiple-input multiple-output (MIMO) systems. Despite some work introducing the frequency-depend…
▽ More
Benefiting from tens of GHz of bandwidth, terahertz (THz) communications has become a promising technology for future 6G networks. However, the conventional hybrid beamforming architecture based on frequency-independent phase-shifters is not able to cope with the beam split effect (BSE) in THz massive multiple-input multiple-output (MIMO) systems. Despite some work introducing the frequency-dependent phase shifts via the time delay network to mitigate the beam splitting in THz wideband communications, the corresponding issue in reconfigurable intelligent surface (RIS)-aided communications has not been well investigated. In this paper, the BSE in THz massive MIMO is quantified by analyzing the array gain loss. A new beamforming architecture has been proposed to mitigate this effect under RIS-aided communications scenarios. Simulations are performed to evaluate the effectiveness of the proposed system architecture in combating the array gain loss.
△ Less
Submitted 21 September, 2023; v1 submitted 20 September, 2023;
originally announced September 2023.
-
Natural Language Embedded Programs for Hybrid Language Symbolic Reasoning
Authors:
Tianhua Zhang,
Jiaxin Ge,
Hongyin Luo,
Yung-Sung Chuang,
Mingye Gao,
Yuan Gong,
Xixin Wu,
Yoon Kim,
Helen Meng,
James Glass
Abstract:
How can we perform computations over natural language representations to solve tasks that require symbolic and numeric reasoning? We propose natural language embedded programs (NLEP) as a unifying framework for addressing math/symbolic reasoning, natural language understanding, and instruction following tasks. Our approach prompts a language model to generate full Python programs that define funct…
▽ More
How can we perform computations over natural language representations to solve tasks that require symbolic and numeric reasoning? We propose natural language embedded programs (NLEP) as a unifying framework for addressing math/symbolic reasoning, natural language understanding, and instruction following tasks. Our approach prompts a language model to generate full Python programs that define functions over data structures which contain natural language representations of structured knowledge. A Python interpreter then executes the generated code and prints the output. Despite using a task-general prompt, we find that this approach can improve upon strong baselines across a range of different tasks including math and symbolic reasoning, text classification, question answering, and instruction following. We found that the generated programs are interpretable since they outline the exact reasoning process followed by the program interpreter.
△ Less
Submitted 28 March, 2024; v1 submitted 19 September, 2023;
originally announced September 2023.
-
An Unified Search and Recommendation Foundation Model for Cold-Start Scenario
Authors:
Yuqi Gong,
Xichen Ding,
Yehui Su,
Kaiming Shen,
Zhongyi Liu,
Guannan Zhang
Abstract:
In modern commercial search engines and recommendation systems, data from multiple domains is available to jointly train the multi-domain model. Traditional methods train multi-domain models in the multi-task setting, with shared parameters to learn the similarity of multiple tasks, and task-specific parameters to learn the divergence of features, labels, and sample distributions of individual tas…
▽ More
In modern commercial search engines and recommendation systems, data from multiple domains is available to jointly train the multi-domain model. Traditional methods train multi-domain models in the multi-task setting, with shared parameters to learn the similarity of multiple tasks, and task-specific parameters to learn the divergence of features, labels, and sample distributions of individual tasks. With the development of large language models, LLM can extract global domain-invariant text features that serve both search and recommendation tasks. We propose a novel framework called S\&R Multi-Domain Foundation, which uses LLM to extract domain invariant features, and Aspect Gating Fusion to merge the ID feature, domain invariant text features and task-specific heterogeneous sparse features to obtain the representations of query and item. Additionally, samples from multiple search and recommendation scenarios are trained jointly with Domain Adaptive Multi-Task module to obtain the multi-domain foundation model. We apply the S\&R Multi-Domain foundation model to cold start scenarios in the pretrain-finetune manner, which achieves better performance than other SOTA transfer learning methods. The S\&R Multi-Domain Foundation model has been successfully deployed in Alipay Mobile Application's online services, such as content query recommendation and service card recommendation, etc.
△ Less
Submitted 16 September, 2023;
originally announced September 2023.
-
Hybrid Attention-based Encoder-decoder Model for Efficient Language Model Adaptation
Authors:
Shaoshi Ling,
Guoli Ye,
Rui Zhao,
Yifan Gong
Abstract:
Attention-based encoder-decoder (AED) speech recognition model has been widely successful in recent years. However, the joint optimization of acoustic model and language model in end-to-end manner has created challenges for text adaptation. In particular, effectively, quickly and inexpensively adapting text has become a primary concern for deploying AED systems in industry. To address this issue,…
▽ More
Attention-based encoder-decoder (AED) speech recognition model has been widely successful in recent years. However, the joint optimization of acoustic model and language model in end-to-end manner has created challenges for text adaptation. In particular, effectively, quickly and inexpensively adapting text has become a primary concern for deploying AED systems in industry. To address this issue, we propose a novel model, the hybrid attention-based encoder-decoder (HAED) speech recognition model that preserves the modularity of conventional hybrid automatic speech recognition systems. Our HAED model separates the acoustic and language models, allowing for the use of conventional text-based language model adaptation techniques. We demonstrate that the proposed HAED model yields 21\% Word Error Rate (WER) improvements in relative when out-of-domain text data is used for language model adaptation, and with only a minor degradation in WER on a general test set compared with conventional AED model.
△ Less
Submitted 13 September, 2023;
originally announced September 2023.
-
Gradient Domain Diffusion Models for Image Synthesis
Authors:
Yuanhao Gong
Abstract:
Diffusion models are getting popular in generative image and video synthesis. However, due to the diffusion process, they require a large number of steps to converge. To tackle this issue, in this paper, we propose to perform the diffusion process in the gradient domain, where the convergence becomes faster. There are two reasons. First, thanks to the Poisson equation, the gradient domain is mathe…
▽ More
Diffusion models are getting popular in generative image and video synthesis. However, due to the diffusion process, they require a large number of steps to converge. To tackle this issue, in this paper, we propose to perform the diffusion process in the gradient domain, where the convergence becomes faster. There are two reasons. First, thanks to the Poisson equation, the gradient domain is mathematically equivalent to the original image domain. Therefore, each diffusion step in the image domain has a unique corresponding gradient domain representation. Second, the gradient domain is much sparser than the image domain. As a result, gradient domain diffusion models converge faster. Several numerical experiments confirm that the gradient domain diffusion models are more efficient than the original diffusion models. The proposed method can be applied in a wide range of applications such as image processing, computer vision and machine learning tasks.
△ Less
Submitted 4 September, 2023;
originally announced September 2023.
-
On the improved dynamics approach in loop quantum black holes
Authors:
Hongchao Zhang,
Wen-Cong Gan,
Yungui Gong,
Anzhong Wang
Abstract:
In this paper, we consider the Böhmer-Vandersloot (BV) model of loop quantum black holes obtained from the improved dynamics approach. We adopt the Saini-Singh gauge, in which it was found analytically that the BV spacetime is geodesically complete. We show that black/white hole horizons do not exist in this geodesically complete spacetime. Instead, there exists only an infinite number of transiti…
▽ More
In this paper, we consider the Böhmer-Vandersloot (BV) model of loop quantum black holes obtained from the improved dynamics approach. We adopt the Saini-Singh gauge, in which it was found analytically that the BV spacetime is geodesically complete. We show that black/white hole horizons do not exist in this geodesically complete spacetime. Instead, there exists only an infinite number of transition surfaces, which always separate trapped regions from anti-trapped ones. Comments on the improved dynamics approach adopted in other models of loop quantum black holes are also given.
△ Less
Submitted 6 March, 2024; v1 submitted 29 August, 2023;
originally announced August 2023.
-
Protonated hydrogen cyanide as a tracer of pristine molecular gas
Authors:
Y. Gong,
F. J. Du,
C. Henkel,
A. M. Jacob,
A. Belloche,
J. Z. Wang,
K. M. Menten,
W. Yang,
D. H. Quan,
C. T. Bop,
G. N. Ortiz-León,
X. D. Tang,
M. R. Rugel,
S. Liu
Abstract:
Protonated hydrogen cyanide, HCNH$^{+}$, plays a fundamental role in astrochemistry because it is an intermediary in gas-phase ion-neutral reactions within cold molecular clouds. However, the impact of the environment on the chemistry of HCNH$^{+}$ remains poorly understood. With the IRAM-30 m and APEX-12 m observations, we report the first robust distribution of HCNH$^{+}$ in the Serpens filament…
▽ More
Protonated hydrogen cyanide, HCNH$^{+}$, plays a fundamental role in astrochemistry because it is an intermediary in gas-phase ion-neutral reactions within cold molecular clouds. However, the impact of the environment on the chemistry of HCNH$^{+}$ remains poorly understood. With the IRAM-30 m and APEX-12 m observations, we report the first robust distribution of HCNH$^{+}$ in the Serpens filament and in Serpens South. Our data suggest that HCNH$^{+}$ is abundant in cold and quiescent regions, but is deficit in active star-forming regions. The observed HCNH$^{+}$ fractional abundances relative to H$_{2}$ range from $3.1\times 10^{-11}$ in protostellar cores to $5.9\times 10^{-10}$ in prestellar cores, and the HCNH$^{+}$ abundance generally decreases with increasing H$_{2}$ column density, which suggests that HCNH$^{+}$ coevolves with cloud cores. Our observations and modeling results suggest that the abundance of HCNH$^{+}$ in cold molecular clouds is strongly dependent on the H$_{2}$ number density. The decrease in the abundance of HCNH$^{+}$ is caused by the fact that its main precursors (e.g., HCN and HNC) undergo freeze-out as the number density of H$_{2}$ increases. However, current chemical models cannot explain other observed trends, such as the fact that the abundance of HCNH$^{+}$ shows an anti-correlation with that of HCN and HNC, but a positive correlation with that of N$_{2}$H$^{+}$ in the southern part of the Serpens South northern clump. This indicates that additional chemical pathways have to be invoked for the formation of HCNH$^{+}$ via molecules like N$_{2}$ in regions in which HCN and HNC freeze out. Both the fact that HCNH$^{+}$ is most abundant in molecular cores prior to gravitational collapse and the fact that low-$J$ HCNH$^{+}$ transitions have very low H$_{2}$ critical densities make this molecular ion an excellent probe of pristine molecular gas.
△ Less
Submitted 29 August, 2023;
originally announced August 2023.
-
Including higher harmonics in gravitational-wave parameter estimation and cosmological implications for LISA
Authors:
Yi Gong,
Zhoujian Cao,
Junjie Zhao,
Li**g Shao
Abstract:
Massive black holes (MBHs) are crucial in sha** their host galaxies. How the MBH co-evolves with its host galaxy is a pressing problem in astrophysics and cosmology. The valuable information carried by the binary MBH is encoded in the gravitational waves (GWs), which will be detectable by the space-borne GW detector LISA. In the GW data analysis, usually, only the dominant $(2,2)$ mode of the GW…
▽ More
Massive black holes (MBHs) are crucial in sha** their host galaxies. How the MBH co-evolves with its host galaxy is a pressing problem in astrophysics and cosmology. The valuable information carried by the binary MBH is encoded in the gravitational waves (GWs), which will be detectable by the space-borne GW detector LISA. In the GW data analysis, usually, only the dominant $(2,2)$ mode of the GW signal is considered in the parameter estimation for LISA. However, including the higher harmonics in parameter estimation can break the degeneracy between the parameters, especially for the inclination angle and luminosity distance. This may enable the identification of GW signals without electromagnetic counterparts, known as ''dark sirens''. Thus, incorporating higher harmonics will be beneficial to resolve the Hubble tension and constrain the cosmological model. In this paper, we investigate the role of higher harmonics in the parameter estimation for GWs emitted by binary MBHs. We demonstrate that including $(3,3)$ mode can lead to a $10^3$-times improvement in angular resolution and a $10^4$-times improvement in luminosity distance. Meanwhile, our results indicate that considering higher harmonics increases the probability of identifying over 70% host galaxies from $10^{-2}\,\rm{Gpc}^3$ cosmological volume threshold (corresponding $10^5$ host galaxies), while the probability less than 8% for only the $(2,2)$ mode. Thus, our results underscore the importance of including higher modes in the GW signal from binary MBHs, for LISA at least $(3,3)$ mode.
△ Less
Submitted 25 August, 2023;
originally announced August 2023.