Search | arXiv e-print repository

doi 10.1016/j.cma.2024.116883

Hutchinson Trace Estimation for High-Dimensional and High-Order Physics-Informed Neural Networks

Authors: Zheyuan Hu, Zekun Shi, George Em Karniadakis, Kenji Kawaguchi

Abstract: Physics-Informed Neural Networks (PINNs) have proven effective in solving partial differential equations (PDEs), especially when some data are available by seamlessly blending data and physics. However, extending PINNs to high-dimensional and even high-order PDEs encounters significant challenges due to the computational cost associated with automatic differentiation in the residual loss. Herein,… ▽ More Physics-Informed Neural Networks (PINNs) have proven effective in solving partial differential equations (PDEs), especially when some data are available by seamlessly blending data and physics. However, extending PINNs to high-dimensional and even high-order PDEs encounters significant challenges due to the computational cost associated with automatic differentiation in the residual loss. Herein, we address the limitations of PINNs in handling high-dimensional and high-order PDEs by introducing Hutchinson Trace Estimation (HTE). Starting with the second-order high-dimensional PDEs ubiquitous in scientific computing, HTE transforms the calculation of the entire Hessian matrix into a Hessian vector product (HVP). This approach alleviates the computational bottleneck via Taylor-mode automatic differentiation and significantly reduces memory consumption from the Hessian matrix to HVP. We further showcase HTE's convergence to the original PINN loss and its unbiased behavior under specific conditions. Comparisons with Stochastic Dimension Gradient Descent (SDGD) highlight the distinct advantages of HTE, particularly in scenarios with significant variance among dimensions. We further extend HTE to higher-order and higher-dimensional PDEs, specifically addressing the biharmonic equation. By employing tensor-vector products (TVP), HTE efficiently computes the colossal tensor associated with the fourth-order high-dimensional biharmonic equation, saving memory and enabling rapid computation. The effectiveness of HTE is illustrated through experimental setups, demonstrating comparable convergence rates with SDGD under memory and speed constraints. Additionally, HTE proves valuable in accelerating the Gradient-Enhanced PINN (gPINN) version as well as the Biharmonic equation. Overall, HTE opens up a new capability in scientific machine learning for tackling high-order and high-dimensional PDEs. △ Less

Submitted 3 March, 2024; v1 submitted 22 December, 2023; originally announced December 2023.

Comments: Published in Computer Methods in Applied Mechanics and Engineering

MSC Class: 14J60

Journal ref: Computer Methods in Applied Mechanics and Engineering, Volume 424, 1 May 2024, 116883

arXiv:2312.13574 [pdf, ps, other]

Bulk reconstruction in flat holography

Authors: Bin Chen, Zezhou Hu

Abstract: In this note, we discuss the bulk reconstruction of massless free fields in flat space from the highest-weight representation of boundary Carrollian conformal field theory (CCFT). We expand the bulk field as a sum of infinite descendants of a primary state defined in the boundary CCFT, and discuss the Lorentz invariant bulk-boundary propagator in detail for the BMS_3/CCFT_2 case. In our calculatio… ▽ More In this note, we discuss the bulk reconstruction of massless free fields in flat space from the highest-weight representation of boundary Carrollian conformal field theory (CCFT). We expand the bulk field as a sum of infinite descendants of a primary state defined in the boundary CCFT, and discuss the Lorentz invariant bulk-boundary propagator in detail for the BMS_3/CCFT_2 case. In our calculation, it is necessary to introduce a nonzero mass at the beginning and take it as vanishing at the end. The framework we proposed has the potential to probe local bulk physics from the boundary CCFT. △ Less

Submitted 11 March, 2024; v1 submitted 20 December, 2023; originally announced December 2023.

Comments: 20 pages, references added; matched version in JHEP

arXiv:2312.13154 [pdf, other]

Joint Range-Velocity-Azimuth Estimation for OFDM-Based Integrated Sensing and Communication

Authors: Zelin Hu, Qibin Ye, Yixuan Huang, Su Hu, Gang Yang

Abstract: Orthogonal frequency division multiplexing (OFDM)-based integrated sensing and communication (ISAC) is promising for future sixth-generation mobile communication systems. Existing works focus on the joint estimation of the targets' range and velocity for OFDM-based ISAC systems. In contrast, this paper studies the three-dimensional joint estimation (3DJE) of range, velocity, and azimuth for OFDM-b… ▽ More Orthogonal frequency division multiplexing (OFDM)-based integrated sensing and communication (ISAC) is promising for future sixth-generation mobile communication systems. Existing works focus on the joint estimation of the targets' range and velocity for OFDM-based ISAC systems. In contrast, this paper studies the three-dimensional joint estimation (3DJE) of range, velocity, and azimuth for OFDM-based ISAC systems with multiple receive antennas. First, we establish the signal model and derive the Cramer-Rao bounds (CRBs) on the 3DJE. Furthermore, an auto-paired super-resolution 3DJE algorithm is proposed by exploiting the reconstructed observation sub-signal's translational invariance property in the time, frequency, and space domains. Finally, with the 5G New Radio parameter setup, simulation results show that the proposed algorithm achieves better estimation performance and its root mean square error is closer to the root of CRBs than existing methods. △ Less

Submitted 20 December, 2023; originally announced December 2023.

Comments: This manuscript has been submitted to the IEEE journal in 09-Aug-2023

arXiv:2312.12907 [pdf, ps, other]

doi 10.1103/PhysRevD.109.092001

Solar neutrino measurements using the full data period of Super-Kamiokande-IV

Authors: Super-Kamiokande Collaboration, :, K. Abe, C. Bronner, Y. Hayato, K. Hiraide, K. Hosokawa, K. Ieki, M. Ikeda, S. Imaizumi, K. Iyogi, J. Kameda, Y. Kanemura, R. Kaneshima, Y. Kashiwagi, Y. Kataoka, Y. Kato, Y. Kishimoto, S. Miki, S. Mine, M. Miura, T. Mochizuki, S. Moriyama, Y. Nagao, M. Nakahata , et al. (305 additional authors not shown)

Abstract: An analysis of solar neutrino data from the fourth phase of Super-Kamiokande~(SK-IV) from October 2008 to May 2018 is performed and the results are presented. The observation time of the data set of SK-IV corresponds to $2970$~days and the total live time for all four phases is $5805$~days. For more precise solar neutrino measurements, several improvements are applied in this analysis: lowering th… ▽ More An analysis of solar neutrino data from the fourth phase of Super-Kamiokande~(SK-IV) from October 2008 to May 2018 is performed and the results are presented. The observation time of the data set of SK-IV corresponds to $2970$~days and the total live time for all four phases is $5805$~days. For more precise solar neutrino measurements, several improvements are applied in this analysis: lowering the data acquisition threshold in May 2015, further reduction of the spallation background using neutron clustering events, precise energy reconstruction considering the time variation of the PMT gain. The observed number of solar neutrino events in $3.49$--$19.49$ MeV electron kinetic energy region during SK-IV is $65,443^{+390}_{-388}\,(\mathrm{stat.})\pm 925\,(\mathrm{syst.})$ events. Corresponding $\mathrm{^{8}B}$ solar neutrino flux is $(2.314 \pm 0.014\, \rm{(stat.)} \pm 0.040 \, \rm{(syst.)}) \times 10^{6}~\mathrm{cm^{-2}\,s^{-1}}$, assuming a pure electron-neutrino flavor component without neutrino oscillations. The flux combined with all SK phases up to SK-IV is $(2.336 \pm 0.011\, \rm{(stat.)} \pm 0.043 \, \rm{(syst.)}) \times 10^{6}~\mathrm{cm^{-2}\,s^{-1}}$. Based on the neutrino oscillation analysis from all solar experiments, including the SK $5805$~days data set, the best-fit neutrino oscillation parameters are $\rm{sin^{2} θ_{12,\,solar}} = 0.306 \pm 0.013 $ and $Δm^{2}_{21,\,\mathrm{solar}} = (6.10^{+ 0.95}_{-0.81}) \times 10^{-5}~\rm{eV}^{2}$, with a deviation of about 1.5$σ$ from the $Δm^{2}_{21}$ parameter obtained by KamLAND. The best-fit neutrino oscillation parameters obtained from all solar experiments and KamLAND are $\sin^{2} θ_{12,\,\mathrm{global}} = 0.307 \pm 0.012 $ and $Δm^{2}_{21,\,\mathrm{global}} = (7.50^{+ 0.19}_{-0.18}) \times 10^{-5}~\rm{eV}^{2}$. △ Less

Submitted 20 February, 2024; v1 submitted 20 December, 2023; originally announced December 2023.

Comments: 47 pages, 61 figures

Journal ref: Phys. Rev. D 109, 092001 (2024)

arXiv:2312.12602 [pdf]

Magnetism of noncolinear amorphous DyCo3 and TbCo3 thin films

Authors: Zexiang Hu, Ajay Jha, Katarzyna Siewierska, Ross Smith, Karsten Rode, Plamen Stamenov, J. M. D. Coey

Abstract: The magnetization of amorphous DyCo3 and TbCo3 is studied by magnetometry, anomalous Hall effect and magneto-optic Kerr effect to understand the temperature-dependent magnetic structure. A square magnetic hysteresis loop with perpendicular magnetic anisotropy and coercivity that reaches 3.5 T in the vicinity of the compensation temperature is seen in thin films. An anhysteretic soft component, see… ▽ More The magnetization of amorphous DyCo3 and TbCo3 is studied by magnetometry, anomalous Hall effect and magneto-optic Kerr effect to understand the temperature-dependent magnetic structure. A square magnetic hysteresis loop with perpendicular magnetic anisotropy and coercivity that reaches 3.5 T in the vicinity of the compensation temperature is seen in thin films. An anhysteretic soft component, seen in the magnetization of some films but not in their Hall or Kerr loops is an artefact due to sputter-deposition on the sides of the substrate. The temperature-dependence of the net rare earth moment from 4-300K is deduced, using the cobalt moment in amorphous YxCo1-x. The single-ion anisotropy of the quadrupole moments of the 4f atoms in the randomly-oriented local electrostatic field gradient overcomes their exchange coupling to the cobalt subnetwork, resulting in a sperimagnetic ground state where spins of the noncollinear rare-earth subnetwork are modelled by a distribution of rare earth moments within a cone whose axis is antiparallel to the ferromagnetic axis z of the cobalt subnetwork. The reduced magnetization (Jz)/J at T=0 is calculated from an atomic Hamiltonian as a function of the ratio of anisotropy to exchange energy per rare-earth atom for a range of angles between the local anisotropy axis and -z and then averaged over all directions in a hemisphere. The experimental and calculated values of (J-z)/J are close to 0.7 at low temperature for both Dy and Tb. On increasing temperature, the magnitude of the rare earth moment and the local random anisotropy that creates the cone are reduced; the cone closes and the structure approaches collinear ferrimagnetism well above ambient temperature. An asymmetric spin flop of the exchange-coupled subnetworks appears in the vicinity of the magnetization compensation temperatures of 175K for amorphous Dy0.25Co0.75 and 200 K for amorphous TbCo3. △ Less

Submitted 19 December, 2023; originally announced December 2023.

Comments: 23 pages, 12 figures

arXiv:2312.12090 [pdf, other]

GazeMoDiff: Gaze-guided Diffusion Model for Stochastic Human Motion Prediction

Authors: Haodong Yan, Zhiming Hu, Syn Schmitt, Andreas Bulling

Abstract: Human motion prediction is important for virtual reality (VR) applications, e.g., for realistic avatar animation. Existing methods have synthesised body motion only from observed past motion, despite the fact that human gaze is known to correlate strongly with body movements and is readily available in recent VR headsets. We present GazeMoDiff -- a novel gaze-guided denoising diffusion model to ge… ▽ More Human motion prediction is important for virtual reality (VR) applications, e.g., for realistic avatar animation. Existing methods have synthesised body motion only from observed past motion, despite the fact that human gaze is known to correlate strongly with body movements and is readily available in recent VR headsets. We present GazeMoDiff -- a novel gaze-guided denoising diffusion model to generate stochastic human motions. Our method first uses a graph attention network to learn the spatio-temporal correlations between eye gaze and human movements and to fuse them into cross-modal gaze-motion features. These cross-modal features are injected into a noise prediction network via a cross-attention mechanism and progressively denoised to generate realistic human full-body motions. Experimental results on the MoGaze and GIMO datasets demonstrate that our method outperforms the state-of-the-art methods by a large margin in terms of average displacement error (15.03% on MoGaze and 9.20% on GIMO). We further conducted an online user study to compare our method with state-of-the-art methods and the responses from 23 participants validate that the motions generated by our method are more realistic than those from other methods. Taken together, our work makes a first important step towards gaze-guided stochastic human motion prediction and guides future work on this important topic in VR research. △ Less

Submitted 19 December, 2023; originally announced December 2023.

arXiv:2312.12042 [pdf, other]

Pose2Gaze: Eye-body Coordination during Daily Activities for Gaze Prediction from Full-body Poses

Authors: Zhiming Hu, Jiahui Xu, Syn Schmitt, Andreas Bulling

Abstract: Human eye gaze plays a significant role in many virtual and augmented reality (VR/AR) applications, such as gaze-contingent rendering, gaze-based interaction, or eye-based activity recognition. However, prior works on gaze analysis and prediction have only explored eye-head coordination and were limited to human-object interactions. We first report a comprehensive analysis of eye-body coordination… ▽ More Human eye gaze plays a significant role in many virtual and augmented reality (VR/AR) applications, such as gaze-contingent rendering, gaze-based interaction, or eye-based activity recognition. However, prior works on gaze analysis and prediction have only explored eye-head coordination and were limited to human-object interactions. We first report a comprehensive analysis of eye-body coordination in various human-object and human-human interaction activities based on four public datasets collected in real-world (MoGaze), VR (ADT), as well as AR (GIMO and EgoBody) environments. We show that in human-object interactions, e.g. pick and place, eye gaze exhibits strong correlations with full-body motion while in human-human interactions, e.g. chat and teach, a person's gaze direction is correlated with the body orientation towards the interaction partner. Informed by these analyses we then present Pose2Gaze, a novel eye-body coordination model that uses a convolutional neural network and a spatio-temporal graph convolutional neural network to extract features from head direction and full-body poses, respectively, and then uses a convolutional neural network to predict eye gaze. We compare our method with state-of-the-art methods that predict eye gaze only from head movements and show that Pose2Gaze outperforms these baselines with an average improvement of 24.0% on MoGaze, 10.1% on ADT, 21.3% on GIMO, and 28.6% on EgoBody in mean angular error, respectively. We also show that our method significantly outperforms prior methods in the sample downstream task of eye-based activity recognition. These results underline the significant information content available in eye-body coordination during daily activities and open up a new direction for gaze prediction. △ Less

Submitted 10 June, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

Comments: Accepted at TVCG 2024, code available at https://zhiminghu.net/hu24_pose2gaze.html

arXiv:2312.10047 [pdf]

doi 10.5815/ijmecs.2023.06.03

Clustering Students According to their Academic Achievement Using Fuzzy Logic

Authors: Serhiy Balovsyak, Oleksandr Derevyanchuk, Hanna Kravchenko, Yuriy Ushenko, Zhengbing Hu

Abstract: The software for clustering students according to their educational achievements using fuzzy logic was developed in Python using the Google Colab cloud service. In the process of analyzing educational data, the problems of Data Mining are solved, since only some characteristics of the educational process are obtained from a large sample of data. Data clustering was performed using the classic K-Me… ▽ More The software for clustering students according to their educational achievements using fuzzy logic was developed in Python using the Google Colab cloud service. In the process of analyzing educational data, the problems of Data Mining are solved, since only some characteristics of the educational process are obtained from a large sample of data. Data clustering was performed using the classic K-Means method, which is characterized by simplicity and high speed. Cluster analysis was performed in the space of two features using the machine learning library scikit-learn (Python). The obtained clusters are described by fuzzy triangular membership functions, which allowed to correctly determine the membership of each student to a certain cluster. Creation of fuzzy membership functions is done using the scikit-fuzzy library. The development of fuzzy functions of objects belonging to clusters is also useful for educational purposes, as it allows a better understanding of the principles of using fuzzy logic. As a result of processing test educational data using the developed software, correct results were obtained. It is shown that the use of fuzzy membership functions makes it possible to correctly determine the belonging of students to certain clusters, even if such clusters are not clearly separated. Due to this, it is possible to more accurately determine the recommended level of difficulty of tasks for each student, depending on his previous evaluations. △ Less

Submitted 1 December, 2023; originally announced December 2023.

Comments: 13 pages,9 figures,ijmecs

Journal ref: International Journal of Modern Education and Computer Science(IJMECS), Vol.15, No.6, pp. 31-43, 2023

arXiv:2312.09926 [pdf, other]

FuXi-S2S: An accurate machine learning model for global subseasonal forecasts

Authors: Lei Chen, Xiaohui Zhong, Jie Wu, Deliang Chen, Qingchen Chao, Chensen Lin, Zixin Hu, Bo Lu, Hao Li, Yuan Qi

Abstract: Skillful subseasonal forecasts beyond 2 weeks are crucial for a wide range of applications across various sectors of society. Recently, state-of-the-art machine learning based weather forecasting models have made significant advancements, outperforming the high-resolution forecast (HRES) from the European Centre for Medium-Range Weather Forecasts (ECMWF). However, the full potential of machine lea… ▽ More Skillful subseasonal forecasts beyond 2 weeks are crucial for a wide range of applications across various sectors of society. Recently, state-of-the-art machine learning based weather forecasting models have made significant advancements, outperforming the high-resolution forecast (HRES) from the European Centre for Medium-Range Weather Forecasts (ECMWF). However, the full potential of machine learning models in subseasonal forecasts has yet to be fully explored. In this study, we introduce FuXi Subseasonal-to-Seasonal (FuXi-S2S), a machine learning based subseasonal forecasting model that provides global daily mean forecasts up to 42 days, covering 5 upper-air atmospheric variables at 13 pressure levels and 11 surface variables. FuXi-S2S integrates an enhanced FuXi base model with a perturbation module for flow-dependent perturbations in hidden features, and incorporates Perlin noise to perturb initial conditions. The model is developed using 72 years of daily statistics from ECMWF ERA5 reanalysis data. When compared to the ECMWF Subseasonal-to-Seasonal (S2S) reforecasts, the FuXi-S2S forecasts demonstrate superior deterministic and ensemble forecasts for total precipitation (TP), outgoing longwave radiation (OLR), and geopotential at 500 hPa (Z500). Although it shows slightly inferior performance in predicting 2-meter temperature (T2M), it has clear advantages over land area. Regarding the extreme forecasts, FuXi-S2S outperforms ECMWF S2S globally for TP. Furthermore, FuXi-S2S forecasts surpass the ECMWF S2S reforecasts in predicting the Madden Julian Oscillation (MJO), a key source of subseasonal predictability. They extend the skillful prediction of MJO from 30 days to 36 days. △ Less

Submitted 15 December, 2023; originally announced December 2023.

arXiv:2312.09641 [pdf, other]

Ins-HOI: Instance Aware Human-Object Interactions Recovery

Authors: Jiajun Zhang, Yuxiang Zhang, Hongwen Zhang, Xiao Zhou, Boyao Zhou, Ruizhi Shao, Zonghai Hu, Yebin Liu

Abstract: Accurately modeling detailed interactions between human/hand and object is an appealing yet challenging task. Current multi-view capture systems are only capable of reconstructing multiple subjects into a single, unified mesh, which fails to model the states of each instance individually during interactions. To address this, previous methods use template-based representations to track human/hand a… ▽ More Accurately modeling detailed interactions between human/hand and object is an appealing yet challenging task. Current multi-view capture systems are only capable of reconstructing multiple subjects into a single, unified mesh, which fails to model the states of each instance individually during interactions. To address this, previous methods use template-based representations to track human/hand and object. However, the quality of the reconstructions is limited by the descriptive capabilities of the templates so that these methods are inherently struggle with geometry details, pressing deformations and invisible contact surfaces. In this work, we propose an end-to-end Instance-aware Human-Object Interactions recovery (Ins-HOI) framework by introducing an instance-level occupancy field representation. However, the real-captured data is presented as a holistic mesh, unable to provide instance-level supervision. To address this, we further propose a complementary training strategy that leverages synthetic data to introduce instance-level shape priors, enabling the disentanglement of occupancy fields for different instances. Specifically, synthetic data, created by randomly combining individual scans of humans/hands and objects, guides the network to learn a coarse prior of instances. Meanwhile, real-captured data helps in learning the overall geometry and restricting interpenetration in contact areas. As demonstrated in experiments, our method Ins-HOI supports instance-level reconstruction and provides reasonable and realistic invisible contact surfaces even in cases of extremely close interaction. To facilitate the research of this task, we collect a large-scale, high-fidelity 3D scan dataset, including 5.2k high-quality scans with real-world human-chair and hand-object interactions. The code and data will be public for research purposes. △ Less

Submitted 21 March, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

Comments: Project Page: https://jiajunzhang16.github.io/ins-hoi/ , Code and Dataset Page: https://github.com/jiajunzhang16/ins-hoi

arXiv:2312.07813 [pdf, other]

On a Foundation Model for Operating Systems

Authors: Divyanshu Saxena, Nihal Sharma, Donghyun Kim, Rohit Dwivedula, Jiayi Chen, Chenxi Yang, Sriram Ravula, Zichao Hu, Aditya Akella, Sebastian Angel, Joydeep Biswas, Swarat Chaudhuri, Isil Dillig, Alex Dimakis, P. Brighten Godfrey, Daehyeok Kim, Chris Rossbach, Gang Wang

Abstract: This paper lays down the research agenda for a domain-specific foundation model for operating systems (OSes). Our case for a foundation model revolves around the observations that several OS components such as CPU, memory, and network subsystems are interrelated and that OS traces offer the ideal dataset for a foundation model to grasp the intricacies of diverse OS components and their behavior in… ▽ More This paper lays down the research agenda for a domain-specific foundation model for operating systems (OSes). Our case for a foundation model revolves around the observations that several OS components such as CPU, memory, and network subsystems are interrelated and that OS traces offer the ideal dataset for a foundation model to grasp the intricacies of diverse OS components and their behavior in varying environments and workloads. We discuss a wide range of possibilities that then arise, from employing foundation models as policy agents to utilizing them as generators and predictors to assist traditional OS control algorithms. Our hope is that this paper spurs further research into OS foundation models and creating the next generation of operating systems for the evolving computing landscape. △ Less

Submitted 12 December, 2023; originally announced December 2023.

Comments: Machine Learning for Systems Workshop at 37th NeurIPS Conference, 2023, New Orleans, LA, USA

arXiv:2312.06950 [pdf, other]

READ-PVLA: Recurrent Adapter with Partial Video-Language Alignment for Parameter-Efficient Transfer Learning in Low-Resource Video-Language Modeling

Authors: Thong Nguyen, Xiaobao Wu, Xinshuai Dong, Khoi Le, Zhiyuan Hu, Cong-Duy Nguyen, See-Kiong Ng, Luu Anh Tuan

Abstract: Fully fine-tuning pretrained large-scale transformer models has become a popular paradigm for video-language modeling tasks, such as temporal language grounding and video-language summarization. With a growing number of tasks and limited training data, such full fine-tuning approach leads to costly model storage and unstable training. To overcome these shortcomings, we introduce lightweight adapte… ▽ More Fully fine-tuning pretrained large-scale transformer models has become a popular paradigm for video-language modeling tasks, such as temporal language grounding and video-language summarization. With a growing number of tasks and limited training data, such full fine-tuning approach leads to costly model storage and unstable training. To overcome these shortcomings, we introduce lightweight adapters to the pre-trained model and only update them at fine-tuning time. However, existing adapters fail to capture intrinsic temporal relations among video frames or textual words. Moreover, they neglect the preservation of critical task-related information that flows from the raw video-language input into the adapter's low-dimensional space. To address these issues, we first propose a novel REcurrent ADapter (READ) that employs recurrent computation to enable temporal modeling capability. Second, we propose Partial Video-Language Alignment (PVLA) objective via the use of partial optimal transport to maintain task-related information flowing into our READ modules. We validate our READ-PVLA framework through extensive experiments where READ-PVLA significantly outperforms all existing fine-tuning strategies on multiple low-resource temporal language grounding and video-language summarization benchmarks. △ Less

Submitted 11 December, 2023; originally announced December 2023.

Comments: Accepted at AAAI 2024

arXiv:2312.06706 [pdf, other]

UNeR3D: Versatile and Scalable 3D RGB Point Cloud Generation from 2D Images in Unsupervised Reconstruction

Authors: Hongbin Lin, Juangui Xu, Qingfeng Xu, Zhengyu Hu, Handing Xu, Yunzhi Chen, Yongjun Hu, Zhenguo Nie

Abstract: In the realm of 3D reconstruction from 2D images, a persisting challenge is to achieve high-precision reconstructions devoid of 3D Ground Truth data reliance. We present UNeR3D, a pioneering unsupervised methodology that sets a new standard for generating detailed 3D reconstructions solely from 2D views. Our model significantly cuts down the training costs tied to supervised approaches and introdu… ▽ More In the realm of 3D reconstruction from 2D images, a persisting challenge is to achieve high-precision reconstructions devoid of 3D Ground Truth data reliance. We present UNeR3D, a pioneering unsupervised methodology that sets a new standard for generating detailed 3D reconstructions solely from 2D views. Our model significantly cuts down the training costs tied to supervised approaches and introduces RGB coloration to 3D point clouds, enriching the visual experience. Employing an inverse distance weighting technique for color rendering, UNeR3D ensures seamless color transitions, enhancing visual fidelity. Our model's flexible architecture supports training with any number of views, and uniquely, it is not constrained by the number of views used during training when performing reconstructions. It can infer with an arbitrary count of views during inference, offering unparalleled versatility. Additionally, the model's continuous spatial input domain allows the generation of point clouds at any desired resolution, empowering the creation of high-resolution 3D RGB point clouds. We solidify the reconstruction process with a novel multi-view geometric loss and color loss, demonstrating that our model excels with single-view inputs and beyond, thus resha** the paradigm of unsupervised learning in 3D vision. Our contributions signal a substantial leap forward in 3D vision, offering new horizons for content creation across diverse applications. Code is available at https://github.com/HongbinLin3589/UNeR3D. △ Less

Submitted 10 December, 2023; originally announced December 2023.

Comments: 17 pages

arXiv:2312.06656 [pdf, ps, other]

Schatten class Hankel operators on doubling Fock spaces and the Berger-Coburn phenomenon

Authors: Ghazaleh Asghari, Jani A. Virtanen, Zhangjian Hu

Abstract: Using the notion of integral distance to analytic functions, we give a characterization of Schatten class Hankel operators acting on doubling Fock spaces on the complex plane and use it to show that for $f\in L^{\infty}$, if $H_{f}$ is Hilbert-Schmidt, then so is $H_{\bar{f}}$. This property is known as the Berger-Coburn phenomenon. When $0<p\le 1$, we show that the Berger-Coburn phenomenon fails… ▽ More Using the notion of integral distance to analytic functions, we give a characterization of Schatten class Hankel operators acting on doubling Fock spaces on the complex plane and use it to show that for $f\in L^{\infty}$, if $H_{f}$ is Hilbert-Schmidt, then so is $H_{\bar{f}}$. This property is known as the Berger-Coburn phenomenon. When $0<p\le 1$, we show that the Berger-Coburn phenomenon fails for a large class of doubling Fock spaces. Along the way, we illustrate our results for the canonical weights $|z|^m$ when $m>0$. △ Less

Submitted 11 June, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

Comments: To appear in Journal of Mathematical Analysis and Applications

arXiv:2312.06550 [pdf, other]

LLM360: Towards Fully Transparent Open-Source LLMs

Authors: Zhengzhong Liu, Aurick Qiao, Willie Neiswanger, Hongyi Wang, Bowen Tan, Tianhua Tao, Junbo Li, Yuqi Wang, Suqi Sun, Omkar Pangarkar, Richard Fan, Yi Gu, Victor Miller, Yonghao Zhuang, Guowei He, Haonan Li, Fajri Koto, Li** Tang, Nikhil Ranjan, Zhiqiang Shen, Xuguang Ren, Roberto Iriondo, Cun Mu, Zhiting Hu, Mark Schulze , et al. (3 additional authors not shown)

Abstract: The recent surge in open-source Large Language Models (LLMs), such as LLaMA, Falcon, and Mistral, provides diverse options for AI practitioners and researchers. However, most LLMs have only released partial artifacts, such as the final model weights or inference code, and technical reports increasingly limit their scope to high-level design choices and surface statistics. These choices hinder prog… ▽ More The recent surge in open-source Large Language Models (LLMs), such as LLaMA, Falcon, and Mistral, provides diverse options for AI practitioners and researchers. However, most LLMs have only released partial artifacts, such as the final model weights or inference code, and technical reports increasingly limit their scope to high-level design choices and surface statistics. These choices hinder progress in the field by degrading transparency into the training of LLMs and forcing teams to rediscover many details in the training process. We present LLM360, an initiative to fully open-source LLMs, which advocates for all training code and data, model checkpoints, and intermediate results to be made available to the community. The goal of LLM360 is to support open and collaborative AI research by making the end-to-end LLM training process transparent and reproducible by everyone. As a first step of LLM360, we release two 7B parameter LLMs pre-trained from scratch, Amber and CrystalCoder, including their training code, data, intermediate checkpoints, and analyses (at https://www.llm360.ai). We are committed to continually pushing the boundaries of LLMs through this open-source effort. More large-scale and stronger models are underway and will be released in the future. △ Less

Submitted 11 December, 2023; originally announced December 2023.

arXiv:2312.05230 [pdf, other]

Language Models, Agent Models, and World Models: The LAW for Machine Reasoning and Planning

Authors: Zhiting Hu, Tianmin Shu

Abstract: Despite their tremendous success in many applications, large language models often fall short of consistent reasoning and planning in various (language, embodied, and social) scenarios, due to inherent limitations in their inference, learning, and modeling capabilities. In this position paper, we present a new perspective of machine reasoning, LAW, that connects the concepts of Language models, Ag… ▽ More Despite their tremendous success in many applications, large language models often fall short of consistent reasoning and planning in various (language, embodied, and social) scenarios, due to inherent limitations in their inference, learning, and modeling capabilities. In this position paper, we present a new perspective of machine reasoning, LAW, that connects the concepts of Language models, Agent models, and World models, for more robust and versatile reasoning capabilities. In particular, we propose that world and agent models are a better abstraction of reasoning, that introduces the crucial elements of deliberate human-like reasoning, including beliefs about the world and other agents, anticipation of consequences, goals/rewards, and strategic planning. Crucially, language models in LAW serve as a backend to implement the system or its elements and hence provide the computational power and adaptability. We review the recent studies that have made relevant progress and discuss future research directions towards operationalizing the LAW framework. △ Less

Submitted 8 December, 2023; originally announced December 2023.

Comments: Position paper. Accompanying NeurIPS2023 Tutorial: https://sites.google.com/view/neurips2023law/home

arXiv:2312.04819 [pdf, other]

Attention-Guided Contrastive Role Representations for Multi-Agent Reinforcement Learning

Authors: Zican Hu, Zongzhang Zhang, Huaxiong Li, Chunlin Chen, Hongyu Ding, Zhi Wang

Abstract: Real-world multi-agent tasks usually involve dynamic team composition with the emergence of roles, which should also be a key to efficient cooperation in multi-agent reinforcement learning (MARL). Drawing inspiration from the correlation between roles and agent's behavior patterns, we propose a novel framework of **A**ttention-guided **CO**ntrastive **R**ole representation learning for **M**ARL (*… ▽ More Real-world multi-agent tasks usually involve dynamic team composition with the emergence of roles, which should also be a key to efficient cooperation in multi-agent reinforcement learning (MARL). Drawing inspiration from the correlation between roles and agent's behavior patterns, we propose a novel framework of **A**ttention-guided **CO**ntrastive **R**ole representation learning for **M**ARL (**ACORM**) to promote behavior heterogeneity, knowledge transfer, and skillful coordination across agents. First, we introduce mutual information maximization to formalize role representation learning, derive a contrastive learning objective, and concisely approximate the distribution of negative pairs. Second, we leverage an attention mechanism to prompt the global state to attend to learned role representations in value decomposition, implicitly guiding agent coordination in a skillful role space to yield more expressive credit assignment. Experiments on challenging StarCraft II micromanagement and Google research football tasks demonstrate the state-of-the-art performance of our method and its advantages over existing approaches. Our code is available at [https://github.com/NJU-RL/ACORM](https://github.com/NJU-RL/ACORM). △ Less

Submitted 2 March, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

arXiv:2312.04233 [pdf]

Fine-tuning vision foundation model for crack segmentation in civil infrastructures

Authors: Kang Ge, Chen Wang, Yutao Guo, Yansong Tang, Zhenzhong Hu, Hongbing Chen

Abstract: Large-scale foundation models have become the mainstream deep learning method, while in civil engineering, the scale of AI models is strictly limited. In this work, a vision foundation model is introduced for crack segmentation. Two parameter-efficient fine-tuning methods, adapter and low-rank adaptation, are adopted to fine-tune the foundation model in semantic segmentation: the Segment Anything… ▽ More Large-scale foundation models have become the mainstream deep learning method, while in civil engineering, the scale of AI models is strictly limited. In this work, a vision foundation model is introduced for crack segmentation. Two parameter-efficient fine-tuning methods, adapter and low-rank adaptation, are adopted to fine-tune the foundation model in semantic segmentation: the Segment Anything Model (SAM). The fine-tuned CrackSAM shows excellent performance on different scenes and materials. To test the zero-shot performance of the proposed method, two unique datasets related to road and exterior wall cracks are collected, annotated and open-sourced, for a total of 810 images. Comparative experiments are conducted with twelve mature semantic segmentation models. On datasets with artificial noise and previously unseen datasets, the performance of CrackSAM far exceeds that of all state-of-the-art models. CrackSAM exhibits remarkable superiority, particularly under challenging conditions such as dim lighting, shadows, road markings, construction joints, and other interference factors. These cross-scenario results demonstrate the outstanding zero-shot capability of foundation models and provide new ideas for develo** vision models in civil engineering. △ Less

Submitted 23 April, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

arXiv:2312.04050 [pdf, other]

X-Ray Constraints on the Hot Gaseous Corona of Edge-on Late-type Galaxies in Virgo

Authors: Meicun Hou, Lin He, Zhensong Hu, Zhiyuan Li, Christine Jones, William Forman, Yuanyuan Su, **g Wang, Luis C. Ho

Abstract: We present a systematic study of the putative hot gas corona around late-type galaxies (LTGs) residing in the Virgo cluster, based on archival Chandra observations. Our sample consists of 21 nearly edge-on galaxies representing a star formation rate (SFR) range of ($0.2-3\rm~M_\odot~yr^{-1}$) a stellar mass ($M_*$) range of $(0.2-10) \times 10^{10}\rm~M_{\odot}$, the majority of which have not bee… ▽ More We present a systematic study of the putative hot gas corona around late-type galaxies (LTGs) residing in the Virgo cluster, based on archival Chandra observations. Our sample consists of 21 nearly edge-on galaxies representing a star formation rate (SFR) range of ($0.2-3\rm~M_\odot~yr^{-1}$) a stellar mass ($M_*$) range of $(0.2-10) \times 10^{10}\rm~M_{\odot}$, the majority of which have not been explored with high-sensitivity X-ray observations so far. Significant extraplanar diffuse X-ray (0.5-2 keV) emission is detected in only three LTGs, which are also the three galaxies with the highest SFR. A stacking analysis is performed for the remaining galaxies without individual detection, dividing the whole sample into two subsets based on SFR, stellar mass, or specific SFR. Only the high-SFR bin yields a significant detection, which has a value of $L\rm_X \sim3\times10^{38}\rm~erg~s^{-1}$ per galaxy. The stacked extraplanar X-ray signals of the Virgo LTGs are consistent with the empirical $L\rm_X - SFR$ and $L\rm_X - M_*$ relations found among highly inclined disk galaxies in the field, but appear to be systematically lower than that of a comparison sample of simulated cluster star-formation galaxies identified from the Illustris-TNG100 simulation. The apparent paucity of hot gas coronae in the sampled Virgo LTGs might be understood as the net outcome of the long-lasting effect of ram pressure strip** exerted by the hot intra-cluster medium and in-disk star-forming activity acting on shorter timescales. A better understanding of the roles of environmental effects in regulating the hot gas content of cluster galaxies invites sensitive X-ray observations for a large galaxy sample. △ Less

Submitted 7 December, 2023; originally announced December 2023.

Comments: 18 pages, 7 figures. Accepted for publication in ApJ. Comments welcome

arXiv:2312.03284 [pdf]

Adaptive Multi-band Modulation for Robust and Low-complexity Faster-than-Nyquist Non-Orthogonal FDM IM-DD System

Authors: Peiji Song, Zhouyi Hu, Yizhan Dai, Yuan Liu, Chao Gao, Chun-Kit Chan

Abstract: Faster-than-Nyquist non-orthogonal frequency-division multiplexing (FTN-NOFDM) is robust against the steep frequency roll-off by saving signal bandwidth. Among the FTN-NOFDM techniques, the non-orthogonal matrix precoding (NOM-p) based FTN has high compatibility with the conventional orthogonal frequency division multiplexing (OFDM), in terms of the advanced digital signal processing already used… ▽ More Faster-than-Nyquist non-orthogonal frequency-division multiplexing (FTN-NOFDM) is robust against the steep frequency roll-off by saving signal bandwidth. Among the FTN-NOFDM techniques, the non-orthogonal matrix precoding (NOM-p) based FTN has high compatibility with the conventional orthogonal frequency division multiplexing (OFDM), in terms of the advanced digital signal processing already used in OFDM. In this work, by dividing the single band into multiple sub-bands in the NOM-p-based FTN-NOFDM system, we propose a novel FTN-NOFDM scheme with adaptive multi-band modulation. The proposed scheme assigns different quadrature amplitude modulation (QAM) levels to different sub-bands, effectively utilizing the low-pass-like channel and reducing the complexity. The impacts of sub-band number and bandwidth compression factor on the bit-error-rate (BER) performance and implementation complexity are experimentally analyzed with a 32.23-Gb/s and 20-km intensity modulation-direct detection (IM-DD) optical transmission system. Results show that the proposed scheme with proper sub-band numbers can lower BER and greatly reduce the complexity compared to the conventional single-band way. △ Less

Submitted 5 December, 2023; originally announced December 2023.

arXiv:2312.02486 [pdf, other]

doi 10.1088/1475-7516/2024/04/087

Probing the vector charge of Sagittarius A* with pulsar timing

Authors: Zexin Hu, Li**g Shao, Rui Xu, Dicong Liang, Zhan-Feng Mai

Abstract: Timing a pulsar orbiting around Sagittarius A* (Sgr A*) can provide us with a unique opportunity of testing gravity theories. We investigate the detectability of a vector charge carried by the Sgr A* black hole (BH) in the bumblebee gravity model with simulated future pulsar timing observations. The spacetime of a bumblebee BH introduces characteristic changes to the orbital dynamics of the pulsar… ▽ More Timing a pulsar orbiting around Sagittarius A* (Sgr A*) can provide us with a unique opportunity of testing gravity theories. We investigate the detectability of a vector charge carried by the Sgr A* black hole (BH) in the bumblebee gravity model with simulated future pulsar timing observations. The spacetime of a bumblebee BH introduces characteristic changes to the orbital dynamics of the pulsar and the light propagation of radio signals. Assuming a timing precision of 1 ms, our simulation shows that a 5-yr observation of a pulsar with an orbital period $P_b\sim 0.5\,{\rm yr}$ and an orbital eccentricity $e\sim 0.8$ can probe a vector charge-to-mass ratio as small as $Q/M\sim 10^{-3}$, which is much more stringent than the current constraint from the Event Horizon Telescope (EHT) observations, and comparable to the prospective constraint from extreme mass-ratio inspirals with the Laser Interferometer Space Antenna (LISA). △ Less

Submitted 4 December, 2023; originally announced December 2023.

Comments: 18 pages, 6 figures

Journal ref: JCAP 04 (2024) 087

arXiv:2312.02353 [pdf, other]

doi 10.1109/IROS47612.2022.9981200

Efficient 2D Graph SLAM for Sparse Sensing

Authors: Hanzhi Zhou, Zichao Hu, Sihang Liu, Samira Khan

Abstract: Simultaneous localization and map** (SLAM) plays a vital role in map** unknown spaces and aiding autonomous navigation. Virtually all state-of-the-art solutions today for 2D SLAM are designed for dense and accurate sensors such as laser range-finders (LiDARs). However, these sensors are not suitable for resource-limited nano robots, which become increasingly capable and ubiquitous nowadays, an… ▽ More Simultaneous localization and map** (SLAM) plays a vital role in map** unknown spaces and aiding autonomous navigation. Virtually all state-of-the-art solutions today for 2D SLAM are designed for dense and accurate sensors such as laser range-finders (LiDARs). However, these sensors are not suitable for resource-limited nano robots, which become increasingly capable and ubiquitous nowadays, and these robots tend to mount economical and low-power sensors that can only provide sparse and noisy measurements. This introduces a challenging problem called SLAM with sparse sensing. This work addresses the problem by adopting the form of the state-of-the-art graph-based SLAM pipeline with a novel frontend and an improvement for loop closing in the backend, both of which are designed to work with sparse and uncertain range data. Experiments show that the maps constructed by our algorithm have superior quality compared to prior works on sparse sensing. Furthermore, our method is capable of running in real-time on a modern PC with an average processing time of 1/100th the input interval time. △ Less

Submitted 4 December, 2023; originally announced December 2023.

Comments: Accepted for 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

arXiv:2312.01889 [pdf, other]

doi 10.1103/PhysRevD.108.123034

Prospects for probing small-scale dark matter models with pulsars around Sagittarius A*

Authors: Zexin Hu, Li**g Shao, Fupeng Zhang

Abstract: Future observations with next-generation large-area radio telescopes are expected to discover radio pulsars (PSRs) closely orbiting around Sagittarius~A* (Sgr~A*), the supermassive black hole (SMBH) dwelling at our Galactic Center (GC). Such a system can provide a unique laboratory for testing General Relativity (GR), as well as the astrophysics around the GC. In this paper, we provide a numerical… ▽ More Future observations with next-generation large-area radio telescopes are expected to discover radio pulsars (PSRs) closely orbiting around Sagittarius~A* (Sgr~A*), the supermassive black hole (SMBH) dwelling at our Galactic Center (GC). Such a system can provide a unique laboratory for testing General Relativity (GR), as well as the astrophysics around the GC. In this paper, we provide a numerical timing model for PSR-SMBH systems based on the post-Newtonian (PN) equation of motion, and use it to explore the prospects of measuring the black hole (BH) properties with pulsar timing. We further consider the perturbation caused by the dark matter (DM) distribution around Sgr~A*, and the possibility of constraining DM models with PSR-SMBH systems. Assuming a 5-year observation of a normal pulsar in an eccentric ($e=0.8$) orbit with an orbital period $P_b = 0.5\,$yr, we find that -- with weekly recorded times of arrival (TOAs) and a timing precision of 1 ms -- the power-law index of DM density distribution near the GC can be constrained to about 20%. Such a measurement is comparable to those measurements at the Galactic length scale but can reveal small-scale properties of the DM. △ Less

Submitted 4 December, 2023; originally announced December 2023.

Comments: 12 pages, 13 figures, accepted by PRD

Journal ref: Phys. Rev. D 108 (2023) 123034

arXiv:2312.01619 [pdf, other]

How Many Validation Labels Do You Need? Exploring the Design Space of Label-Efficient Model Ranking

Authors: Zhengyu Hu, Jieyu Zhang, Yue Yu, Yuchen Zhuang, Hui Xiong

Abstract: This paper presents LEMR (Label-Efficient Model Ranking) and introduces the MoraBench Benchmark. LEMR is a novel framework that minimizes the need for costly annotations in model selection by strategically annotating instances from an unlabeled validation set. To evaluate LEMR, we leverage the MoraBench Benchmark, a comprehensive collection of model outputs across diverse scenarios. Our extensive… ▽ More This paper presents LEMR (Label-Efficient Model Ranking) and introduces the MoraBench Benchmark. LEMR is a novel framework that minimizes the need for costly annotations in model selection by strategically annotating instances from an unlabeled validation set. To evaluate LEMR, we leverage the MoraBench Benchmark, a comprehensive collection of model outputs across diverse scenarios. Our extensive evaluation across 23 different NLP tasks in semi-supervised learning, weak supervision, and prompt selection tasks demonstrates LEMR's effectiveness in significantly reducing labeling costs. Key findings highlight the impact of suitable ensemble methods, uncertainty sampling strategies, and model committee selection in enhancing model ranking accuracy. LEMR, supported by the insights from MoraBench, provides a cost-effective and accurate solution for model selection, especially valuable in resource-constrained environments. △ Less

Submitted 17 February, 2024; v1 submitted 3 December, 2023; originally announced December 2023.

arXiv:2312.00738 [pdf, other]

SeaLLMs -- Large Language Models for Southeast Asia

Authors: Xuan-Phi Nguyen, Wenxuan Zhang, Xin Li, Mahani Aljunied, Zhiqiang Hu, Chenhui Shen, Yew Ken Chia, Xingxuan Li, Jianyu Wang, Qingyu Tan, Liying Cheng, Guanzheng Chen, Yue Deng, Sen Yang, Chaoqun Liu, Hang Zhang, Lidong Bing

Abstract: Despite the remarkable achievements of large language models (LLMs) in various tasks, there remains a linguistic bias that favors high-resource languages, such as English, often at the expense of low-resource and regional languages. To address this imbalance, we introduce SeaLLMs, an innovative series of language models that specifically focuses on Southeast Asian (SEA) languages. SeaLLMs are buil… ▽ More Despite the remarkable achievements of large language models (LLMs) in various tasks, there remains a linguistic bias that favors high-resource languages, such as English, often at the expense of low-resource and regional languages. To address this imbalance, we introduce SeaLLMs, an innovative series of language models that specifically focuses on Southeast Asian (SEA) languages. SeaLLMs are built upon the Llama-2 model and further advanced through continued pre-training with an extended vocabulary, specialized instruction and alignment tuning to better capture the intricacies of regional languages. This allows them to respect and reflect local cultural norms, customs, stylistic preferences, and legal considerations. Our comprehensive evaluation demonstrates that SeaLLM-13b models exhibit superior performance across a wide spectrum of linguistic tasks and assistant-style instruction-following capabilities relative to comparable open-source models. Moreover, they outperform ChatGPT-3.5 in non-Latin languages, such as Thai, Khmer, Lao, and Burmese, by large margins while remaining lightweight and cost-effective to operate. △ Less

Submitted 1 July, 2024; v1 submitted 1 December, 2023; originally announced December 2023.

Comments: Technical report, ACL 2024 DEMO TRACK

arXiv:2312.00667 [pdf, other]

Twist-3 Generalized Parton Distribution for the Proton from Basis Light-Front Quantization

Authors: Ziqi Zhang, Zhi Hu, Siqi Xu, Chandan Mondal, Xingbo Zhao, James P. Vary

Abstract: We investigate the twist-3 generalized parton distributions (GPDs) for the valence quarks of the proton within the basis light-front quantization (BLFQ) framework. We first solve for the mass spectra and light-front waved functions (LFWFs) in the leading Fock sector using an effective Hamiltonian. Using the LFWFs we then calculate the twist-3 GPDs via the overlap representation. By taking the forw… ▽ More We investigate the twist-3 generalized parton distributions (GPDs) for the valence quarks of the proton within the basis light-front quantization (BLFQ) framework. We first solve for the mass spectra and light-front waved functions (LFWFs) in the leading Fock sector using an effective Hamiltonian. Using the LFWFs we then calculate the twist-3 GPDs via the overlap representation. By taking the forward limit, we also get the twist-3 parton distribution functions (PDFs), and discuss their properties. Our prediction for the twist-3 scalar PDF agrees well with the CLAS experimental extractions. △ Less

Submitted 1 December, 2023; originally announced December 2023.

arXiv:2312.00375 [pdf, other]

Text-Guided 3D Face Synthesis -- From Generation to Editing

Authors: Yunjie Wu, Yapeng Meng, Zhipeng Hu, Lincheng Li, Haoqian Wu, Kun Zhou, Weiwei Xu, Xin Yu

Abstract: Text-guided 3D face synthesis has achieved remarkable results by leveraging text-to-image (T2I) diffusion models. However, most existing works focus solely on the direct generation, ignoring the editing, restricting them from synthesizing customized 3D faces through iterative adjustments. In this paper, we propose a unified text-guided framework from face generation to editing. In the generation s… ▽ More Text-guided 3D face synthesis has achieved remarkable results by leveraging text-to-image (T2I) diffusion models. However, most existing works focus solely on the direct generation, ignoring the editing, restricting them from synthesizing customized 3D faces through iterative adjustments. In this paper, we propose a unified text-guided framework from face generation to editing. In the generation stage, we propose a geometry-texture decoupled generation to mitigate the loss of geometric details caused by coupling. Besides, decoupling enables us to utilize the generated geometry as a condition for texture generation, yielding highly geometry-texture aligned results. We further employ a fine-tuned texture diffusion model to enhance texture quality in both RGB and YUV space. In the editing stage, we first employ a pre-trained diffusion model to update facial geometry or texture based on the texts. To enable sequential editing, we introduce a UV domain consistency preservation regularization, preventing unintentional changes to irrelevant facial attributes. Besides, we propose a self-guided consistency weight strategy to improve editing efficacy while preserving consistency. Through comprehensive experiments, we showcase our method's superiority in face synthesis. Project page: https://faceg2e.github.io/. △ Less

Submitted 1 December, 2023; originally announced December 2023.

arXiv:2312.00294 [pdf, other]

aeons: approximating the end of nested sampling

Authors: Zixiao Hu, Artem Baryshnikov, Will Handley

Abstract: This paper presents analytic results on the anatomy of nested sampling, from which a technique is developed to estimate the run-time of the algorithm that works for any nested sampling implementation. We test these methods on both toy models and true cosmological nested sampling runs. The method gives an order-of-magnitude prediction of the end point at all times, forecasting the true endpoint wit… ▽ More This paper presents analytic results on the anatomy of nested sampling, from which a technique is developed to estimate the run-time of the algorithm that works for any nested sampling implementation. We test these methods on both toy models and true cosmological nested sampling runs. The method gives an order-of-magnitude prediction of the end point at all times, forecasting the true endpoint within standard error around the halfway point. △ Less

Submitted 30 November, 2023; originally announced December 2023.

Comments: 11 pages, 14 figures

arXiv:2311.16488 [pdf, other]

Efficient Multimodal Diffusion Models Using Joint Data Infilling with Partially Shared U-Net

Authors: Zizhao Hu, Shaochong Jia, Mohammad Rostami

Abstract: Recently, diffusion models have been used successfully to fit distributions for cross-modal data translation and multimodal data generation. However, these methods rely on extensive scaling, overlooking the inefficiency and interference between modalities. We develop Partially Shared U-Net (PS-U-Net) architecture which is an efficient multimodal diffusion model that allows text and image inputs to… ▽ More Recently, diffusion models have been used successfully to fit distributions for cross-modal data translation and multimodal data generation. However, these methods rely on extensive scaling, overlooking the inefficiency and interference between modalities. We develop Partially Shared U-Net (PS-U-Net) architecture which is an efficient multimodal diffusion model that allows text and image inputs to pass through dedicated layers and skip-connections for preserving modality-specific fine-grained details. Inspired by image inpainting, we also propose a new efficient multimodal sampling method that introduces new scenarios for conditional generation while only requiring a simple joint distribution to be learned. Our empirical exploration of the MS-COCO dataset demonstrates that our method generates multimodal text and image data with higher quality compared to existing multimodal diffusion models while having a comparable size, faster training, faster multimodal sampling, and more flexible generation. △ Less

Submitted 27 November, 2023; originally announced November 2023.

arXiv:2311.15906 [pdf, other]

MetaDefa: Meta-learning based on Domain Enhancement and Feature Alignment for Single Domain Generalization

Authors: Can Sun, Hao Zheng, Zhigang Hu, Liu Yang, Meiguang Zheng, Bo Xu

Abstract: The single domain generalization(SDG) based on meta-learning has emerged as an effective technique for solving the domain-shift problem. However, the inadequate match of data distribution between source and augmented domains and difficult separation of domain-invariant features from domain-related features make SDG model hard to achieve great generalization. Therefore, a novel meta-learning method… ▽ More The single domain generalization(SDG) based on meta-learning has emerged as an effective technique for solving the domain-shift problem. However, the inadequate match of data distribution between source and augmented domains and difficult separation of domain-invariant features from domain-related features make SDG model hard to achieve great generalization. Therefore, a novel meta-learning method based on domain enhancement and feature alignment (MetaDefa) is proposed to improve the model generalization performance. First, the background substitution and visual corruptions techniques are used to generate diverse and effective augmented domains. Then, the multi-channel feature alignment module based on class activation maps and class agnostic activation maps is designed to effectively extract adequate transferability knowledge. In this module, domain-invariant features can be fully explored by focusing on similar target regions between source and augmented domains feature space and suppressing the feature representation of non-similar target regions. Extensive experiments on two publicly available datasets show that MetaDefa has significant generalization performance advantages in unknown multiple target domains. △ Less

Submitted 27 November, 2023; originally announced November 2023.

Comments: 5 pages, 3 figures

arXiv:2311.15464 [pdf, ps, other]

Bounded, compact and Schatten class Hankel operators on Fock-type spaces

Authors: Zhicheng Zeng, Xiaofeng Wang, Zhangjian Hu

Abstract: In this paper, we consider Hankel operators, with locally integrable symbols, densely defined on a family of Fock-type spaces whose weights are $C^3$-logarithmic growth functions with mild smoothness conditions. It is shown that a Hankel operator is bounded on such a Fock space if and only if its symbol function has bounded distance to analytic functions BDA which is initiated by Luecking(J. Funct… ▽ More In this paper, we consider Hankel operators, with locally integrable symbols, densely defined on a family of Fock-type spaces whose weights are $C^3$-logarithmic growth functions with mild smoothness conditions. It is shown that a Hankel operator is bounded on such a Fock space if and only if its symbol function has bounded distance to analytic functions BDA which is initiated by Luecking(J. Funct. Anal. 110:247-271, 1992). We also characterize the compactness and Schatten class membership of Hankel operators. Besides, we give characterizations of the Schatten class membership of Toeplitz operators with positive measure symbols for the small exponent $0<p<1$. Our proofs depend strongly on the technique of Hömander's $L^2$ estimates for the $\overline{\partial}$ operator and the decomposition theory of BDA spaces as well as integral estimates involving the reproducing kernel. △ Less

Submitted 26 November, 2023; originally announced November 2023.

MSC Class: 47B35; 30H20(Primary) 32A36; 32A37(Secondary)

arXiv:2311.15283 [pdf, other]

Bias-Variance Trade-off in Physics-Informed Neural Networks with Randomized Smoothing for High-Dimensional PDEs

Authors: Zheyuan Hu, Zhouhao Yang, Yezhen Wang, George Em Karniadakis, Kenji Kawaguchi

Abstract: While physics-informed neural networks (PINNs) have been proven effective for low-dimensional partial differential equations (PDEs), the computational cost remains a hurdle in high-dimensional scenarios. This is particularly pronounced when computing high-order and high-dimensional derivatives in the physics-informed loss. Randomized Smoothing PINN (RS-PINN) introduces Gaussian noise for stochasti… ▽ More While physics-informed neural networks (PINNs) have been proven effective for low-dimensional partial differential equations (PDEs), the computational cost remains a hurdle in high-dimensional scenarios. This is particularly pronounced when computing high-order and high-dimensional derivatives in the physics-informed loss. Randomized Smoothing PINN (RS-PINN) introduces Gaussian noise for stochastic smoothing of the original neural net model, enabling Monte Carlo methods for derivative approximation, eliminating the need for costly auto-differentiation. Despite its computational efficiency in high dimensions, RS-PINN introduces biases in both loss and gradients, negatively impacting convergence, especially when coupled with stochastic gradient descent (SGD). We present a comprehensive analysis of biases in RS-PINN, attributing them to the nonlinearity of the Mean Squared Error (MSE) loss and the PDE nonlinearity. We propose tailored bias correction techniques based on the order of PDE nonlinearity. The unbiased RS-PINN allows for a detailed examination of its pros and cons compared to the biased version. Specifically, the biased version has a lower variance and runs faster than the unbiased version, but it is less accurate due to the bias. To optimize the bias-variance trade-off, we combine the two approaches in a hybrid method that balances the rapid convergence of the biased version with the high accuracy of the unbiased version. In addition, we present an enhanced implementation of RS-PINN. Extensive experiments on diverse high-dimensional PDEs, including Fokker-Planck, HJB, viscous Burgers', Allen-Cahn, and Sine-Gordon equations, illustrate the bias-variance trade-off and highlight the effectiveness of the hybrid RS-PINN. Empirical guidelines are provided for selecting biased, unbiased, or hybrid versions, depending on the dimensionality and nonlinearity of the specific PDE problem. △ Less

Submitted 26 November, 2023; originally announced November 2023.

Comments: 21 pages, 5 figures

MSC Class: 14J60

arXiv:2311.15263 [pdf, ps, other]

Strong limit theorems for step-reinforced random walks

Authors: Zhishui Hu, Yiting Zhang

Abstract: A step-reinforced random walk is a discrete-time non-Markovian process with long range memory. At each step, with a fixed probability p, the positively step-reinforced random walk repeats one of its preceding steps chosen uniformly at random, and with complementary probability 1-p, it has an independent increment. The negatively step-reinforced random walk follows the same reinforcement algorithm… ▽ More A step-reinforced random walk is a discrete-time non-Markovian process with long range memory. At each step, with a fixed probability p, the positively step-reinforced random walk repeats one of its preceding steps chosen uniformly at random, and with complementary probability 1-p, it has an independent increment. The negatively step-reinforced random walk follows the same reinforcement algorithm but when a step is repeated its sign is also changed. Strong laws of large numbers and strong invariance principles are established for positively and negatively step-reinforced random walks in this work. Our approach relies on two general theorems on invariance principle for martingale difference sequences and a truncation argument. As by-products of our main results, the law of iterated logarithm and the functional central limit theorem are also obtained for step-reinforced random walks. △ Less

Submitted 26 November, 2023; originally announced November 2023.

arXiv:2311.15231 [pdf, other]

Double Reverse Regularization Network Based on Self-Knowledge Distillation for SAR Object Classification

Authors: Bo Xu, Hao Zheng, Zhigang Hu, Liu Yang, Meiguang Zheng

Abstract: In current synthetic aperture radar (SAR) object classification, one of the major challenges is the severe overfitting issue due to the limited dataset (few-shot) and noisy data. Considering the advantages of knowledge distillation as a learned label smoothing regularization, this paper proposes a novel Double Reverse Regularization Network based on Self-Knowledge Distillation (DRRNet-SKD). Specif… ▽ More In current synthetic aperture radar (SAR) object classification, one of the major challenges is the severe overfitting issue due to the limited dataset (few-shot) and noisy data. Considering the advantages of knowledge distillation as a learned label smoothing regularization, this paper proposes a novel Double Reverse Regularization Network based on Self-Knowledge Distillation (DRRNet-SKD). Specifically, through exploring the effect of distillation weight on the process of distillation, we are inspired to adopt the double reverse thought to implement an effective regularization network by combining offline and online distillation in a complementary way. Then, the Adaptive Weight Assignment (AWA) module is designed to adaptively assign two reverse-changing weights based on the network performance, allowing the student network to better benefit from both teachers. The experimental results on OpenSARShip and FUSAR-Ship demonstrate that DRRNet-SKD exhibits remarkable performance improvement on classical CNNs, outperforming state-of-the-art self-knowledge distillation methods. △ Less

Submitted 26 November, 2023; originally announced November 2023.

Comments: 6 pages, 8 figures

arXiv:2311.15202 [pdf, other]

Dual-stream contrastive predictive network with joint handcrafted feature view for SAR ship classification

Authors: Xianting Feng, Hao zheng, Zhigang Hu, Liu Yang, Meiguang Zheng

Abstract: Most existing synthetic aperture radar (SAR) ship classification technologies heavily rely on correctly labeled data, ignoring the discriminative features of unlabeled SAR ship images. Even though researchers try to enrich CNN-based features by introducing traditional handcrafted features, existing methods easily cause information redundancy and fail to capture the interaction between them. To add… ▽ More Most existing synthetic aperture radar (SAR) ship classification technologies heavily rely on correctly labeled data, ignoring the discriminative features of unlabeled SAR ship images. Even though researchers try to enrich CNN-based features by introducing traditional handcrafted features, existing methods easily cause information redundancy and fail to capture the interaction between them. To address these issues, we propose a novel dual-stream contrastive predictive network (DCPNet), which consists of two asymmetric task designs and the false negative sample elimination module. The first task is to construct positive sample pairs, guiding the core encoder to learn more general representations. The second task is to encourage adaptive capture of the correspondence between deep features and handcrated features, achieving knowledge transfer within the model, and effectively improving the redundancy caused by the feature fusion. To increase the separability between clusters, we also design a cluster-level tasks. The experimental results on OpenSARShip and FUSAR-Ship datasets demonstrate the improvement in classification accuracy of supervised models and confirm the capability of learning effective representations of DCPNet. △ Less

Submitted 30 November, 2023; v1 submitted 26 November, 2023; originally announced November 2023.

Comments: 6 pages, 3 figures, ICASSP2024

arXiv:2311.14756 [pdf, other]

Task-Distributionally Robust Data-Free Meta-Learning

Authors: Zixuan Hu, Li Shen, Zhenyi Wang, Yongxian Wei, Baoyuan Wu, Chun Yuan, Dacheng Tao

Abstract: Data-Free Meta-Learning (DFML) aims to efficiently learn new tasks by leveraging multiple pre-trained models without requiring their original training data. Existing inversion-based DFML methods construct pseudo tasks from a learnable dataset, which is inversely generated from the pre-trained model pool. For the first time, we reveal two major challenges hindering their practical deployments: Task… ▽ More Data-Free Meta-Learning (DFML) aims to efficiently learn new tasks by leveraging multiple pre-trained models without requiring their original training data. Existing inversion-based DFML methods construct pseudo tasks from a learnable dataset, which is inversely generated from the pre-trained model pool. For the first time, we reveal two major challenges hindering their practical deployments: Task-Distribution Shift (TDS) and Task-Distribution Corruption (TDC). TDS leads to a biased meta-learner because of the skewed task distribution towards newly generated tasks. TDC occurs when untrusted models characterized by misleading labels or poor quality pollute the task distribution. To tackle these issues, we introduce a robust DFML framework that ensures task distributional robustness. We propose to meta-learn from a pseudo task distribution, diversified through task interpolation within a compact task-memory buffer. This approach reduces the meta-learner's overreliance on newly generated tasks by maintaining consistent performance across a broader range of interpolated memory tasks, thus ensuring its generalization for unseen tasks. Additionally, our framework seamlessly incorporates an automated model selection mechanism into the meta-training phase, parameterizing each model's reliability as a learnable weight. This is optimized with a policy gradient algorithm inspired by reinforcement learning, effectively addressing the non-differentiable challenge posed by model selection. Comprehensive experiments across various datasets demonstrate the framework's effectiveness in mitigating TDS and TDC, underscoring its potential to improve DFML in real-world scenarios. △ Less

Submitted 23 November, 2023; originally announced November 2023.

arXiv:2311.14277 [pdf, ps, other]

Anisotropy-induced Coulomb phase and quasiparticle zoo in the atomic monopole-spin hybrid system

Authors: Shao-Jun Li, Xiang Gao, Xue-Ting Fang, Lushuai Cao, Peter Schmelcher, Zhong-Kun Hu

Abstract: Quantum simulation of a monopole-spin hybrid system is performed on basis of a dipolar ultracold gas in a ladder lattice. The site-occupation states of the dipolar ladder lattice gas can spontaneously emulate both the monopole and spin excitations. The hop** of the atoms induces a particle conversion process between spin and monopole pairs, and the dipole-dipole interaction determines the spin-s… ▽ More Quantum simulation of a monopole-spin hybrid system is performed on basis of a dipolar ultracold gas in a ladder lattice. The site-occupation states of the dipolar ladder lattice gas can spontaneously emulate both the monopole and spin excitations. The hop** of the atoms induces a particle conversion process between spin and monopole pairs, and the dipole-dipole interaction determines the spin-spin, spin-monopole and monopole-monopole interactions. The anisotropic nature of the dipole-dipole interaction allows hereby for a flexible engineering of the designed hybrid system, and for a significant tunability of the interaction strengths. As a result, we encounter a rich phase diagram, and specifically a self-assembled Coulomb phase arises, in which monopoles and spins coexist and are orderly arranged according to the local Gauss's law. The Coulomb phase hosts a zoo of different types of quasiparticles, and provides the possibility to simulate various phenomena in particle physics, such as a degenerate vacuum, particle decay and conversion processes. Our work provides a significant extension of the scope of quantum simulations based on the anisotropy of dipolar interactions. △ Less

Submitted 23 November, 2023; originally announced November 2023.

Comments: 11 pages,8 figures

arXiv:2311.13181 [pdf, other]

doi 10.1103/PhysRevB.109.155102

Fractional quantum Hall interface induced by geometric singularity

Authors: Qi Li, Yi Yang, Zhou Li, Hao Wang, Zi-Xiang Hu

Abstract: The geometric response of quantum Hall liquids is an important aspect to understand their topological characteristics in addition to the electromagnetic response. According to the Wen-Zee theory, the topological spin is coupled to the curvature of the space in which the electrons reside. The presence of conical geometry provides a local isolated geometric singularity, making it suitable for explor… ▽ More The geometric response of quantum Hall liquids is an important aspect to understand their topological characteristics in addition to the electromagnetic response. According to the Wen-Zee theory, the topological spin is coupled to the curvature of the space in which the electrons reside. The presence of conical geometry provides a local isolated geometric singularity, making it suitable for exploring the geometric response. In the context of two-dimensional electrons in a perpendicular magnetic field, each Landau orbit occupies the same area. The cone geometry naturally provides a structure in which the distances between two adjacent orbits gradually change and can be easily adjusted by altering the tip angle. The presence of a cone tip introduces a geometric singularity that affects the electron density and interacts with the motion of electrons, which has been extensively studied. Furthermore, this type of geometry can automatically create a smooth interface or crossover between the crystalline charge-density-wave state and the liquid-like fractional quantum Hall state. In this work, the properties of this interface are studied from multiple perspectives, shedding light on the behavior of quantum Hall liquids in such geometric configurations. △ Less

Submitted 22 November, 2023; originally announced November 2023.

Journal ref: Phys. Rev. B 109, 155102 (2024)

arXiv:2311.12122 [pdf, ps, other]

The $K$-theory of the moduli stacks $\mathcal{M}_2$ and $\overline{\mathcal{M}}_2$

Authors: Dan Edidin, Zhengning Hu

Abstract: We compute the integral Grothendieck rings of the moduli stacks, $\mathcal{M}_2$, $\overline{\mathcal{M}}_2$ of smooth and stable curves of genus two respectively. We compute $K_0(\mathcal{M}_2)$ by using the presentation of $\mathcal{M}_2$ as a global quotient stack given by Vistoli. To compute the Grothendieck ring $K_0(\overline{\mathcal{M}}_2)$ we decompose $\overline{\mathcal{M}}_2$ as $Δ_1$… ▽ More We compute the integral Grothendieck rings of the moduli stacks, $\mathcal{M}_2$, $\overline{\mathcal{M}}_2$ of smooth and stable curves of genus two respectively. We compute $K_0(\mathcal{M}_2)$ by using the presentation of $\mathcal{M}_2$ as a global quotient stack given by Vistoli. To compute the Grothendieck ring $K_0(\overline{\mathcal{M}}_2)$ we decompose $\overline{\mathcal{M}}_2$ as $Δ_1$ and its complement $\overline{\mathcal{M}}_2 \setminus Δ_1$ and use their presentations as quotient stacks given by Larson to compute their Grothendieck rings. We show that they are torsion-free and this, together with the Riemann-Roch isomorphism allows to ultimately give a presentation for the integral Grothendieck ring $K_0(\overline{\mathcal{M}}_2)$. △ Less

Submitted 20 November, 2023; originally announced November 2023.

Comments: 19 pages

MSC Class: 14H10; 14C35

arXiv:2311.11509 [pdf, other]

Token-Level Adversarial Prompt Detection Based on Perplexity Measures and Contextual Information

Authors: Zhengmian Hu, Gang Wu, Saayan Mitra, Ruiyi Zhang, Tong Sun, Heng Huang, Viswanathan Swaminathan

Abstract: In recent years, Large Language Models (LLM) have emerged as pivotal tools in various applications. However, these models are susceptible to adversarial prompt attacks, where attackers can carefully curate input strings that mislead LLMs into generating incorrect or undesired outputs. Previous work has revealed that with relatively simple yet effective attacks based on discrete optimization, it is… ▽ More In recent years, Large Language Models (LLM) have emerged as pivotal tools in various applications. However, these models are susceptible to adversarial prompt attacks, where attackers can carefully curate input strings that mislead LLMs into generating incorrect or undesired outputs. Previous work has revealed that with relatively simple yet effective attacks based on discrete optimization, it is possible to generate adversarial prompts that bypass moderation and alignment of the models. This vulnerability to adversarial prompts underscores a significant concern regarding the robustness and reliability of LLMs. Our work aims to address this concern by introducing a novel approach to detecting adversarial prompts at a token level, leveraging the LLM's capability to predict the next token's probability. We measure the degree of the model's perplexity, where tokens predicted with high probability are considered normal, and those exhibiting high perplexity are flagged as adversarial. Additionaly, our method also integrates context understanding by incorporating neighboring token information to encourage the detection of contiguous adversarial prompt sequences. To this end, we design two algorithms for adversarial prompt detection: one based on optimization techniques and another on Probabilistic Graphical Models (PGM). Both methods are equipped with efficient solving methods, ensuring efficient adversarial prompt detection. Our token-level detection result can be visualized as heatmap overlays on the text sequence, allowing for a clearer and more intuitive representation of which part of the text may contain adversarial prompts. △ Less

Submitted 18 February, 2024; v1 submitted 19 November, 2023; originally announced November 2023.

arXiv:2311.11183 [pdf, other]

doi 10.1109/LRA.2024.3360020

Deploying and Evaluating LLMs to Program Service Mobile Robots

Authors: Zichao Hu, Francesca Lucchetti, Claire Schlesinger, Yash Saxena, Anders Freeman, Sadanand Modak, Arjun Guha, Joydeep Biswas

Abstract: Recent advancements in large language models (LLMs) have spurred interest in using them for generating robot programs from natural language, with promising initial results. We investigate the use of LLMs to generate programs for service mobile robots leveraging mobility, perception, and human interaction skills, and where accurate sequencing and ordering of actions is crucial for success. We contr… ▽ More Recent advancements in large language models (LLMs) have spurred interest in using them for generating robot programs from natural language, with promising initial results. We investigate the use of LLMs to generate programs for service mobile robots leveraging mobility, perception, and human interaction skills, and where accurate sequencing and ordering of actions is crucial for success. We contribute CodeBotler, an open-source robot-agnostic tool to program service mobile robots from natural language, and RoboEval, a benchmark for evaluating LLMs' capabilities of generating programs to complete service robot tasks. CodeBotler performs program generation via few-shot prompting of LLMs with an embedded domain-specific language (eDSL) in Python, and leverages skill abstractions to deploy generated programs on any general-purpose mobile robot. RoboEval evaluates the correctness of generated programs by checking execution traces starting with multiple initial states, and checking whether the traces satisfy temporal logic properties that encode correctness for each task. RoboEval also includes multiple prompts per task to test for the robustness of program generation. We evaluate several popular state-of-the-art LLMs with the RoboEval benchmark, and perform a thorough analysis of the modes of failures, resulting in a taxonomy that highlights common pitfalls of LLMs at generating robot programs. We release our code and benchmark at https://amrl.cs.utexas.edu/codebotler/. △ Less

Submitted 21 February, 2024; v1 submitted 18 November, 2023; originally announced November 2023.

Comments: 8 pages, Accepted at IEEE Robotics and Automation Letters (RA-L)

Journal ref: IEEE Robotics and Automation Letters, vol. 9, no. 3, pp. 2853-2860, March 2024

arXiv:2311.10003 [pdf, ps, other]

Suppression of Chemotactic Singularity via Viscous Flow with Large Buoyancy

Authors: Zhongtian Hu

Abstract: In this work, we study the Keller-Segel-Navier-Stokes equation with low Reynolds number and subject to large buoyancy force. We show that for initial cell density with arbitrarily large mass (i.e. the $L^1$ norm), the solution remains regular for all times in the regime of sufficiently large buoyancy and viscosity. The major blowup suppression mechanism is a norm-stabilizing property possessed by… ▽ More In this work, we study the Keller-Segel-Navier-Stokes equation with low Reynolds number and subject to large buoyancy force. We show that for initial cell density with arbitrarily large mass (i.e. the $L^1$ norm), the solution remains regular for all times in the regime of sufficiently large buoyancy and viscosity. The major blowup suppression mechanism is a norm-stabilizing property possessed by a ``static problem,'' where the full problem can be seen as a perturbation of this quasi-stationary model. △ Less

Submitted 12 June, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

Comments: 42 pages. Major revisions: removed the section on Keller-Segel-Stokes equation (previously Section 5) due to large overlaps with that on Keller-Segel-Navier-Stokes equation (previously Section 6). Added a new section (current Section 6) of discussion on Keller-Segel-Stokes equation and an open problem. Added some technical details in Section 5 to enhance the readability per referees' suggestions

arXiv:2311.09850 [pdf, other]

Semantic-Relay-Aided Text Transmission: Placement Optimization and Bandwidth Allocation

Authors: Tianyu Liu, Changsheng You, Zeyang Hu, Chenyu Wu, Yi Gong, Kaibin Huang

Abstract: Semantic communication has emerged as a promising technology to break the Shannon limit by extracting the meaning of source data and sending relevant semantic information only. However, some mobile devices may have limited computation and storage resources, which renders it difficult to deploy and implement the resource-demanding deep learning based semantic encoder/decoder. To tackle this challen… ▽ More Semantic communication has emerged as a promising technology to break the Shannon limit by extracting the meaning of source data and sending relevant semantic information only. However, some mobile devices may have limited computation and storage resources, which renders it difficult to deploy and implement the resource-demanding deep learning based semantic encoder/decoder. To tackle this challenge, we propose in this paper a new semantic relay (SemRelay), which is equipped with a semantic receiver for assisting text transmission from a resource-abundant base station (BS) to a resource-constrained mobile device. Specifically, the SemRelay first decodes the semantic information sent by the BS (with a semantic transmitter) and then forwards it to the user by adopting conventional bit transmission, hence effectively improving the text transmission efficiency. We formulate an optimization problem to maximize the achievable (effective) bit rate by jointly designing the SemRelay placement and bandwidth allocation. Although this problem is non-convex and generally difficult to solve, we propose an efficient penalty-based algorithm to obtain a high-quality suboptimal solution. Numerical results show the close-to-optimal performance of the proposed algorithm as well as significant rate performance gain of the proposed SemRelay over conventional decode-and-forward relay. △ Less

Submitted 16 November, 2023; originally announced November 2023.

Comments: 6 pages, 4 figures, accepted for IEEE Global Communication Conference (GLOBECOM) 2023 Workshop

arXiv:2311.09804 [pdf, ps, other]

Average Jaccard Index of Random Graphs

Authors: Qunqiang Feng, Shuai Guo, Zhishui Hu

Abstract: The asymptotic behavior of the Jaccard index in $G(n,p)$, the classical Erdös-Rényi random graphs model, is studied in this paper, as $n$ goes to infinity. We first derive the asymptotic distribution of the Jaccard index of any pair of distinct vertices, as well as the first two moments of this index. Then the average of the Jaccard indices over all vertex pairs in $G(n,p)$ is shown to be asymptot… ▽ More The asymptotic behavior of the Jaccard index in $G(n,p)$, the classical Erdös-Rényi random graphs model, is studied in this paper, as $n$ goes to infinity. We first derive the asymptotic distribution of the Jaccard index of any pair of distinct vertices, as well as the first two moments of this index. Then the average of the Jaccard indices over all vertex pairs in $G(n,p)$ is shown to be asymptotically normal under an additional mild condition that $np\to\infty$ and $n^2(1-p)\to\infty$. △ Less

Submitted 16 November, 2023; originally announced November 2023.

arXiv:2311.08562 [pdf, other]

MAgIC: Investigation of Large Language Model Powered Multi-Agent in Cognition, Adaptability, Rationality and Collaboration

Authors: Lin Xu, Zhiyuan Hu, Daquan Zhou, Hongyu Ren, Zhen Dong, Kurt Keutzer, See Kiong Ng, Jiashi Feng

Abstract: Large Language Models (LLMs) have marked a significant advancement in the field of natural language processing, demonstrating exceptional capabilities in reasoning, tool usage, and memory. As their applications extend into multi-agent environments, a need has arisen for a comprehensive evaluation framework that captures their abilities in reasoning, planning, collaboration, and more. This work int… ▽ More Large Language Models (LLMs) have marked a significant advancement in the field of natural language processing, demonstrating exceptional capabilities in reasoning, tool usage, and memory. As their applications extend into multi-agent environments, a need has arisen for a comprehensive evaluation framework that captures their abilities in reasoning, planning, collaboration, and more. This work introduces a novel benchmarking framework specifically tailored to assess LLMs within multi-agent settings, providing quantitative metrics to evaluate their judgment, reasoning, deception, self-awareness, cooperation, coordination, and rationality. We utilize games such as Chameleon and Undercover, alongside game theory scenarios like Cost Sharing, Multi-player Prisoner's Dilemma, and Public Good, to create diverse testing environments. Our framework is fortified with the Probabilistic Graphical Modeling (PGM) method, enhancing the LLMs' capabilities in navigating complex social and cognitive dimensions. The benchmark evaluates seven multi-agent systems powered by different LLMs, quantitatively highlighting a significant capability gap over threefold between the strongest, GPT-4, and the weakest, Llama-2-70B. It also confirms that our PGM enhancement boosts the inherent abilities of all selected models by 50% on average. Our codes are released here https://github.com/cathyxl/MAgIC. △ Less

Submitted 16 November, 2023; v1 submitted 14 November, 2023; originally announced November 2023.

Comments: work in progress

arXiv:2311.07634 [pdf, other]

ActiveDC: Distribution Calibration for Active Finetuning

Authors: Wenshuai Xu, Zhenghui Hu, Yu Lu, **zhou Meng, Qingjie Liu, Yunhong Wang

Abstract: The pretraining-finetuning paradigm has gained popularity in various computer vision tasks. In this paradigm, the emergence of active finetuning arises due to the abundance of large-scale data and costly annotation requirements. Active finetuning involves selecting a subset of data from an unlabeled pool for annotation, facilitating subsequent finetuning. However, the use of a limited number of tr… ▽ More The pretraining-finetuning paradigm has gained popularity in various computer vision tasks. In this paradigm, the emergence of active finetuning arises due to the abundance of large-scale data and costly annotation requirements. Active finetuning involves selecting a subset of data from an unlabeled pool for annotation, facilitating subsequent finetuning. However, the use of a limited number of training samples can lead to a biased distribution, potentially resulting in model overfitting. In this paper, we propose a new method called ActiveDC for the active finetuning tasks. Firstly, we select samples for annotation by optimizing the distribution similarity between the subset to be selected and the entire unlabeled pool in continuous space. Secondly, we calibrate the distribution of the selected samples by exploiting implicit category information in the unlabeled pool. The feature visualization provides an intuitive sense of the effectiveness of our approach to distribution calibration. We conducted extensive experiments on three image classification datasets with different sampling ratios. The results indicate that ActiveDC consistently outperforms the baseline performance in all image classification tasks. The improvement is particularly significant when the sampling ratio is low, with performance gains of up to 10%. Our code will be released. △ Less

Submitted 27 February, 2024; v1 submitted 13 November, 2023; originally announced November 2023.

Comments: CVPR 2024 Accept

arXiv:2311.06877 [pdf, ps, other]

A study on the negative binomial distribution motivated by Chvátal's theorem

Authors: Zheng-Yan Guo, Ze-Yu Tao, Ze-Chun Hu

Abstract: Let $B(n,p)$ denote a binomial random variable with parameters $n$ and $p$. Chvátal's theorem says that for any fixed $n\geq 2$, as $m$ ranges over $\{0,\ldots,n\}$, the probability $q_m:=P(B(n,m/n)\leq m)$ is the smallest when $m$ is closest to $\frac{2n}{3}$. Motivated by this theorem, in this note we consider the infimum value of the probability $P(X\leq E[X])$, where $X$ is a negative binomial… ▽ More Let $B(n,p)$ denote a binomial random variable with parameters $n$ and $p$. Chvátal's theorem says that for any fixed $n\geq 2$, as $m$ ranges over $\{0,\ldots,n\}$, the probability $q_m:=P(B(n,m/n)\leq m)$ is the smallest when $m$ is closest to $\frac{2n}{3}$. Motivated by this theorem, in this note we consider the infimum value of the probability $P(X\leq E[X])$, where $X$ is a negative binomial random variable. As a consequence, we give an affirmative answer to the conjecture posed in [Statistics and Probability Letters, 200 (2023) 109871]. △ Less

Submitted 12 November, 2023; originally announced November 2023.

Comments: 10 pages

arXiv:2311.06854 [pdf, ps, other]

Multiuser Resource Allocation for Semantic-Relay-Aided Text Transmissions

Authors: Zeyang Hu, Tianyu Liu, Changsheng You, Zhaohui Yang, Mingzhe Chen

Abstract: Semantic communication (SemCom) is an emerging technology that extracts useful meaning from data and sends only relevant semantic information. Thus, it has the great potential to improve the spectrum efficiency of conventional wireless systems with bit transmissions, especially in low signal-to-noise ratio (SNR) and small bandwidth regions. However, the existing works have mostly overlooked the co… ▽ More Semantic communication (SemCom) is an emerging technology that extracts useful meaning from data and sends only relevant semantic information. Thus, it has the great potential to improve the spectrum efficiency of conventional wireless systems with bit transmissions, especially in low signal-to-noise ratio (SNR) and small bandwidth regions. However, the existing works have mostly overlooked the constraints of mobile devices, which may not have sufficient capabilities to implement resource-demanding semantic encoder/decoder based on deep learning. To address this issue, we propose in this paper a new semantic relay (SemRelay), which is equipped with a semantic receiver to assist multiuser text transmissions. Specifically, the SemRelay decodes semantic information from a base station and forwards it to the users using conventional bit transmission, hence effectively improving text transmission efficiency. To study the multiuser resource allocation, we formulate an optimization problem to maximize the multiuser weighted sum-rate by jointly designing the SemRelay transmit power allocation and system bandwidth allocation. Although this problem is non-convex and hence challenging to solve, we propose an efficient algorithm to obtain its high-quality suboptimal solution by using the block coordinate descent method. Last, numerical results show the effectiveness of the proposed algorithm as well as superior performance of the proposed SemRelay over the conventional decode-and-forward (DF) relay, especially in small bandwidth region. △ Less

Submitted 12 November, 2023; originally announced November 2023.

Comments: 6 pages, 3 figures, accepted for IEEE Global Communication Conference (GLOBECOM) 2023 Workshop on Semantic Communication for 6G

arXiv:2311.06720 [pdf, other]

Cappy: Outperforming and Boosting Large Multi-Task LMs with a Small Scorer

Authors: Bowen Tan, Yun Zhu, Lijuan Liu, Eric Xing, Zhiting Hu, **dong Chen

Abstract: Large language models (LLMs) such as T0, FLAN, and OPT-IML, excel in multi-tasking under a unified instruction-following paradigm, where they also exhibit remarkable generalization abilities to unseen tasks. Despite their impressive performance, these LLMs, with sizes ranging from several billion to hundreds of billions of parameters, demand substantial computational resources, making their traini… ▽ More Large language models (LLMs) such as T0, FLAN, and OPT-IML, excel in multi-tasking under a unified instruction-following paradigm, where they also exhibit remarkable generalization abilities to unseen tasks. Despite their impressive performance, these LLMs, with sizes ranging from several billion to hundreds of billions of parameters, demand substantial computational resources, making their training and inference expensive and inefficient. Furthermore, adapting these models to downstream applications, particularly complex tasks, is often unfeasible due to the extensive hardware requirements for finetuning, even when utilizing parameter-efficient approaches such as prompt tuning. Additionally, the most powerful multi-task LLMs, such as OPT-IML-175B and FLAN-PaLM-540B, are not publicly accessible, severely limiting their customization potential. To address these challenges, we introduce a pretrained small scorer, Cappy, designed to enhance the performance and efficiency of multi-task LLMs. With merely 360 million parameters, Cappy functions either independently on classification tasks or serve as an auxiliary component for LLMs, boosting their performance. Moreover, Cappy enables efficiently integrating downstream supervision without requiring LLM finetuning nor the access to their parameters. Our experiments demonstrate that, when working independently on 11 language understanding tasks from PromptSource, Cappy outperforms LLMs that are several orders of magnitude larger. Besides, on 45 complex tasks from BIG-Bench, Cappy boosts the performance of the advanced multi-task LLM, FLAN-T5, by a large margin. Furthermore, Cappy is flexible to cooperate with other LLM adaptations, including finetuning and in-context learning, offering additional performance enhancement. △ Less

Submitted 11 November, 2023; originally announced November 2023.

Comments: In proceedings of NeurIPS 2023; Code and model available at https://github.com/tanyuqian/cappy and https://huggingface.co/btan2/cappy-large, respectively

arXiv:2311.05374 [pdf, other]

TencentLLMEval: A Hierarchical Evaluation of Real-World Capabilities for Human-Aligned LLMs

Authors: Shuyi Xie, Wenlin Yao, Yong Dai, Shaobo Wang, Donlin Zhou, Lifeng **, Xinhua Feng, Pengzhi Wei, Yujie Lin, Zhichao Hu, Dong Yu, Zhengyou Zhang, **g Nie, Yuhong Liu

Abstract: Large language models (LLMs) have shown impressive capabilities across various natural language tasks. However, evaluating their alignment with human preferences remains a challenge. To this end, we propose a comprehensive human evaluation framework to assess LLMs' proficiency in following instructions on diverse real-world tasks. We construct a hierarchical task tree encompassing 7 major areas co… ▽ More Large language models (LLMs) have shown impressive capabilities across various natural language tasks. However, evaluating their alignment with human preferences remains a challenge. To this end, we propose a comprehensive human evaluation framework to assess LLMs' proficiency in following instructions on diverse real-world tasks. We construct a hierarchical task tree encompassing 7 major areas covering over 200 categories and over 800 tasks, which covers diverse capabilities such as question answering, reasoning, multiturn dialogue, and text generation, to evaluate LLMs in a comprehensive and in-depth manner. We also design detailed evaluation standards and processes to facilitate consistent, unbiased judgments from human evaluators. A test set of over 3,000 instances is released, spanning different difficulty levels and knowledge domains. Our work provides a standardized methodology to evaluate human alignment in LLMs for both English and Chinese. We also analyze the feasibility of automating parts of evaluation with a strong LLM (GPT-4). Our framework supports a thorough assessment of LLMs as they are integrated into real-world applications. We have made publicly available the task tree, TencentLLMEval dataset, and evaluation methodology which have been demonstrated as effective in assessing the performance of Tencent Hunyuan LLMs. By doing so, we aim to facilitate the benchmarking of advances in the development of safe and human-aligned LLMs. △ Less

Submitted 9 November, 2023; originally announced November 2023.

Showing 201–250 of 1,580 results for author: Hu, Z