Search | arXiv e-print repository

FedFMS: Exploring Federated Foundation Models for Medical Image Segmentation

Authors: Yuxi Liu, Guibo Luo, Yuesheng Zhu

Abstract: Medical image segmentation is crucial for clinical diagnosis. The Segmentation Anything Model (SAM) serves as a powerful foundation model for visual segmentation and can be adapted for medical image segmentation. However, medical imaging data typically contain privacy-sensitive information, making it challenging to train foundation models with centralized storage and sharing. To date, there are fe… ▽ More Medical image segmentation is crucial for clinical diagnosis. The Segmentation Anything Model (SAM) serves as a powerful foundation model for visual segmentation and can be adapted for medical image segmentation. However, medical imaging data typically contain privacy-sensitive information, making it challenging to train foundation models with centralized storage and sharing. To date, there are few foundation models tailored for medical image deployment within the federated learning framework, and the segmentation performance, as well as the efficiency of communication and training, remain unexplored. In response to these issues, we developed Federated Foundation models for Medical image Segmentation (FedFMS), which includes the Federated SAM (FedSAM) and a communication and training-efficient Federated SAM with Medical SAM Adapter (FedMSA). Comprehensive experiments on diverse datasets are conducted to investigate the performance disparities between centralized training and federated learning across various configurations of FedFMS. The experiments revealed that FedFMS could achieve performance comparable to models trained via centralized training methods while maintaining privacy. Furthermore, FedMSA demonstrated the potential to enhance communication and training efficiency. Our model implementation codes are available at https://github.com/LIU-YUXI/FedFMS. △ Less

Submitted 8 March, 2024; originally announced March 2024.

Comments: Medical image segmentation, Federated learning and Foundation model

ACM Class: I.4.6; I.2.11

arXiv:2403.02084 [pdf, other]

ResAdapter: Domain Consistent Resolution Adapter for Diffusion Models

Authors: Jiaxiang Cheng, Pan Xie, Xin Xia, Jiashi Li, Jie Wu, Yuxi Ren, Huixia Li, Xuefeng Xiao, Min Zheng, Lean Fu

Abstract: Recent advancement in text-to-image models (e.g., Stable Diffusion) and corresponding personalized technologies (e.g., DreamBooth and LoRA) enables individuals to generate high-quality and imaginative images. However, they often suffer from limitations when generating images with resolutions outside of their trained domain. To overcome this limitation, we present the Resolution Adapter (ResAdapter… ▽ More Recent advancement in text-to-image models (e.g., Stable Diffusion) and corresponding personalized technologies (e.g., DreamBooth and LoRA) enables individuals to generate high-quality and imaginative images. However, they often suffer from limitations when generating images with resolutions outside of their trained domain. To overcome this limitation, we present the Resolution Adapter (ResAdapter), a domain-consistent adapter designed for diffusion models to generate images with unrestricted resolutions and aspect ratios. Unlike other multi-resolution generation methods that process images of static resolution with complex post-process operations, ResAdapter directly generates images with the dynamical resolution. Especially, after learning a deep understanding of pure resolution priors, ResAdapter trained on the general dataset, generates resolution-free images with personalized diffusion models while preserving their original style domain. Comprehensive experiments demonstrate that ResAdapter with only 0.5M can process images with flexible resolutions for arbitrary diffusion models. More extended experiments demonstrate that ResAdapter is compatible with other modules (e.g., ControlNet, IP-Adapter and LCM-LoRA) for image generation across a broad range of resolutions, and can be integrated into other multi-resolution model (e.g., ElasticDiffusion) for efficiently generating higher-resolution images. Project link is https://res-adapter.github.io △ Less

Submitted 4 March, 2024; originally announced March 2024.

Comments: 21 pages, 16 figures

arXiv:2403.00628 [pdf, other]

Region-Adaptive Transform with Segmentation Prior for Image Compression

Authors: Yuxi Liu, Wenhan Yang, Huihui Bai, Yunchao Wei, Yao Zhao

Abstract: Learned Image Compression (LIC) has shown remarkable progress in recent years. Existing works commonly employ CNN-based or self-attention-based modules as transform methods for compression. However, there is no prior research on neural transform that focuses on specific regions. In response, we introduce the class-agnostic segmentation masks (i.e. semantic masks without category labels) for extrac… ▽ More Learned Image Compression (LIC) has shown remarkable progress in recent years. Existing works commonly employ CNN-based or self-attention-based modules as transform methods for compression. However, there is no prior research on neural transform that focuses on specific regions. In response, we introduce the class-agnostic segmentation masks (i.e. semantic masks without category labels) for extracting region-adaptive contextual information. Our proposed module, Region-Adaptive Transform, applies adaptive convolutions on different regions guided by the masks. Additionally, we introduce a plug-and-play module named Scale Affine Layer to incorporate rich contexts from various regions. While there have been prior image compression efforts that involve segmentation masks as additional intermediate inputs, our approach differs significantly from them. Our advantages lie in that, to avoid extra bitrate overhead, we treat these masks as privilege information, which is accessible during the model training stage but not required during the inference phase. To the best of our knowledge, we are the first to employ class-agnostic masks as privilege information and achieve superior performance in pixel-fidelity metrics, such as Peak Signal to Noise Ratio (PSNR). The experimental results demonstrate our improvement compared to previously well-performing methods, with about 8.2% bitrate saving compared to VTM-17.0. The source code is available at https://github.com/GityuxiLiu/SegPIC-for-Image-Compression. △ Less

Submitted 9 July, 2024; v1 submitted 1 March, 2024; originally announced March 2024.

Comments: Accepted to ECCV 2024

arXiv:2403.00491 [pdf, ps, other]

Analyzing Divergence for Nondeterministic Probabilistic Models

Authors: Hao Wu, Yuxi Fu, Huan Long, Xian Xu, Wenbo Zhang

Abstract: Branching and weak probabilistic bisimilarities are two well-known notions capturing behavioral equivalence between nondeterministic probabilistic systems. For probabilistic systems, divergence is of major concern. Recently several divergence-sensitive refinements of branching and weak probabilistic bisimilarities have been proposed in the literature. Both the definitions of these equivalences and… ▽ More Branching and weak probabilistic bisimilarities are two well-known notions capturing behavioral equivalence between nondeterministic probabilistic systems. For probabilistic systems, divergence is of major concern. Recently several divergence-sensitive refinements of branching and weak probabilistic bisimilarities have been proposed in the literature. Both the definitions of these equivalences and the techniques to investigate them differ significantly. This paper presents a comprehensive comparative study on divergence-sensitive behavioral equivalence relations that refine the branching and weak probabilistic bisimilarities. Additionally, these equivalence relations are shown to have efficient checking algorithms. The techniques of this paper might be of independent interest in a more general setting. △ Less

Submitted 1 March, 2024; originally announced March 2024.

arXiv:2402.16895 [pdf, other]

Field-Aligned Current Structures during the Terrestrial Magnetosphere's Transformation into Alfven Wings and Recovery

Authors: Jason M. H. Beedle, Li-Jen Chen, Jason R. Shuster, Harsha Gurram, Dan J. Gershman, Yuxi Chen, Rachel C. Rice, Brandon L. Burkholder, Akhtar S. Ardakani, Kevin J. Genestreti, Roy B. Torbert

Abstract: On April 24th, 2023, a CME event caused the solar wind to become sub-Alfvenic, leading to the development of an Alfven Wing configuration in the Earth's Magnetosphere. Alfven Wings have previously been observed as cavities of low flow in Jupiter's magnetosphere, but the observing satellites did not have the ability to directly measure the Alfven Wings' current structures. Through in situ measureme… ▽ More On April 24th, 2023, a CME event caused the solar wind to become sub-Alfvenic, leading to the development of an Alfven Wing configuration in the Earth's Magnetosphere. Alfven Wings have previously been observed as cavities of low flow in Jupiter's magnetosphere, but the observing satellites did not have the ability to directly measure the Alfven Wings' current structures. Through in situ measurements made by the Magnetospheric Multiscale (MMS) spacecraft, the April 24th event provides us with the first direct measurements of current structures during an Alfven Wing configuration. We have found two distinct types of current structures associated with the Alfven Wing transformation as well as the magnetosphere recovery. These structures are observed to be significantly more anti-field-aligned and electron-driven than typical magnetopause currents, indicating the disruptions caused to the magnetosphere current system by the Alfven Wing formation. △ Less

Submitted 23 February, 2024; originally announced February 2024.

arXiv:2402.14954 [pdf, other]

Methods for the detection of stellar rotation periods in individual TESS sectors and results from the Prime mission

Authors: Isabel L. Colman, Ruth Angus, Trevor David, Jason Curtis, Soichiro Hattori, Yuxi Lucy Lu

Abstract: For ongoing studies of the role of rotation in stellar evolution, we require large catalogs of rotation periods for testing and refining gyrochronology. While there is a wealth of data from the Kepler and K2 missions, TESS presents both an opportunity and a challenge: despite its all-sky coverage, rotation periods remain hard to detect. We analyzed individual TESS sectors to detect short-period st… ▽ More For ongoing studies of the role of rotation in stellar evolution, we require large catalogs of rotation periods for testing and refining gyrochronology. While there is a wealth of data from the Kepler and K2 missions, TESS presents both an opportunity and a challenge: despite its all-sky coverage, rotation periods remain hard to detect. We analyzed individual TESS sectors to detect short-period stellar rotation, using only parameters measured from light curves for a robust and unbiased method of evaluating detections. We used random forest classifiers for vetting, trained on a large corpus of period measurements in KELT data from the Oelkers et al. (2018) catalog and using TESS full-frame image light curves generated by eleanor (Feinstein et al. 2019). Finally, using data from the first 26 sectors of TESS, we analyzed 432,704 2-minute cadence single-sector light curves for FGKM dwarfs. We detected 16,800 periods in individual sector light curves, covering 10,909 distinct targets, and we present a catalog of the median period for each target as measured by a Lomb-Scargle periodogram. △ Less

Submitted 22 February, 2024; originally announced February 2024.

Comments: 23 pages, 15 figures, accepted for publication in The Astronomical Journal

arXiv:2402.13471 [pdf]

Thermal transport in a 2D amorphous material

Authors: Yuxi Wang, Xingxing Zhang, Wujuan Yan, Nianjie Liang, Haiyu He, Xinwei Tao, Ang Li, Fuwei Yang, Buxuan Li, Te-Huan Liu, Jia Zhu, Wu Zhou, Wei Wang, Lin Zhou, Bai Song

Abstract: Two-dimensional (2D) crystals proved revolutionary soon after graphene was discovered in 2004. However, 2D amorphous materials only became accessible in 2020 and remain largely unexplored. In particular, the thermophysical properties of amorphous materials are of great interest upon transition from 3D to 2D. Here, we probe thermal transport in 2D amorphous carbon. A cross-plane thermal conductivit… ▽ More Two-dimensional (2D) crystals proved revolutionary soon after graphene was discovered in 2004. However, 2D amorphous materials only became accessible in 2020 and remain largely unexplored. In particular, the thermophysical properties of amorphous materials are of great interest upon transition from 3D to 2D. Here, we probe thermal transport in 2D amorphous carbon. A cross-plane thermal conductivity ($κ$) down to 0.079 $\rm{Wm}^{-1}K^{-1}$ is measured for van der Waals stacked multilayers at room temperature, which is among the lowest reported to date. Meanwhile, an unexpectedly high in-plane $κ$ is obtained for freestanding monolayers which is a few times larger than what is predicted by conventional wisdom for 3D amorphous carbon with similar $\rm{sp}^{2}$ fraction. Our molecular dynamics simulations reveal the role of disorder and highlight the impact of dimensionality. Amorphous materials at the 2D limit open up new avenues for understanding and manipulating heat at the atomic scale. △ Less

Submitted 22 March, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

arXiv:2402.09433 [pdf, other]

Electrical Behavior Association Mining for Household ShortTerm Energy Consumption Forecasting

Authors: Heyang Yu, Yuxi Sun, Yintao Liu, Guangchao Geng, Quanyuan Jiang

Abstract: Accurate household short-term energy consumption forecasting (STECF) is crucial for home energy management, but it is technically challenging, due to highly random behaviors of individual residential users. To improve the accuracy of STECF on a day-ahead scale, this paper proposes an novel STECF methodology that leverages association mining in electrical behaviors. First, a probabilistic associati… ▽ More Accurate household short-term energy consumption forecasting (STECF) is crucial for home energy management, but it is technically challenging, due to highly random behaviors of individual residential users. To improve the accuracy of STECF on a day-ahead scale, this paper proposes an novel STECF methodology that leverages association mining in electrical behaviors. First, a probabilistic association quantifying and discovering method is proposed to model the pairwise behaviors association and generate associated clusters. Then, a convolutional neural network-gated recurrent unit (CNN-GRU) based forecasting is provided to explore the temporal correlation and enhance accuracy. The testing results demonstrate that this methodology yields a significant enhancement in the STECF. △ Less

Submitted 25 January, 2024; originally announced February 2024.

Comments: 3 figures and 4 tables; This manuscript is submitted for possible publication

arXiv:2402.08091 [pdf]

Earth's Alfvén wings driven by the April 2023 Coronal Mass Ejection

Authors: Li-Jen Chen, Daniel Gershman, Brandon Burkholder, Yuxi Chen, Menelaos Sarantos, Lan Jian, James Drake, Chuanfei Dong, Harsha Gurram, Jason Shuster, Daniel Graham, Olivier Le Contel, Steven Schwartz, Stephen Fuselier, Hadi Madanian, Craig Pollock, Haoming Liang, Matthew Argall, Richard Denton, Rachel Rice, Jason Beedle, Kevin Genestreti, Akhtar Ardakani, Adam Stanier, Ari Le , et al. (11 additional authors not shown)

Abstract: We report a rare regime of Earth's magnetosphere interaction with sub-Alfvénic solar wind in which the windsock-like magnetosphere transforms into one with Alfvén wings. In the magnetic cloud of a Coronal Mass Ejection (CME) on April 24, 2023, NASA's Magnetospheric Multiscale mission distinguishes the following features: (1) unshocked and accelerated cold CME plasma coming directly against Earth's… ▽ More We report a rare regime of Earth's magnetosphere interaction with sub-Alfvénic solar wind in which the windsock-like magnetosphere transforms into one with Alfvén wings. In the magnetic cloud of a Coronal Mass Ejection (CME) on April 24, 2023, NASA's Magnetospheric Multiscale mission distinguishes the following features: (1) unshocked and accelerated cold CME plasma coming directly against Earth's dayside magnetosphere; (2) dynamical wing filaments representing new channels of magnetic connection between the magnetosphere and foot points of the Sun's erupted flux rope; (3) cold CME ions observed with energized counter-streaming electrons, evidence of CME plasma captured due to reconnection between magnetic-cloud and Alfvén-wing field lines. The reported measurements advance our knowledge of CME interaction with planetary magnetospheres, and open new opportunities to understand how sub-Alfvénic plasma flows impact astrophysical bodies such as Mercury, moons of Jupiter, and exoplanets close to their host stars. △ Less

Submitted 3 May, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

Comments: 14 pages, including 4 figures, Under review in Geophys. Res. Lett

arXiv:2402.06953 [pdf, other]

The Star Formation History in Local Group Galaxies. I. Ten Dwarf Galaxies

Authors: Yi Ren, Biwei Jiang, Yuxi Wang, Ming Yang, Zhiqiang Yan

Abstract: The star formation histories (SFHs) of galaxies provide valuable insights into galaxy evolution and stellar physics. Understanding the SFHs enables the study of chemical enrichment of galaxies, star formation triggered by interactions, and the behavior of various stellar populations. This work investigates the SFHs of ten dwarf galaxies in the Local Group (LG), which spans a wide range of types, m… ▽ More The star formation histories (SFHs) of galaxies provide valuable insights into galaxy evolution and stellar physics. Understanding the SFHs enables the study of chemical enrichment of galaxies, star formation triggered by interactions, and the behavior of various stellar populations. This work investigates the SFHs of ten dwarf galaxies in the Local Group (LG), which spans a wide range of types, masses, luminosities, and metallicities. The analysis is based on our new sample of the member stars in the LG after removing the foreground dwarf stars by the near-infrared color-color diagram and the Gaia astrometric information. The samples include the most complete and pure red supergiants and asymptotic giant branch stars to gain valuable insights into the recent SFHs of the galaxies. The CMD fitting method is introduced to measure the SFH. The Padova isochrones are used to generate initial model CMDs, accounting for photometric errors and completeness through star field simulations to match the completeness and error distributions of the observed CMDs. Subsequently, the SFHs, distance modulus, and metallicity of the ten dwarf galaxies are determined by fitting the CMDs. The results indicate that the star formation rates (SFRs) of dwarf irregulars show a gradual increase, while those of dwarf ellipticals exhibit a gradual decrease from the past to the present. Furthermore, this work shows that the star formation activity in dwarf ellipticals persisted up to 30 Myr ago. A significant increasing feature in the SFH of NGC 6822 reveals star formation activity triggered by an interaction event. △ Less

Submitted 10 February, 2024; originally announced February 2024.

Comments: 21 pages, 11 figures, 5 tables, accepted for publication in ApJ

arXiv:2402.05746 [pdf, other]

Editable Scene Simulation for Autonomous Driving via Collaborative LLM-Agents

Authors: Yuxi Wei, Zi Wang, Yifan Lu, Chenxin Xu, Changxing Liu, Hao Zhao, Siheng Chen, Yanfeng Wang

Abstract: Scene simulation in autonomous driving has gained significant attention because of its huge potential for generating customized data. However, existing editable scene simulation approaches face limitations in terms of user interaction efficiency, multi-camera photo-realistic rendering and external digital assets integration. To address these challenges, this paper introduces ChatSim, the first sys… ▽ More Scene simulation in autonomous driving has gained significant attention because of its huge potential for generating customized data. However, existing editable scene simulation approaches face limitations in terms of user interaction efficiency, multi-camera photo-realistic rendering and external digital assets integration. To address these challenges, this paper introduces ChatSim, the first system that enables editable photo-realistic 3D driving scene simulations via natural language commands with external digital assets. To enable editing with high command flexibility,~ChatSim leverages a large language model (LLM) agent collaboration framework. To generate photo-realistic outcomes, ChatSim employs a novel multi-camera neural radiance field method. Furthermore, to unleash the potential of extensive high-quality digital assets, ChatSim employs a novel multi-camera lighting estimation method to achieve scene-consistent assets' rendering. Our experiments on Waymo Open Dataset demonstrate that ChatSim can handle complex language commands and generate corresponding photo-realistic scene videos. △ Less

Submitted 26 June, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

Comments: CVPR 2024(Highlight)

arXiv:2402.04282 [pdf, other]

Interplanetary magnetic field $B_y$ controlled Alfvén wings at Earth during encounter of a coronal mass ejection

Authors: Yuxi Chen, Chuanfei Dong, Li-Jen Chen, Menelaos Sarantos, Brandon L. Burkholder

Abstract: In the vicinity of Earth's orbit, the typical solar wind Alfvén Mach number exceeds 5, and the super-Alfvénic solar wind drives a conventional magnetosphere configuration. However, at the ejecta phase of an interplanetary coronal mass ejection (ICME) event, the Alfvén Mach number may experience a significant reduction due to the intensified interplanetary magnetic field (IMF) strength and decrease… ▽ More In the vicinity of Earth's orbit, the typical solar wind Alfvén Mach number exceeds 5, and the super-Alfvénic solar wind drives a conventional magnetosphere configuration. However, at the ejecta phase of an interplanetary coronal mass ejection (ICME) event, the Alfvén Mach number may experience a significant reduction due to the intensified interplanetary magnetic field (IMF) strength and decreased density. On 24 April 2023, an ICME reached Earth's orbit. The solar wind density dropped to as low as 0.3 amu/cc while the IMF strength is about 25 nT. As a result, the solar wind flow transitions to a sub-Alfvénic state with an Alfvén Mach number of 0.4, providing opportunities to investigate the interaction of planetary magnetospheres with low Mach number solar wind. We carry out global simulations to investigate the responses of Earth's magnetosphere to the sub-Alfvénic ICME ejecta. The global magnetohydrodynamic (MHD) simulation results show the formation of Alfvén wings as the solar wind becomes sub-Alfvénic. Furthermore, the sub-Alfvénic period was characterized by the dominance of IMF By component, causing the Alfvén wings to extend towards the dawn and dusk sides. In this paper, we present the structures of the magnetic field, plasma flow, and current system around the Alfvén wings. The global magnetospheric convection under the sub-Alfvénic solar wind condition is discussed in depth. Our results achieve a new level of understanding about the interaction between a magnetized body and sub-Alfvénic upstream conditions, and provide guidance for future observations. △ Less

Submitted 5 February, 2024; originally announced February 2024.

arXiv:2402.00464 [pdf, ps, other]

Normalized solutions for a fractional Schrödinger-Poisson system with critical growth

Authors: Xiaoming He, Yuxi Meng, Marco Squassina

Abstract: In this paper, we study the fractional critical Schrödinger-Poisson system \[\begin{cases} (-Δ)^su +λφu= αu+μ|u|^{q-2}u+|u|^{2^*_s-2}u,&~~ \mbox{in}~{\mathbb R}^3,\\ (-Δ)^tφ=u^2,&~~ \mbox{in}~{\mathbb R}^3,\end{cases} \] having prescribed mass \[\int_{\mathbb R^3} |u|^2dx=a^2,\] where $ s, t \in (0, 1)$ satisfies $2s+2t > 3, q\in(2,2^*_s), a>0$ and $λ,μ>0$ parameters and $α\in{\mathbb R}$ is an un… ▽ More In this paper, we study the fractional critical Schrödinger-Poisson system \[\begin{cases} (-Δ)^su +λφu= αu+μ|u|^{q-2}u+|u|^{2^*_s-2}u,&~~ \mbox{in}~{\mathbb R}^3,\\ (-Δ)^tφ=u^2,&~~ \mbox{in}~{\mathbb R}^3,\end{cases} \] having prescribed mass \[\int_{\mathbb R^3} |u|^2dx=a^2,\] where $ s, t \in (0, 1)$ satisfies $2s+2t > 3, q\in(2,2^*_s), a>0$ and $λ,μ>0$ parameters and $α\in{\mathbb R}$ is an undetermined parameter. Under the $L^2$-subcritical perturbation $q\in (2, 2+\frac{4s}{3})$, we derive the existence of multiple normalized solutions by means of the truncation technique, concentration-compactness principle and the genus theory. For the $L^2$-supercritical perturbation $q\in (2+\frac{4s}{3}, 2^*_s)$, by applying the constrain variational methods and the mountain pass theorem, we show the existence of positive normalized ground state solutions. △ Less

Submitted 1 February, 2024; originally announced February 2024.

Comments: 43 pages

MSC Class: 35J62; 35J50; 35B65

arXiv:2401.17857 [pdf, other]

SAGD: Boundary-Enhanced Segment Anything in 3D Gaussian via Gaussian Decomposition

Authors: Xu Hu, Yuxi Wang, Lue Fan, Junsong Fan, Junran Peng, Zhen Lei, Qing Li, Zhaoxiang Zhang

Abstract: 3D Gaussian Splatting has emerged as an alternative 3D representation for novel view synthesis, benefiting from its high-quality rendering results and real-time rendering speed. However, the 3D Gaussians learned by 3D-GS have ambiguous structures without any geometry constraints. This inherent issue in 3D-GS leads to a rough boundary when segmenting individual objects. To remedy these problems, we… ▽ More 3D Gaussian Splatting has emerged as an alternative 3D representation for novel view synthesis, benefiting from its high-quality rendering results and real-time rendering speed. However, the 3D Gaussians learned by 3D-GS have ambiguous structures without any geometry constraints. This inherent issue in 3D-GS leads to a rough boundary when segmenting individual objects. To remedy these problems, we propose SAGD, a conceptually simple yet effective boundary-enhanced segmentation pipeline for 3D-GS to improve segmentation accuracy while preserving segmentation speed. Specifically, we introduce a Gaussian Decomposition scheme, which ingeniously utilizes the special structure of 3D Gaussian, finds out, and then decomposes the boundary Gaussians. Moreover, to achieve fast interactive 3D segmentation, we introduce a novel training-free pipeline by lifting a 2D foundation model to 3D-GS. Extensive experiments demonstrate that our approach achieves high-quality 3D segmentation without rough boundary issues, which can be easily applied to other scene editing tasks. △ Less

Submitted 17 May, 2024; v1 submitted 31 January, 2024; originally announced January 2024.

arXiv:2401.15071 [pdf, other]

From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities

Authors: Chaochao Lu, Chen Qian, Guodong Zheng, Hongxing Fan, Hongzhi Gao, Jie Zhang, **g Shao, **gyi Deng, **lan Fu, Kexin Huang, Kunchang Li, Lijun Li, Limin Wang, Lu Sheng, Meiqi Chen, Ming Zhang, Qibing Ren, Sirui Chen, Tao Gui, Wanli Ouyang, Yali Wang, Yan Teng, Yaru Wang, Yi Wang, Yinan He , et al. (11 additional authors not shown)

Abstract: Multi-modal Large Language Models (MLLMs) have shown impressive abilities in generating reasonable responses with respect to multi-modal contents. However, there is still a wide gap between the performance of recent MLLM-based applications and the expectation of the broad public, even though the most powerful OpenAI's GPT-4 and Google's Gemini have been deployed. This paper strives to enhance unde… ▽ More Multi-modal Large Language Models (MLLMs) have shown impressive abilities in generating reasonable responses with respect to multi-modal contents. However, there is still a wide gap between the performance of recent MLLM-based applications and the expectation of the broad public, even though the most powerful OpenAI's GPT-4 and Google's Gemini have been deployed. This paper strives to enhance understanding of the gap through the lens of a qualitative study on the generalizability, trustworthiness, and causal reasoning capabilities of recent proprietary and open-source MLLMs across four modalities: ie, text, code, image, and video, ultimately aiming to improve the transparency of MLLMs. We believe these properties are several representative factors that define the reliability of MLLMs, in supporting various downstream applications. To be specific, we evaluate the closed-source GPT-4 and Gemini and 6 open-source LLMs and MLLMs. Overall we evaluate 230 manually designed cases, where the qualitative results are then summarized into 12 scores (ie, 4 modalities times 3 properties). In total, we uncover 14 empirical findings that are useful to understand the capabilities and limitations of both proprietary and open-source MLLMs, towards more reliable downstream multi-modal applications. △ Less

Submitted 29 January, 2024; v1 submitted 26 January, 2024; originally announced January 2024.

arXiv:2401.13325 [pdf, other]

Memory Consistency Guided Divide-and-Conquer Learning for Generalized Category Discovery

Authors: Yuanpeng Tu, Zhun Zhong, Yuxi Li, Hengshuang Zhao

Abstract: Generalized category discovery (GCD) aims at addressing a more realistic and challenging setting of semi-supervised learning, where only part of the category labels are assigned to certain training samples. Previous methods generally employ naive contrastive learning or unsupervised clustering scheme for all the samples. Nevertheless, they usually ignore the inherent critical information within th… ▽ More Generalized category discovery (GCD) aims at addressing a more realistic and challenging setting of semi-supervised learning, where only part of the category labels are assigned to certain training samples. Previous methods generally employ naive contrastive learning or unsupervised clustering scheme for all the samples. Nevertheless, they usually ignore the inherent critical information within the historical predictions of the model being trained. Specifically, we empirically reveal that a significant number of salient unlabeled samples yield consistent historical predictions corresponding to their ground truth category. From this observation, we propose a Memory Consistency guided Divide-and-conquer Learning framework (MCDL). In this framework, we introduce two memory banks to record historical prediction of unlabeled data, which are exploited to measure the credibility of each sample in terms of its prediction consistency. With the guidance of credibility, we can design a divide-and-conquer learning strategy to fully utilize the discriminative information of unlabeled data while alleviating the negative influence of noisy labels. Extensive experimental results on multiple benchmarks demonstrate the generality and superiority of our method, where our method outperforms state-of-the-art models by a large margin on both seen and unseen classes of the generic image recognition and challenging semantic shift settings (i.e.,with +8.4% gain on CUB and +8.1% on Standford Cars). △ Less

Submitted 31 January, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

arXiv:2401.10061 [pdf, other]

DiffusionGPT: LLM-Driven Text-to-Image Generation System

Authors: Jie Qin, Jie Wu, Weifeng Chen, Yuxi Ren, Huixia Li, Hefeng Wu, Xuefeng Xiao, Rui Wang, Shilei Wen

Abstract: Diffusion models have opened up new avenues for the field of image generation, resulting in the proliferation of high-quality models shared on open-source platforms. However, a major challenge persists in current text-to-image systems are often unable to handle diverse inputs, or are limited to single model results. Current unified attempts often fall into two orthogonal aspects: i) parse Diverse… ▽ More Diffusion models have opened up new avenues for the field of image generation, resulting in the proliferation of high-quality models shared on open-source platforms. However, a major challenge persists in current text-to-image systems are often unable to handle diverse inputs, or are limited to single model results. Current unified attempts often fall into two orthogonal aspects: i) parse Diverse Prompts in input stage; ii) activate expert model to output. To combine the best of both worlds, we propose DiffusionGPT, which leverages Large Language Models (LLM) to offer a unified generation system capable of seamlessly accommodating various types of prompts and integrating domain-expert models. DiffusionGPT constructs domain-specific Trees for various generative models based on prior knowledge. When provided with an input, the LLM parses the prompt and employs the Trees-of-Thought to guide the selection of an appropriate model, thereby relaxing input constraints and ensuring exceptional performance across diverse domains. Moreover, we introduce Advantage Databases, where the Tree-of-Thought is enriched with human feedback, aligning the model selection process with human preferences. Through extensive experiments and comparisons, we demonstrate the effectiveness of DiffusionGPT, showcasing its potential for pushing the boundaries of image synthesis in diverse domains. △ Less

Submitted 18 January, 2024; originally announced January 2024.

arXiv:2401.09655 [pdf]

Cavity-enhanced narrowband spectral filters using rare-earth ions doped in thin-film lithium niobate

Authors: Yuqi Zhao, Dylan Renaud, Demitry Farfurnik, Yuxi Jiang, Subhojit Dutta, Neil Sinclair, Marko Loncar, Edo Waks

Abstract: On-chip optical filters are fundamental components in optical signal processing. While rare-earth ion-doped crystals offer ultra-narrow optical filtering via spectral hole burning, their applications have primarily been limited to those using bulk crystals, restricting their utility. In this work, we demonstrate cavity-enhanced spectral filtering based on rare-earth ions in an integrated nonlinear… ▽ More On-chip optical filters are fundamental components in optical signal processing. While rare-earth ion-doped crystals offer ultra-narrow optical filtering via spectral hole burning, their applications have primarily been limited to those using bulk crystals, restricting their utility. In this work, we demonstrate cavity-enhanced spectral filtering based on rare-earth ions in an integrated nonlinear optical platform. We incorporate rare-earth ions into high quality-factor ring resonators patterned in thin-film lithium niobate. By spectral hole burning at 4K in a critically coupled resonance mode, we achieve bandpass filters ranging from 7 MHz linewidth, with 13.0 dB of extinction, to 24 MHz linewidth, with 20.4 dB of extinction. By reducing the temperature to 100 mK to eliminate phonon broadening, we achieve an even narrower linewidth of 681 kHz, which is comparable to the narrowest filter linewidth demonstrated in an integrated photonic device, while only requiring a small device footprint. Moreover, the cavity enables reconfigurable filtering by varying the cavity coupling rate. For instance, as opposed to the bandpass filter, we demonstrate a bandstop filter utilizing an under-coupled ring resonator. Such versatile integrated spectral filters with high extinction ratio and narrow linewidth could serve as fundamental components for optical signal processing and optical memories on-a-chip. △ Less

Submitted 30 May, 2024; v1 submitted 17 January, 2024; originally announced January 2024.

arXiv:2401.04408 [pdf, other]

Fine-Grained Embedding Dimension Optimization During Training for Recommender Systems

Authors: Qinyi Luo, Penghan Wang, Wei Zhang, Fan Lai, Jiachen Mao, Xiaohan Wei, Jun Song, Wei-Yu Tsai, Shuai Yang, Yuxi Hu, Xuehai Qian

Abstract: Huge embedding tables in modern Deep Learning Recommender Models (DLRM) require prohibitively large memory during training and inference. Aiming to reduce the memory footprint of training, this paper proposes FIne-grained In-Training Embedding Dimension optimization (FIITED). Given the observation that embedding vectors are not equally important, FIITED adjusts the dimension of each individual emb… ▽ More Huge embedding tables in modern Deep Learning Recommender Models (DLRM) require prohibitively large memory during training and inference. Aiming to reduce the memory footprint of training, this paper proposes FIne-grained In-Training Embedding Dimension optimization (FIITED). Given the observation that embedding vectors are not equally important, FIITED adjusts the dimension of each individual embedding vector continuously during training, assigning longer dimensions to more important embeddings while adapting to dynamic changes in data. A novel embedding storage system based on virtually-hashed physically-indexed hash tables is designed to efficiently implement the embedding dimension adjustment and effectively enable memory saving. Experiments on two industry models show that FIITED is able to reduce the size of embeddings by more than 65% while maintaining the trained model's quality, saving significantly more memory than a state-of-the-art in-training embedding pruning method. On public click-through rate prediction datasets, FIITED is able to prune up to 93.75%-99.75% embeddings without significant accuracy loss. △ Less

Submitted 9 January, 2024; originally announced January 2024.

Comments: 16 pages, 9 figures

ACM Class: I.2.6; H.3.3

arXiv:2401.03470 [pdf, other]

FurniScene: A Large-scale 3D Room Dataset with Intricate Furnishing Scenes

Authors: Genghao Zhang, Yuxi Wang, Chuanchen Luo, Shibiao Xu, Zhaoxiang Zhang, Man Zhang, Junran Peng

Abstract: Indoor scene generation has attracted significant attention recently as it is crucial for applications of gaming, virtual reality, and interior design. Current indoor scene generation methods can produce reasonable room layouts but often lack diversity and realism. This is primarily due to the limited coverage of existing datasets, including only large furniture without tiny furnishings in daily l… ▽ More Indoor scene generation has attracted significant attention recently as it is crucial for applications of gaming, virtual reality, and interior design. Current indoor scene generation methods can produce reasonable room layouts but often lack diversity and realism. This is primarily due to the limited coverage of existing datasets, including only large furniture without tiny furnishings in daily life. To address these challenges, we propose FurniScene, a large-scale 3D room dataset with intricate furnishing scenes from interior design professionals. Specifically, the FurniScene consists of 11,698 rooms and 39,691 unique furniture CAD models with 89 different types, covering things from large beds to small teacups on the coffee table. To better suit fine-grained indoor scene layout generation, we introduce a novel Two-Stage Diffusion Scene Model (TSDSM) and conduct an evaluation benchmark for various indoor scene generation based on FurniScene. Quantitative and qualitative evaluations demonstrate the capability of our method to generate highly realistic indoor scenes. Our dataset and code will be publicly available soon. △ Less

Submitted 6 May, 2024; v1 submitted 7 January, 2024; originally announced January 2024.

arXiv:2401.03145 [pdf, other]

Self-supervised Feature Adaptation for 3D Industrial Anomaly Detection

Authors: Yuanpeng Tu, Boshen Zhang, Liang Liu, Yuxi Li, Xuhai Chen, Jiangning Zhang, Yabiao Wang, Chengjie Wang, Cai Rong Zhao

Abstract: Industrial anomaly detection is generally addressed as an unsupervised task that aims at locating defects with only normal training samples. Recently, numerous 2D anomaly detection methods have been proposed and have achieved promising results, however, using only the 2D RGB data as input is not sufficient to identify imperceptible geometric surface anomalies. Hence, in this work, we focus on mult… ▽ More Industrial anomaly detection is generally addressed as an unsupervised task that aims at locating defects with only normal training samples. Recently, numerous 2D anomaly detection methods have been proposed and have achieved promising results, however, using only the 2D RGB data as input is not sufficient to identify imperceptible geometric surface anomalies. Hence, in this work, we focus on multi-modal anomaly detection. Specifically, we investigate early multi-modal approaches that attempted to utilize models pre-trained on large-scale visual datasets, i.e., ImageNet, to construct feature databases. And we empirically find that directly using these pre-trained models is not optimal, it can either fail to detect subtle defects or mistake abnormal features as normal ones. This may be attributed to the domain gap between target industrial data and source data.Towards this problem, we propose a Local-to-global Self-supervised Feature Adaptation (LSFA) method to finetune the adaptors and learn task-oriented representation toward anomaly detection.Both intra-modal adaptation and cross-modal alignment are optimized from a local-to-global perspective in LSFA to ensure the representation quality and consistency in the inference stage.Extensive experiments demonstrate that our method not only brings a significant performance boost to feature embedding based approaches, but also outperforms previous State-of-The-Art (SoTA) methods prominently on both MVTec-3D AD and Eyecandies datasets, e.g., LSFA achieves 97.1% I-AUROC on MVTec-3D, surpass previous SoTA by +3.4%. △ Less

Submitted 17 January, 2024; v1 submitted 6 January, 2024; originally announced January 2024.

arXiv:2312.13714 [pdf, other]

Bootstrap Masked Visual Modeling via Hard Patches Mining

Authors: Haochen Wang, Junsong Fan, Yuxi Wang, Kaiyou Song, Tiancai Wang, Xiangyu Zhang, Zhaoxiang Zhang

Abstract: Masked visual modeling has attracted much attention due to its promising potential in learning generalizable representations. Typical approaches urge models to predict specific contents of masked tokens, which can be intuitively considered as teaching a student (the model) to solve given problems (predicting masked contents). Under such settings, the performance is highly correlated with mask stra… ▽ More Masked visual modeling has attracted much attention due to its promising potential in learning generalizable representations. Typical approaches urge models to predict specific contents of masked tokens, which can be intuitively considered as teaching a student (the model) to solve given problems (predicting masked contents). Under such settings, the performance is highly correlated with mask strategies (the difficulty of provided problems). We argue that it is equally important for the model to stand in the shoes of a teacher to produce challenging problems by itself. Intuitively, patches with high values of reconstruction loss can be regarded as hard samples, and masking those hard patches naturally becomes a demanding reconstruction task. To empower the model as a teacher, we propose Hard Patches Mining (HPM), predicting patch-wise losses and subsequently determining where to mask. Technically, we introduce an auxiliary loss predictor, which is trained with a relative objective to prevent overfitting to exact loss values. Also, to gradually guide the training procedure, we propose an easy-to-hard mask strategy. Empirically, HPM brings significant improvements under both image and video benchmarks. Interestingly, solely incorporating the extra loss prediction objective leads to better representations, verifying the efficacy of determining where is hard to reconstruct. The code is available at https://github.com/Haochen-Wang409/HPM. △ Less

Submitted 21 December, 2023; originally announced December 2023.

Comments: arXiv admin note: substantial text overlap with arXiv:2304.05919

arXiv:2312.11024 [pdf, other]

Collaborative Weakly Supervised Video Correlation Learning for Procedure-Aware Instructional Video Analysis

Authors: Tianyao He, Huabin Liu, Yuxi Li, Xiao Ma, Cheng Zhong, Yang Zhang, Weiyao Lin

Abstract: Video Correlation Learning (VCL), which aims to analyze the relationships between videos, has been widely studied and applied in various general video tasks. However, applying VCL to instructional videos is still quite challenging due to their intrinsic procedural temporal structure. Specifically, procedural knowledge is critical for accurate correlation analyses on instructional videos. Neverthel… ▽ More Video Correlation Learning (VCL), which aims to analyze the relationships between videos, has been widely studied and applied in various general video tasks. However, applying VCL to instructional videos is still quite challenging due to their intrinsic procedural temporal structure. Specifically, procedural knowledge is critical for accurate correlation analyses on instructional videos. Nevertheless, current procedure-learning methods heavily rely on step-level annotations, which are costly and not scalable. To address this problem, we introduce a weakly supervised framework called Collaborative Procedure Alignment (CPA) for procedure-aware correlation learning on instructional videos. Our framework comprises two core modules: collaborative step mining and frame-to-step alignment. The collaborative step mining module enables simultaneous and consistent step segmentation for paired videos, leveraging the semantic and temporal similarity between frames. Based on the identified steps, the frame-to-step alignment module performs alignment between the frames and steps across videos. The alignment result serves as a measurement of the correlation distance between two videos. We instantiate our framework in two distinct instructional video tasks: sequence verification and action quality assessment. Extensive experiments validate the effectiveness of our approach in providing accurate and interpretable correlation analyses for instructional videos. △ Less

Submitted 18 December, 2023; originally announced December 2023.

Comments: has been accepted by AAAI 24

arXiv:2312.10997 [pdf, other]

Retrieval-Augmented Generation for Large Language Models: A Survey

Authors: Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, **liu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, Meng Wang, Haofen Wang

Abstract: Large Language Models (LLMs) showcase impressive capabilities but encounter challenges like hallucination, outdated knowledge, and non-transparent, untraceable reasoning processes. Retrieval-Augmented Generation (RAG) has emerged as a promising solution by incorporating knowledge from external databases. This enhances the accuracy and credibility of the generation, particularly for knowledge-inten… ▽ More Large Language Models (LLMs) showcase impressive capabilities but encounter challenges like hallucination, outdated knowledge, and non-transparent, untraceable reasoning processes. Retrieval-Augmented Generation (RAG) has emerged as a promising solution by incorporating knowledge from external databases. This enhances the accuracy and credibility of the generation, particularly for knowledge-intensive tasks, and allows for continuous knowledge updates and integration of domain-specific information. RAG synergistically merges LLMs' intrinsic knowledge with the vast, dynamic repositories of external databases. This comprehensive review paper offers a detailed examination of the progression of RAG paradigms, encompassing the Naive RAG, the Advanced RAG, and the Modular RAG. It meticulously scrutinizes the tripartite foundation of RAG frameworks, which includes the retrieval, the generation and the augmentation techniques. The paper highlights the state-of-the-art technologies embedded in each of these critical components, providing a profound understanding of the advancements in RAG systems. Furthermore, this paper introduces up-to-date evaluation framework and benchmark. At the end, this article delineates the challenges currently faced and points out prospective avenues for research and development. △ Less

Submitted 27 March, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

Comments: Ongoing Work

arXiv:2312.09595 [pdf, other]

Density Matters: Improved Core-set for Active Domain Adaptive Segmentation

Authors: Shizhan Liu, Zhengkai Jiang, Yuxi Li, **long Peng, Yabiao Wang, Weiyao Lin

Abstract: Active domain adaptation has emerged as a solution to balance the expensive annotation cost and the performance of trained models in semantic segmentation. However, existing works usually ignore the correlation between selected samples and its local context in feature space, which leads to inferior usage of annotation budgets. In this work, we revisit the theoretical bound of the classical Core-se… ▽ More Active domain adaptation has emerged as a solution to balance the expensive annotation cost and the performance of trained models in semantic segmentation. However, existing works usually ignore the correlation between selected samples and its local context in feature space, which leads to inferior usage of annotation budgets. In this work, we revisit the theoretical bound of the classical Core-set method and identify that the performance is closely related to the local sample distribution around selected samples. To estimate the density of local samples efficiently, we introduce a local proxy estimator with Dynamic Masked Convolution and develop a Density-aware Greedy algorithm to optimize the bound. Extensive experiments demonstrate the superiority of our approach. Moreover, with very few labels, our scheme achieves comparable performance to the fully supervised counterpart. △ Less

Submitted 15 December, 2023; originally announced December 2023.

arXiv:2312.07911 [pdf]

Projective Parallel Single-Pixel Imaging: 3D Structured Light Scanning Under Global Illumination

Authors: Yuxi Li, Hongzhi Jiang, Huijie Zhao, Xudong Li

Abstract: We present projective parallel single-pixel imaging (pPSI), a 3D photography method that provides a robust and efficient way to analyze the light transport behavior and enables separation of light effect due to global illumination, thereby achieving 3D structured light scanning under global illumination. The light transport behavior is described by the light transport coefficients (LTC), which con… ▽ More We present projective parallel single-pixel imaging (pPSI), a 3D photography method that provides a robust and efficient way to analyze the light transport behavior and enables separation of light effect due to global illumination, thereby achieving 3D structured light scanning under global illumination. The light transport behavior is described by the light transport coefficients (LTC), which contain complete information for a projector camera pair, and is a 4D data set. However, the capture of LTC is generally time consuming. The 4D LTC in pPSI are reduced to projection functions, thereby enabling a highly efficient data capture process. We introduce the local maximum constraint, which provides constraint for the location of candidate correspondence matching points when projections are captured. Local slice extension (LSE) method is introduced to accelerate the capture of projection functions. Optimization is conducted for pPSI under several situations. The number of projection functions required for pPSI is optimized and the influence of capture ratio in LSE on the accuracy of the correspondence matching points is investigated. Discussions and experiments include two typical kinds of global illuminations: inter-reflections and subsurface scattering. The proposed method is validated with several challenging scenarios, and outperforms the state-of-the-art methods. △ Less

Submitted 13 December, 2023; originally announced December 2023.

Comments: 21 pages,13 figures

arXiv:2312.07800 [pdf]

Purcell enhanced emission and saturable absorption of cavity-coupled CsPbBr$_3$ quantum dots

Authors: Purbita Purkayastha, Shaun Gallagher, Yuxi Jiang, Chang-Min Lee, Gillian Shen, David Ginger, Edo Waks

Abstract: Halide perovskite semiconductors have emerged as promising materials for the development of solution-processed, scalable, high performance optoelectronic devices such as light-emitting diodes (LEDs) as well as coherent single photon emitters. Their integration to nanophotonic cavities for radiative enhancement and strong nonlinearity is underexplored. In this work, we demonstrate cavity-enhanced e… ▽ More Halide perovskite semiconductors have emerged as promising materials for the development of solution-processed, scalable, high performance optoelectronic devices such as light-emitting diodes (LEDs) as well as coherent single photon emitters. Their integration to nanophotonic cavities for radiative enhancement and strong nonlinearity is underexplored. In this work, we demonstrate cavity-enhanced emission and saturable absorption using colloidal CsPbBr$_3$ perovskite quantum dots coupled to a high-Q cavity mode of a circular Bragg grating structure designed to facilitate integration of solution-processed materials . We achieve an order of magnitude increase in brightness and 8-fold increase in the spontaneous emission rate for the cavity-coupled emitters. This result indicates the possibility of achieving transform-limited photon coherence for the halide perovskites at cryogenic temperatures. We also observe saturable absorption of the emitters through intensity-dependent cavity quality factor. These results pave the way towards achieving improved photon indistinguishability and strong optical nonlinearities for cavity coupled perovskite systems. △ Less

Submitted 12 December, 2023; originally announced December 2023.

Comments: 10 pages, 4 figures

arXiv:2312.07219 [pdf, other]

LMC Stars and Where to Find Them: Inferring Birth Radii for External Galaxies

Authors: Yuxi, Lu, Tobias Buck, David Nidever, Bridget Ratcliffe, Ivan Minchev, Andrea V. Macciò, Aura Obreja

Abstract: It is well known that stars move away from their birth location over time via radial migration. This dynamical process makes computing the correct chemical evolution, e.g., metallicity gradients, of galaxies very difficult. This dynamical process makes inferring the chemical evolution of observed galaxies from their measured abundance gradients very difficult. One way to account for radial migrati… ▽ More It is well known that stars move away from their birth location over time via radial migration. This dynamical process makes computing the correct chemical evolution, e.g., metallicity gradients, of galaxies very difficult. This dynamical process makes inferring the chemical evolution of observed galaxies from their measured abundance gradients very difficult. One way to account for radial migration is to infer stellar birth radii for individual stars. Many attempts to do so have been performed over the last years, but are limited to the Milky Way as computing the birth position of stars requires precise measurements of stellar metallicity and age for individual stars that cover large Galactic radii. Fortunately, recent and future surveys will provide numerous opportunities for inferring birth radii for external galaxies such as the Large Magellanic Cloud (LMC). In this paper, we investigate the possibility of doing so using the NIHAO cosmological zoom-in simulations. We find that it is theoretically possible to infer birth radii with a ~ 25% median uncertainty for individual stars in galaxies with i) orderliness of the orbits, $\langle v_φ\rangle/σ_{v} >$ 2, ii) a dark matter halo mass greater or equal to approximately the LMC mass (~ 2 x 10$^{11} M_\odot$), and iii) after the average azimuthal velocity of the stellar disk reaches ~70% of its maximum. From our analysis, we conclude that it is possible and useful to infer birth radii for the LMC and other external galaxies that satisfy the above criteria. △ Less

Submitted 12 December, 2023; originally announced December 2023.

Comments: 8 pages, 8 figures. Missing citations welcome

arXiv:2312.03469 [pdf, other]

Estimation of line-of-sight velocities of individual galaxies using neural networks I. Modelling redshift-space distortions at large scales

Authors: Hongxiang Chen, Jie Wang, Tianxiang Mao, Juntao Ma, Yuxi Meng, Baojiu Li, Yan-Chuan Cai, Mark Neyrinck, Bridget Falck, Alexander S. Szalay

Abstract: We present a scheme based on artificial neural networks (ANN) to estimate the line-of-sight velocities of individual galaxies from an observed redshift-space galaxy distribution. We find an estimate of the peculiar velocity at a galaxy based on galaxy counts and barycenters in shells around it. By training the network with environmental characteristics, such as the total mass and mass center withi… ▽ More We present a scheme based on artificial neural networks (ANN) to estimate the line-of-sight velocities of individual galaxies from an observed redshift-space galaxy distribution. We find an estimate of the peculiar velocity at a galaxy based on galaxy counts and barycenters in shells around it. By training the network with environmental characteristics, such as the total mass and mass center within each shell surrounding every galaxy in redshift space, our ANN model can accurately predict the line-of-sight velocity of each individual galaxy. When this velocity is used to eliminate the RSD effect, the two-point correlation function (TPCF) in real space can be recovered with an accuracy better than 1% at $s$ > 8 $h^{-1}\mathrm{Mpc}$, and 4% on all scales compared to ground truth. The real-space power spectrum can be recovered within 3% on $k$< 0.5 $\mathrm{Mpc}^{-1}h$, and less than 5% for all $k$ modes. The quadrupole moment of the TPCF or power spectrum is almost zero down to $s$ = 10 $h^{-1}\mathrm{Mpc}$ or all $k$ modes, indicating an effective correction of the spatial anisotropy caused by the RSD effect. We demonstrate that on large scales, without additional training with new data, our network is adaptable to different galaxy formation models, different cosmological models, and mock galaxy samples at high redshifts and high biases, achieving less than 10% error for scales greater than 15 $h^{-1}\mathrm{Mpc}$. As it is sensitive to large-scale densities, it does not manage to remove Fingers of God in large clusters, but works remarkably well at recovering real-space galaxy positions elsewhere. Our scheme provides a novel way to predict the peculiar velocity of individual galaxies, to eliminate the RSD effect directly in future large galaxy surveys, and to reconstruct the 3-D cosmic velocity field accurately. △ Less

Submitted 8 July, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

Comments: 15 pages,15 figures, accepted for publication in MNRAS

arXiv:2312.02614 [pdf, other]

Prompt Optimization via Adversarial In-Context Learning

Authors: Xuan Long Do, Yiran Zhao, Hannah Brown, Yuxi Xie, James Xu Zhao, Nancy F. Chen, Kenji Kawaguchi, Michael Shieh, Junxian He

Abstract: We propose a new method, Adversarial In-Context Learning (adv-ICL), to optimize prompt for in-context learning (ICL) by employing one LLM as a generator, another as a discriminator, and a third as a prompt modifier. As in traditional adversarial learning, adv-ICL is implemented as a two-player game between the generator and discriminator, where the generator tries to generate realistic enough outp… ▽ More We propose a new method, Adversarial In-Context Learning (adv-ICL), to optimize prompt for in-context learning (ICL) by employing one LLM as a generator, another as a discriminator, and a third as a prompt modifier. As in traditional adversarial learning, adv-ICL is implemented as a two-player game between the generator and discriminator, where the generator tries to generate realistic enough output to fool the discriminator. In each round, given an input prefixed by task instructions and several exemplars, the generator produces an output. The discriminator is then tasked with classifying the generator input-output pair as model-generated or real data. Based on the discriminator loss, the prompt modifier proposes possible edits to the generator and discriminator prompts, and the edits that most improve the adversarial loss are selected. We show that adv-ICL results in significant improvements over state-of-the-art prompt optimization techniques for both open and closed-source models on 11 generation and classification tasks including summarization, arithmetic reasoning, machine translation, data-to-text generation, and the MMLU and big-bench hard benchmarks. In addition, because our method uses pre-trained models and updates only prompts rather than model parameters, it is computationally efficient, easy to extend to any LLM and task, and effective in low-resource settings. △ Less

Submitted 22 June, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

Comments: ACL 2024

arXiv:2311.18148 [pdf]

A universal optical modulator for synthetic topologically tuneable structured matter

Authors: Chao He, Binguo Chen, Zipei Song, Zimo Zhao, Yifei Ma, Honghui He, Lin Luo, Tade Marozsak, An Wang, Rui Xu, Peixiang Huang, Xuke Qiu, Bangshan Sun, Jiahe Cui, Yuxi Cai, Yun Zhang, Patrick Salter, Julian AJ Fells, Ben Dai, Shaoxiong Liu, Limei Guo, Hui Ma, Steve J Elston, Qiwen Zhan, Chengwei Qiu , et al. (3 additional authors not shown)

Abstract: Topologically structured matter, such as metasurfaces and metamaterials, have given rise to impressive photonic functionality, fuelling diverse applications from microscopy and holography to encryption and communication. Presently these solutions are limited by their largely static nature and preset functionality, hindering applications that demand dynamic photonic systems with reconfigurable topo… ▽ More Topologically structured matter, such as metasurfaces and metamaterials, have given rise to impressive photonic functionality, fuelling diverse applications from microscopy and holography to encryption and communication. Presently these solutions are limited by their largely static nature and preset functionality, hindering applications that demand dynamic photonic systems with reconfigurable topologies. Here we demonstrate a universal optical modulator that implements topologically tuneable structured matter as virtual pixels derived from cascading low functionality tuneable devices, altering the paradigm of phase and amplitude control to encompass arbitrary spatially varying retarders in a synthetic structured matter device. Our approach opens unprecedented functionality that is user-defined with high flexibility, allowing our synthetic structured matter to act as an information carrier, beam generator, analyser, and corrector, opening an exciting path to tuneable topologies of light and matter. △ Less

Submitted 29 November, 2023; originally announced November 2023.

arXiv:2311.13409 [pdf, other]

CompenHR: Efficient Full Compensation for High-resolution Projector

Authors: Yuxi Wang, Haibin Ling, Bingyao Huang

Abstract: Full projector compensation is a practical task of projector-camera systems. It aims to find a projector input image, named compensation image, such that when projected it cancels the geometric and photometric distortions due to the physical environment and hardware. State-of-the-art methods use deep learning to address this problem and show promising performance for low-resolution setups. However… ▽ More Full projector compensation is a practical task of projector-camera systems. It aims to find a projector input image, named compensation image, such that when projected it cancels the geometric and photometric distortions due to the physical environment and hardware. State-of-the-art methods use deep learning to address this problem and show promising performance for low-resolution setups. However, directly applying deep learning to high-resolution setups is impractical due to the long training time and high memory cost. To address this issue, this paper proposes a practical full compensation solution. Firstly, we design an attention-based grid refinement network to improve geometric correction quality. Secondly, we integrate a novel sampling scheme into an end-to-end compensation network to alleviate computation and introduce attention blocks to preserve key features. Finally, we construct a benchmark dataset for high-resolution projector full compensation. In experiments, our method demonstrates clear advantages in both efficiency and quality. △ Less

Submitted 28 November, 2023; v1 submitted 22 November, 2023; originally announced November 2023.

arXiv:2311.13046 [pdf, ps, other]

Do we listen to what we are told? An empirical study on human behaviour during the COVID-19 pandemic: neural networks vs. regression analysis

Authors: Yuxi Heluo, Kexin Wang, Charles W. Robson

Abstract: In this work, we contribute the first visual open-source empirical study on human behaviour during the COVID-19 pandemic, in order to investigate how compliant a general population is to mask-wearing-related public-health policy. Object-detection-based convolutional neural networks, regression analysis and multilayer perceptrons are combined to analyse visual data of the Viennese public during 202… ▽ More In this work, we contribute the first visual open-source empirical study on human behaviour during the COVID-19 pandemic, in order to investigate how compliant a general population is to mask-wearing-related public-health policy. Object-detection-based convolutional neural networks, regression analysis and multilayer perceptrons are combined to analyse visual data of the Viennese public during 2020. We find that mask-wearing-related government regulations and public-transport announcements encouraged correct mask-wearing-behaviours during the COVID-19 pandemic. Importantly, changes in announcement and regulation contents led to heterogeneous effects on people's behaviour. Comparing the predictive power of regression analysis and neural networks, we demonstrate that the latter produces more accurate predictions of population reactions during the COVID-19 pandemic. Our use of regression modelling also allows us to unearth possible causal pathways underlying societal behaviour. Since our findings highlight the importance of appropriate communication contents, our results will facilitate more effective non-pharmaceutical interventions to be developed in future. Adding to the literature, we demonstrate that regression modelling and neural networks are not mutually exclusive but instead complement each other. △ Less

Submitted 21 November, 2023; originally announced November 2023.

arXiv:2311.10937 [pdf, other]

Bridging Data-Driven and Knowledge-Driven Approaches for Safety-Critical Scenario Generation in Automated Vehicle Validation

Authors: Kunkun Hao, Lu Liu, Wen Cui, Jianxing Zhang, Songyang Yan, Yuxi Pan, Zijiang Yang

Abstract: Automated driving vehicles~(ADV) promise to enhance driving efficiency and safety, yet they face intricate challenges in safety-critical scenarios. As a result, validating ADV within generated safety-critical scenarios is essential for both development and performance evaluations. This paper investigates the complexities of employing two major scenario-generation solutions: data-driven and knowled… ▽ More Automated driving vehicles~(ADV) promise to enhance driving efficiency and safety, yet they face intricate challenges in safety-critical scenarios. As a result, validating ADV within generated safety-critical scenarios is essential for both development and performance evaluations. This paper investigates the complexities of employing two major scenario-generation solutions: data-driven and knowledge-driven methods. Data-driven methods derive scenarios from recorded datasets, efficiently generating scenarios by altering the existing behavior or trajectories of traffic participants but often falling short in considering ADV perception; knowledge-driven methods provide effective coverage through expert-designed rules, but they may lead to inefficiency in generating safety-critical scenarios within that coverage. To overcome these challenges, we introduce BridgeGen, a safety-critical scenario generation framework, designed to bridge the benefits of both methodologies. Specifically, by utilizing ontology-based techniques, BridgeGen models the five scenario layers in the operational design domain (ODD) from knowledge-driven methods, ensuring broad coverage, and incorporating data-driven strategies to efficiently generate safety-critical scenarios. An optimized scenario generation toolkit is developed within BridgeGen. This expedites the crafting of safety-critical scenarios through a combination of traditional optimization and reinforcement learning schemes. Extensive experiments conducted using Carla simulator demonstrate the effectiveness of BridgeGen in generating diverse safety-critical scenarios. △ Less

Submitted 17 November, 2023; originally announced November 2023.

arXiv:2311.10813 [pdf, other]

A Language Agent for Autonomous Driving

Authors: Jiageng Mao, Junjie Ye, Yuxi Qian, Marco Pavone, Yue Wang

Abstract: Human-level driving is an ultimate goal of autonomous driving. Conventional approaches formulate autonomous driving as a perception-prediction-planning framework, yet their systems do not capitalize on the inherent reasoning ability and experiential knowledge of humans. In this paper, we propose a fundamental paradigm shift from current pipelines, exploiting Large Language Models (LLMs) as a cogni… ▽ More Human-level driving is an ultimate goal of autonomous driving. Conventional approaches formulate autonomous driving as a perception-prediction-planning framework, yet their systems do not capitalize on the inherent reasoning ability and experiential knowledge of humans. In this paper, we propose a fundamental paradigm shift from current pipelines, exploiting Large Language Models (LLMs) as a cognitive agent to integrate human-like intelligence into autonomous driving systems. Our approach, termed Agent-Driver, transforms the traditional autonomous driving pipeline by introducing a versatile tool library accessible via function calls, a cognitive memory of common sense and experiential knowledge for decision-making, and a reasoning engine capable of chain-of-thought reasoning, task planning, motion planning, and self-reflection. Powered by LLMs, our Agent-Driver is endowed with intuitive common sense and robust reasoning capabilities, thus enabling a more nuanced, human-like approach to autonomous driving. We evaluate our approach on the large-scale nuScenes benchmark, and extensive experiments substantiate that our Agent-Driver significantly outperforms the state-of-the-art driving methods by a large margin. Our approach also demonstrates superior interpretability and few-shot learning ability to these methods. Code will be released. △ Less

Submitted 27 November, 2023; v1 submitted 17 November, 2023; originally announced November 2023.

Comments: Project Page: https://usc-gvl.github.io/Agent-Driver/

arXiv:2311.07582 [pdf]

Evaluating the Potential of Leading Large Language Models in Reasoning Biology Questions

Authors: Xinyu Gong, Jason Holmes, Yiwei Li, Zhengliang Liu, Qi Gan, Zihao Wu, Jianli Zhang, Yusong Zou, Yuxi Teng, Tian Jiang, Hongtu Zhu, Wei Liu, Tianming Liu, Yajun Yan

Abstract: Recent advances in Large Language Models (LLMs) have presented new opportunities for integrating Artificial General Intelligence (AGI) into biological research and education. This study evaluated the capabilities of leading LLMs, including GPT-4, GPT-3.5, PaLM2, Claude2, and SenseNova, in answering conceptual biology questions. The models were tested on a 108-question multiple-choice exam covering… ▽ More Recent advances in Large Language Models (LLMs) have presented new opportunities for integrating Artificial General Intelligence (AGI) into biological research and education. This study evaluated the capabilities of leading LLMs, including GPT-4, GPT-3.5, PaLM2, Claude2, and SenseNova, in answering conceptual biology questions. The models were tested on a 108-question multiple-choice exam covering biology topics in molecular biology, biological techniques, metabolic engineering, and synthetic biology. Among the models, GPT-4 achieved the highest average score of 90 and demonstrated the greatest consistency across trials with different prompts. The results indicated GPT-4's proficiency in logical reasoning and its potential to aid biology research through capabilities like data analysis, hypothesis generation, and knowledge integration. However, further development and validation are still required before the promise of LLMs in accelerating biological discovery can be realized. △ Less

Submitted 4 November, 2023; originally announced November 2023.

arXiv:2311.06151 [pdf]

doi 10.1038/s41467-023-42630-7

Detection of magnetospheric ion drift patterns at Mars

Authors: Chi Zhang, Hans Nilsson, Yusuke Ebihara, Masatoshi Yamauchi, Moa Persson, Zhao** Rong, Jun Zhong, Chuanfei Dong, Yuxi Chen, Xuzhi Zhou, Yixin Sun, Yuki Harada, Jasper Halekas, Shaosui Xu, Yoshifumi Futaana, Zhen Shi, Chong**g Yuan, Xiaotong Yun, Song Fu, Jiawei Gao, Mats Holmström, Yong Wei, Stas Barabash

Abstract: Mars lacks a global magnetic field, and instead possesses small-scale crustal magnetic fields, making its magnetic environment fundamentally different from intrinsic magnetospheres like those of Earth or Saturn. Here we report the discovery of magnetospheric ion drift patterns, typical of intrinsic magnetospheres, at Mars usingmeasurements fromMarsAtmosphere and Volatile EvolutioNmission. Specific… ▽ More Mars lacks a global magnetic field, and instead possesses small-scale crustal magnetic fields, making its magnetic environment fundamentally different from intrinsic magnetospheres like those of Earth or Saturn. Here we report the discovery of magnetospheric ion drift patterns, typical of intrinsic magnetospheres, at Mars usingmeasurements fromMarsAtmosphere and Volatile EvolutioNmission. Specifically, we observewedge-like dispersion structures of hydrogen ions exhibiting butterfly-shaped distributions within the Martian crustal fields, a feature previously observed only in planetary-scale intrinsic magnetospheres. These dispersed structures are the results of driftmotions that fundamentally resemble those observed in intrinsic magnetospheres. Our findings indicate that the Martian magnetosphere embodies an intermediate case where both the unmagnetized and magnetized ion behaviors could be observed because of the wide range of strengths and spatial scales of the crustal magnetic fields around Mars. △ Less

Submitted 10 November, 2023; originally announced November 2023.

Comments: 10 pages, 6 figures

arXiv:2310.20329 [pdf, other]

InstructCoder: Instruction Tuning Large Language Models for Code Editing

Authors: Kaixin Li, Qisheng Hu, Xu Zhao, Hui Chen, Yuxi Xie, Tiedong Liu, Qizhe Xie, Junxian He

Abstract: Code editing encompasses a variety of pragmatic tasks that developers deal with daily. Despite its relevance and practical usefulness, automatic code editing remains an underexplored area in the evolution of deep learning models, partly due to data scarcity. In this work, we explore the use of Large Language Models (LLMs) to edit code based on user instructions. Evaluated on a novel human-written… ▽ More Code editing encompasses a variety of pragmatic tasks that developers deal with daily. Despite its relevance and practical usefulness, automatic code editing remains an underexplored area in the evolution of deep learning models, partly due to data scarcity. In this work, we explore the use of Large Language Models (LLMs) to edit code based on user instructions. Evaluated on a novel human-written execution-based benchmark dubbed EditEval, we found current models often struggle to fulfill the instructions. In light of this, we contribute InstructCoder, the first instruction-tuning dataset designed to adapt LLMs for general-purpose code editing, containing high-diversity code-editing tasks such as comment insertion, code optimization, and code refactoring. It consists of over 114,000 instruction-input-output triplets and covers multiple distinct code editing scenarios. The collection process starts with filtered commit data sourced from GitHub Python repositories as seeds. Subsequently, the dataset is systematically expanded through an iterative process, where both seed and generated tasks are used to prompt ChatGPT for more data. Our findings reveal that open-source LLMs fine-tuned on InstructCoder can significantly enhance the accuracy of code edits, exhibiting superior code-editing performance matching advanced proprietary LLMs. The datasets and the source code are publicly available at https://github.com/qishenghu/CodeInstruct. △ Less

Submitted 28 February, 2024; v1 submitted 31 October, 2023; originally announced October 2023.

arXiv:2310.14990 [pdf, other]

In this Day and Age: An Empirical Gyrochronology Relation for Partially and Fully Convective Single Field Stars

Authors: Yuxi Lu, Ruth Angus, Daniel Foreman-Mackey, Soichiro Hattori

Abstract: Gyrochronology, the field of age-dating stars using mainly their rotation periods and masses, is ideal for inferring the ages of individual main-sequence stars. However, due to the lack of physical understanding of the complex magnetic fields in stars, gyrochronology relies heavily on empirical calibrations that require consistent and reliable stellar age measurements across a wide range of period… ▽ More Gyrochronology, the field of age-dating stars using mainly their rotation periods and masses, is ideal for inferring the ages of individual main-sequence stars. However, due to the lack of physical understanding of the complex magnetic fields in stars, gyrochronology relies heavily on empirical calibrations that require consistent and reliable stellar age measurements across a wide range of periods and masses. In this paper, we obtain a sample of consistent ages using the gyro-kinematic age-dating method, a technique to calculate the kinematics ages of stars. Using a Gaussian Process model conditioned on ages from this sample (~ 1 - 14 Gyr) and known clusters (0.67 - 3.8 Gyr), we calibrate the first empirical gyrochronology relation that is capable of inferring ages for single, main-sequence stars between 0.67 Gyr to 14 Gyr. Cross-validating and testing results suggest our model can infer cluster and asteroseismic ages with an average uncertainty of just over 1 Gyr. With this model, we obtain gyrochronology ages for ~ 100,000 stars within 1.5 kpc of the Sun with period measurements from Kepler and ZTF, and 384 unique planet host stars. △ Less

Submitted 23 October, 2023; originally announced October 2023.

Comments: Submitted to AJ. Missing citations welcome

arXiv:2310.14299 [pdf, other]

Revealing the Chemical Structure of the Magellanic Clouds with APOGEE. III. Abundance Gradients of the Small Magellanic Cloud

Authors: Joshua T. Povick, David L. Nidever, Pol Massana, Steven R. Majewski, Yuxi, Lu, Maria-Rosa L. Cioni, Doug Geisler, Szabolcs Mészáros, Christian Nitschelm, Andrés Almeida, Richard R. Lane, Penélope Longa-Peña

Abstract: We determine radial- and age-abundance gradients of the Small Magellanic Cloud (SMC) using spectra of 2,062 red giant branch (RGB) field stars observed by SDSS-IV / APOGEE-2S. With coverage out to $\sim$9 kpc in the SMC, these data taken with the high resolution ($R \sim 22,500$) APOGEE $H$-band spectrograph afford the opportunity to measure extensive radial gradients for as many as 24 abundance r… ▽ More We determine radial- and age-abundance gradients of the Small Magellanic Cloud (SMC) using spectra of 2,062 red giant branch (RGB) field stars observed by SDSS-IV / APOGEE-2S. With coverage out to $\sim$9 kpc in the SMC, these data taken with the high resolution ($R \sim 22,500$) APOGEE $H$-band spectrograph afford the opportunity to measure extensive radial gradients for as many as 24 abundance ratios. The SMC is found to have an overall metallicity gradient of $-$0.0546 $\pm$ 0.0043 dex/kpc. Ages are calculated for every star to explore the evolution of the different abundance gradients. As a function of age, many of the gradients show a feature 3.66--5.58 Gyr ago, which is especially prominent in the [X/H] gradients. Initially many gradients flatten until about $\sim$5.58 Gyr ago, but then steepen in more recent times. We previously detected similar evolutionary patterns in the Large Magellanic Cloud (LMC) which are attributed to a recent interaction between the LMC and SMC. It is inferred that the feature in the SMC gradients was caused by the same interaction. The age-[X/Fe] trends, which track average [X/Fe] over time, are flat, demonstrating a slow enrichment history for the SMC. When comparing the SMC gradients to the LMC and MW, normalized to disk scale length ($R_\text{d}$), the [X/Fe] and [X/Mg] gradients are similar, but there is a dichotomy between the dwarfs and the Milky Way (MW) for the [X/H] gradients. The median MW [X/H] gradient around $-$0.125 dex/$R_\text{d}$ whilst the Clouds have gradients of about $-$0.075 dex/$R_\text{d}$. △ Less

Submitted 22 October, 2023; originally announced October 2023.

Comments: 27 pages, 22 figures, and 11 tables

arXiv:2310.11022 [pdf, other]

Compatible Transformer for Irregularly Sampled Multivariate Time Series

Authors: Yuxi Wei, Juntong Peng, Tong He, Chenxin Xu, Jian Zhang, Shirui Pan, Siheng Chen

Abstract: To analyze multivariate time series, most previous methods assume regular subsampling of time series, where the interval between adjacent measurements and the number of samples remain unchanged. Practically, data collection systems could produce irregularly sampled time series due to sensor failures and interventions. However, existing methods designed for regularly sampled multivariate time serie… ▽ More To analyze multivariate time series, most previous methods assume regular subsampling of time series, where the interval between adjacent measurements and the number of samples remain unchanged. Practically, data collection systems could produce irregularly sampled time series due to sensor failures and interventions. However, existing methods designed for regularly sampled multivariate time series cannot directly handle irregularity owing to misalignment along both temporal and variate dimensions. To fill this gap, we propose Compatible Transformer (CoFormer), a transformer-based encoder to achieve comprehensive temporal-interaction feature learning for each individual sample in irregular multivariate time series. In CoFormer, we view each sample as a unique variate-time point and leverage intra-variate/inter-variate attentions to learn sample-wise temporal/interaction features based on intra-variate/inter-variate neighbors. With CoFormer as the core, we can analyze irregularly sampled multivariate time series for many downstream tasks, including classification and prediction. We conduct extensive experiments on 3 real-world datasets and validate that the proposed CoFormer significantly and consistently outperforms existing methods. △ Less

Submitted 17 October, 2023; originally announced October 2023.

Comments: Accepted at the IEEE International Conference on Data Mining (ICDM) 2023 as short paper

arXiv:2310.01415 [pdf, other]

GPT-Driver: Learning to Drive with GPT

Authors: Jiageng Mao, Yuxi Qian, Junjie Ye, Hang Zhao, Yue Wang

Abstract: We present a simple yet effective approach that can transform the OpenAI GPT-3.5 model into a reliable motion planner for autonomous vehicles. Motion planning is a core challenge in autonomous driving, aiming to plan a driving trajectory that is safe and comfortable. Existing motion planners predominantly leverage heuristic methods to forecast driving trajectories, yet these approaches demonstrate… ▽ More We present a simple yet effective approach that can transform the OpenAI GPT-3.5 model into a reliable motion planner for autonomous vehicles. Motion planning is a core challenge in autonomous driving, aiming to plan a driving trajectory that is safe and comfortable. Existing motion planners predominantly leverage heuristic methods to forecast driving trajectories, yet these approaches demonstrate insufficient generalization capabilities in the face of novel and unseen driving scenarios. In this paper, we propose a novel approach to motion planning that capitalizes on the strong reasoning capabilities and generalization potential inherent to Large Language Models (LLMs). The fundamental insight of our approach is the reformulation of motion planning as a language modeling problem, a perspective not previously explored. Specifically, we represent the planner inputs and outputs as language tokens, and leverage the LLM to generate driving trajectories through a language description of coordinate positions. Furthermore, we propose a novel prompting-reasoning-finetuning strategy to stimulate the numerical reasoning potential of the LLM. With this strategy, the LLM can describe highly precise trajectory coordinates and also its internal decision-making process in natural language. We evaluate our approach on the large-scale nuScenes dataset, and extensive experiments substantiate the effectiveness, generalization ability, and interpretability of our GPT-based motion planner. Code is now available at https://github.com/PointsCoder/GPT-Driver. △ Less

Submitted 5 December, 2023; v1 submitted 2 October, 2023; originally announced October 2023.

Comments: NeurIPS 2023 Foundation Models for Decision Making Workshop

arXiv:2309.16940 [pdf, other]

Asynchrony-Robust Collaborative Perception via Bird's Eye View Flow

Authors: Sizhe Wei, Yuxi Wei, Yue Hu, Yifan Lu, Yiqi Zhong, Siheng Chen, Ya Zhang

Abstract: Collaborative perception can substantially boost each agent's perception ability by facilitating communication among multiple agents. However, temporal asynchrony among agents is inevitable in the real world due to communication delays, interruptions, and clock misalignments. This issue causes information mismatch during multi-agent fusion, seriously shaking the foundation of collaboration. To add… ▽ More Collaborative perception can substantially boost each agent's perception ability by facilitating communication among multiple agents. However, temporal asynchrony among agents is inevitable in the real world due to communication delays, interruptions, and clock misalignments. This issue causes information mismatch during multi-agent fusion, seriously shaking the foundation of collaboration. To address this issue, we propose CoBEVFlow, an asynchrony-robust collaborative perception system based on bird's eye view (BEV) flow. The key intuition of CoBEVFlow is to compensate motions to align asynchronous collaboration messages sent by multiple agents. To model the motion in a scene, we propose BEV flow, which is a collection of the motion vector corresponding to each spatial location. Based on BEV flow, asynchronous perceptual features can be reassigned to appropriate positions, mitigating the impact of asynchrony. CoBEVFlow has two advantages: (i) CoBEVFlow can handle asynchronous collaboration messages sent at irregular, continuous time stamps without discretization; and (ii) with BEV flow, CoBEVFlow only transports the original perceptual features, instead of generating new perceptual features, avoiding additional noises. To validate CoBEVFlow's efficacy, we create IRregular V2V(IRV2V), the first synthetic collaborative perception dataset with various temporal asynchronies that simulate different real-world scenarios. Extensive experiments conducted on both IRV2V and the real-world dataset DAIR-V2X show that CoBEVFlow consistently outperforms other baselines and is robust in extremely asynchronous settings. The code is available at https://github.com/MediaBrain-SJTU/CoBEVFlow. △ Less

Submitted 8 October, 2023; v1 submitted 28 September, 2023; originally announced September 2023.

Comments: 16 pages, 9 figures. Accepted by NeurIPS 2023

arXiv:2309.14241 [pdf, other]

Informative Data Mining for One-Shot Cross-Domain Semantic Segmentation

Authors: Yuxi Wang, Jian Liang, Jun Xiao, Shuqi Mei, Yuran Yang, Zhaoxiang Zhang

Abstract: Contemporary domain adaptation offers a practical solution for achieving cross-domain transfer of semantic segmentation between labeled source data and unlabeled target data. These solutions have gained significant popularity; however, they require the model to be retrained when the test environment changes. This can result in unbearable costs in certain applications due to the time-consuming trai… ▽ More Contemporary domain adaptation offers a practical solution for achieving cross-domain transfer of semantic segmentation between labeled source data and unlabeled target data. These solutions have gained significant popularity; however, they require the model to be retrained when the test environment changes. This can result in unbearable costs in certain applications due to the time-consuming training process and concerns regarding data privacy. One-shot domain adaptation methods attempt to overcome these challenges by transferring the pre-trained source model to the target domain using only one target data. Despite this, the referring style transfer module still faces issues with computation cost and over-fitting problems. To address this problem, we propose a novel framework called Informative Data Mining (IDM) that enables efficient one-shot domain adaptation for semantic segmentation. Specifically, IDM provides an uncertainty-based selection criterion to identify the most informative samples, which facilitates quick adaptation and reduces redundant training. We then perform a model adaptation method using these selected samples, which includes patch-wise mixing and prototype-based information maximization to update the model. This approach effectively enhances adaptation and mitigates the overfitting problem. In general, we provide empirical evidence of the effectiveness and efficiency of IDM. Our approach outperforms existing methods and achieves a new state-of-the-art one-shot performance of 56.7\%/55.4\% on the GTA5/SYNTHIA to Cityscapes adaptation tasks, respectively. The code will be released at \url{https://github.com/yxiwang/IDM}. △ Less

Submitted 25 September, 2023; originally announced September 2023.

Comments: Accepted by ICCV 2023

arXiv:2309.12503 [pdf, other]

Revealing the Chemical Structure of the Magellanic Clouds with APOGEE. II. Abundance Gradients of the Large Magellanic Cloud

Authors: Joshua T. Povick, David L. Nidever, Steven R. Majewski, Doug Geisler, Maria-Rosa L. Cioni, Yuxi, Lu, Ricardo Muñoz, Guy S. Stringfellow, Andrés Almeida, Penélope Longa-Peña, Richard R. Lane, Alexandre Roman-Lopes

Abstract: We present the abundance gradients of the Large Magellanic Cloud (LMC) for 25 elemental abundance ratios and their respective temporal evolution as well as age-[X/Fe] trends using 6130 LMC field red giant branch (RGB) stars observed by SDSS-IV / APOGEE-2S. APOGEE is a high resolution ($R$ $\sim$22,500) $H$-band spectroscopic survey that gathered data on the LMC with broad radial and azimuthal cove… ▽ More We present the abundance gradients of the Large Magellanic Cloud (LMC) for 25 elemental abundance ratios and their respective temporal evolution as well as age-[X/Fe] trends using 6130 LMC field red giant branch (RGB) stars observed by SDSS-IV / APOGEE-2S. APOGEE is a high resolution ($R$ $\sim$22,500) $H$-band spectroscopic survey that gathered data on the LMC with broad radial and azimuthal coverage out to $\sim$10\degr. The calculated overall metallicity gradient of the LMC with no age binning is $-$0.0380 $\pm$ 0.0022 dex/kpc. We also find that many of the abundance gradients show a U-shaped trend as functions of age. This trend is marked by a flattening of the gradient but then a general steepening at more recent times. The extreme point at which all these gradients (with the U-shaped trend) begin to steepen is $\gtrsim$2 Gyr ago. In addition, some of the age-[X/Fe] trends show an increase starting a few Gyr before the extreme point in the gradient evolutions. A subset of the age-[X/Fe] trends also show maxima concurrent with the gradients' extreme points, further pinpointing a major event in the history of the LMC $\sim$2 Gyr ago. This time frame is consistent with a previously proposed interaction between the Magellanic Clouds suggesting that this is most likely the cause of the distinct trend in the gradients and age-[X/Fe] trends. △ Less

Submitted 21 September, 2023; originally announced September 2023.

Comments: 25 pages, 19 figures, and 10 tables

arXiv:2309.09310 [pdf, other]

UGC: Unified GAN Compression for Efficient Image-to-Image Translation

Authors: Yuxi Ren, Jie Wu, Peng Zhang, Manlin Zhang, Xuefeng Xiao, Qian He, Rui Wang, Min Zheng, Xin Pan

Abstract: Recent years have witnessed the prevailing progress of Generative Adversarial Networks (GANs) in image-to-image translation. However, the success of these GAN models hinges on ponderous computational costs and labor-expensive training data. Current efficient GAN learning techniques often fall into two orthogonal aspects: i) model slimming via reduced calculation costs; ii)data/label-efficient lear… ▽ More Recent years have witnessed the prevailing progress of Generative Adversarial Networks (GANs) in image-to-image translation. However, the success of these GAN models hinges on ponderous computational costs and labor-expensive training data. Current efficient GAN learning techniques often fall into two orthogonal aspects: i) model slimming via reduced calculation costs; ii)data/label-efficient learning with fewer training data/labels. To combine the best of both worlds, we propose a new learning paradigm, Unified GAN Compression (UGC), with a unified optimization objective to seamlessly prompt the synergy of model-efficient and label-efficient learning. UGC sets up semi-supervised-driven network architecture search and adaptive online semi-supervised distillation stages sequentially, which formulates a heterogeneous mutual learning scheme to obtain an architecture-flexible, label-efficient, and performance-excellent model. △ Less

Submitted 17 September, 2023; originally announced September 2023.

arXiv:2309.09292 [pdf, other]

An Auto-Parallelizer for Distributed Computing in Haskell

Authors: Yuxi Long, Shiyou Wu, Yingjie Xu

Abstract: One of the main challenges in distributed computing is building interfaces and APIs that allow programmers with limited background in distributed systems to write scalable, performant, and fault-tolerant applications on large clusters. In this demonstration, we designed and implemented a Haskell auto-parallelizer with a simple yet powerful interface by taking advantage of the default purity of Has… ▽ More One of the main challenges in distributed computing is building interfaces and APIs that allow programmers with limited background in distributed systems to write scalable, performant, and fault-tolerant applications on large clusters. In this demonstration, we designed and implemented a Haskell auto-parallelizer with a simple yet powerful interface by taking advantage of the default purity of Haskell functions. Finally, we benchmarked our implementation on a set of examples to illustrate the potential for future work in this direction. △ Less

Submitted 17 September, 2023; originally announced September 2023.

Comments: 2 pages excluding title page and reference page. 2 figures. This work was submitted to the 28th ACM SIGPLAN International Conference on Functional Programming, Haskell Symposium. This work was accepted for oral presentation and was presented on Sep 8, 2023

arXiv:2309.03893 [pdf, other]

DiffusionEngine: Diffusion Model is Scalable Data Engine for Object Detection

Authors: Manlin Zhang, Jie Wu, Yuxi Ren, Ming Li, Jie Qin, Xuefeng Xiao, Wei Liu, Rui Wang, Min Zheng, Andy J. Ma

Abstract: Data is the cornerstone of deep learning. This paper reveals that the recently developed Diffusion Model is a scalable data engine for object detection. Existing methods for scaling up detection-oriented data often require manual collection or generative models to obtain target images, followed by data augmentation and labeling to produce training pairs, which are costly, complex, or lacking diver… ▽ More Data is the cornerstone of deep learning. This paper reveals that the recently developed Diffusion Model is a scalable data engine for object detection. Existing methods for scaling up detection-oriented data often require manual collection or generative models to obtain target images, followed by data augmentation and labeling to produce training pairs, which are costly, complex, or lacking diversity. To address these issues, we presentDiffusionEngine (DE), a data scaling-up engine that provides high-quality detection-oriented training pairs in a single stage. DE consists of a pre-trained diffusion model and an effective Detection-Adapter, contributing to generating scalable, diverse and generalizable detection data in a plug-and-play manner. Detection-Adapter is learned to align the implicit semantic and location knowledge in off-the-shelf diffusion models with detection-aware signals to make better bounding-box predictions. Additionally, we contribute two datasets, i.e., COCO-DE and VOC-DE, to scale up existing detection benchmarks for facilitating follow-up research. Extensive experiments demonstrate that data scaling-up via DE can achieve significant improvements in diverse scenarios, such as various detection algorithms, self-supervised pre-training, data-sparse, label-scarce, cross-domain, and semi-supervised learning. For example, when using DE with a DINO-based adapter to scale up data, mAP is improved by 3.1% on COCO, 7.6% on VOC, and 11.5% on Clipart. △ Less

Submitted 7 September, 2023; originally announced September 2023.

Comments: Code and Models are publicly available. Project Page: https://mettyz.github.io/DiffusionEngine

arXiv:2309.03576 [pdf, other]

DropPos: Pre-Training Vision Transformers by Reconstructing Dropped Positions

Authors: Haochen Wang, Junsong Fan, Yuxi Wang, Kaiyou Song, Tong Wang, Zhaoxiang Zhang

Abstract: As it is empirically observed that Vision Transformers (ViTs) are quite insensitive to the order of input tokens, the need for an appropriate self-supervised pretext task that enhances the location awareness of ViTs is becoming evident. To address this, we present DropPos, a novel pretext task designed to reconstruct Dropped Positions. The formulation of DropPos is simple: we first drop a large ra… ▽ More As it is empirically observed that Vision Transformers (ViTs) are quite insensitive to the order of input tokens, the need for an appropriate self-supervised pretext task that enhances the location awareness of ViTs is becoming evident. To address this, we present DropPos, a novel pretext task designed to reconstruct Dropped Positions. The formulation of DropPos is simple: we first drop a large random subset of positional embeddings and then the model classifies the actual position for each non-overlap** patch among all possible positions solely based on their visual appearance. To avoid trivial solutions, we increase the difficulty of this task by kee** only a subset of patches visible. Additionally, considering there may be different patches with similar visual appearances, we propose position smoothing and attentive reconstruction strategies to relax this classification problem, since it is not necessary to reconstruct their exact positions in these cases. Empirical evaluations of DropPos show strong capabilities. DropPos outperforms supervised pre-training and achieves competitive results compared with state-of-the-art self-supervised alternatives on a wide range of downstream benchmarks. This suggests that explicitly encouraging spatial reasoning abilities, as DropPos does, indeed contributes to the improved location awareness of ViTs. The code is publicly available at https://github.com/Haochen-Wang409/DropPos. △ Less

Submitted 21 September, 2023; v1 submitted 7 September, 2023; originally announced September 2023.

Comments: Accepted by NeurIPS 2023

arXiv:2309.01748 [pdf]

Cavity-enhanced single photon emission from a single impurity-bound exciton

Authors: Yuxi Jiang, Robert M. Pettit, Nils von den Driesch, Alexander Pawlis, Edo Waks

Abstract: Impurity-bound excitons in ZnSe quantum wells are bright single photon emitters--a crucial element in photonics-based quantum technology. But to achieve the efficiencies required for practical applications, these emitters must be integrated into optical cavities that enhance their radiative properties and far-field emission pattern. In this work, we demonstrate cavity-enhanced emission from a sing… ▽ More Impurity-bound excitons in ZnSe quantum wells are bright single photon emitters--a crucial element in photonics-based quantum technology. But to achieve the efficiencies required for practical applications, these emitters must be integrated into optical cavities that enhance their radiative properties and far-field emission pattern. In this work, we demonstrate cavity-enhanced emission from a single impurity-bound exciton in a ZnSe quantum well. We utilize a bullseye cavity structure optimized to feature a small mode volume and a nearly Gaussian far-field transverse mode that can efficiently couple to an optical fiber. The fabricated device displays emission that is more than an order of magnitude brighter than bulk impurity-bound exciton emitters in the ZnSe quantum well, as-well-as clear anti-bunching, which verifies the single photon emission from the source. Time-resolved photoluminescence spectroscopy reveals a Purcell-enhanced radiative decay process with a Purcell factor of 1.43. This work paves the way towards high efficiency spin-photon interfaces using an impurity-doped II-VI semiconductor coupled to nanophotonics. △ Less

Submitted 4 October, 2023; v1 submitted 4 September, 2023; originally announced September 2023.

Showing 51–100 of 304 results for author: Yuxi