Search | arXiv e-print repository

AutoSplat: Constrained Gaussian Splatting for Autonomous Driving Scene Reconstruction

Authors: Mustafa Khan, Hamidreza Fazlali, Dhruv Sharma, Tongtong Cao, Dongfeng Bai, Yuan Ren, Bingbing Liu

Abstract: Realistic scene reconstruction and view synthesis are essential for advancing autonomous driving systems by simulating safety-critical scenarios. 3D Gaussian Splatting excels in real-time rendering and static scene reconstructions but struggles with modeling driving scenarios due to complex backgrounds, dynamic objects, and sparse views. We propose AutoSplat, a framework employing Gaussian splatti… ▽ More Realistic scene reconstruction and view synthesis are essential for advancing autonomous driving systems by simulating safety-critical scenarios. 3D Gaussian Splatting excels in real-time rendering and static scene reconstructions but struggles with modeling driving scenarios due to complex backgrounds, dynamic objects, and sparse views. We propose AutoSplat, a framework employing Gaussian splatting to achieve highly realistic reconstructions of autonomous driving scenes. By imposing geometric constraints on Gaussians representing the road and sky regions, our method enables multi-view consistent simulation of challenging scenarios including lane changes. Leveraging 3D templates, we introduce a reflected Gaussian consistency constraint to supervise both the visible and unseen side of foreground objects. Moreover, to model the dynamic appearance of foreground objects, we estimate residual spherical harmonics for each foreground Gaussian. Extensive experiments on Pandaset and KITTI demonstrate that AutoSplat outperforms state-of-the-art methods in scene reconstruction and novel view synthesis across diverse driving scenarios. Visit our $\href{https://autosplat.github.io/}{\text{project page}}$. △ Less

Submitted 2 July, 2024; originally announced July 2024.

arXiv:2407.00088 [pdf, other]

T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge

Authors: Jianyu Wei, Shijie Cao, Ting Cao, Lingxiao Ma, Lei Wang, Yanyong Zhang, Mao Yang

Abstract: The deployment of Large Language Models (LLMs) on edge devices is increasingly important to enhance on-device intelligence. Weight quantization is crucial for reducing the memory footprint of LLMs on devices. However, low-bit LLMs necessitate mixed precision matrix multiplication (mpGEMM) of low precision weights and high precision activations during inference. Existing systems, lacking native sup… ▽ More The deployment of Large Language Models (LLMs) on edge devices is increasingly important to enhance on-device intelligence. Weight quantization is crucial for reducing the memory footprint of LLMs on devices. However, low-bit LLMs necessitate mixed precision matrix multiplication (mpGEMM) of low precision weights and high precision activations during inference. Existing systems, lacking native support for mpGEMM, resort to dequantize weights for high precision computation. Such an indirect way can lead to a significant inference overhead. In this paper, we introduce T-MAC, an innovative lookup table(LUT)-based method designed for efficient low-bit LLM (i.e., weight-quantized LLM) inference on CPUs. T-MAC directly supports mpGEMM without dequantization, while simultaneously eliminating multiplications and reducing additions required. Specifically, T-MAC transforms the traditional data-type-centric multiplication to bit-wise table lookup, and enables a unified and scalable mpGEMM solution. Our LUT-based kernels scale linearly to the weight bit-width. Evaluated on low-bit Llama and BitNet models, T-MAC demonstrates up to 4x increase in throughput and 70% reduction in energy consumption compared to llama.cpp. For BitNet-b1.58-3B, T-MAC delivers a token generation throughput of 30 tokens/s with a single core and 71 tokens/s with eight cores on M2-Ultra, and 11 tokens/s on lower-end devices like Raspberry Pi 5, which significantly exceeds the adult average reading speed. T-MAC with LUT-based computing paradigm, paves the way for the practical deployment of low-bit LLMs on resource-constrained edge devices without compromising computational efficiency. The system is open-sourced at https://github.com/microsoft/T-MAC. △ Less

Submitted 25 June, 2024; originally announced July 2024.

arXiv:2406.15539 [pdf, other]

First Measurement of Deeply Virtual Compton Scattering on the Neutron with Detection of the Active Neutron

Authors: CLAS Collaboration, A. Hobart, S. Niccolai, M. Čuić, K. Kumerički, P. Achenbach, J. S. Alvarado, W. R. Armstrong, H. Atac, H. Avakian, L. Baashen, N. A. Baltzell, L. Barion, M. Bashkanov, M. Battaglieri, B. Benkel, F. Benmokhtar, A. Bianconi, A. S. Biselli, S. Boiarinov, M. Bondi, W. A. Booth, F. Bossù, K. -Th. Brinkmann, W. J. Briscoe , et al. (124 additional authors not shown)

Abstract: Measuring Deeply Virtual Compton Scattering on the neutron is one of the necessary steps to understand the structure of the nucleon in terms of Generalized Parton Distributions (GPDs). Neutron targets play a complementary role to transversely polarized proton targets in the determination of the GPD $E$. This poorly known and poorly constrained GPD is essential to obtain the contribution of the qua… ▽ More Measuring Deeply Virtual Compton Scattering on the neutron is one of the necessary steps to understand the structure of the nucleon in terms of Generalized Parton Distributions (GPDs). Neutron targets play a complementary role to transversely polarized proton targets in the determination of the GPD $E$. This poorly known and poorly constrained GPD is essential to obtain the contribution of the quarks' angular momentum to the spin of the nucleon. DVCS on the neutron was measured for the first time selecting the exclusive final state by detecting the neutron, using the Jefferson Lab longitudinally polarized electron beam, with energies up to 10.6 GeV, and the CLAS12 detector. The extracted beam-spin asymmetries, combined with DVCS observables measured on the proton, allow a clean quark-flavor separation of the imaginary parts of the GPDs $H$ and $E$. △ Less

Submitted 25 June, 2024; v1 submitted 21 June, 2024; originally announced June 2024.

Comments: 7 pages, 6 figures

Report number: JLAB-PHY-24-4089

arXiv:2406.09591 [pdf]

Ferromagnetism and Topology of the Higher Flat Band in a Fractional Chern Insulator

Authors: Heonjoon Park, Jiaqi Cai, Eric Anderson, Xiao-Wei Zhang, Xiaoyu Liu, William Holtzmann, Weijie Li, Chong Wang, Chaowei Hu, Yuzhou Zhao, Takashi Taniguchi, Kenji Watanabe, Jihui Yang, David Cobden, Jiun-Haw Chu, Nicolas Regnault, B. Andrei Bernevig, Liang Fu, Ting Cao, Di Xiao, Xiaodong Xu

Abstract: The recent observation of the fractional quantum anomalous Hall effect in moiré fractional Chern insulators (FCI) provides opportunities for investigating zero magnetic field anyons. So far, both experimental and theoretical results suggest that filling > 1/3 FCI states in the first Chern band share features with those of the lowest Landau level (LL). To create the possibility of realizing non-Abe… ▽ More The recent observation of the fractional quantum anomalous Hall effect in moiré fractional Chern insulators (FCI) provides opportunities for investigating zero magnetic field anyons. So far, both experimental and theoretical results suggest that filling > 1/3 FCI states in the first Chern band share features with those of the lowest Landau level (LL). To create the possibility of realizing non-Abelian anyons, one route is to engineer higher flat Chern bands that mimic higher LLs. Here, we investigate the interaction, topology, and ferromagnetism of the second moiré miniband in twisted MoTe2 bilayer (tMoTe2). Around filling factor v = -3, i.e., half-filling of the second miniband, we uncover spontaneous ferromagnetism and an incipient Chern insulator state. By measuring the anomalous Hall effect as a function of twist angle, we find that the Chern numbers (C) of the top two moiré flat bands have opposite sign (C = -+1) at twist angles above 3.1° but the same sign (C = -1) around 2.6°. This observation is consistent with the recently predicted twist-angle dependent band topology, resulting from the competition between moiré ferroelectricity and piezoelectricity. As we increase the magnetic field, only the small twist-angle device (2.6°) experiences a topological phase transition with an emergent C = -2 state. This is attributed to a Zeeman field-induced band crossing between opposite valleys, with the determined C = -1 for the top two bands. Our results lay a firm foundation for understanding the higher flat Chern bands, which is essential for the prediction or discovery of non-Abelian FCIs. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: 24 pages, 4 figures

arXiv:2406.00276 [pdf]

Non-destructive Degradation Pattern Decoupling for Ultra-early Battery Prototype Verification Using Physics-informed Machine Learning

Authors: Shengyu Tao, Mengtian Zhang, Zixi Zhao, Haoyang Li, Ruifei Ma, Yunhong Che, Xin Sun, Lin Su, Xiangyu Chen, Zihao Zhou, Heng Chang, Tingwei Cao, Xiao Xiao, Yaojun Liu, Wenjun Yu, Zhongling Xu, Yang Li, Han Hao, Xuan Zhang, Xiaosong Hu, Guangmin ZHou

Abstract: Manufacturing complexities and uncertainties have impeded the transition from material prototypes to commercial batteries, making prototype verification critical to quality assessment. A fundamental challenge involves deciphering intertwined chemical processes to characterize degradation patterns and their quantitative relationship with battery performance. Here we show that a physics-informed mac… ▽ More Manufacturing complexities and uncertainties have impeded the transition from material prototypes to commercial batteries, making prototype verification critical to quality assessment. A fundamental challenge involves deciphering intertwined chemical processes to characterize degradation patterns and their quantitative relationship with battery performance. Here we show that a physics-informed machine learning approach can quantify and visualize temporally resolved losses concerning thermodynamics and kinetics only using electric signals. Our method enables non-destructive degradation pattern characterization, expediting temperature-adaptable predictions of entire lifetime trajectories, rather than end-of-life points. The verification speed is 25 times faster yet maintaining 95.1% accuracy across temperatures. Such advances facilitate more sustainable management of defective prototypes before massive production, establishing a 19.76 billion USD scrap material recycling market by 2060 in China. By incorporating stepwise charge acceptance as a measure of the initial manufacturing variability of normally identical batteries, we can immediately identify long-term degradation variations. We attribute the predictive power to interpreting machine learning insights using material-agnostic featurization taxonomy for degradation pattern decoupling. Our findings offer new possibilities for dynamic system analysis, such as battery prototype degradation, demonstrating that complex pattern evolutions can be accurately predicted in a non-destructive and data-driven fashion by integrating physics-informed machine learning. △ Less

Submitted 31 May, 2024; originally announced June 2024.

ACM Class: J.2; G.3

arXiv:2405.19853 [pdf]

Correlated Electronic Structure and Density-Wave Gap in Trilayer Nickelate La4Ni3O10

Authors: X. Du, Y. D. Li, Y. T. Cao, C. Y. Pei, M. X. Zhang, W. X. Zhao, K. Y. Zhai, R. Z. Xu, Z. K. Liu, Z. W. Li, J. K. Zhao, G. Li, Y. L. Chen, Y. P. Qi, H. J. Guo, L. X. Yang

Abstract: The discovery of pressurized superconductivity at 80 K in La3Ni2O7 officially brings nickelates into the family of high-temperature superconductors, which gives rise to not only new insights but also mysteries in the strongly correlated superconductivity. More recently, the sibling compound La4Ni3O10 was also shown to be superconducting below about 25 K under pressure, further boosting the popular… ▽ More The discovery of pressurized superconductivity at 80 K in La3Ni2O7 officially brings nickelates into the family of high-temperature superconductors, which gives rise to not only new insights but also mysteries in the strongly correlated superconductivity. More recently, the sibling compound La4Ni3O10 was also shown to be superconducting below about 25 K under pressure, further boosting the popularity of nickelates in the Ruddlesden-Popper phase. In this study, combining high-resolution angle-resolved photoemission spectroscopy and ab initio calculation, we systematically investigate the electronic structures of La4Ni3O10 at ambient pressure. We reveal a high resemblance of La4Ni3O10 with La3Ni2O7 in the orbital-dependent fermiology and electronic structure, suggesting a similar electronic correlation between the two compounds. The temperature-dependent measurements imply an orbital-dependent energy gap related to the density-wave transition in La4Ni3O10. By comparing the theoretical pressure-dependent electronic structure, clues about the superconducting high-pressure phase can be deduced from the ambient measurements, providing crucial information for deciphering the unconventional superconductivity in nickelates. △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.19308 [pdf, other]

Visualizing the microscopic origins of topology in twisted molybdenum ditelluride

Authors: Ellis Thompson, Keng Tou Chu, Florie Mesple, Xiao-Wei Zhang, Chaowei Hu, Yuzhou Zhao, Heonjoon Park, Jiaqi Cai, Eric Anderson, Kenji Watanabe, Takashi Taniguchi, Jihui Yang, Jiun-Haw Chu, Xiaodong Xu, Ting Cao, Di Xiao, Matthew Yankowitz

Abstract: In moiré materials with flat electronic bands and suitable quantum geometry, strong correlations can give rise to novel topological states of matter. The nontrivial band topology of twisted molybdenum ditelluride (tMoTe$_2$) -- responsible for its fractional quantum anomalous Hall (FQAH) states -- is predicted to arise from a layer-pseudospin skyrmion lattice. Tracing the layer polarization of wav… ▽ More In moiré materials with flat electronic bands and suitable quantum geometry, strong correlations can give rise to novel topological states of matter. The nontrivial band topology of twisted molybdenum ditelluride (tMoTe$_2$) -- responsible for its fractional quantum anomalous Hall (FQAH) states -- is predicted to arise from a layer-pseudospin skyrmion lattice. Tracing the layer polarization of wavefunctions within the moiré unit cell can thus offer crucial insights into the band topology. Here, we use scanning tunneling microscopy and spectroscopy (STM/S) to probe the layer-pseudospin skyrmion textures of tMoTe$_2$. We do this by simultaneously visualizing the moiré lattice structure and the spatial localization of its electronic states. We find that the wavefunctions associated with the topological flat bands exhibit a spatially-dependent layer polarization within the moiré unit cell. This is in excellent agreement with our theoretical modeling, thereby revealing a direct microscopic connection between the structural properties of tMoTe$_2$ and its band topology. Our work enables new pathways for engineering FQAH states with strain, as well as future STM studies of the intertwined correlated and topological states arising in gate-tunable devices. △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: 7 pages, 4 figures, Extended Data, 9 figures, Supplementary Information, 8 pages, 5 figures

arXiv:2405.10318 [pdf, other]

Gauge theory of giant phonon magnetic moment in doped Dirac semimetals

Authors: Wenqin Chen, Xiao-Wei Zhang, Ying Su, Ting Cao, Di Xiao, Shi-Zeng Lin

Abstract: We present a quantum theory of phonon magnetic moment in doped Dirac semimetals. Our theory is based on an emergent gauge field approach to the electron-phonon coupling, applicable to both gapless and gapped systems. We find that the magnetic moment is directly proportional to the electrical Hall conductivity through the phonon Hall viscosity. Our theory is combined with the first-principles calcu… ▽ More We present a quantum theory of phonon magnetic moment in doped Dirac semimetals. Our theory is based on an emergent gauge field approach to the electron-phonon coupling, applicable to both gapless and gapped systems. We find that the magnetic moment is directly proportional to the electrical Hall conductivity through the phonon Hall viscosity. Our theory is combined with the first-principles calculations, allowing us to quantitatively implement it to realistic materials. Magnetic moments are found to be on the order of Bohr magneton for certain phonon modes in graphene and $\text{Cd}_3 \text{As}_2$. Our results provide practical guidance for the dynamical generation of large magnetization in the topological quantum materials. △ Less

Submitted 20 May, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

Comments: 7 pages, 3 figures, supplemental materials included

arXiv:2405.01394 [pdf, other]

Analysis of a Modular Autonomous Driving Architecture: The Top Submission to CARLA Leaderboard 2.0 Challenge

Authors: Weize Zhang, Mohammed Elmahgiubi, Kasra Rezaee, Behzad Khamidehi, Hamidreza Mirkhani, Fazel Arasteh, Chunlin Li, Muhammad Ahsan Kaleem, Eduardo R. Corral-Soto, Dhruv Sharma, Tongtong Cao

Abstract: In this paper we present the architecture of the Kyber-E2E submission to the map track of CARLA Leaderboard 2.0 Autonomous Driving (AD) challenge 2023, which achieved first place. We employed a modular architecture for our solution consists of five main components: sensing, localization, perception, tracking/prediction, and planning/control. Our solution leverages state-of-the-art language-assiste… ▽ More In this paper we present the architecture of the Kyber-E2E submission to the map track of CARLA Leaderboard 2.0 Autonomous Driving (AD) challenge 2023, which achieved first place. We employed a modular architecture for our solution consists of five main components: sensing, localization, perception, tracking/prediction, and planning/control. Our solution leverages state-of-the-art language-assisted perception models to help our planner perform more reliably in highly challenging traffic scenarios. We use open-source driving datasets in conjunction with Inverse Reinforcement Learning (IRL) to enhance the performance of our motion planner. We provide insight into our design choices and trade-offs made to achieve this solution. We also explore the impact of each component in the overall performance of our solution, with the intent of providing a guideline where allocation of resources can have the greatest impact. △ Less

Submitted 21 March, 2024; originally announced May 2024.

arXiv:2404.10584 [pdf, other]

ReWiTe: Realistic Wide-angle and Telephoto Dual Camera Fusion Dataset via Beam Splitter Camera Rig

Authors: Chunli Peng, Xuan Dong, Tiantian Cao, Zhengqing Li, Kun Dong, Weixin Li

Abstract: The fusion of images from dual camera systems featuring a wide-angle and a telephoto camera has become a hotspot problem recently. By integrating simultaneously captured wide-angle and telephoto images from these systems, the resulting fused image achieves a wide field of view (FOV) coupled with high-definition quality. Existing approaches are mostly deep learning methods, and predominantly rely o… ▽ More The fusion of images from dual camera systems featuring a wide-angle and a telephoto camera has become a hotspot problem recently. By integrating simultaneously captured wide-angle and telephoto images from these systems, the resulting fused image achieves a wide field of view (FOV) coupled with high-definition quality. Existing approaches are mostly deep learning methods, and predominantly rely on supervised learning, where the training dataset plays a pivotal role. However, current datasets typically adopt a data synthesis approach generate input pairs of wide-angle and telephoto images alongside ground-truth images. Notably, the wide-angle inputs are synthesized rather than captured using real wide-angle cameras, and the ground-truth image is captured by wide-angle camera whose quality is substantially lower than that of input telephoto images captured by telephoto cameras. To address these limitations, we introduce a novel hardware setup utilizing a beam splitter to simultaneously capture three images, i.e. input pairs and ground-truth images, from two authentic cellphones equipped with wide-angle and telephoto dual cameras. Specifically, the wide-angle and telephoto images captured by cellphone 2 serve as the input pair, while the telephoto image captured by cellphone 1, which is calibrated to match the optical path of the wide-angle image from cellphone 2, serves as the ground-truth image, maintaining quality on par with the input telephoto image. Experiments validate the efficacy of our newly introduced dataset, named ReWiTe, significantly enhances the performance of various existing methods for real-world wide-angle and telephoto dual image fusion tasks. △ Less

Submitted 29 April, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

arXiv:2404.06162 [pdf, other]

Characterizing Multimodal Long-form Summarization: A Case Study on Financial Reports

Authors: Tianyu Cao, Natraj Raman, Danial Dervovic, Chenhao Tan

Abstract: As large language models (LLMs) expand the power of natural language processing to handle long inputs, rigorous and systematic analyses are necessary to understand their abilities and behavior. A salient application is summarization, due to its ubiquity and controversy (e.g., researchers have declared the death of summarization). In this paper, we use financial report summarization as a case study… ▽ More As large language models (LLMs) expand the power of natural language processing to handle long inputs, rigorous and systematic analyses are necessary to understand their abilities and behavior. A salient application is summarization, due to its ubiquity and controversy (e.g., researchers have declared the death of summarization). In this paper, we use financial report summarization as a case study because financial reports not only are long but also use numbers and tables extensively. We propose a computational framework for characterizing multimodal long-form summarization and investigate the behavior of Claude 2.0/2.1, GPT-4/3.5, and Command. We find that GPT-3.5 and Command fail to perform this summarization task meaningfully. For Claude 2 and GPT-4, we analyze the extractiveness of the summary and identify a position bias in LLMs. This position bias disappears after shuffling the input for Claude, which suggests that Claude has the ability to recognize important information. We also conduct a comprehensive investigation on the use of numeric data in LLM-generated summaries and offer a taxonomy of numeric hallucination. We employ prompt engineering to improve GPT-4's use of numbers with limited success. Overall, our analyses highlight the strong capability of Claude 2 in handling long multimodal inputs compared to GPT-4. △ Less

Submitted 8 May, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

arXiv:2404.05697 [pdf, other]

Higher Landau-Level Analogues and Signatures of Non-Abelian States in Twisted Bilayer MoTe$_2$

Authors: Chong Wang, Xiao-Wei Zhang, Xiaoyu Liu, Jie Wang, Ting Cao, Di Xiao

Abstract: Recent experimental discovery of fractional Chern insulators at zero magnetic field in moiré superlattices has sparked intense interests in bringing Landau level physics to flat Chern bands. In twisted MoTe$_2$ bilayers (tMoTe$_2$), recent theoretical and experimental studies have found three consecutive flat Chern bands at twist angle $\sim 2^\circ$. In this work, we investigate whether higher La… ▽ More Recent experimental discovery of fractional Chern insulators at zero magnetic field in moiré superlattices has sparked intense interests in bringing Landau level physics to flat Chern bands. In twisted MoTe$_2$ bilayers (tMoTe$_2$), recent theoretical and experimental studies have found three consecutive flat Chern bands at twist angle $\sim 2^\circ$. In this work, we investigate whether higher Landau level physics can be found in these consecutive Chern bands. At twist angles $2.00^\circ$ and $1.89^\circ$, we identify four consecutive $C = 1$ bands for the $K$ valley in tMoTe$_2$. By constructing Wannier functions directly from density functional theory (DFT) calculations, a six-orbital model is developed to describe the consecutive Chern bands, with the orbitals forming a honeycomb lattice. Exact diagonalization on top of Hartree-Fock calculations are carried out with the Wannier functions. Especially, when the second moiré miniband is half-filled, signatures of non-Abelian states are found. Our Wannier-based approach in modelling moiré superlattices is faithful to DFT wave functions and can serve as benchmarks for continuum models. The possibility of realizing non-Abelian anyons at zero magnetic field also opens up a new pathway for fault-tolerant quantum information processing. △ Less

Submitted 8 April, 2024; originally announced April 2024.

arXiv:2404.05208 [pdf, other]

doi 10.1021/acs.jpclett.4c00722

Proximity-Induced Exchange Interaction: a New Pathway for Quantum Sensing using Spin Centers in Hexagonal Boron Nitride

Authors: Lingnan Shen, Di Xiao, Ting Cao

Abstract: Defects in hexagonal boron nitride (hBN), a two-dimensional van der Waals material, have raised wide range interest for its potential in various quantum applications. Due to hBN's 2D nature, spin center in hBN can be engineered in close proximity to target material, providing advantages over their 3D counterparts, such as nitrogen-vacancy (NV) center in diamond. Here we propose a novel quantum sen… ▽ More Defects in hexagonal boron nitride (hBN), a two-dimensional van der Waals material, have raised wide range interest for its potential in various quantum applications. Due to hBN's 2D nature, spin center in hBN can be engineered in close proximity to target material, providing advantages over their 3D counterparts, such as nitrogen-vacancy (NV) center in diamond. Here we propose a novel quantum sensing protocol driven by exchange interaction between spin center in hBN and the underlying magnetic substrate induced by magnetic proximity effect. By first-principle calculation, we demonstrate the induced exchange interaction dominates over dipole-dipole interaction by orders of magnitude when in proximity. The interaction remains antiferromagnetic across all stacking configuration between the spin center in hBN and the target van der Waals magnets. Additionally, we explored the scaling behavior of the exchange field as a function of the spatial separation between the spin center and the targets. △ Less

Submitted 8 April, 2024; originally announced April 2024.

arXiv:2403.15385 [pdf, other]

LATTE3D: Large-scale Amortized Text-To-Enhanced3D Synthesis

Authors: Kevin Xie, Jonathan Lorraine, Tianshi Cao, Jun Gao, James Lucas, Antonio Torralba, Sanja Fidler, Xiaohui Zeng

Abstract: Recent text-to-3D generation approaches produce impressive 3D results but require time-consuming optimization that can take up to an hour per prompt. Amortized methods like ATT3D optimize multiple prompts simultaneously to improve efficiency, enabling fast text-to-3D synthesis. However, they cannot capture high-frequency geometry and texture details and struggle to scale to large prompt sets, so t… ▽ More Recent text-to-3D generation approaches produce impressive 3D results but require time-consuming optimization that can take up to an hour per prompt. Amortized methods like ATT3D optimize multiple prompts simultaneously to improve efficiency, enabling fast text-to-3D synthesis. However, they cannot capture high-frequency geometry and texture details and struggle to scale to large prompt sets, so they generalize poorly. We introduce LATTE3D, addressing these limitations to achieve fast, high-quality generation on a significantly larger prompt set. Key to our method is 1) building a scalable architecture and 2) leveraging 3D data during optimization through 3D-aware diffusion priors, shape regularization, and model initialization to achieve robustness to diverse and complex training prompts. LATTE3D amortizes both neural field and textured surface generation to produce highly detailed textured meshes in a single forward pass. LATTE3D generates 3D objects in 400ms, and can be further enhanced with fast test-time optimization. △ Less

Submitted 22 March, 2024; originally announced March 2024.

Comments: See the project website at https://research.nvidia.com/labs/toronto-ai/LATTE3D/

MSC Class: 68T45 ACM Class: I.2.6; I.2.7; I.3.6; I.3.7

arXiv:2403.05012 [pdf]

Ultrafast Dynamics of Bilayer and Trilayer Nickelate Superconductors

Authors: Y. D. Li, Y. T. Cao, L. Y. Liu, P. Peng, H. Lin, C. Y. Pei, M. X. Zhang, H. Wu, X. Du, W. X. Zhao, K. Y. Zhai, J. K. Zhao, M. -L. Lin, P. H. Tan, Y. P. Qi, G. Li, H. J. Guo, Luyi Yang, L. X. Yang

Abstract: In addition to the pressurized high-temperature superconductivity, bilayer and trilayer nickelate superconductors Lan+1NinO3n+1 (n = 2 and 3) exhibit many intriguing properties at ambient pressure, such as orbital-dependent electronic correlation, non-Fermi liquid behavior, and density-wave transitions. Here, using ultrafast reflectivity measurement, we observe a drastic difference between the ult… ▽ More In addition to the pressurized high-temperature superconductivity, bilayer and trilayer nickelate superconductors Lan+1NinO3n+1 (n = 2 and 3) exhibit many intriguing properties at ambient pressure, such as orbital-dependent electronic correlation, non-Fermi liquid behavior, and density-wave transitions. Here, using ultrafast reflectivity measurement, we observe a drastic difference between the ultrafast dynamics of the bilayer and trilayer nickelates at ambient pressure. Firstly, we observe a coherent phonon mode in La4Ni3O10 involving the collective vibration of La, Ni, and O atoms, which is absent in La3Ni2O7. Secondly, the temperature-dependent relaxation time diverges near the density-wave transition temperature of La4Ni3O10, in drastic contrast to kink-like changes in La3Ni2O7. Moreover, we estimate the electron-phonon coupling constants to be 0.05~0.07 and 0.12~0.16 for La3Ni2O7 and La4Ni3O10, respectively, suggesting a relatively minor role of electron-phonon coupling in the electronic properties of Lan+1NinO3n+1. Our work not only sheds light on the relevant microscopic interaction but also establishes a foundation for further studying the interplay between superconductivity and density-wave transitions in nickelate superconductors. △ Less

Submitted 7 March, 2024; originally announced March 2024.

arXiv:2403.04997 [pdf, other]

DiffChat: Learning to Chat with Text-to-Image Synthesis Models for Interactive Image Creation

Authors: Jiapeng Wang, Chengyu Wang, Tingfeng Cao, Jun Huang, Lianwen **

Abstract: We present DiffChat, a novel method to align Large Language Models (LLMs) to "chat" with prompt-as-input Text-to-Image Synthesis (TIS) models (e.g., Stable Diffusion) for interactive image creation. Given a raw prompt/image and a user-specified instruction, DiffChat can effectively make appropriate modifications and generate the target prompt, which can be leveraged to create the target image of h… ▽ More We present DiffChat, a novel method to align Large Language Models (LLMs) to "chat" with prompt-as-input Text-to-Image Synthesis (TIS) models (e.g., Stable Diffusion) for interactive image creation. Given a raw prompt/image and a user-specified instruction, DiffChat can effectively make appropriate modifications and generate the target prompt, which can be leveraged to create the target image of high quality. To achieve this, we first collect an instruction-following prompt engineering dataset named InstructPE for the supervised training of DiffChat. Next, we propose a reinforcement learning framework with the feedback of three core criteria for image creation, i.e., aesthetics, user preference, and content integrity. It involves an action-space dynamic modification technique to obtain more relevant positive samples and harder negative samples during the off-policy sampling. Content integrity is also introduced into the value estimation function for further improvement of produced images. Our method can exhibit superior performance than baseline models and strong competitors based on both automatic and human evaluations, which fully demonstrates its effectiveness. △ Less

Submitted 7 March, 2024; originally announced March 2024.

arXiv:2403.03431 [pdf, other]

Towards Understanding Cross and Self-Attention in Stable Diffusion for Text-Guided Image Editing

Authors: Bingyan Liu, Chengyu Wang, Tingfeng Cao, Kui Jia, Jun Huang

Abstract: Deep Text-to-Image Synthesis (TIS) models such as Stable Diffusion have recently gained significant popularity for creative Text-to-image generation. Yet, for domain-specific scenarios, tuning-free Text-guided Image Editing (TIE) is of greater importance for application developers, which modify objects or object properties in images by manipulating feature components in attention layers during the… ▽ More Deep Text-to-Image Synthesis (TIS) models such as Stable Diffusion have recently gained significant popularity for creative Text-to-image generation. Yet, for domain-specific scenarios, tuning-free Text-guided Image Editing (TIE) is of greater importance for application developers, which modify objects or object properties in images by manipulating feature components in attention layers during the generation process. However, little is known about what semantic meanings these attention layers have learned and which parts of the attention maps contribute to the success of image editing. In this paper, we conduct an in-depth probing analysis and demonstrate that cross-attention maps in Stable Diffusion often contain object attribution information that can result in editing failures. In contrast, self-attention maps play a crucial role in preserving the geometric and shape details of the source image during the transformation to the target image. Our analysis offers valuable insights into understanding cross and self-attention maps in diffusion models. Moreover, based on our findings, we simplify popular image editing methods and propose a more straightforward yet more stable and efficient tuning-free procedure that only modifies self-attention maps of the specified attention layers during the denoising process. Experimental results show that our simplified method consistently surpasses the performance of popular approaches on multiple datasets. △ Less

Submitted 5 March, 2024; originally announced March 2024.

arXiv:2403.02253 [pdf, other]

KnowPhish: Large Language Models Meet Multimodal Knowledge Graphs for Enhancing Reference-Based Phishing Detection

Authors: Yuexin Li, Chengyu Huang, Shumin Deng, Mei Lin Lock, Tri Cao, Nay Oo, Hoon Wei Lim, Bryan Hooi

Abstract: Phishing attacks have inflicted substantial losses on individuals and businesses alike, necessitating the development of robust and efficient automated phishing detection approaches. Reference-based phishing detectors (RBPDs), which compare the logos on a target webpage to a known set of logos, have emerged as the state-of-the-art approach. However, a major limitation of existing RBPDs is that the… ▽ More Phishing attacks have inflicted substantial losses on individuals and businesses alike, necessitating the development of robust and efficient automated phishing detection approaches. Reference-based phishing detectors (RBPDs), which compare the logos on a target webpage to a known set of logos, have emerged as the state-of-the-art approach. However, a major limitation of existing RBPDs is that they rely on a manually constructed brand knowledge base, making it infeasible to scale to a large number of brands, which results in false negative errors due to the insufficient brand coverage of the knowledge base. To address this issue, we propose an automated knowledge collection pipeline, using which we collect a large-scale multimodal brand knowledge base, KnowPhish, containing 20k brands with rich information about each brand. KnowPhish can be used to boost the performance of existing RBPDs in a plug-and-play manner. A second limitation of existing RBPDs is that they solely rely on the image modality, ignoring useful textual information present in the webpage HTML. To utilize this textual information, we propose a Large Language Model (LLM)-based approach to extract brand information of webpages from text. Our resulting multimodal phishing detection approach, KnowPhish Detector (KPD), can detect phishing webpages with or without logos. We evaluate KnowPhish and KPD on a manually validated dataset, and a field study under Singapore's local context, showing substantial improvements in effectiveness and efficiency compared to state-of-the-art baselines. △ Less

Submitted 15 June, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

Comments: Accepted by USENIX Security 2024

arXiv:2403.01417 [pdf, other]

Asyn2F: An Asynchronous Federated Learning Framework with Bidirectional Model Aggregation

Authors: Tien-Dung Cao, Nguyen T. Vuong, Thai Q. Le, Hoang V. N. Dao, Tram Truong-Huu

Abstract: In federated learning, the models can be trained synchronously or asynchronously. Many research works have focused on develo** an aggregation method for the server to aggregate multiple local models into the global model with improved performance. They ignore the heterogeneity of the training workers, which causes the delay in the training of the local models, leading to the obsolete information… ▽ More In federated learning, the models can be trained synchronously or asynchronously. Many research works have focused on develo** an aggregation method for the server to aggregate multiple local models into the global model with improved performance. They ignore the heterogeneity of the training workers, which causes the delay in the training of the local models, leading to the obsolete information issue. In this paper, we design and develop Asyn2F, an Asynchronous Federated learning Framework with bidirectional model aggregation. By bidirectional model aggregation, Asyn2F, on one hand, allows the server to asynchronously aggregate multiple local models and results in a new global model. On the other hand, it allows the training workers to aggregate the new version of the global model into the local model, which is being trained even in the middle of a training epoch. We develop Asyn2F considering the practical implementation requirements such as using cloud services for model storage and message queuing protocols for communications. Extensive experiments with different datasets show that the models trained by Asyn2F achieve higher performance compared to the state-of-the-art techniques. The experiments also demonstrate the effectiveness, practicality, and scalability of Asyn2F, making it ready for deployment in real scenarios. △ Less

Submitted 3 March, 2024; originally announced March 2024.

arXiv:2402.13822 [pdf, other]

MSTAR: Multi-Scale Backbone Architecture Search for Timeseries Classification

Authors: Tue M. Cao, Nhat H. Tran, Hieu H. Pham, Hung T. Nguyen, Le P. Nguyen

Abstract: Most of the previous approaches to Time Series Classification (TSC) highlight the significance of receptive fields and frequencies while overlooking the time resolution. Hence, unavoidably suffered from scalability issues as they integrated an extensive range of receptive fields into classification models. Other methods, while having a better adaptation for large datasets, require manual design an… ▽ More Most of the previous approaches to Time Series Classification (TSC) highlight the significance of receptive fields and frequencies while overlooking the time resolution. Hence, unavoidably suffered from scalability issues as they integrated an extensive range of receptive fields into classification models. Other methods, while having a better adaptation for large datasets, require manual design and yet not being able to reach the optimal architecture due to the uniqueness of each dataset. We overcome these challenges by proposing a novel multi-scale search space and a framework for Neural architecture search (NAS), which addresses both the problem of frequency and time resolution, discovering the suitable scale for a specific dataset. We further show that our model can serve as a backbone to employ a powerful Transformer module with both untrained and pre-trained weights. Our search space reaches the state-of-the-art performance on four datasets on four different domains while introducing more than ten highly fine-tuned models for each data. △ Less

Submitted 21 February, 2024; originally announced February 2024.

arXiv:2402.12603 [pdf]

Interlayer ferroelectric polarization modulated anomalous Hall effects in four-layer MnBi2Te4 antiferromagnets

Authors: Ziyu Niu, Xiang-Long Yu, Dingfu Shao, Xixiang **g, Defeng Hou, Xuhong Li, **g Sun, Junqin Shi, Xiaoli Fan, Tengfei Cao

Abstract: Van der Waals (vdW) assembly could efficiently modulate the symmetry of two-dimensional (2D) materials that ultimately governs their physical properties. Of particular interest is the ferroelectric polarization being introduced by proper vdW assembly that enables the realization of novel electronic, magnetic and transport properties of 2D materials. Four-layer antiferromagnetic MnBi2Te4 (F-MBT) of… ▽ More Van der Waals (vdW) assembly could efficiently modulate the symmetry of two-dimensional (2D) materials that ultimately governs their physical properties. Of particular interest is the ferroelectric polarization being introduced by proper vdW assembly that enables the realization of novel electronic, magnetic and transport properties of 2D materials. Four-layer antiferromagnetic MnBi2Te4 (F-MBT) offers an excellent platform to explore ferroelectric polarization effects on magnetic order and topological transport properties of nanomaterials. Here, by applying symmetry analyses and density-functional-theory calculations, the ferroelectric interface effects on magnetic order, anomalous Hall effect (AHE) or even quantum AHE (QAHE) on the F-MBT are analyzed. Interlayer ferroelectric polarization in F-MBT efficiently violates the PT symmetry (the combination symmetry of central inversion (P) and time reverse (T) of the F-MBT by conferring magnetoelectric couplings, and stabilizes a specific antiferromagnetic order encompassing a ferromagnetic interface in the F-MBT. We predict that engineering an interlayer polarization in the top or bottom interface of F-MBT allows converting F-MBT from a trivial insulator to a Chern insulator. The switching of ferroelectric polarization at the middle interfaces results in a direction reversal of the quantum anomalous Hall current. Additionally, the interlayer polarization of the top and bottom interfaces can be aligned in the same direction, and the switching of polarization direction also reverses the direction of anomalous Hall currents. Overall, our work highlights the occurrence of quantum-transport phenomena in 2D vdW four-layer antiferromagnets through vdW assembly. These phenomena are absent in the bulk or thin-film in bulk-like stacking forms of MnBi2Te4. △ Less

Submitted 19 February, 2024; originally announced February 2024.

arXiv:2402.10631 [pdf, other]

BitDistiller: Unleashing the Potential of Sub-4-Bit LLMs via Self-Distillation

Authors: Dayou Du, Yijia Zhang, Shijie Cao, Jiaqi Guo, Ting Cao, Xiaowen Chu, Ningyi Xu

Abstract: The upscaling of Large Language Models (LLMs) has yielded impressive advances in natural language processing, yet it also poses significant deployment challenges. Weight quantization has emerged as a widely embraced solution to reduce memory and computational demands. This paper introduces BitDistiller, a framework that synergizes Quantization-Aware Training (QAT) with Knowledge Distillation (KD)… ▽ More The upscaling of Large Language Models (LLMs) has yielded impressive advances in natural language processing, yet it also poses significant deployment challenges. Weight quantization has emerged as a widely embraced solution to reduce memory and computational demands. This paper introduces BitDistiller, a framework that synergizes Quantization-Aware Training (QAT) with Knowledge Distillation (KD) to boost the performance of LLMs at ultra-low precisions (sub-4-bit). Specifically, BitDistiller first incorporates a tailored asymmetric quantization and clip** technique to maximally preserve the fidelity of quantized weights, and then proposes a novel Confidence-Aware Kullback-Leibler Divergence (CAKLD) objective, which is employed in a self-distillation manner to enable faster convergence and superior model performance. Empirical evaluations demonstrate that BitDistiller significantly surpasses existing methods in both 3-bit and 2-bit configurations on general language understanding and complex reasoning benchmarks. Notably, BitDistiller is shown to be more cost-effective, demanding fewer data and training resources. The code is available at https://github.com/DD-DuDa/BitDistiller. △ Less

Submitted 16 February, 2024; originally announced February 2024.

arXiv:2402.05981 [pdf, other]

Exploring the Impact of In-Browser Deep Learning Inference on Quality of User Experience and Performance

Authors: Qipeng Wang, Shiqi Jiang, Zhenpeng Chen, Xu Cao, Yuanchun Li, Aoyu Li, Ying Zhang, Yun Ma, Ting Cao, Xuanzhe Liu

Abstract: Deep Learning (DL) is increasingly being integrated into Web applications through a method known as "in-browser inference", where the DL processes occur directly within Web browsers. However, the actual performance of this method and its effect on user experience quality (QoE) is not well-understood. This gap in knowledge necessitates new forms of QoE measurement, going beyond traditional metrics… ▽ More Deep Learning (DL) is increasingly being integrated into Web applications through a method known as "in-browser inference", where the DL processes occur directly within Web browsers. However, the actual performance of this method and its effect on user experience quality (QoE) is not well-understood. This gap in knowledge necessitates new forms of QoE measurement, going beyond traditional metrics such as page load time. To address this, we conducted the first extensive performance evaluation of in-browser inference. We introduced new metrics for this purpose: responsiveness, smoothness, and inference accuracy. Our thorough study included 9 widely-used DL models and tested them across 50 popular PC Web browsers. The findings show a significant latency issue with in-browser inference: it's on average 16.9 times slower on CPU and 4.9 times slower on GPU than native inference methods. Several factors contribute to this latency, including underused hardware instruction sets, inherent delays in the runtime environment, resource competition within the browser, and inefficiencies in software libraries and GPU abstractions. Moreover, in-browser inference demands a lot of memory, sometimes up to 334.6 times more than the size of the DL models themselves. This excessive memory usage is partly due to suboptimal memory management. Additionally, we noticed that in-browser inference increases the time it takes for graphical user interface (GUI) components to load in web browsers by a significant 67.2\%, which severely impacts the overall QoE for users of web applications that depend on this technology. △ Less

Submitted 8 February, 2024; originally announced February 2024.

arXiv:2401.15754 [pdf]

doi 10.5170/CERN-2009-006.471

High-Speed Serial Optical Link Test Bench Using FPGA with Embedded Transceivers

Authors: Annie C. Xiang, Tingting Cao, Datao Gong, Suen Hou, Chonghan Liu, Tiankuan Liu, Da-Shung Su, **-Kun Teng, **gbo Ye

Abstract: We develop a custom Bit Error Rate test bench based on Altera's Stratix II GX transceiver signal integrity development kit, demonstrate it on point-to-point serial optical link with data rate up to 5 Gbps, and compare it with commercial stand alone tester. The 8B/10B protocol is implemented and its effects studied. A variable optical attenuator is inserted in the fibre loop to induce transmission… ▽ More We develop a custom Bit Error Rate test bench based on Altera's Stratix II GX transceiver signal integrity development kit, demonstrate it on point-to-point serial optical link with data rate up to 5 Gbps, and compare it with commercial stand alone tester. The 8B/10B protocol is implemented and its effects studied. A variable optical attenuator is inserted in the fibre loop to induce transmission degradation and to measure receiver sensitivity. We report comparable receiver sensitivity results using the FPGA based tester and commercial tester. The results of the FPGA also shows that there are more one-to-zero bit flips than zero-to-one bit flips at lower error rate. In 8B/10B coded transmission, there are more word errors than bit flips, and the total error rate is less than two times that of non-coded transmission. Total error rate measured complies with simulation results, according to the protocol setup. △ Less

Submitted 28 January, 2024; originally announced January 2024.

Comments: 5 pages, 8 figures, Proceedings of the Topical Workshop on Electronics for Particle Physics 2009

arXiv:2401.12216 [pdf, other]

Mitigating Covariate Shift in Misspecified Regression with Applications to Reinforcement Learning

Authors: Philip Amortila, Tongyi Cao, Akshay Krishnamurthy

Abstract: A pervasive phenomenon in machine learning applications is distribution shift, where training and deployment conditions for a machine learning model differ. As distribution shift typically results in a degradation in performance, much attention has been devoted to algorithmic interventions that mitigate these detrimental effects. In this paper, we study the effect of distribution shift in the pres… ▽ More A pervasive phenomenon in machine learning applications is distribution shift, where training and deployment conditions for a machine learning model differ. As distribution shift typically results in a degradation in performance, much attention has been devoted to algorithmic interventions that mitigate these detrimental effects. In this paper, we study the effect of distribution shift in the presence of model misspecification, specifically focusing on $L_{\infty}$-misspecified regression and adversarial covariate shift, where the regression target remains fixed while the covariate distribution changes arbitrarily. We show that empirical risk minimization, or standard least squares regression, can result in undesirable misspecification amplification where the error due to misspecification is amplified by the density ratio between the training and testing distributions. As our main result, we develop a new algorithm -- inspired by robust optimization techniques -- that avoids this undesirable behavior, resulting in no misspecification amplification while still obtaining optimal statistical rates. As applications, we use this regression procedure to obtain new guarantees in offline and online reinforcement learning with misspecification and establish new separations between previously studied structural conditions and notions of coverage. △ Less

Submitted 22 January, 2024; originally announced January 2024.

arXiv:2401.03416 [pdf, other]

Active Galactic Nuclei in a Mid-Infrared Selected Galaxy Sample at z>0.13: [Ne V]3426 Line Emission as a Benchmark

Authors: Zi-Jian Li, Y. Sophia Dai, Jia-Sheng Huang, Stijn Wuyts, Tian-Wen Cao

Abstract: We present a 24 um-selected spectroscopic sample z > 0.13 (median z = 0.41) in the Lockman Hole field, consisting of 4035 spectra. Our aim is to identify AGNs and determine their fraction in this mid-infrared selected sample. In this work, we use the [Ne V]3426 emission line to spectroscopically identify AGNs. Combined with broad-line Type I AGNs selected in our previous study, our sample consists… ▽ More We present a 24 um-selected spectroscopic sample z > 0.13 (median z = 0.41) in the Lockman Hole field, consisting of 4035 spectra. Our aim is to identify AGNs and determine their fraction in this mid-infrared selected sample. In this work, we use the [Ne V]3426 emission line to spectroscopically identify AGNs. Combined with broad-line Type I AGNs selected in our previous study, our sample consists of 887 (22%) spectroscopically confirmed AGNs. We perform a stacking analysis on the remaining spectra, and find that in various MIR-wedge-selected AGN candidates, the stacked spectra still show significant [Ne V]3426 emission, In contrast, no clear [Ne V]3426 signal is detected in non-AGN candidates falling outside the wedges. Assuming a range of AGN mid-IR SED slope of 0.3< alpha <0.7, and an average star-forming relation derived from 65 star-forming templates, we develop a robust method to separate the AGN and star-forming contributions to the mid-IR SEDs using the rest-frame L12 /L1.6 vs L4.5 /L1.6 diagram. We separate the objects into bins of L12 , and find that AGN fraction increases with increasing L12. We also find that the stacked [Ne V]3426 strength scales with L12 . The pure AGN luminosity at 12 um exhibits a positive correlation with the star formation rates, indicating possible co-evolution and common gas supply between the AGN and their host galaxies. Varying population properties across the redshift range explored contribute to the observed correlation. △ Less

Submitted 7 January, 2024; originally announced January 2024.

Comments: 16 pages, 14 figures. Accepted for publication in ApJ

arXiv:2312.16199 [pdf, other]

Enhancing User Intent Capture in Session-Based Recommendation with Attribute Patterns

Authors: Xin Liu, Zheng Li, Yifan Gao, **gfeng Yang, Tianyu Cao, Zhengyang Wang, Bing Yin, Yangqiu Song

Abstract: The goal of session-based recommendation in E-commerce is to predict the next item that an anonymous user will purchase based on the browsing and purchase history. However, constructing global or local transition graphs to supplement session data can lead to noisy correlations and user intent vanishing. In this work, we propose the Frequent Attribute Pattern Augmented Transformer (FAPAT) that char… ▽ More The goal of session-based recommendation in E-commerce is to predict the next item that an anonymous user will purchase based on the browsing and purchase history. However, constructing global or local transition graphs to supplement session data can lead to noisy correlations and user intent vanishing. In this work, we propose the Frequent Attribute Pattern Augmented Transformer (FAPAT) that characterizes user intents by building attribute transition graphs and matching attribute patterns. Specifically, the frequent and compact attribute patterns are served as memory to augment session representations, followed by a gate and a transformer block to fuse the whole session information. Through extensive experiments on two public benchmarks and 100 million industrial data in three domains, we demonstrate that FAPAT consistently outperforms state-of-the-art methods by an average of 4.5% across various evaluation metrics (Hits, NDCG, MRR). Besides evaluating the next-item prediction, we estimate the models' capabilities to capture user intents via predicting items' attributes and period-item recommendations. △ Less

Submitted 22 December, 2023; originally announced December 2023.

Comments: Accepted by NeurIPS 2023

arXiv:2312.11184 [pdf, other]

View Transition based Dual Camera Image Fusion

Authors: Tiantian Cao, Xuan Dong, Chunli Peng, Zhengqing Li, Xinyu Guo, Weixin Li

Abstract: The dual camera system of wide-angle ($\bf{W}$) and telephoto ($\bf{T}$) cameras has been widely adopted by popular phones. In the overlap region, fusing the $\bf{W}$ and $\bf{T}$ images can generate a higher quality image. Related works perform pixel-level motion alignment or high-dimensional feature alignment of the $\bf{T}$ image to the view of the $\bf{W}$ image and then perform image/feature… ▽ More The dual camera system of wide-angle ($\bf{W}$) and telephoto ($\bf{T}$) cameras has been widely adopted by popular phones. In the overlap region, fusing the $\bf{W}$ and $\bf{T}$ images can generate a higher quality image. Related works perform pixel-level motion alignment or high-dimensional feature alignment of the $\bf{T}$ image to the view of the $\bf{W}$ image and then perform image/feature fusion, but the enhancement in occlusion area is ill-posed and can hardly utilize data from $\bf{T}$ images. Our insight is to minimize the occlusion area and thus maximize the use of pixels from $\bf{T}$ images. Instead of insisting on placing the output in the $\bf{W}$ view, we propose a view transition method to transform both $\bf{W}$ and $\bf{T}$ images into a mixed view and then blend them into the output. The transformation ratio is kept small and not apparent to users, and the center area of the output, which has accumulated a sufficient amount of transformation, can directly use the contents from the T view to minimize occlusions. Experimental results show that, in comparison with the SOTA methods, occlusion area is largely reduced by our method and thus more pixels of the $\bf{T}$ image can be used for improving the quality of the output image. △ Less

Submitted 18 December, 2023; originally announced December 2023.

arXiv:2312.09445 [pdf, other]

IncepSE: Leveraging InceptionTime's performance with Squeeze and Excitation mechanism in ECG analysis

Authors: Tue Minh Cao, Nhat Hong Tran, Le Phi Nguyen, Hieu Huy Pham, Hung Thanh Nguyen

Abstract: Our study focuses on the potential for modifications of Inception-like architecture within the electrocardiogram (ECG) domain. To this end, we introduce IncepSE, a novel network characterized by strategic architectural incorporation that leverages the strengths of both InceptionTime and channel attention mechanisms. Furthermore, we propose a training setup that employs stabilization techniques tha… ▽ More Our study focuses on the potential for modifications of Inception-like architecture within the electrocardiogram (ECG) domain. To this end, we introduce IncepSE, a novel network characterized by strategic architectural incorporation that leverages the strengths of both InceptionTime and channel attention mechanisms. Furthermore, we propose a training setup that employs stabilization techniques that are aimed at tackling the formidable challenges of severe imbalance dataset PTB-XL and gradient corruption. By this means, we manage to set a new height for deep learning model in a supervised learning manner across the majority of tasks. Our model consistently surpasses InceptionTime by substantial margins compared to other state-of-the-arts in this domain, noticeably 0.013 AUROC score improvement in the "all" task, while also mitigating the inherent dataset fluctuations during training. △ Less

Submitted 16 November, 2023; originally announced December 2023.

arXiv:2312.07141 [pdf, other]

Multilingual large language models leak human stereotypes across language boundaries

Authors: Yang Trista Cao, Anna Sotnikova, Jieyu Zhao, Linda X. Zou, Rachel Rudinger, Hal Daume III

Abstract: Multilingual large language models have been increasingly popular for their proficiency in processing and generating text across various languages. Previous research has shown that the presence of stereotypes and biases in monolingual large language models can be attributed to the nature of their training data, which is collected from humans and reflects societal biases. Multilingual language mode… ▽ More Multilingual large language models have been increasingly popular for their proficiency in processing and generating text across various languages. Previous research has shown that the presence of stereotypes and biases in monolingual large language models can be attributed to the nature of their training data, which is collected from humans and reflects societal biases. Multilingual language models undergo the same training procedure as monolingual ones, albeit with training data sourced from various languages. This raises the question: do stereotypes present in one social context leak across languages within the model? In our work, we first define the term ``stereotype leakage'' and propose a framework for its measurement. With this framework, we investigate how stereotypical associations leak across four languages: English, Russian, Chinese, and Hindi. To quantify the stereotype leakage, we employ an approach from social psychology, measuring stereotypes via group-trait associations. We evaluate human stereotypes and stereotypical associations manifested in multilingual large language models such as mBERT, mT5, and GPT-3.5. Our findings show a noticeable leakage of positive, negative, and non-polar associations across all languages. Notably, Hindi within multilingual models appears to be the most susceptible to influence from other languages, while Chinese is the least. Additionally, GPT-3.5 exhibits a better alignment with human scores than other models. WARNING: This paper contains model outputs which could be offensive in nature. △ Less

Submitted 8 May, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

arXiv:2311.12776 [pdf, other]

doi 10.1038/s41467-024-48511-x

Polarization-driven band topology evolution in twisted MoTe$_2$ and WSe$_2$

Authors: Xiao-Wei Zhang, Chong Wang, Xiaoyu Liu, Yueyao Fan, Ting Cao, Di Xiao

Abstract: Motivated by recent experimental observations of opposite Chern numbers in $R$-type twisted MoTe$_2$ and WSe$_2$ homobilayers, we perform large-scale density-functional-theory (DFT) calculations with machine learning force fields to investigate moiré band topology from large to small twist angles in both materials. We find that the Chern numbers of the moiré frontier bands change sign as a functio… ▽ More Motivated by recent experimental observations of opposite Chern numbers in $R$-type twisted MoTe$_2$ and WSe$_2$ homobilayers, we perform large-scale density-functional-theory (DFT) calculations with machine learning force fields to investigate moiré band topology from large to small twist angles in both materials. We find that the Chern numbers of the moiré frontier bands change sign as a function of twist angle, and this change is driven by the competition between moiré ferroelectricity and piezoelectricity. Our large-scale calculations, enabled by machine learning methods, reveal crucial insights into interactions across different scales in twisted bilayer systems. The interplay between atomic-level relaxation effects and moiré-scale electrostatic potential variation opens new avenues for the design of intertwined topological and correlated states, including the possibility of mimicking higher Landau-level physics in the absence of a magnetic field. △ Less

Submitted 27 March, 2024; v1 submitted 21 November, 2023; originally announced November 2023.

Comments: 8 pages, 5 figures

Journal ref: Nature Communications 15, 4223 (2024)

arXiv:2311.09708 [pdf, other]

A Self-enhancement Multitask Framework for Unsupervised Aspect Category Detection

Authors: Thi-Nhung Nguyen, Hoang Ngo, Kiem-Hieu Nguyen, Tuan-Dung Cao

Abstract: Our work addresses the problem of unsupervised Aspect Category Detection using a small set of seed words. Recent works have focused on learning embedding spaces for seed words and sentences to establish similarities between sentences and aspects. However, aspect representations are limited by the quality of initial seed words, and model performances are compromised by noise. To mitigate this limit… ▽ More Our work addresses the problem of unsupervised Aspect Category Detection using a small set of seed words. Recent works have focused on learning embedding spaces for seed words and sentences to establish similarities between sentences and aspects. However, aspect representations are limited by the quality of initial seed words, and model performances are compromised by noise. To mitigate this limitation, we propose a simple framework that automatically enhances the quality of initial seed words and selects high-quality sentences for training instead of using the entire dataset. Our main concepts are to add a number of seed words to the initial set and to treat the task of noise resolution as a task of augmenting data for a low-resource task. In addition, we jointly train Aspect Category Detection with Aspect Term Extraction and Aspect Term Polarity to further enhance performance. This approach facilitates shared representation learning, allowing Aspect Category Detection to benefit from the additional guidance offered by other tasks. Extensive experiments demonstrate that our framework surpasses strong baselines on standard datasets. △ Less

Submitted 16 November, 2023; originally announced November 2023.

Comments: Accepted to EMNLP 2023

arXiv:2311.08100 [pdf, other]

PPAD: Iterative Interactions of Prediction and Planning for End-to-end Autonomous Driving

Authors: Zhili Chen, Maosheng Ye, Shuangjie Xu, Tongyi Cao, Qifeng Chen

Abstract: We present a new interaction mechanism of prediction and planning for end-to-end autonomous driving, called PPAD (Iterative Interaction of Prediction and Planning Autonomous Driving), which considers the timestep-wise interaction to better integrate prediction and planning. An ego vehicle performs motion planning at each timestep based on the trajectory prediction of surrounding agents (e.g., vehi… ▽ More We present a new interaction mechanism of prediction and planning for end-to-end autonomous driving, called PPAD (Iterative Interaction of Prediction and Planning Autonomous Driving), which considers the timestep-wise interaction to better integrate prediction and planning. An ego vehicle performs motion planning at each timestep based on the trajectory prediction of surrounding agents (e.g., vehicles and pedestrians) and its local road conditions. Unlike existing end-to-end autonomous driving frameworks, PPAD models the interactions among ego, agents, and the dynamic environment in an autoregressive manner by interleaving the Prediction and Planning processes at every timestep, instead of a single sequential process of prediction followed by planning. Specifically, we design ego-to-agent, ego-to-map, and ego-to-BEV interaction mechanisms with hierarchical dynamic key objects attention to better model the interactions. The experiments on the nuScenes benchmark show that our approach outperforms state-of-the-art methods. △ Less

Submitted 27 March, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

arXiv:2311.07879 [pdf, other]

Toxicity Detection is NOT all you Need: Measuring the Gaps to Supporting Volunteer Content Moderators

Authors: Yang Trista Cao, Lovely-Frances Domingo, Sarah Ann Gilbert, Michelle Mazurek, Katie Shilton, Hal Daumé III

Abstract: Extensive efforts in automated approaches for content moderation have been focused on develo** models to identify toxic, offensive, and hateful content with the aim of lightening the load for moderators. Yet, it remains uncertain whether improvements on those tasks have truly addressed moderators' needs in accomplishing their work. In this paper, we surface gaps between past research efforts tha… ▽ More Extensive efforts in automated approaches for content moderation have been focused on develo** models to identify toxic, offensive, and hateful content with the aim of lightening the load for moderators. Yet, it remains uncertain whether improvements on those tasks have truly addressed moderators' needs in accomplishing their work. In this paper, we surface gaps between past research efforts that have aimed to provide automation for aspects of content moderation and the needs of volunteer content moderators, regarding identifying violations of various moderation rules. To do so, we conduct a model review on Hugging Face to reveal the availability of models to cover various moderation rules and guidelines from three exemplar forums. We further put state-of-the-art LLMs to the test, evaluating how well these models perform in flagging violations of platform rules from one particular forum. Finally, we conduct a user survey study with volunteer moderators to gain insight into their perspectives on useful moderation models. Overall, we observe a non-trivial gap, as missing developed models and LLMs exhibit moderate to low performance on a significant portion of the rules. Moderators' reports provide guides for future work on develo** moderation assistant models. △ Less

Submitted 16 February, 2024; v1 submitted 13 November, 2023; originally announced November 2023.

arXiv:2311.06758 [pdf, other]

Sharing, Teaching and Aligning: Knowledgeable Transfer Learning for Cross-Lingual Machine Reading Comprehension

Authors: Tingfeng Cao, Chengyu Wang, Chuanqi Tan, Jun Huang, **hui Zhu

Abstract: In cross-lingual language understanding, machine translation is often utilized to enhance the transferability of models across languages, either by translating the training data from the source language to the target, or from the target to the source to aid inference. However, in cross-lingual machine reading comprehension (MRC), it is difficult to perform a deep level of assistance to enhance cro… ▽ More In cross-lingual language understanding, machine translation is often utilized to enhance the transferability of models across languages, either by translating the training data from the source language to the target, or from the target to the source to aid inference. However, in cross-lingual machine reading comprehension (MRC), it is difficult to perform a deep level of assistance to enhance cross-lingual transfer because of the variation of answer span positions in different languages. In this paper, we propose X-STA, a new approach for cross-lingual MRC. Specifically, we leverage an attentive teacher to subtly transfer the answer spans of the source language to the answer output space of the target. A Gradient-Disentangled Knowledge Sharing technique is proposed as an improved cross-attention block. In addition, we force the model to learn semantic alignments from multiple granularities and calibrate the model outputs with teacher guidance to enhance cross-lingual transferability. Experiments on three multi-lingual MRC datasets show the effectiveness of our method, outperforming state-of-the-art approaches. △ Less

Submitted 12 November, 2023; originally announced November 2023.

Comments: emnlp 2023

arXiv:2311.06752 [pdf, other]

BeautifulPrompt: Towards Automatic Prompt Engineering for Text-to-Image Synthesis

Authors: Tingfeng Cao, Chengyu Wang, Bingyan Liu, Ziheng Wu, **hui Zhu, Jun Huang

Abstract: Recently, diffusion-based deep generative models (e.g., Stable Diffusion) have shown impressive results in text-to-image synthesis. However, current text-to-image models often require multiple passes of prompt engineering by humans in order to produce satisfactory results for real-world applications. We propose BeautifulPrompt, a deep generative model to produce high-quality prompts from very simp… ▽ More Recently, diffusion-based deep generative models (e.g., Stable Diffusion) have shown impressive results in text-to-image synthesis. However, current text-to-image models often require multiple passes of prompt engineering by humans in order to produce satisfactory results for real-world applications. We propose BeautifulPrompt, a deep generative model to produce high-quality prompts from very simple raw descriptions, which enables diffusion-based models to generate more beautiful images. In our work, we first fine-tuned the BeautifulPrompt model over low-quality and high-quality collecting prompt pairs. Then, to ensure that our generated prompts can generate more beautiful images, we further propose a Reinforcement Learning with Visual AI Feedback technique to fine-tune our model to maximize the reward values of the generated prompts, where the reward values are calculated based on the PickScore and the Aesthetic Scores. Our results demonstrate that learning from visual AI feedback promises the potential to improve the quality of generated prompts and images significantly. We further showcase the integration of BeautifulPrompt to a cloud-native AI platform to provide better text-to-image generation service in the cloud. △ Less

Submitted 12 November, 2023; originally announced November 2023.

Comments: emnlp 2023

arXiv:2311.01792 [pdf, other]

AFPQ: Asymmetric Floating Point Quantization for LLMs

Authors: Yijia Zhang, Sicheng Zhang, Shijie Cao, Dayou Du, Jianyu Wei, Ting Cao, Ningyi Xu

Abstract: Large language models (LLMs) show great performance in various tasks, but face deployment challenges from limited memory capacity and bandwidth. Low-bit weight quantization can save memory and accelerate inference. Although floating-point (FP) formats show good performance in LLM quantization, they tend to perform poorly with small group sizes or sub-4 bits. We find the reason is that the absence… ▽ More Large language models (LLMs) show great performance in various tasks, but face deployment challenges from limited memory capacity and bandwidth. Low-bit weight quantization can save memory and accelerate inference. Although floating-point (FP) formats show good performance in LLM quantization, they tend to perform poorly with small group sizes or sub-4 bits. We find the reason is that the absence of asymmetry in previous FP quantization makes it unsuitable for handling asymmetric value distribution of LLM weight tensors. In this work, we propose asymmetric FP quantization (AFPQ), which sets separate scales for positive and negative values. Our method leads to large accuracy improvements and can be easily plugged into other quantization methods, including GPTQ and AWQ, for better performance. Besides, no additional storage is needed compared with asymmetric integer (INT) quantization. The code is available at https://github.com/zhangsichengsjtu/AFPQ. △ Less

Submitted 3 November, 2023; originally announced November 2023.

arXiv:2310.13772 [pdf, other]

TexFusion: Synthesizing 3D Textures with Text-Guided Image Diffusion Models

Authors: Tianshi Cao, Karsten Kreis, Sanja Fidler, Nicholas Sharp, Kangxue Yin

Abstract: We present TexFusion (Texture Diffusion), a new method to synthesize textures for given 3D geometries, using large-scale text-guided image diffusion models. In contrast to recent works that leverage 2D text-to-image diffusion models to distill 3D objects using a slow and fragile optimization process, TexFusion introduces a new 3D-consistent generation technique specifically designed for texture sy… ▽ More We present TexFusion (Texture Diffusion), a new method to synthesize textures for given 3D geometries, using large-scale text-guided image diffusion models. In contrast to recent works that leverage 2D text-to-image diffusion models to distill 3D objects using a slow and fragile optimization process, TexFusion introduces a new 3D-consistent generation technique specifically designed for texture synthesis that employs regular diffusion model sampling on different 2D rendered views. Specifically, we leverage latent diffusion models, apply the diffusion model's denoiser on a set of 2D renders of the 3D object, and aggregate the different denoising predictions on a shared latent texture map. Final output RGB textures are produced by optimizing an intermediate neural color field on the decodings of 2D renders of the latent texture. We thoroughly validate TexFusion and show that we can efficiently generate diverse, high quality and globally coherent textures. We achieve state-of-the-art text-guided texture synthesis performance using only image diffusion models, while avoiding the pitfalls of previous distillation-based methods. The text-conditioning offers detailed control and we also do not rely on any ground truth 3D textures for training. This makes our method versatile and applicable to a broad range of geometry and texture types. We hope that TexFusion will advance AI-based texturing of 3D assets for applications in virtual reality, game design, simulation, and more. △ Less

Submitted 20 October, 2023; originally announced October 2023.

Comments: Videos and more results on https://research.nvidia.com/labs/toronto-ai/texfusion/

ACM Class: I.3.3

Journal ref: Proceedings of the IEEE/CVF International Conference on Computer Vision (2023) 4169-4181

arXiv:2310.12551 [pdf, other]

Iterative PnP and its application in 3D-2D vascular image registration for robot navigation

Authors: **gwei Song, Keke Yang, Zheng Zhang, Meng Li, Tuoyu Cao, Maani Ghaffari

Abstract: This paper reports on a new real-time robot-centered 3D-2D vascular image alignment algorithm, which is robust to outliers and can align nonrigid shapes. Few works have managed to achieve both real-time and accurate performance for vascular intervention robots. This work bridges high-accuracy 3D-2D registration techniques and computational efficiency requirements in intervention robot applications… ▽ More This paper reports on a new real-time robot-centered 3D-2D vascular image alignment algorithm, which is robust to outliers and can align nonrigid shapes. Few works have managed to achieve both real-time and accurate performance for vascular intervention robots. This work bridges high-accuracy 3D-2D registration techniques and computational efficiency requirements in intervention robot applications. We categorize centerline-based vascular 3D-2D image registration problems as an iterative Perspective-n-Point (PnP) problem and propose to use the Levenberg-Marquardt solver on the Lie manifold. Then, the recently developed Reproducing Kernel Hilbert Space (RKHS) algorithm is introduced to overcome the ``big-to-small'' problem in typical robotic scenarios. Finally, an iterative reweighted least squares is applied to solve RKHS-based formulation efficiently. Experiments indicate that the proposed algorithm processes registration over 50 Hz (rigid) and 20 Hz (nonrigid) and obtains competing registration accuracy similar to other works. Results indicate that our Iterative PnP is suitable for future vascular intervention robot applications. △ Less

Submitted 11 January, 2024; v1 submitted 19 October, 2023; originally announced October 2023.

Comments: Submitted to ICRA 2024 Errors in Eq. 4 and Eq. 6 have been corrected. Updates include some minor improvements in Section II

arXiv:2310.00057 [pdf, other]

A multi-fidelity deep operator network (DeepONet) for fusing simulation and monitoring data: Application to real-time settlement prediction during tunnel construction

Authors: Chen Xu, Ba Trung Cao, Yong Yuan, Günther Meschke

Abstract: Ground settlement prediction during the process of mechanized tunneling is of paramount importance and remains a challenging research topic. Typically, two paradigms are existing: a physics-driven approach utilizing process-oriented computational simulation models for the tunnel-soil interaction and the settlement prediction, and a data-driven approach employing machine learning techniques to esta… ▽ More Ground settlement prediction during the process of mechanized tunneling is of paramount importance and remains a challenging research topic. Typically, two paradigms are existing: a physics-driven approach utilizing process-oriented computational simulation models for the tunnel-soil interaction and the settlement prediction, and a data-driven approach employing machine learning techniques to establish map**s between influencing factors and the ground settlement. To integrate the advantages of both approaches and to assimilate the data from different sources, we propose a multi-fidelity deep operator network (DeepONet) framework, leveraging the recently developed operator learning methods. The presented framework comprises of two components: a low-fidelity subnet that captures the fundamental ground settlement patterns obtained from finite element simulations, and a high-fidelity subnet that learns the nonlinear correlation between numerical models and real engineering monitoring data. A pre-processing strategy for causality is adopted to consider the spatio-temporal characteristics of the settlement during tunnel excavation. Transfer learning is utilized to reduce the training cost for the low-fidelity subnet. The results show that the proposed method can effectively capture the physical information provided by the numerical simulations and accurately fit measured data as well. Remarkably, even with very limited noisy monitoring data, the proposed model can achieve rapid, accurate, and robust predictions of the full-field ground settlement in real-time during mechanized tunnel excavation. △ Less

Submitted 12 November, 2023; v1 submitted 29 September, 2023; originally announced October 2023.

arXiv:2309.16110 [pdf, other]

Learning Effective NeRFs and SDFs Representations with 3D Generative Adversarial Networks for 3D Object Generation: Technical Report for ICCV 2023 OmniObject3D Challenge

Authors: Zheyuan Yang, Yibo Liu, Guile Wu, Tongtong Cao, Yuan Ren, Yang Liu, Bingbing Liu

Abstract: In this technical report, we present a solution for 3D object generation of ICCV 2023 OmniObject3D Challenge. In recent years, 3D object generation has made great process and achieved promising results, but it remains a challenging task due to the difficulty of generating complex, textured and high-fidelity results. To resolve this problem, we study learning effective NeRFs and SDFs representation… ▽ More In this technical report, we present a solution for 3D object generation of ICCV 2023 OmniObject3D Challenge. In recent years, 3D object generation has made great process and achieved promising results, but it remains a challenging task due to the difficulty of generating complex, textured and high-fidelity results. To resolve this problem, we study learning effective NeRFs and SDFs representations with 3D Generative Adversarial Networks (GANs) for 3D object generation. Specifically, inspired by recent works, we use the efficient geometry-aware 3D GANs as the backbone incorporating with label embedding and color map**, which enables to train the model on different taxonomies simultaneously. Then, through a decoder, we aggregate the resulting features to generate Neural Radiance Fields (NeRFs) based representations for rendering high-fidelity synthetic images. Meanwhile, we optimize Signed Distance Functions (SDFs) to effectively represent objects with 3D meshes. Besides, we observe that this model can be effectively trained with only a few images of each object from a variety of classes, instead of using a great number of images per object or training one model per class. With this pipeline, we can optimize an effective model for 3D object generation. This solution is one of the final top-3-place solutions in the ICCV 2023 OmniObject3D Challenge. △ Less

Submitted 27 September, 2023; originally announced September 2023.

arXiv:2309.08978 [pdf, other]

Accelerating In-Browser Deep Learning Inference on Diverse Edge Clients through Just-in-Time Kernel Optimizations

Authors: Fucheng Jia, Shiqi Jiang, Ting Cao, Wei Cui, Tianrui Xia, Xu Cao, Yuanchun Li, Deyu Zhang, Ju Ren, Yunxin Liu, Lili Qiu, Mao Yang

Abstract: Web applications are increasingly becoming the primary platform for AI service delivery, making in-browser deep learning (DL) inference more prominent. However, current in-browser inference systems fail to effectively utilize advanced web programming techniques and customize kernels for various client devices, leading to suboptimal performance. To address the issues, this paper presents the firs… ▽ More Web applications are increasingly becoming the primary platform for AI service delivery, making in-browser deep learning (DL) inference more prominent. However, current in-browser inference systems fail to effectively utilize advanced web programming techniques and customize kernels for various client devices, leading to suboptimal performance. To address the issues, this paper presents the first in-browser inference system, nn-JIT.web, which enables just-in-time (JIT) auto-generation of optimized kernels for both CPUs and GPUs during inference. The system achieves this by using two novel web programming techniques that can significantly reduce kernel generation time, compared to other tensor compilers such as TVM, while maintaining or even improving performance. The first technique, Tensor-Web Compiling Co-Design, lowers compiling costs by unifying tensor and web compiling and eliminating redundant and ineffective compiling passes. The second technique, Web-Specific Lite Kernel Optimization Space Design, reduces kernel tuning costs by focusing on web programming requirements and efficient hardware resource utilization, limiting the optimization space to only dozens. nn-JIT.web is evaluated for modern transformer models on a range of client devices, including the mainstream CPUs and GPUs from ARM, Intel, AMD and Nvidia. Results show that nn-JIT.web can achieve up to 8.2x faster within 30 seconds compared to the baselines across various models. △ Less

Submitted 16 September, 2023; originally announced September 2023.

arXiv:2309.04865 [pdf, other]

doi 10.1103/PhysRevB.108.L121404

Observation of flat and weakly dispersing bands in a van der Waals semiconductor Nb3Br8 with breathing kagome lattice

Authors: Sabin Regmi, Anup Pradhan Sakhya, Tharindu Fernando, Yuzhou Zhao, Dylan Jeff, Milo Sprague, Favian Gonzalez, Iftakhar Bin Elius, Mazharul Islam Mondal, Nathan Valadez, Damani Jarrett, Alexis Agosto, Jihui Yang, Jiun-Haw Chu, Saiful I. Khondaker, Xiaodong Xu, Ting Cao, Madhab Neupane

Abstract: Niobium halides, Nb3X8 (X = Cl,Br,I), which are predicted two-dimensional magnets, have recently gotten attention due to their breathing kagome geometry. Here, we have studied the electronic structure of Nb3Br8 by using angle-resolved photoemission spectroscopy (ARPES) and first-principles calculations. ARPES results depict the presence of multiple flat and weakly dispersing bands. These bands are… ▽ More Niobium halides, Nb3X8 (X = Cl,Br,I), which are predicted two-dimensional magnets, have recently gotten attention due to their breathing kagome geometry. Here, we have studied the electronic structure of Nb3Br8 by using angle-resolved photoemission spectroscopy (ARPES) and first-principles calculations. ARPES results depict the presence of multiple flat and weakly dispersing bands. These bands are well explained by the theoretical calculations, which show they have Nb d character indicating their origination from the Nb atoms forming the breathing kagome plane. This van der Waals material can be easily thinned down via mechanical exfoliation to the ultrathin limit and such ultrathin samples are stable as depicted from the time-dependent Raman spectroscopy measurements at room temperature. These results demonstrate that Nb3Br8 is an excellent material not only for studying breathing kagome induced flat band physics and its connection with magnetism, but also for heterostructure fabrication for application purposes. △ Less

Submitted 9 September, 2023; originally announced September 2023.

Comments: 24 pages, 12 figures, Supplemental Material included

Journal ref: Phys. Rev. B 108, L121404 (2023)

arXiv:2308.16634 [pdf, other]

doi 10.1103/PhysRevC.108.024610

Effect of initial-state geometric configurations on the nuclear liquid-gas phase transition

Authors: Y. T. Cao, X. G. Deng, Y. G. Ma

Abstract: Within the framework of an extended quantum molecular dynamics model, we simulated $^{40}$Ca + $^{16}$O collisions at beam energies ranging from 60 to 150 MeV/nucleon for $^{16}$O with different $α$-cluster configurations. Results imply that different $α$-cluster configurations lead to different yields of deuteron, triton, $^3$He and $^4$He, but not for proton and neutron. We discuss the effect of… ▽ More Within the framework of an extended quantum molecular dynamics model, we simulated $^{40}$Ca + $^{16}$O collisions at beam energies ranging from 60 to 150 MeV/nucleon for $^{16}$O with different $α$-cluster configurations. Results imply that different $α$-cluster configurations lead to different yields of deuteron, triton, $^3$He and $^4$He, but not for proton and neutron. We discuss the effect of geometric fluctuations which are presented by double ratios of light nuclei, namely $\mathcal{O}_\text{p-d-t}$ and $\mathcal{O}_\text{p-d-He}$. It is found that magnitude hierarchy of geometric fluctuations is chain, kite, square and tetrahedron structure of $^{16}$O. $\mathcal{O}_\text{p-d-t}$ has maximum value around 80 -- 100 MeV/nucleon which could be related to liquid-gas phase transition, that is consistent with results from the charge distribution of the heaviest fragments in the collisions. △ Less

Submitted 31 August, 2023; originally announced August 2023.

Comments: 10 pages, 8 figures

Journal ref: Physical Review C 108, 024610 (2023)

arXiv:2308.16451 [pdf, other]

Optical flow-based vascular respiratory motion compensation

Authors: Keke Yang, Zheng Zhang, Meng Li, Tuoyu Cao, Maani Ghaffari, **gwei Song

Abstract: This paper develops a new vascular respiratory motion compensation algorithm, Motion-Related Compensation (MRC), to conduct vascular respiratory motion compensation by extrapolating the correlation between invisible vascular and visible non-vascular. Robot-assisted vascular intervention can significantly reduce the radiation exposure of surgeons. In robot-assisted image-guided intervention, blood… ▽ More This paper develops a new vascular respiratory motion compensation algorithm, Motion-Related Compensation (MRC), to conduct vascular respiratory motion compensation by extrapolating the correlation between invisible vascular and visible non-vascular. Robot-assisted vascular intervention can significantly reduce the radiation exposure of surgeons. In robot-assisted image-guided intervention, blood vessels are constantly moving/deforming due to respiration, and they are invisible in the X-ray images unless contrast agents are injected. The vascular respiratory motion compensation technique predicts 2D vascular roadmaps in live X-ray images. When blood vessels are visible after contrast agents injection, vascular respiratory motion compensation is conducted based on the sparse Lucas-Kanade feature tracker. An MRC model is trained to learn the correlation between vascular and non-vascular motions. During the intervention, the invisible blood vessels are predicted with visible tissues and the trained MRC model. Moreover, a Gaussian-based outlier filter is adopted for refinement. Experiments on in-vivo data sets show that the proposed method can yield vascular respiratory motion compensation in 0.032 sec, with an average error 1.086 mm. Our real-time and accurate vascular respiratory motion compensation approach contributes to modern vascular intervention and surgical robots. △ Less

Submitted 31 August, 2023; originally announced August 2023.

Comments: This manuscript has been accepted by IEEE Robotics and Automation Letters

arXiv:2308.13323 [pdf, other]

SVQNet: Sparse Voxel-Adjacent Query Network for 4D Spatio-Temporal LiDAR Semantic Segmentation

Authors: Xuechao Chen, Shuangjie Xu, Xiaoyi Zou, Tongyi Cao, Dit-Yan Yeung, Lu Fang

Abstract: LiDAR-based semantic perception tasks are critical yet challenging for autonomous driving. Due to the motion of objects and static/dynamic occlusion, temporal information plays an essential role in reinforcing perception by enhancing and completing single-frame knowledge. Previous approaches either directly stack historical frames to the current frame or build a 4D spatio-temporal neighborhood usi… ▽ More LiDAR-based semantic perception tasks are critical yet challenging for autonomous driving. Due to the motion of objects and static/dynamic occlusion, temporal information plays an essential role in reinforcing perception by enhancing and completing single-frame knowledge. Previous approaches either directly stack historical frames to the current frame or build a 4D spatio-temporal neighborhood using KNN, which duplicates computation and hinders realtime performance. Based on our observation that stacking all the historical points would damage performance due to a large amount of redundant and misleading information, we propose the Sparse Voxel-Adjacent Query Network (SVQNet) for 4D LiDAR semantic segmentation. To take full advantage of the historical frames high-efficiently, we shunt the historical points into two groups with reference to the current points. One is the Voxel-Adjacent Neighborhood carrying local enhancing knowledge. The other is the Historical Context completing the global knowledge. Then we propose new modules to select and extract the instructive features from the two groups. Our SVQNet achieves state-of-the-art performance in LiDAR semantic segmentation of the SemanticKITTI benchmark and the nuScenes dataset. △ Less

Submitted 25 August, 2023; originally announced August 2023.

Comments: Received by ICCV2023

arXiv:2308.12066 [pdf, other]

Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference

Authors: Ranggi Hwang, Jianyu Wei, Shijie Cao, Changho Hwang, Xiaohu Tang, Ting Cao, Mao Yang

Abstract: Large language models (LLMs) based on transformers have made significant strides in recent years, the success of which is driven by scaling up their model size. Despite their high algorithmic performance, the computational and memory requirements of LLMs present unprecedented challenges. To tackle the high compute requirements of LLMs, the Mixture-of-Experts (MoE) architecture was introduced which… ▽ More Large language models (LLMs) based on transformers have made significant strides in recent years, the success of which is driven by scaling up their model size. Despite their high algorithmic performance, the computational and memory requirements of LLMs present unprecedented challenges. To tackle the high compute requirements of LLMs, the Mixture-of-Experts (MoE) architecture was introduced which is able to scale its model size without proportionally scaling up its computational requirements. Unfortunately, MoE's high memory demands and dynamic activation of sparse experts restrict its applicability to real-world problems. Previous solutions that offload MoE's memory-hungry expert parameters to CPU memory fall short because the latency to migrate activated experts from CPU to GPU incurs high performance overhead. Our proposed Pre-gated MoE system effectively tackles the compute and memory challenges of conventional MoE architectures using our algorithm-system co-design. Pre-gated MoE employs our novel pre-gating function which alleviates the dynamic nature of sparse expert activation, allowing our proposed system to address the large memory footprint of MoEs while also achieving high performance. We demonstrate that Pre-gated MoE is able to improve performance, reduce GPU memory consumption, while also maintaining the same level of model quality. These features allow our Pre-gated MoE system to cost-effectively deploy large-scale LLMs using just a single GPU with high performance. △ Less

Submitted 27 April, 2024; v1 submitted 23 August, 2023; originally announced August 2023.

arXiv:2308.08140 [pdf, other]

GPA-3D: Geometry-aware Prototype Alignment for Unsupervised Domain Adaptive 3D Object Detection from Point Clouds

Authors: Ziyu Li, **gming Guo, Tongtong Cao, Liu Bingbing, Wankou Yang

Abstract: LiDAR-based 3D detection has made great progress in recent years. However, the performance of 3D detectors is considerably limited when deployed in unseen environments, owing to the severe domain gap problem. Existing domain adaptive 3D detection methods do not adequately consider the problem of the distributional discrepancy in feature space, thereby hindering generalization of detectors across d… ▽ More LiDAR-based 3D detection has made great progress in recent years. However, the performance of 3D detectors is considerably limited when deployed in unseen environments, owing to the severe domain gap problem. Existing domain adaptive 3D detection methods do not adequately consider the problem of the distributional discrepancy in feature space, thereby hindering generalization of detectors across domains. In this work, we propose a novel unsupervised domain adaptive \textbf{3D} detection framework, namely \textbf{G}eometry-aware \textbf{P}rototype \textbf{A}lignment (\textbf{GPA-3D}), which explicitly leverages the intrinsic geometric relationship from point cloud objects to reduce the feature discrepancy, thus facilitating cross-domain transferring. Specifically, GPA-3D assigns a series of tailored and learnable prototypes to point cloud objects with distinct geometric structures. Each prototype aligns BEV (bird's-eye-view) features derived from corresponding point cloud objects on source and target domains, reducing the distributional discrepancy and achieving better adaptation. The evaluation results obtained on various benchmarks, including Waymo, nuScenes and KITTI, demonstrate the superiority of our GPA-3D over the state-of-the-art approaches for different adaptation scenarios. The MindSpore version code will be publicly available at \url{https://github.com/Liz66666/GPA3D}. △ Less

Submitted 16 August, 2023; originally announced August 2023.

Comments: Accepted by ICCV 2023

arXiv:2308.07488 [pdf, other]

doi 10.1103/PhysRevLett.132.146401

Gate-tunable antiferromagnetic Chern insulator in twisted bilayer transition metal dichalcogenides

Authors: Xiaoyu Liu, Chong Wang, Xiao-Wei Zhang, Ting Cao, Di Xiao

Abstract: A series of recent experimental works on twisted MoTe$_2$ homobilayers have unveiled an abundance of exotic states in this system. Valley-polarized quantum anomalous Hall states have been identified at hole do** of $ν= -1$, and the fractional quantum anomalous Hall effect is observed at $ν= -2/3$ and $ν= -3/5$. In this work, we investigate the electronic properties of AA-stacked twisted bilayer… ▽ More A series of recent experimental works on twisted MoTe$_2$ homobilayers have unveiled an abundance of exotic states in this system. Valley-polarized quantum anomalous Hall states have been identified at hole do** of $ν= -1$, and the fractional quantum anomalous Hall effect is observed at $ν= -2/3$ and $ν= -3/5$. In this work, we investigate the electronic properties of AA-stacked twisted bilayer MoTe$_2$ at $ν=-2$ by $k$-space Hartree-Fock calculations. We find that the phase diagram is qualitatively similar to the phase diagram of a Kane-Mele-Hubbard with staggered onsite potential. A noteworthy phase within the diagram is the antiferromagnetic Chern insulator, stabilized by the external electric field. We attribute the existence of this Chern insulator to an antiferromagnetic instability at a topological phase transition between the quantum spin hall phase and a band insulator phase. We highlight that the antiferromagnetic Chern insulator phase is most evident at a twist angle of approximately $4^\circ$. Our research proposes the potential of realizing a Chern insulator beyond $ν=-1$, and contributes fresh perspectives on the interplay between band topology and electron-electron correlations in moiré superlattices. △ Less

Submitted 14 August, 2023; originally announced August 2023.

arXiv:2308.02657 [pdf]

doi 10.1038/s41586-023-06536-0

Observation of Fractionally Quantized Anomalous Hall Effect

Authors: Heonjoon Park, Jiaqi Cai, Eric Anderson, Yinong Zhang, Jiayi Zhu, Xiaoyu Liu, Chong Wang, William Holtzmann, Chaowei Hu, Zhaoyu Liu, Takashi Taniguchi, Kenji Watanabe, Jiun-haw Chu, Ting Cao, Liang Fu, Wang Yao, Cui-Zu Chang, David Cobden, Di Xiao, Xiaodong Xu

Abstract: The integer quantum anomalous Hall (QAH) effect is a lattice analog of the quantum Hall effect at zero magnetic field. This striking transport phenomenon occurs in electronic systems with topologically nontrivial bands and spontaneous time-reversal symmetry breaking. Discovery of its putative fractional counterpart in the presence of strong electron correlations, i.e., the fractional quantum anoma… ▽ More The integer quantum anomalous Hall (QAH) effect is a lattice analog of the quantum Hall effect at zero magnetic field. This striking transport phenomenon occurs in electronic systems with topologically nontrivial bands and spontaneous time-reversal symmetry breaking. Discovery of its putative fractional counterpart in the presence of strong electron correlations, i.e., the fractional quantum anomalous Hall (FQAH) effect, would open a new chapter in condensed matter physics. Here, we report the direct observation of both integer and fractional QAH effects in electrical measurements on twisted bilayer MoTe$_2$. At zero magnetic field, near filling factor $ν= -1$ (one hole per moiré unit cell) we see an extended integer QAH plateau in the Hall resistance $R_\text{xy}$ that is quantized to $h/e^2 \pm 0.1 \%$ while the longitudinal resistance $R_\text{xx}$ vanishes. Remarkably, at $ν=-2/3$ and $-3/5$ we see plateau features in $R_\text{xy}$ at $3h/2e^2 \pm 1\%$ and $5h/3e^2 \pm 3\%$, respectively, while $R_\text{xx}$ remains small. All these features shift linearly in an applied magnetic field with slopes matching the corresponding Chern numbers $-1$, $-2/3$, and $-3/5$, precisely as expected for integer and fractional QAH states. In addition, at zero magnetic field, $R_\text{xy}$ is approximately $2h/e^2$ near half filling ($ν= -1/2$) and varies linearly as $ν$ is tuned. This behavior resembles that of the composite Fermi liquid in the half-filled lowest Landau level of a two-dimensional electron gas at high magnetic field. Direct observation of the FQAH and associated effects paves the way for researching charge fractionalization and anyonic statistics at zero magnetic field. △ Less

Submitted 4 August, 2023; originally announced August 2023.

Comments: 15 pages, 4 figures for main text. 8 extended data figures

Journal ref: Nature (2023)

Showing 1–50 of 318 results for author: Cao, T