-
ChatPCG: Large Language Model-Driven Reward Design for Procedural Content Generation
Authors:
In-Chang Baek,
Tae-Hwa Park,
**-Ha Noh,
Cheong-Mok Bae,
Kyung-Joong Kim
Abstract:
Driven by the rapid growth of machine learning, recent advances in game artificial intelligence (AI) have significantly impacted productivity across various gaming genres. Reward design plays a pivotal role in training game AI models, wherein researchers implement concepts of specific reward functions. However, despite the presence of AI, the reward design process predominantly remains in the doma…
▽ More
Driven by the rapid growth of machine learning, recent advances in game artificial intelligence (AI) have significantly impacted productivity across various gaming genres. Reward design plays a pivotal role in training game AI models, wherein researchers implement concepts of specific reward functions. However, despite the presence of AI, the reward design process predominantly remains in the domain of human experts, as it is heavily reliant on their creativity and engineering skills. Therefore, this paper proposes ChatPCG, a large language model (LLM)-driven reward design framework.It leverages human-level insights, coupled with game expertise, to generate rewards tailored to specific game features automatically. Moreover, ChatPCG is integrated with deep reinforcement learning, demonstrating its potential for multiplayer game content generation tasks. The results suggest that the proposed LLM exhibits the capability to comprehend game mechanics and content generation tasks, enabling tailored content generation for a specified game. This study not only highlights the potential for improving accessibility in content generation but also aims to streamline the game AI development process.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
Near-Room-Temperature Field-Controllable Exchange Bias in 2D van der Waals Ferromagnet Fe3GaTe2
Authors:
Jifeng Shao,
Xiaolong Yin,
Chunhao Bao,
Sirong Lu,
Xiaoming Ma,
Shu Guo,
Le Wang,
Xi Zhang,
Zhiyue Li,
Longxiang Li,
Yue Zhao,
Tingyong Chen
Abstract:
Exchange bias (EB) is a cornerstone of modern magnetic memory and sensing technologies. Its extension to the realm of two-dimensional (2D) van der Waals (vdW) magnets holds promise for revolutionary advancements in miniaturized and efficient atomic spintronic devices. However, the blocking temperature of EB in 2D vdW magnets is currently well below room temperature ~130 K. This study reports a rob…
▽ More
Exchange bias (EB) is a cornerstone of modern magnetic memory and sensing technologies. Its extension to the realm of two-dimensional (2D) van der Waals (vdW) magnets holds promise for revolutionary advancements in miniaturized and efficient atomic spintronic devices. However, the blocking temperature of EB in 2D vdW magnets is currently well below room temperature ~130 K. This study reports a robust EB phenomenon in Fe3GaTe2 thin-layer devices, which significantly increases the blocking temperature to a near-room-temperature record of 280 K. Both the bias direction and magnitude can be isothermally tuned by adjusting the field sweep range, in striking contrast to the conventional EB in ferromagnetic/antiferromagnetic (FM/AFM) bilayers. We propose an exchange spring model in which crystal defects with higher coercivity act as the pivotal pinning source for the observed EB phenomenon, deviating from the conventional FM/AFM interface mechanism. Cumulative growth of minor loops and multiple magnetization reversal paths are observed in field cycles below the saturation field, consistent with the hard FM defects behavior of our exchange spring model. These findings provide insights into the complex magnetic order in 2D ferromagnets and open new avenues for develo** practical ultrathin vdW spintronic devices with EB-like properties at room temperature.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Ultrafast Carrier Relaxation Dynamics in a Nodal-Line Semimetal PtSn$_4$
Authors:
Tianyun Lin,
Yongkang Ju,
Haoyuan Zhong,
Xiangyu Zeng,
Xue Dong,
Changhua Bao,
Hongyun Zhang,
Tian-Long Xia,
Peizhe Tang,
Shuyun Zhou
Abstract:
Topological Dirac nodal-line semimetals host topologically nontrivial electronic structure with nodal-line crossings around the Fermi level, which could affect the photocarrier dynamics and lead to novel relaxation mechanisms. Herein, by using time- and angle-resolved photoemission spectroscopy, we reveal the previously-inaccessible linear dispersions of the bulk conduction bands above the Fermi l…
▽ More
Topological Dirac nodal-line semimetals host topologically nontrivial electronic structure with nodal-line crossings around the Fermi level, which could affect the photocarrier dynamics and lead to novel relaxation mechanisms. Herein, by using time- and angle-resolved photoemission spectroscopy, we reveal the previously-inaccessible linear dispersions of the bulk conduction bands above the Fermi level in a Dirac nodal-line semimetal PtSn$_4$, as well as the momentum and temporal evolution of the gapless nodal lines. A surprisingly ultrafast relaxation dynamics within a few hundred femtoseconds is revealed for photoexcited carriers in the nodal line. Theoretical calculations suggest that such ultrafast carrier relaxation is attributed to the multichannel scatterings among the complex metallic bands of PtSn$_4$ via electron-phonon coupling. In addition, a unique dynamic relaxation mechanism contributed by the highly anisotropic Dirac nodal-line electronic structure is also identified. Our work provides a comprehensive understanding of the ultrafast carrier dynamics in a Dirac nodal-line semimetal.
△ Less
Submitted 20 May, 2024;
originally announced May 2024.
-
Keep It Private: Unsupervised Privatization of Online Text
Authors:
Calvin Bao,
Marine Carpuat
Abstract:
Authorship obfuscation techniques hold the promise of hel** people protect their privacy in online communications by automatically rewriting text to hide the identity of the original author. However, obfuscation has been evaluated in narrow settings in the NLP literature and has primarily been addressed with superficial edit operations that can lead to unnatural outputs. In this work, we introdu…
▽ More
Authorship obfuscation techniques hold the promise of hel** people protect their privacy in online communications by automatically rewriting text to hide the identity of the original author. However, obfuscation has been evaluated in narrow settings in the NLP literature and has primarily been addressed with superficial edit operations that can lead to unnatural outputs. In this work, we introduce an automatic text privatization framework that fine-tunes a large language model via reinforcement learning to produce rewrites that balance soundness, sense, and privacy. We evaluate it extensively on a large-scale test set of English Reddit posts by 68k authors composed of short-medium length texts. We study how the performance changes among evaluative conditions including authorial profile length and authorship detection strategy. Our method maintains high text quality according to both automated metrics and human evaluation, and successfully evades several automated authorship attacks.
△ Less
Submitted 16 May, 2024;
originally announced May 2024.
-
Evaluation of Thermal Performance of a Wick-free Vapor Chamber in Power Electronics Cooling
Authors:
Arani Mukhopadhyay,
Anish Pal,
Congbo Bao,
Mohamad Jafari Gukeh,
Sudip K. Mazumder,
Constantine M. Megaridis
Abstract:
Efficient thermal management in high-power electronics cooling can be achieved using phase-change heat transfer devices, such as vapor chambers. Traditional vapor chambers use wicks to transport condensate for efficient thermal exchange and to prevent "dry-out" of the evaporator. However, wicks in vapor chambers present significant design challenges arising out of large pressure drops across the w…
▽ More
Efficient thermal management in high-power electronics cooling can be achieved using phase-change heat transfer devices, such as vapor chambers. Traditional vapor chambers use wicks to transport condensate for efficient thermal exchange and to prevent "dry-out" of the evaporator. However, wicks in vapor chambers present significant design challenges arising out of large pressure drops across the wicking material, which slows down condensate transport rates and increases the chances for dry-out. Thicker wicks add to overall thermal resistance, while deterring the development of thinner devices by limiting the total thickness of the vapor chamber. Wickless vapor chambers eliminate the use of metal wicks entirely, by incorporating complementary wettability-patterned flat plates on both the evaporator and the condenser side. Such surface modifications enhance fluid transport on the evaporator side, while allowing the chambers to be virtually as thin as imaginable, thereby permitting design of thermally efficient thin electronic cooling devices. While wick-free vapor chambers have been studied and efficient design strategies have been suggested, we delve into real-life applications of wick-free vapor chambers in forced air cooling of high-power electronics. An experimental setup is developed wherein two Si-based MOSFETs of TO-247-3 packaging having high conduction resistance, are connected in parallel and switched at 100 kHz, to emulate high frequency power electronics operations. A rectangular copper wick-free vapor chamber spreads heat laterally over a surface 13 times larger than the heating area. This chamber is cooled externally by a fan that circulates air at room temperature. The present experimental setup extends our previous work on wick-free vapor chambers, while demonstrating the effectiveness of low-cost air cooling in vapor-chamber enhanced high-power electronics applications.
△ Less
Submitted 29 April, 2024;
originally announced April 2024.
-
Closeby Habitable Exoplanet Survey (CHES). I. Astrometric Noise and Planetary Detection Efficiency due to Stellar Spots and Faculae
Authors:
Chunhui Bao,
Jianghui Ji,
Dongjie Tan,
Guo Chen,
Xiumin Huang,
Su Wang,
Yao Dong
Abstract:
The Closeby Habitable Exoplanet Survey (CHES) is dedicated to the astrometric exploration for habitable-zone Earth-like planets orbiting solar-type stars in close proximity, achieving unprecedented micro-arcsecond precision. Given the elevated precision, thorough consideration of photocenter jitters induced by stellar activity becomes imperative. This study endeavors to model the stellar activity…
▽ More
The Closeby Habitable Exoplanet Survey (CHES) is dedicated to the astrometric exploration for habitable-zone Earth-like planets orbiting solar-type stars in close proximity, achieving unprecedented micro-arcsecond precision. Given the elevated precision, thorough consideration of photocenter jitters induced by stellar activity becomes imperative. This study endeavors to model the stellar activity of solar-type stars, compute astrometric noise, and delineate the detection limits of habitable planets within the astrometric domain. Simulations were conducted for identified primary targets of CHES, involving the generation of simulated observed data for astrometry and photometry, accounting for the impact of stellar activity. Estimation of activity levels in our samples was achieved through chromospheric activity indices, revealing that over 90% of stars exhibited photocenter jitters below 1 $μ\mathrm{as}$. Notably, certain proximate stars, such as $α$ Cen A and B, displayed more discernible noise arising from stellar activity. Subsequent tests were performed to evaluate detection performance, unveiling that stellar activity tends to have a less pronounced impact on planetary detectability for the majority of stars. Approximately 95% of targets demonstrated a detection efficiency exceeding 80%. However, for several cold stars, e.g., HD 32450 and HD 21531, with the habitable zones close to the stars, a reduction in detection efficiency was observed. These findings offer invaluable insights into the intricate interplay between stellar activity and astrometric precision, significantly advancing our understanding in the search for habitable planets.
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
Threat Behavior Textual Search by Attention Graph Isomorphism
Authors:
Chanwoo Bae,
Guanhong Tao,
Zhuo Zhang,
Xiangyu Zhang
Abstract:
Cyber attacks cause over \$1 trillion loss every year. An important task for cyber security analysts is attack forensics. It entails understanding malware behaviors and attack origins. However, existing automated or manual malware analysis can only disclose a subset of behaviors due to inherent difficulties (e.g., malware cloaking and obfuscation). As such, analysts often resort to text search tec…
▽ More
Cyber attacks cause over \$1 trillion loss every year. An important task for cyber security analysts is attack forensics. It entails understanding malware behaviors and attack origins. However, existing automated or manual malware analysis can only disclose a subset of behaviors due to inherent difficulties (e.g., malware cloaking and obfuscation). As such, analysts often resort to text search techniques to identify existing malware reports based on the symptoms they observe, exploiting the fact that malware samples share a lot of similarity, especially those from the same origin. In this paper, we propose a novel malware behavior search technique that is based on graph isomorphism at the attention layers of Transformer models. We also compose a large dataset collected from various agencies to facilitate such research. Our technique outperforms state-of-the-art methods, such as those based on sentence embeddings and keywords by 6-14%. In the case study of 10 real-world malwares, our technique can correctly attribute 8 of them to their ground truth origins while using Google only works for 3 cases.
△ Less
Submitted 18 April, 2024; v1 submitted 16 April, 2024;
originally announced April 2024.
-
Observation of dichotomic field-tunable electronic structure in twisted monolayer-bilayer graphene
Authors:
Hongyun Zhang,
Qian Li,
Youngju Park,
Yu** Jia,
Wanying Chen,
Jiaheng Li,
Qinxin Liu,
Changhua Bao,
Nicolas Leconte,
Shaohua Zhou,
Yuan Wang,
Kenji Watanabe,
Takashi Taniguchi,
Jose Avila,
Pavel Dudin,
Pu Yu,
Hongming Weng,
Wenhui Duan,
Quansheng Wu,
Jeil Jung,
Shuyun Zhou
Abstract:
Twisted bilayer graphene (tBLG) provides a fascinating platform for engineering flat bands and inducing correlated phenomena. By designing the stacking architecture of graphene layers, twisted multilayer graphene can exhibit different symmetries with rich tunability. For example, in twisted monolayer-bilayer graphene (tMBG) which breaks the C2z symmetry, transport measurements reveal an asymmetric…
▽ More
Twisted bilayer graphene (tBLG) provides a fascinating platform for engineering flat bands and inducing correlated phenomena. By designing the stacking architecture of graphene layers, twisted multilayer graphene can exhibit different symmetries with rich tunability. For example, in twisted monolayer-bilayer graphene (tMBG) which breaks the C2z symmetry, transport measurements reveal an asymmetric phase diagram under an out-of-plane electric field, exhibiting correlated insulating state and ferromagnetic state respectively when reversing the field direction. Revealing how the electronic structure evolves with electric field is critical for providing a better understanding of such asymmetric field-tunable properties. Here we report the experimental observation of field-tunable dichotomic electronic structure of tMBG by nanospot angle-resolved photoemission spectroscopy (NanoARPES) with operando gating. Interestingly, selective enhancement of the relative spectral weight contributions from monolayer and bilayer graphene is observed when switching the polarity of the bias voltage. Combining experimental results with theoretical calculations, the origin of such field-tunable electronic structure, resembling either tBLG or twisted double-bilayer graphene (tDBG), is attributed to the selectively enhanced contribution from different stacking graphene layers with a strong electron-hole asymmetry. Our work provides electronic structure insights for understanding the rich field-tunable physics of tMBG.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
Hidden charge density wave induced shadow bands and ultrafast dynamics of CuTe investigated using time-resolved ARPES
Authors:
Haoyuan Zhong,
Changhua Bao,
Tianyun Lin,
Fei Wang,
Xuanxi Cai,
Pu Yu,
Shuyun Zhou
Abstract:
Revealing the fine electronic structure is critical for understanding the underlying physics of low-dimensional materials. Angle-resolved photoemission spectroscopy (ARPES) is a powerful experimental technique for map** out the experimental electronic structure. By reducing the photon energy (e.g. to 6 eV) using laser sources, a greatly improved momentum resolution can be achieved, thereby provi…
▽ More
Revealing the fine electronic structure is critical for understanding the underlying physics of low-dimensional materials. Angle-resolved photoemission spectroscopy (ARPES) is a powerful experimental technique for map** out the experimental electronic structure. By reducing the photon energy (e.g. to 6 eV) using laser sources, a greatly improved momentum resolution can be achieved, thereby providing opportunities for ``zooming in'' the fine electronic structure and even revealing the previously unresolvable bands near the Brillouin zone center. Here, by using quasi-one-dimensional material CuTe as an example, we demonstrate the unique capability of laser-based ARPES in revealing the fine electronic structures of ``hidden'' charge density wave induced shadow bands near the Brillouin zone center, which are previously unresolvable using synchrotron sources. The observation of the shadow bands reveals the CDW phase from the aspect of band folding, and the unpredicted CDW band hybridization strongly modifies the electronic structure and Fermi surface, which suggests that such hybridization must be taken into account for studying the CDW transition. Moreover, the ultrafast non-equilibrium carrier dynamics are captured by time-resolved ARPES, revealing the relaxation dynamics through electron-phonon scattering. Our work demonstrates the advantages of laser-based ARPES in zooming in the fine electronic structures, as well as capturing the ultrafast dynamics of low-dimensional materials.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
GeneAvatar: Generic Expression-Aware Volumetric Head Avatar Editing from a Single Image
Authors:
Chong Bao,
Yinda Zhang,
Yuan Li,
Xiyu Zhang,
Bangbang Yang,
Hujun Bao,
Marc Pollefeys,
Guofeng Zhang,
Zhaopeng Cui
Abstract:
Recently, we have witnessed the explosive growth of various volumetric representations in modeling animatable head avatars. However, due to the diversity of frameworks, there is no practical method to support high-level applications like 3D head avatar editing across different representations. In this paper, we propose a generic avatar editing approach that can be universally applied to various 3D…
▽ More
Recently, we have witnessed the explosive growth of various volumetric representations in modeling animatable head avatars. However, due to the diversity of frameworks, there is no practical method to support high-level applications like 3D head avatar editing across different representations. In this paper, we propose a generic avatar editing approach that can be universally applied to various 3DMM driving volumetric head avatars. To achieve this goal, we design a novel expression-aware modification generative model, which enables lift 2D editing from a single image to a consistent 3D modification field. To ensure the effectiveness of the generative modification process, we develop several techniques, including an expression-dependent modification distillation scheme to draw knowledge from the large-scale head avatar model and 2D facial texture editing tools, implicit latent space guidance to enhance model convergence, and a segmentation-based loss reweight strategy for fine-grained texture inversion. Extensive experiments demonstrate that our method delivers high-quality and consistent results across multiple expression and viewpoints. Project page: https://zju3dv.github.io/geneavatar/
△ Less
Submitted 2 April, 2024;
originally announced April 2024.
-
SeNM-VAE: Semi-Supervised Noise Modeling with Hierarchical Variational Autoencoder
Authors:
Dihan Zheng,
Yihang Zou,
Xiaowen Zhang,
Chenglong Bao
Abstract:
The data bottleneck has emerged as a fundamental challenge in learning based image restoration methods. Researchers have attempted to generate synthesized training data using paired or unpaired samples to address this challenge. This study proposes SeNM-VAE, a semi-supervised noise modeling method that leverages both paired and unpaired datasets to generate realistic degraded data. Our approach is…
▽ More
The data bottleneck has emerged as a fundamental challenge in learning based image restoration methods. Researchers have attempted to generate synthesized training data using paired or unpaired samples to address this challenge. This study proposes SeNM-VAE, a semi-supervised noise modeling method that leverages both paired and unpaired datasets to generate realistic degraded data. Our approach is based on modeling the conditional distribution of degraded and clean images with a specially designed graphical model. Under the variational inference framework, we develop an objective function for handling both paired and unpaired data. We employ our method to generate paired training samples for real-world image denoising and super-resolution tasks. Our approach excels in the quality of synthetic degraded images compared to other unpaired and paired noise modeling methods. Furthermore, our approach demonstrates remarkable performance in downstream image restoration tasks, even with limited paired data. With more paired data, our method achieves the best performance on the SIDD dataset.
△ Less
Submitted 26 March, 2024;
originally announced March 2024.
-
Convection-Diffusion Equation: A Theoretically Certified Framework for Neural Networks
Authors:
Tangjun Wang,
Chenglong Bao,
Zuoqiang Shi
Abstract:
In this paper, we study the partial differential equation models of neural networks. Neural network can be viewed as a map from a simple base model to a complicate function. Based on solid analysis, we show that this map can be formulated by a convection-diffusion equation. This theoretically certified framework gives mathematical foundation and more understanding of neural networks. Moreover, bas…
▽ More
In this paper, we study the partial differential equation models of neural networks. Neural network can be viewed as a map from a simple base model to a complicate function. Based on solid analysis, we show that this map can be formulated by a convection-diffusion equation. This theoretically certified framework gives mathematical foundation and more understanding of neural networks. Moreover, based on the convection-diffusion equation model, we design a novel network structure, which incorporates diffusion mechanism into network architecture. Extensive experiments on both benchmark datasets and real-world applications validate the performance of the proposed model.
△ Less
Submitted 23 March, 2024;
originally announced March 2024.
-
Evolution of flat band and role of lattice relaxations in twisted bilayer graphene
Authors:
Qian Li,
Hongyun Zhang,
Yijie Wang,
Wanying Chen,
Changhua Bao,
Qinxin Liu,
Tianyun Lin,
Shuai Zhang,
Haoxiong Zhang,
Kenji Watanabe,
Takashi Taniguchi,
Jose Avila,
Pavel Dudin,
Qunyang Li,
Pu Yu,
Wenhui Duan,
Zhida Song,
Shuyun Zhou
Abstract:
Magic-angle twisted bilayer graphene (MATBG) exhibits correlated phenomena such as superconductivity and Mott insulating state related to the weakly dispersing flat band near the Fermi energy. Beyond its moiré period, such flat band is expected to be sensitive to lattice relaxations. Thus, clarifying the evolution of the electronic structure with twist angle is critical for understanding the physi…
▽ More
Magic-angle twisted bilayer graphene (MATBG) exhibits correlated phenomena such as superconductivity and Mott insulating state related to the weakly dispersing flat band near the Fermi energy. Beyond its moiré period, such flat band is expected to be sensitive to lattice relaxations. Thus, clarifying the evolution of the electronic structure with twist angle is critical for understanding the physics of MATBG. Here, we combine nanospot angle-resolved photoemission spectroscopy and atomic force microscopy to resolve the fine electronic structure of the flat band and remote bands, and their evolution with twist angles from 1.07$^\circ$ to 2.60$^\circ$. Near the magic angle, dispersion is characterized by a flat band near the Fermi energy with a strongly reduced bandwidth. Moreover, near 1.07$^\circ$, we observe a spectral weight transfer between remote bands at higher binding energy and extract the modulated interlayer spacing near the magic angle. Our work provides direct spectroscopic information on flat band physics and highlights the role of lattice relaxations.
△ Less
Submitted 20 March, 2024;
originally announced March 2024.
-
Sequential Modeling of Complex Marine Navigation: Case Study on a Passenger Vessel (Student Abstract)
Authors:
Yimeng Fan,
Pedram Agand,
Mo Chen,
Edward J. Park,
Allison Kennedy,
Chanwoo Bae
Abstract:
The maritime industry's continuous commitment to sustainability has led to a dedicated exploration of methods to reduce vessel fuel consumption. This paper undertakes this challenge through a machine learning approach, leveraging a real-world dataset spanning two years of a ferry in west coast Canada. Our focus centers on the creation of a time series forecasting model given the dynamic and static…
▽ More
The maritime industry's continuous commitment to sustainability has led to a dedicated exploration of methods to reduce vessel fuel consumption. This paper undertakes this challenge through a machine learning approach, leveraging a real-world dataset spanning two years of a ferry in west coast Canada. Our focus centers on the creation of a time series forecasting model given the dynamic and static states, actions, and disturbances. This model is designed to predict dynamic states based on the actions provided, subsequently serving as an evaluative tool to assess the proficiency of the ferry's operation under the captain's guidance. Additionally, it lays the foundation for future optimization algorithms, providing valuable feedback on decision-making processes. To facilitate future studies, our code is available at \url{https://github.com/pagand/model_optimze_vessel/tree/AAAI}
△ Less
Submitted 20 March, 2024;
originally announced March 2024.
-
Globalized distributionally robust optimization with multi core sets
Authors:
Yueyao Li,
Chenglong Bao,
Wenxun Xing
Abstract:
It is essential to capture the true probability distribution of uncertain data in the distributionally robust optimization (DRO). The uncertain data presents multimodality in numerous application scenarios, in the sense that the probability density function of the uncertain data has two or more modes (local maximums). In this paper, we propose a globalized distributionally robust optimization fram…
▽ More
It is essential to capture the true probability distribution of uncertain data in the distributionally robust optimization (DRO). The uncertain data presents multimodality in numerous application scenarios, in the sense that the probability density function of the uncertain data has two or more modes (local maximums). In this paper, we propose a globalized distributionally robust optimization framework with multiple core sets (MGDRO) to handle the multimodal data. This framework captures the multimodal structure via a penalty function composed of the minimum distances from the random vector to all core sets. Under some assumptions, the MGDRO model can be reformulated as tractable semi-definite programs for both moment-based and metric-based ambiguity sets. We applied the MGDRO models to a multi-product newswendor problem with multimodal demands. The numerical results turn out that the MGDRO models outperform traditional DRO models and other multimodal models greatly.
△ Less
Submitted 12 March, 2024;
originally announced March 2024.
-
A Framework for Cost-Effective and Self-Adaptive LLM Shaking and Recovery Mechanism
Authors:
Zhiyu Chen,
Yu Li,
Suochao Zhang,
**gbo Zhou,
Jiwen Zhou,
Chenfu Bao,
Dianhai Yu
Abstract:
As Large Language Models (LLMs) gain great success in real-world applications, an increasing number of users are seeking to develop and deploy their customized LLMs through cloud services. Nonetheless, in some specific domains, there are still concerns regarding cost and trade-offs between privacy issues and accuracy. In this study, we introduce a cost-effective and self-adaptive LLM shaking tunin…
▽ More
As Large Language Models (LLMs) gain great success in real-world applications, an increasing number of users are seeking to develop and deploy their customized LLMs through cloud services. Nonetheless, in some specific domains, there are still concerns regarding cost and trade-offs between privacy issues and accuracy. In this study, we introduce a cost-effective and self-adaptive LLM shaking tuning and recovery mechanism, named CypherTalk. With carefully designed horizontal and vertical shaking operators, we can achieve comparable accuracy results with SOTA privacy-preserving LLM schemes using Cryptography-based or Differential Privacy-based methods. Experiments also show that with the CypherTalk framework, users can achieve reliable accuracy when using optimized shaking operator settings. To our best knowledge, this is the first work that considers cost, and trade-off between model utility and privacy in LLM scenarios.
△ Less
Submitted 11 March, 2024;
originally announced March 2024.
-
Recommending Missed Citations Identified by Reviewers: A New Task, Dataset and Baselines
Authors:
Kehan Long,
Shasha Li,
Pancheng Wang,
Chenlong Bao,
**tao Tang,
Ting Wang
Abstract:
Citing comprehensively and appropriately has become a challenging task with the explosive growth of scientific publications. Current citation recommendation systems aim to recommend a list of scientific papers for a given text context or a draft paper. However, none of the existing work focuses on already included citations of full papers, which are imperfect and still have much room for improveme…
▽ More
Citing comprehensively and appropriately has become a challenging task with the explosive growth of scientific publications. Current citation recommendation systems aim to recommend a list of scientific papers for a given text context or a draft paper. However, none of the existing work focuses on already included citations of full papers, which are imperfect and still have much room for improvement. In the scenario of peer reviewing, it is a common phenomenon that submissions are identified as missing vital citations by reviewers. This may lead to a negative impact on the credibility and validity of the research presented. To help improve citations of full papers, we first define a novel task of Recommending Missed Citations Identified by Reviewers (RMC) and construct a corresponding expert-labeled dataset called CitationR. We conduct an extensive evaluation of several state-of-the-art methods on CitationR. Furthermore, we propose a new framework RMCNet with an Attentive Reference Encoder module mining the relevance between papers, already-made citations, and missed citations. Empirical results prove that RMC is challenging, with the proposed architecture outperforming previous methods in all metrics. We release our dataset and benchmark models to motivate future research on this challenging new task.
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
Target Speaker Extraction by Directly Exploiting Contextual Information in the Time-Frequency Domain
Authors:
Xue Yang,
Changchun Bao,
**g Zhou,
Xianhong Chen
Abstract:
In target speaker extraction, many studies rely on the speaker embedding which is obtained from an enrollment of the target speaker and employed as the guidance. However, solely using speaker embedding may not fully utilize the contextual information contained in the enrollment. In this paper, we directly exploit this contextual information in the time-frequency (T-F) domain. Specifically, the T-F…
▽ More
In target speaker extraction, many studies rely on the speaker embedding which is obtained from an enrollment of the target speaker and employed as the guidance. However, solely using speaker embedding may not fully utilize the contextual information contained in the enrollment. In this paper, we directly exploit this contextual information in the time-frequency (T-F) domain. Specifically, the T-F representations of the enrollment and the mixed signal are interacted to compute the weighting matrices through an attention mechanism. These weighting matrices reflect the similarity among different frames of the T-F representations and are further employed to obtain the consistent T-F representations of the enrollment. These consistent representations are served as the guidance, allowing for better exploitation of the contextual information. Furthermore, the proposed method achieves the state-of-the-art performance on the benchmark dataset and shows its effectiveness in the complex scenarios.
△ Less
Submitted 26 February, 2024;
originally announced February 2024.
-
Exploring the Power of Pure Attention Mechanisms in Blind Room Parameter Estimation
Authors:
Chunxi Wang,
Maoshen Jia,
Meiran Li,
Changchun Bao,
Wenyu **
Abstract:
Dynamic parameterization of acoustic environments has drawn widespread attention in the field of audio processing. Precise representation of local room acoustic characteristics is crucial when designing audio filters for various audio rendering applications. Key parameters in this context include reverberation time (RT60) and geometric room volume. In recent years, neural networks have been extens…
▽ More
Dynamic parameterization of acoustic environments has drawn widespread attention in the field of audio processing. Precise representation of local room acoustic characteristics is crucial when designing audio filters for various audio rendering applications. Key parameters in this context include reverberation time (RT60) and geometric room volume. In recent years, neural networks have been extensively applied in the task of blind room parameter estimation. However, there remains a question of whether pure attention mechanisms can achieve superior performance in this task. To address this issue, this study employs blind room parameter estimation based on monaural noisy speech signals. Various model architectures are investigated, including a proposed attention-based model. This model is a convolution-free Audio Spectrogram Transformer, utilizing patch splitting, attention mechanisms, and cross-modality transfer learning from a pretrained Vision Transformer. Experimental results suggest that the proposed attention mechanism-based model, relying purely on attention mechanisms without using convolution, exhibits significantly improved performance across various room parameter estimation tasks, especially with the help of dedicated pretraining and data augmentation schemes. Additionally, the model demonstrates more advantageous adaptability and robustness when handling variable-length audio inputs compared to existing methods.
△ Less
Submitted 25 April, 2024; v1 submitted 25 February, 2024;
originally announced February 2024.
-
Insights and caveats from mining local and global temporal motifs in cryptocurrency transaction networks
Authors:
Naomi A. Arnold,
Peijie Zhong,
Cheick Tidiane Ba,
Ben Steer,
Raul Mondragon,
Felix Cuadrado,
Renaud Lambiotte,
Richard G. Clegg
Abstract:
Distributed ledger technologies have opened up a wealth of fine-grained transaction data from cryptocurrencies like Bitcoin and Ethereum. This allows research into problems like anomaly detection, anti-money laundering, pattern mining and activity clustering (where data from traditional currencies is rarely available). The formalism of temporal networks offers a natural way of representing this da…
▽ More
Distributed ledger technologies have opened up a wealth of fine-grained transaction data from cryptocurrencies like Bitcoin and Ethereum. This allows research into problems like anomaly detection, anti-money laundering, pattern mining and activity clustering (where data from traditional currencies is rarely available). The formalism of temporal networks offers a natural way of representing this data and offers access to a wealth of metrics and models. However, the large scale of the data presents a challenge using standard graph analysis techniques. We use temporal motifs to analyse two Bitcoin datasets and one NFT dataset, using sequences of three transactions and up to three users. We show that the commonly used technique of simply counting temporal motifs over all users and all time can give misleading conclusions. Here we also study the motifs contributed by each user and discover that the motif distribution is heavy-tailed and that the key players have diverse motif signatures. We study the motifs that occur in different time periods and find events and anomalous activity that cannot be seen just by a count on the whole dataset. Studying motif completion time reveals dynamics driven by human behaviour as well as algorithmic behaviour.
△ Less
Submitted 14 February, 2024;
originally announced February 2024.
-
Rhythmic soliton interactions for integrated dual-microcomb spectroscopy
Authors:
Zihao Wang,
Yifei Wang,
Baoqi Shi,
Chen Shen,
Wei Sun,
Yulei Ding,
Changxi Yang,
Junqiu Liu,
Chengying Bao
Abstract:
Rotation symmetry of microresonators supports the generation of phase-locked counter-propagating (CP) solitons that can potentially miniaturize dual-comb systems. Realization of these dual-comb compatible solitons in photonic integrated circuits remains a challenge. Here, we synthesized such CP solitons in an integrated silicon nitride microresonator and observed forced soliton oscillation due to…
▽ More
Rotation symmetry of microresonators supports the generation of phase-locked counter-propagating (CP) solitons that can potentially miniaturize dual-comb systems. Realization of these dual-comb compatible solitons in photonic integrated circuits remains a challenge. Here, we synthesized such CP solitons in an integrated silicon nitride microresonator and observed forced soliton oscillation due to rhythmic, time-varying soliton interactions. The interactions result in seconds mutual-coherence passively. Temporal motion in the soliton streams is discerned by measuring a quadratic-scaling frequency noise peaks and an inverse quadratic-scaling microcomb sidebands. By generating a CP soliton trimer to have two synchronized solitons in one of the orbiting directions, we resolve the incapability of measuring two unsynchronized CP soliton dimer pulses by optical cross-correlation, and show CP solitons undergo complex motion trajectory. We further prove that precise dual-comb spectroscopy with an acquisition time as short as 0.6 $μ$s is feasible using these solitons, although the temporal motion limits the dynamic range. Besides revealing soliton interactions with different group velocities, our work propels the realization of photonic integrated dual-comb spectrometers with high passive coherence.
△ Less
Submitted 13 February, 2024;
originally announced February 2024.
-
Limitations of Agents Simulated by Predictive Models
Authors:
Raymond Douglas,
Jacek Karwowski,
Chan Bae,
Andis Draguns,
Victoria Krakovna
Abstract:
There is increasing focus on adapting predictive models into agent-like systems, most notably AI assistants based on language models. We outline two structural reasons for why these models can fail when turned into agents. First, we discuss auto-suggestive delusions. Prior work has shown theoretically that models fail to imitate agents that generated the training data if the agents relied on hidde…
▽ More
There is increasing focus on adapting predictive models into agent-like systems, most notably AI assistants based on language models. We outline two structural reasons for why these models can fail when turned into agents. First, we discuss auto-suggestive delusions. Prior work has shown theoretically that models fail to imitate agents that generated the training data if the agents relied on hidden observations: the hidden observations act as confounding variables, and the models treat actions they generate as evidence for nonexistent observations. Second, we introduce and formally study a related, novel limitation: predictor-policy incoherence. When a model generates a sequence of actions, the model's implicit prediction of the policy that generated those actions can serve as a confounding variable. The result is that models choose actions as if they expect future actions to be suboptimal, causing them to be overly conservative. We show that both of those failures are fixed by including a feedback loop from the environment, that is, re-training the models on their own actions. We give simple demonstrations of both limitations using Decision Transformers and confirm that empirical results agree with our conceptual and formal analysis. Our treatment provides a unifying view of those failure modes, and informs the question of why fine-tuning offline learned policies with online learning makes them more effective.
△ Less
Submitted 8 February, 2024;
originally announced February 2024.
-
Accelerated Gradient Methods with Gradient Restart: Global Linear Convergence
Authors:
Chenglong Bao,
Liang Chen,
Jiahong Li,
Zuowei Shen
Abstract:
Gradient restarting has been shown to improve the numerical performance of accelerated gradient methods. This paper provides a mathematical analysis to understand these advantages. First, we establish global linear convergence guarantees for the gradient restarted accelerated proximal gradient method when solving strongly convex composite optimization problems. Second, through analysis of the corr…
▽ More
Gradient restarting has been shown to improve the numerical performance of accelerated gradient methods. This paper provides a mathematical analysis to understand these advantages. First, we establish global linear convergence guarantees for the gradient restarted accelerated proximal gradient method when solving strongly convex composite optimization problems. Second, through analysis of the corresponding ordinary differential equation model, we prove the continuous trajectory of gradient restarted Nesterov's accelerated gradient method exhibits global linear convergence for quadratic strongly convex objectives, while the non-restarted version provably lacks this property by [Su, Boyd, and Candés, J. Mach. Learn. Res., 2016, 17(153), 1-43].
△ Less
Submitted 15 January, 2024;
originally announced January 2024.
-
Inter-domain Resource Collaboration in Satellite Networks: An Intelligent Scheduling Approach Towards Hybrid Missions
Authors:
Chenxi Bao,
Di Zhou,
Min Sheng,
Yan Shi,
Jiandong Li
Abstract:
Since the next-generation satellite network consisting of various service function domains, such as communication, observation, navigation, etc., is moving towards large-scale, using single-domain resources is difficult to provide satisfied and timely service guarantees for the rapidly increasing mission demands of each domain. Breaking the barriers of independence of resources in each domain, and…
▽ More
Since the next-generation satellite network consisting of various service function domains, such as communication, observation, navigation, etc., is moving towards large-scale, using single-domain resources is difficult to provide satisfied and timely service guarantees for the rapidly increasing mission demands of each domain. Breaking the barriers of independence of resources in each domain, and realizing the cross-domain transmission of missions to efficiently collaborate inter-domain resources is a promising solution. However, the hybrid scheduling of different missions and the continuous increase in the number of service domains have strengthened the differences and dynamics of mission demands, making it challenging for an efficient cross-domain mission scheduling (CMS). To this end, this paper first accurately characterizes the communication resource state of inter-satellite in real-time exploiting the sparse resource representation scheme, and systematically characterizes the differentiation of mission demands by conducting the mission priority model. Based on the information of resources and missions, we construct the top- and bottom-layer mission scheduling models of reward association exploiting the correlation of intra- and inter-domain mission scheduling and formulate the Markov decision process-based hierarchical CMS problem. Further, to achieve higher adaptability and autonomy of CMS and efficiently mitigate the impact of network scale, a hierarchical intelligent CMS algorithm is developed to dynamically adjust and efficiently match the CMS policy according to different mission demands. Simulation results demonstrate that the proposed algorithm has significant performance gain compared with independent domains and the existing CMS algorithms, and can still guarantee high service performance under different network scales.
△ Less
Submitted 15 December, 2023;
originally announced December 2023.
-
Reconstruction of dynamical systems from data without time labels
Authors:
Zhijun Zeng,
Pipi Hu,
Chenglong Bao,
Yi Zhu,
Zuoqiang Shi
Abstract:
In this paper, we study the method to reconstruct dynamical systems from data without time labels. Data without time labels appear in many applications, such as molecular dynamics, single-cell RNA sequencing etc. Reconstruction of dynamical system from time sequence data has been studied extensively. However, these methods do not apply if time labels are unknown. Without time labels, sequence data…
▽ More
In this paper, we study the method to reconstruct dynamical systems from data without time labels. Data without time labels appear in many applications, such as molecular dynamics, single-cell RNA sequencing etc. Reconstruction of dynamical system from time sequence data has been studied extensively. However, these methods do not apply if time labels are unknown. Without time labels, sequence data becomes distribution data. Based on this observation, we propose to treat the data as samples from a probability distribution and try to reconstruct the underlying dynamical system by minimizing the distribution loss, sliced Wasserstein distance more specifically. Extensive experiment results demonstrate the effectiveness of the proposed method.
△ Less
Submitted 8 April, 2024; v1 submitted 6 December, 2023;
originally announced December 2023.
-
Roadmap on Perovskite Light-Emitting Diodes
Authors:
Ziming Chen,
Robert L. Z. Hoye,
Hin-Lap Yip,
Nadesh Fiuza-Maneiro,
Iago López-Fernández,
Clara Otero-Martínez,
Lakshminarayana Polavarapu,
Navendu Mondal,
Alessandro Mirabelli,
Miguel Anaya,
Samuel D. Stranks,
Hui Liu,
Guangyi Shi,
Zhengguo Xiao,
Nakyung Kim,
Yunna Kim,
Byungha Shin,
**quan Shi,
Mengxia Liu,
Qianpeng Zhang,
Zhiyong Fan,
James C. Loy,
Lianfeng Zhao,
Barry P. Rand,
Habibul Arfin
, et al. (18 additional authors not shown)
Abstract:
In recent years, the field of metal-halide perovskite emitters has rapidly emerged as a new community in solid-state lighting. Their exceptional optoelectronic properties have contributed to the rapid rise in external quantum efficiencies (EQEs) in perovskite light-emitting diodes (PeLEDs) from <1% (in 2014) to approaching 30% (in 2023) across a wide range of wavelengths. However, several challeng…
▽ More
In recent years, the field of metal-halide perovskite emitters has rapidly emerged as a new community in solid-state lighting. Their exceptional optoelectronic properties have contributed to the rapid rise in external quantum efficiencies (EQEs) in perovskite light-emitting diodes (PeLEDs) from <1% (in 2014) to approaching 30% (in 2023) across a wide range of wavelengths. However, several challenges still hinder their commercialization, including the relatively low EQEs of blue/white devices, limited EQEs in large-area devices, poor device stability, as well as the toxicity of the easily accessible lead components and the solvents used in the synthesis and processing of PeLEDs. This roadmap addresses the current and future challenges in PeLEDs across fundamental and applied research areas, by sharing the community's perspectives. This work will provide the field with practical guidelines to advance PeLED development and facilitate more rapid commercialization.
△ Less
Submitted 19 November, 2023;
originally announced November 2023.
-
Improved Dense Nested Attention Network Based on Transformer for Infrared Small Target Detection
Authors:
Chun Bao,
Jie Cao,
Yaqian Ning,
Tianhua Zhao,
Zhijun Li,
Zechen Wang,
Li Zhang,
Qun Hao
Abstract:
Infrared small target detection based on deep learning offers unique advantages in separating small targets from complex and dynamic backgrounds. However, the features of infrared small targets gradually weaken as the depth of convolutional neural network (CNN) increases. To address this issue, we propose a novel method for detecting infrared small targets called improved dense nested attention ne…
▽ More
Infrared small target detection based on deep learning offers unique advantages in separating small targets from complex and dynamic backgrounds. However, the features of infrared small targets gradually weaken as the depth of convolutional neural network (CNN) increases. To address this issue, we propose a novel method for detecting infrared small targets called improved dense nested attention network (IDNANet), which is based on the transformer architecture. We preserve the dense nested structure of dense nested attention network (DNANet) and introduce the Swin-transformer during feature extraction stage to enhance the continuity of features. Furthermore, we integrate the ACmix attention structure into the dense nested structure to enhance the features of intermediate layers. Additionally, we design a weighted dice binary cross-entropy (WD-BCE) loss function to mitigate the negative impact of foreground-background imbalance in the samples. Moreover, we develop a dataset specifically for infrared small targets, called BIT-SIRST. The dataset comprises a significant amount of real-world targets and manually annotated labels, as well as synthetic data and corresponding labels. We have evaluated the effectiveness of our method through experiments conducted on public datasets. In comparison to other state-of-the-art methods, our approach outperforms in terms of probability of detection ($P_d$), false-alarm rate ($F_a$), and mean intersection of union ($mIoU$). The $mIoU$ reaches 90.89\% on the NUDT-SIRST dataset and 79.72\% on the SIRST dataset. The BIT-SIRST dataset and codes are available openly at \href{https://github.com/EdwardBao1006/bit\_sirst}{\color[HTML]{B22222}{https://github.com/EdwardBao1006/bit\_sirst}}.
△ Less
Submitted 17 January, 2024; v1 submitted 15 November, 2023;
originally announced November 2023.
-
Coherence memory and amnesia in a mode-locked laser
Authors:
Bo Cao,
Zhongshu Liu,
Chenxin Gao,
Changxi Yang,
Chengying Bao
Abstract:
Self-organization of temporal modes in mode-locked lasers usually starts from quantum noise. In this process, incoherent spontaneous emission is steered into coherent ultrashort pulses by dissipation and nonlinearity. In this work, we investigated self-organization dynamics in a mode-locked Mamyshev oscillator starting from coherent pulse seeds as opposed to quantum noise. We observed that the coh…
▽ More
Self-organization of temporal modes in mode-locked lasers usually starts from quantum noise. In this process, incoherent spontaneous emission is steered into coherent ultrashort pulses by dissipation and nonlinearity. In this work, we investigated self-organization dynamics in a mode-locked Mamyshev oscillator starting from coherent pulse seeds as opposed to quantum noise. We observed that the coherence of the seed can be remembered or forgotten depending on the initial inverse population. The excessive nonlinearity in the coherence amnesia regime can devastate the seed coherence, causing the oscillator to undergo a chaotic transition lasting hundreds of round trips before regaining coherence. Conversely, the oscillator converges in only a few round trips for the coherence memory regime. A heterodyne technique was developed to record the fast varying optical phase and characterize these two regimes. Dissipative soliton molecules were synthesized from external pulse pair seeds via the coherence memory pathway. In this case, a plateau of the generated pulse spacing independent from seed pulse spacing, i.e., amnesia of the seed spacing, was observed for close spaced seed pulse pairs. Moreover, we show that pulse seeds can be used for laser reconfiguration and pulse pattern control. Our work paves a way to control transient pulse dynamics and steady pulse forms on demand in mode-locked lasers.
△ Less
Submitted 14 November, 2023;
originally announced November 2023.
-
Fuel Consumption Prediction for a Passenger Ferry using Machine Learning and In-service Data: A Comparative Study
Authors:
Pedram Agand,
Allison Kennedy,
Trevor Harris,
Chanwoo Bae,
Mo Chen,
Edward J Park
Abstract:
As the importance of eco-friendly transportation increases, providing an efficient approach for marine vessel operation is essential. Methods for status monitoring with consideration to the weather condition and forecasting with the use of in-service data from ships requires accurate and complete models for predicting the energy efficiency of a ship. The models need to effectively process all the…
▽ More
As the importance of eco-friendly transportation increases, providing an efficient approach for marine vessel operation is essential. Methods for status monitoring with consideration to the weather condition and forecasting with the use of in-service data from ships requires accurate and complete models for predicting the energy efficiency of a ship. The models need to effectively process all the operational data in real-time. This paper presents models that can predict fuel consumption using in-service data collected from a passenger ship. Statistical and domain-knowledge methods were used to select the proper input variables for the models. These methods prevent over-fitting, missing data, and multicollinearity while providing practical applicability. Prediction models that were investigated include multiple linear regression (MLR), decision tree approach (DT), an artificial neural network (ANN), and ensemble methods. The best predictive performance was from a model developed using the XGboost technique which is a boosting ensemble approach. \rvv{Our code is available on GitHub at \url{https://github.com/pagand/model_optimze_vessel/tree/OE} for future research.
△ Less
Submitted 23 October, 2023; v1 submitted 19 October, 2023;
originally announced October 2023.
-
PyMsOfa: A Python Package for the Standards of Fundamental Astronomy (SOFA) Service
Authors:
Jianghui Ji,
Dongjie Tan,
Chunhui Bao,
Xiumin Huang,
Shoucun Hu,
Yao Dong,
Su Wang
Abstract:
The Standards of Fundamental Astronomy (SOFA) is a service provided by the International Astronomical Union (IAU) that offers algorithms and software for astronomical calculations, which was released in two versions by FORTRAN 77 and ANSI C, respectively. In this work, we implement the python package PyMsOfa for SOFA service by three ways: (1) a python wrapper package based on a foreign function l…
▽ More
The Standards of Fundamental Astronomy (SOFA) is a service provided by the International Astronomical Union (IAU) that offers algorithms and software for astronomical calculations, which was released in two versions by FORTRAN 77 and ANSI C, respectively. In this work, we implement the python package PyMsOfa for SOFA service by three ways: (1) a python wrapper package based on a foreign function library for Python (ctypes), (2) a python wrapper package with the foreign function interface for Python calling C code (cffi), and (3) a python package directly written in pure python codes from SOFA subroutines. The package PyMsOfa has fully implemented 247 functions of the original SOFA routines. In addition, PyMsOfa is also extensively examined, which is exactly consistent with those test examples given by the original SOFA. This python package can be suitable to not only the astrometric detection of habitable planets of the Closeby Habitable Exoplanet Survey (CHES) mission (Ji et al. 2022), but also for the frontiers themes of black holes and dark matter related to astrometric calculations and other fields. The source codes are available via https://github.com/CHES2023/PyMsOfa.
△ Less
Submitted 17 October, 2023; v1 submitted 12 October, 2023;
originally announced October 2023.
-
GenSim: Generating Robotic Simulation Tasks via Large Language Models
Authors:
Lirui Wang,
Yiyang Ling,
Zhecheng Yuan,
Mohit Shridhar,
Chen Bao,
Yuzhe Qin,
Bailin Wang,
Huazhe Xu,
Xiaolong Wang
Abstract:
Collecting large amounts of real-world interaction data to train general robotic policies is often prohibitively expensive, thus motivating the use of simulation data. However, existing methods for data generation have generally focused on scene-level diversity (e.g., object instances and poses) rather than task-level diversity, due to the human effort required to come up with and verify novel tas…
▽ More
Collecting large amounts of real-world interaction data to train general robotic policies is often prohibitively expensive, thus motivating the use of simulation data. However, existing methods for data generation have generally focused on scene-level diversity (e.g., object instances and poses) rather than task-level diversity, due to the human effort required to come up with and verify novel tasks. This has made it challenging for policies trained on simulation data to demonstrate significant task-level generalization. In this paper, we propose to automatically generate rich simulation environments and expert demonstrations by exploiting a large language models' (LLM) grounding and coding ability. Our approach, dubbed GenSim, has two modes: goal-directed generation, wherein a target task is given to the LLM and the LLM proposes a task curriculum to solve the target task, and exploratory generation, wherein the LLM bootstraps from previous tasks and iteratively proposes novel tasks that would be helpful in solving more complex tasks. We use GPT4 to expand the existing benchmark by ten times to over 100 tasks, on which we conduct supervised finetuning and evaluate several LLMs including finetuned GPTs and Code Llama on code generation for robotic simulation tasks. Furthermore, we observe that LLMs-generated simulation programs can enhance task-level generalization significantly when used for multitask policy training. We further find that with minimal sim-to-real adaptation, the multitask policies pretrained on GPT4-generated simulation tasks exhibit stronger transfer to unseen long-horizon tasks in the real world and outperform baselines by 25%. See the project website (https://liruiw.github.io/gensim) for code, demos, and videos.
△ Less
Submitted 21 January, 2024; v1 submitted 2 October, 2023;
originally announced October 2023.
-
Addressing preferred orientation in single-particle cryo-EM through AI-generated auxiliary particles
Authors:
Hui Zhang,
Dihan Zheng,
Qiurong Wu,
Nieng Yan,
Zuoqiang Shi,
Mingxu Hu,
Chenglong Bao
Abstract:
The single-particle cryo-EM field faces the persistent challenge of preferred orientation, lacking general computational solutions. We introduce cryoPROS, an AI-based approach designed to address the above issue. By generating the auxiliary particles with a conditional deep generative model, cryoPROS addresses the intrinsic bias in orientation estimation for the observed particles. We effectively…
▽ More
The single-particle cryo-EM field faces the persistent challenge of preferred orientation, lacking general computational solutions. We introduce cryoPROS, an AI-based approach designed to address the above issue. By generating the auxiliary particles with a conditional deep generative model, cryoPROS addresses the intrinsic bias in orientation estimation for the observed particles. We effectively employed cryoPROS in the cryo-EM single particle analysis of the hemagglutinin trimer, showing the ability to restore the near-atomic resolution structure on non-tilt data. Moreover, the enhanced version named cryoPROS-MP significantly improves the resolution of the membrane protein NaX using the no-tilted data that contains the effects of micelles. Compared to the classical approaches, cryoPROS does not need special experimental or image acquisition techniques, providing a purely computational yet effective solution for the preferred orientation problem. Finally, we conduct extensive experiments that establish the low risk of model bias and the high robustness of cryoPROS.
△ Less
Submitted 26 September, 2023;
originally announced September 2023.
-
Attention Is All You Need For Blind Room Volume Estimation
Authors:
Chunxi Wang,
Maoshen Jia,
Meiran Li,
Changchun Bao,
Wenyu **
Abstract:
In recent years, dynamic parameterization of acoustic environments has raised increasing attention in the field of audio processing. One of the key parameters that characterize the local room acoustics in isolation from orientation and directivity of sources and receivers is the geometric room volume. Convolutional neural networks (CNNs) have been widely selected as the main models for conducting…
▽ More
In recent years, dynamic parameterization of acoustic environments has raised increasing attention in the field of audio processing. One of the key parameters that characterize the local room acoustics in isolation from orientation and directivity of sources and receivers is the geometric room volume. Convolutional neural networks (CNNs) have been widely selected as the main models for conducting blind room acoustic parameter estimation, which aims to learn a direct map** from audio spectrograms to corresponding labels. With the recent trend of self-attention mechanisms, this paper introduces a purely attention-based model to blindly estimate room volumes based on single-channel noisy speech signals. We demonstrate the feasibility of eliminating the reliance on CNN for this task and the proposed Transformer architecture takes Gammatone magnitude spectral coefficients and phase spectrograms as inputs. To enhance the model performance given the task-specific dataset, cross-modality transfer learning is also applied. Experimental results demonstrate that the proposed model outperforms traditional CNN models across a wide range of real-world acoustics spaces, especially with the help of the dedicated pretraining and data augmentation schemes.
△ Less
Submitted 27 December, 2023; v1 submitted 23 September, 2023;
originally announced September 2023.
-
Floquet engineering of black phosphorus upon below-gap pum**
Authors:
Shaohua Zhou,
Changhua Bao,
Benshu Fan,
Fei Wang,
Haoyuan Zhong,
Hongyun Zhang,
Peizhe Tang,
Wenhui Duan,
Shuyun Zhou
Abstract:
Time-periodic light field can dress the electronic states and lead to light-induced emergent properties in quantum materials. While below-gap pum** is regarded favorable for Floquet engineering, so far direct experimental evidence of momentum-resolved band renormalization still remains missing. Here, we report experimental evidence of light-induced band renormalization in black phosphorus by pum…
▽ More
Time-periodic light field can dress the electronic states and lead to light-induced emergent properties in quantum materials. While below-gap pum** is regarded favorable for Floquet engineering, so far direct experimental evidence of momentum-resolved band renormalization still remains missing. Here, we report experimental evidence of light-induced band renormalization in black phosphorus by pum** at photon energy of 160 meV which is far below the band gap, and the distinction between below-gap pum** and near-resonance pum** is revealed. Our work demonstrates light-induced band engineering upon below-gap pum**, and provides insights for extending Floquet engineering to more quantum materials.
△ Less
Submitted 20 September, 2023;
originally announced September 2023.
-
Riemannian Anderson Mixing Methods for Minimizing $C^2$-Functions on Riemannian Manifolds
Authors:
Zanyu Li,
Chenglong Bao
Abstract:
The Anderson Mixing (AM) method is a popular approach for accelerating fixed-point iterations by leveraging historical information from previous steps. In this paper, we introduce the Riemannian Anderson Mixing (RAM) method, an extension of AM to Riemannian manifolds, and analyze its local linear convergence under reasonable assumptions. Unlike other extrapolation-based algorithms on Riemannian ma…
▽ More
The Anderson Mixing (AM) method is a popular approach for accelerating fixed-point iterations by leveraging historical information from previous steps. In this paper, we introduce the Riemannian Anderson Mixing (RAM) method, an extension of AM to Riemannian manifolds, and analyze its local linear convergence under reasonable assumptions. Unlike other extrapolation-based algorithms on Riemannian manifolds, RAM does not require computing the inverse retraction or inverse exponential map** and has a lower per-iteration cost. Furthermore, we propose a variant of RAM called Regularized RAM (RRAM), which establishes global convergence and exhibits similar local convergence properties as RAM. Our proof relies on careful error estimations based on the local geometry of Riemannian manifolds. Finally, we present experimental results on various manifold optimization problems that demonstrate the superior performance of our proposed methods over existing Riemannian gradient descent and LBFGS approaches.
△ Less
Submitted 12 September, 2023; v1 submitted 7 September, 2023;
originally announced September 2023.
-
A Note on Heights of Cyclotomic Polynomials
Authors:
Gennady Bachman,
Christopher Bao,
Shenlone Wu
Abstract:
We show that for any positive integer $h$, either $h$ or $h+1$ is a height of some cyclotomic polynomial $Φ_n$, where $n$ is a product of three distinct primes.
We show that for any positive integer $h$, either $h$ or $h+1$ is a height of some cyclotomic polynomial $Φ_n$, where $n$ is a product of three distinct primes.
△ Less
Submitted 6 September, 2023;
originally announced September 2023.
-
Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration
Authors:
Haoyu Cao,
Changcun Bao,
Chaohu Liu,
Huang Chen,
Kun Yin,
Hao Liu,
Yinsong Liu,
Deqiang Jiang,
Xing Sun
Abstract:
We propose a novel end-to-end document understanding model called SeRum (SElective Region Understanding Model) for extracting meaningful information from document images, including document analysis, retrieval, and office automation.
Unlike state-of-the-art approaches that rely on multi-stage technical schemes and are computationally expensive,
SeRum converts document image understanding and r…
▽ More
We propose a novel end-to-end document understanding model called SeRum (SElective Region Understanding Model) for extracting meaningful information from document images, including document analysis, retrieval, and office automation.
Unlike state-of-the-art approaches that rely on multi-stage technical schemes and are computationally expensive,
SeRum converts document image understanding and recognition tasks into a local decoding process of the visual tokens of interest, using a content-aware token merge module.
This mechanism enables the model to pay more attention to regions of interest generated by the query decoder, improving the model's effectiveness and speeding up the decoding speed of the generative scheme.
We also designed several pre-training tasks to enhance the understanding and local awareness of the model.
Experimental results demonstrate that SeRum achieves state-of-the-art performance on document understanding tasks and competitive results on text spotting tasks.
SeRum represents a substantial advancement towards enabling efficient and effective end-to-end document understanding.
△ Less
Submitted 3 September, 2023;
originally announced September 2023.
-
The Global R-linear Convergence of Nesterov's Accelerated Gradient Method with Unknown Strongly Convex Parameter
Authors:
Chenglong Bao,
Liang Chen,
Jiahong Li
Abstract:
The Nesterov accelerated gradient (NAG) method is an important extrapolation-based numerical algorithm that accelerates the convergence of the gradient descent method in convex optimization. When dealing with an objective function that is $μ$-strongly convex, selecting extrapolation coefficients dependent on $μ$ enables global R-linear convergence. In cases where $μ$ is unknown, a commonly adopted…
▽ More
The Nesterov accelerated gradient (NAG) method is an important extrapolation-based numerical algorithm that accelerates the convergence of the gradient descent method in convex optimization. When dealing with an objective function that is $μ$-strongly convex, selecting extrapolation coefficients dependent on $μ$ enables global R-linear convergence. In cases where $μ$ is unknown, a commonly adopted approach is to set the extrapolation coefficient using the original NAG method. This choice allows for achieving the optimal iteration complexity among first-order methods for general convex problems. However, it remains unknown whether the NAG method with an unknown strongly convex parameter exhibits global R-linear convergence for strongly convex problems. In this work, we answer this question positively by establishing the Q-linear convergence of certain constructed Lyapunov sequences. Furthermore, we extend our result to the global R-linear convergence of the accelerated proximal gradient method, which is employed for solving strongly convex composite optimization problems. Interestingly, these results contradict the findings of the continuous counterpart of the NAG method in [Su, Boyd, and Candés, J. Mach. Learn. Res., 2016, 17(153), 1-43], where the convergence rate by the suggested ordinary differential equation cannot exceed the $O(1/{\tt poly}(k))$ for strongly convex functions.
△ Less
Submitted 24 October, 2023; v1 submitted 27 August, 2023;
originally announced August 2023.
-
Ionic liquid gating induced self-intercalation of transition metal chalcogenides
Authors:
Fei Wang,
Yang Zhang,
Zhijie Wang,
Haoxiong Zhang,
Xi Wu,
Changhua Bao,
Jia Li,
Pu Yu,
Shuyun Zhou
Abstract:
Ionic liquids provide versatile pathways for controlling the structures and properties of quantum materials. Previous studies have reported electrostatic gating of nanometre-thick flakes leading to emergent superconductivity, insertion or extraction of protons and oxygen ions in perovskite oxide films enabling the control of different phases and material properties, and intercalation of large-size…
▽ More
Ionic liquids provide versatile pathways for controlling the structures and properties of quantum materials. Previous studies have reported electrostatic gating of nanometre-thick flakes leading to emergent superconductivity, insertion or extraction of protons and oxygen ions in perovskite oxide films enabling the control of different phases and material properties, and intercalation of large-sized organic cations into layered crystals giving access to tailored superconductivity. Here, we report an ionic-liquid gating method to form three-dimensional transition metal monochalcogenides (TMMCs) by driving the metals dissolved from layered transition metal dichalcogenides (TMDCs) into the van der Waals gap. We demonstrate the successful self-intercalation of PdTe$_2$ and NiTe$_2$, turning them into high-quality PdTe and NiTe single crystals, respectively. Moreover, the monochalcogenides exhibit distinctive properties from dichalcogenides. For instance, the self-intercalation of PdTe$_2$ leads to the emergence of superconductivity in PdTe. Our work provides a synthesis pathway for TMMCs by means of ionic liquid gating driven self-intercalation.
△ Less
Submitted 25 August, 2023;
originally announced August 2023.
-
On a conjecture on pattern-avoiding machines
Authors:
Christopher Bao,
Giulio Cerbai,
Yunseo Choi,
Katelyn Gan,
Owen Zhang
Abstract:
Let $s$ be West's stack-sorting map, and let $s_{T}$ be the generalized stack-sorting map, where instead of being required to increase, the stack avoids subpermutations that are order-isomorphic to any permutation in the set $T$. In 2020, Cerbai, Claesson, and Ferrari introduced the $σ$-machine $s \circ s_σ$ as a generalization of West's $2$-stack-sorting-map $s \circ s$. As a further generalizati…
▽ More
Let $s$ be West's stack-sorting map, and let $s_{T}$ be the generalized stack-sorting map, where instead of being required to increase, the stack avoids subpermutations that are order-isomorphic to any permutation in the set $T$. In 2020, Cerbai, Claesson, and Ferrari introduced the $σ$-machine $s \circ s_σ$ as a generalization of West's $2$-stack-sorting-map $s \circ s$. As a further generalization, in 2021, Baril, Cerbai, Khalil, and Vajnovski introduced the $(σ, τ)$-machine $s \circ s_{σ, τ}$ and enumerated $|\Sort_{n}(σ,τ)|$ -- the number of permutations in $S_n$ that are mapped to the identity by the $(σ, τ)$-machine -- for six pairs of length $3$ permutations $(σ, τ)$. In this work, we settle a conjecture by Baril, Cerbai, Khalil, and Vajnovski on the only remaining pair of length $3$ patterns $(σ, τ) = (132, 321)$ for which $|\Sort_{n}(σ, τ)|$ appears in the OEIS. In addition, we enumerate $|\Sort_n(123, 321)|$, which does not appear in the OEIS, but has a simple closed form.
△ Less
Submitted 12 September, 2023; v1 submitted 18 August, 2023;
originally announced August 2023.
-
Least-Squares Design of Chromatic Dispersion Compensation FIR Filters Realized with Overlap-Save Processing
Authors:
Oscar Gustafsson,
Cheolyong Bae,
Hakan Johansson
Abstract:
A design method for chromatic dispersion compensation filters realized using overlap-save processing in the frequency domain is proposed. Based on the idea to use the values that are normally zero-padded, better results than using optimal time-domain design are obtained without any modification of the overlap-save processing complexity.
A design method for chromatic dispersion compensation filters realized using overlap-save processing in the frequency domain is proposed. Based on the idea to use the values that are normally zero-padded, better results than using optimal time-domain design are obtained without any modification of the overlap-save processing complexity.
△ Less
Submitted 20 June, 2023;
originally announced August 2023.
-
Mirror-NeRF: Learning Neural Radiance Fields for Mirrors with Whitted-Style Ray Tracing
Authors:
Junyi Zeng,
Chong Bao,
Rui Chen,
Zilong Dong,
Guofeng Zhang,
Hujun Bao,
Zhaopeng Cui
Abstract:
Recently, Neural Radiance Fields (NeRF) has exhibited significant success in novel view synthesis, surface reconstruction, etc. However, since no physical reflection is considered in its rendering pipeline, NeRF mistakes the reflection in the mirror as a separate virtual scene, leading to the inaccurate reconstruction of the mirror and multi-view inconsistent reflections in the mirror. In this pap…
▽ More
Recently, Neural Radiance Fields (NeRF) has exhibited significant success in novel view synthesis, surface reconstruction, etc. However, since no physical reflection is considered in its rendering pipeline, NeRF mistakes the reflection in the mirror as a separate virtual scene, leading to the inaccurate reconstruction of the mirror and multi-view inconsistent reflections in the mirror. In this paper, we present a novel neural rendering framework, named Mirror-NeRF, which is able to learn accurate geometry and reflection of the mirror and support various scene manipulation applications with mirrors, such as adding new objects or mirrors into the scene and synthesizing the reflections of these new objects in mirrors, controlling mirror roughness, etc. To achieve this goal, we propose a unified radiance field by introducing the reflection probability and tracing rays following the light transport model of Whitted Ray Tracing, and also develop several techniques to facilitate the learning process. Experiments and comparisons on both synthetic and real datasets demonstrate the superiority of our method. The code and supplementary material are available on the project webpage: https://zju3dv.github.io/Mirror-NeRF/.
△ Less
Submitted 6 August, 2023;
originally announced August 2023.
-
An axiomatized PDE model of deep neural networks
Authors:
Tangjun Wang,
Wenqi Tao,
Chenglong Bao,
Zuoqiang Shi
Abstract:
Inspired by the relation between deep neural network (DNN) and partial differential equations (PDEs), we study the general form of the PDE models of deep neural networks. To achieve this goal, we formulate DNN as an evolution operator from a simple base model. Based on several reasonable assumptions, we prove that the evolution operator is actually determined by convection-diffusion equation. This…
▽ More
Inspired by the relation between deep neural network (DNN) and partial differential equations (PDEs), we study the general form of the PDE models of deep neural networks. To achieve this goal, we formulate DNN as an evolution operator from a simple base model. Based on several reasonable assumptions, we prove that the evolution operator is actually determined by convection-diffusion equation. This convection-diffusion equation model gives mathematical explanation for several effective networks. Moreover, we show that the convection-diffusion model improves the robustness and reduces the Rademacher complexity. Based on the convection-diffusion equation, we design a new training method for ResNets. Experiments validate the performance of the proposed method.
△ Less
Submitted 22 March, 2024; v1 submitted 23 July, 2023;
originally announced July 2023.
-
Convergence Analysis for Restarted Anderson Mixing and Beyond
Authors:
Fuchao Wei,
Chenglong Bao,
Yang Liu,
Guangwen Yang
Abstract:
Anderson mixing (AM) is a classical method that can accelerate fixed-point iterations by exploring historical information. Despite the successful application of AM in scientific computing, the theoretical properties of AM are still under exploration. In this paper, we study the restarted version of the Type-I and Type-II AM methods, i.e., restarted AM. With a multi-step analysis, we give a unified…
▽ More
Anderson mixing (AM) is a classical method that can accelerate fixed-point iterations by exploring historical information. Despite the successful application of AM in scientific computing, the theoretical properties of AM are still under exploration. In this paper, we study the restarted version of the Type-I and Type-II AM methods, i.e., restarted AM. With a multi-step analysis, we give a unified convergence analysis for the two types of restarted AM and justify that the restarted Type-II AM can locally improve the convergence rate of the fixed-point iteration. Furthermore, we propose an adaptive mixing strategy by estimating the spectrum of the Jacobian matrix. If the Jacobian matrix is symmetric, we develop the short-term recurrence forms of restarted AM to reduce the memory cost. Finally, experimental results on various problems validate our theoretical findings.
△ Less
Submitted 5 July, 2023;
originally announced July 2023.
-
RaidEnv: Exploring New Challenges in Automated Content Balancing for Boss Raid Games
Authors:
Hyeon-Chang Jeon,
In-Chang Baek,
Cheong-mok Bae,
Taehwa Park,
Wonsang You,
Taegwan Ha,
Hoyun Jung,
**ha Noh,
Seungwon Oh,
Kyung-Joong Kim
Abstract:
The balance of game content significantly impacts the gaming experience. Unbalanced game content diminishes engagement or increases frustration because of repetitive failure. Although game designers intend to adjust the difficulty of game content, this is a repetitive, labor-intensive, and challenging process, especially for commercial-level games with extensive content. To address this issue, the…
▽ More
The balance of game content significantly impacts the gaming experience. Unbalanced game content diminishes engagement or increases frustration because of repetitive failure. Although game designers intend to adjust the difficulty of game content, this is a repetitive, labor-intensive, and challenging process, especially for commercial-level games with extensive content. To address this issue, the game research community has explored automated game balancing using artificial intelligence (AI) techniques. However, previous studies have focused on limited game content and did not consider the importance of the generalization ability of playtesting agents when encountering content changes. In this study, we propose RaidEnv, a new game simulator that includes diverse and customizable content for the boss raid scenario in MMORPG games. Additionally, we design two benchmarks for the boss raid scenario that can aid in the practical application of game AI. These benchmarks address two open problems in automatic content balancing, and we introduce two evaluation metrics to provide guidance for AI in automatic content balancing. This novel game research platform expands the frontiers of automatic game balancing problems and offers a framework within a realistic game production pipeline.
△ Less
Submitted 4 July, 2023;
originally announced July 2023.
-
Raphtory: The temporal graph engine for Rust and Python
Authors:
Ben Steer,
Naomi Arnold,
Cheick Tidiane Ba,
Renaud Lambiotte,
Haaroon Yousaf,
Lucas Jeub,
Fabian Murariu,
Shivam Kapoor,
Pedro Rico,
Rachel Chan,
Louis Chan,
James Alford,
Richard G. Clegg,
Felix Cuadrado,
Matthew Russell Barnes,
Peijie Zhong,
John N. Pougué Biyong,
Alhamza Alnaimi
Abstract:
Raphtory is a platform for building and analysing temporal networks. The library includes methods for creating networks from a variety of data sources; algorithms to explore their structure and evolution; and an extensible GraphQL server for deployment of applications built on top. Raphtory's core engine is built in Rust, for efficiency, with Python interfaces, for ease of use. Raphtory is develop…
▽ More
Raphtory is a platform for building and analysing temporal networks. The library includes methods for creating networks from a variety of data sources; algorithms to explore their structure and evolution; and an extensible GraphQL server for deployment of applications built on top. Raphtory's core engine is built in Rust, for efficiency, with Python interfaces, for ease of use. Raphtory is developed by network scientists, with a background in Physics, Applied Mathematics, Engineering and Computer Science, for use across academia and industry.
△ Less
Submitted 3 January, 2024; v1 submitted 28 June, 2023;
originally announced June 2023.
-
Lightweight wood panel defect detection method incorporating attention mechanism and feature fusion network
Authors:
Yongxin Cao,
Fanghua Liu,
Lai Jiang,
Cheng Bao,
You Miao,
Yang Chen
Abstract:
In recent years, deep learning has made significant progress in wood panel defect detection. However, there are still challenges such as low detection , slow detection speed, and difficulties in deploying embedded devices on wood panel surfaces. To overcome these issues, we propose a lightweight wood panel defect detection method called YOLOv5-LW, which incorporates attention mechanisms and a feat…
▽ More
In recent years, deep learning has made significant progress in wood panel defect detection. However, there are still challenges such as low detection , slow detection speed, and difficulties in deploying embedded devices on wood panel surfaces. To overcome these issues, we propose a lightweight wood panel defect detection method called YOLOv5-LW, which incorporates attention mechanisms and a feature fusion network.Firstly, to enhance the detection capability of acceptable defects, we introduce the Multi-scale Bi-directional Feature Pyramid Network (MBiFPN) as a feature fusion network. The MBiFPN reduces feature loss, enriches local and detailed features, and improves the model's detection capability for acceptable defects.Secondly, to achieve a lightweight design, we reconstruct the ShuffleNetv2 network model as the backbone network. This reconstruction reduces the number of parameters and computational requirements while maintaining performance. We also introduce the Stem Block and Spatial Pyramid Pooling Fast (SPPF) models to compensate for any accuracy loss resulting from the lightweight design, ensuring the model's detection capabilities remain intact while being computationally efficient.Thirdly, we enhance the backbone network by incorporating Efficient Channel Attention (ECA), which improves the network's focus on key information relevant to defect detection. By attending to essential features, the model becomes more proficient in accurately identifying and localizing defects.We validate the proposed method using a self-developed wood panel defect dataset.The experimental results demonstrate the effectiveness of the improved YOLOv5-LW method. Compared to the original model, our approach achieves a 92.8\% accuracy rate, reduces the number of parameters by 27.78\%, compresses computational volume by 41.25\%, improves detection inference speed by 10.16\%
△ Less
Submitted 21 June, 2023;
originally announced June 2023.
-
The Lobster Eye Imager for Astronomy Onboard the SATech-01 Satellite
Authors:
Z. X. Ling,
X. J. Sun,
C. Zhang,
S. L. Sun,
G. **,
S. N. Zhang,
X. F. Zhang,
J. B. Chang,
F. S. Chen,
Y. F. Chen,
Z. W. Cheng,
W. Fu,
Y. X. Han,
H. Li,
J. F. Li,
Y. Li,
Z. D. Li,
P. R. Liu,
Y. H. Lv,
X. H. Ma,
Y. J. Tang,
C. B. Wang,
R. J. Xie,
Y. L. Xue,
A. L. Yan
, et al. (101 additional authors not shown)
Abstract:
The Lobster Eye Imager for Astronomy (LEIA), a pathfinder of the Wide-field X-ray Telescope of the Einstein Probe (EP) mission, was successfully launched onboard the SATech-01 satellite of the Chinese Academy of Sciences on 27 July 2022. In this paper, we introduce the design and on-ground test results of the LEIA instrument. Using state-of-the-art Micro-Pore Optics (MPO), a wide field-of-view (Fo…
▽ More
The Lobster Eye Imager for Astronomy (LEIA), a pathfinder of the Wide-field X-ray Telescope of the Einstein Probe (EP) mission, was successfully launched onboard the SATech-01 satellite of the Chinese Academy of Sciences on 27 July 2022. In this paper, we introduce the design and on-ground test results of the LEIA instrument. Using state-of-the-art Micro-Pore Optics (MPO), a wide field-of-view (FoV) of 346 square degrees (18.6 degrees * 18.6 degrees) of the X-ray imager is realized. An optical assembly composed of 36 MPO chips is used to focus incident X-ray photons, and four large-format complementary metal-oxide semiconductor (CMOS) sensors, each of 6 cm * 6 cm, are used as the focal plane detectors. The instrument has an angular resolution of 4 - 8 arcmin (in FWHM) for the central focal spot of the point spread function, and an effective area of 2 - 3 cm2 at 1 keV in essentially all the directions within the field of view. The detection passband is 0.5 - 4 keV in the soft X-rays and the sensitivity is 2 - 3 * 10-11 erg s-1 cm-2 (about 1 mini-Crab) at 1,000 second observation. The total weight of LEIA is 56 kg and the power is 85 W. The satellite, with a design lifetime of 2 years, operates in a Sun-synchronous orbit of 500 km with an orbital period of 95 minutes. LEIA is paving the way for future missions by verifying in flight the technologies of both novel focusing imaging optics and CMOS sensors for X-ray observation, and by optimizing the working setups of the instrumental parameters. In addition, LEIA is able to carry out scientific observations to find new transients and to monitor known sources in the soft X-ray band, albeit limited useful observing time available.
△ Less
Submitted 24 May, 2023;
originally announced May 2023.
-
DexArt: Benchmarking Generalizable Dexterous Manipulation with Articulated Objects
Authors:
Chen Bao,
Helin Xu,
Yuzhe Qin,
Xiaolong Wang
Abstract:
To enable general-purpose robots, we will require the robot to operate daily articulated objects as humans do. Current robot manipulation has heavily relied on using a parallel gripper, which restricts the robot to a limited set of objects. On the other hand, operating with a multi-finger robot hand will allow better approximation to human behavior and enable the robot to operate on diverse articu…
▽ More
To enable general-purpose robots, we will require the robot to operate daily articulated objects as humans do. Current robot manipulation has heavily relied on using a parallel gripper, which restricts the robot to a limited set of objects. On the other hand, operating with a multi-finger robot hand will allow better approximation to human behavior and enable the robot to operate on diverse articulated objects. To this end, we propose a new benchmark called DexArt, which involves Dexterous manipulation with Articulated objects in a physical simulator. In our benchmark, we define multiple complex manipulation tasks, and the robot hand will need to manipulate diverse articulated objects within each task. Our main focus is to evaluate the generalizability of the learned policy on unseen articulated objects. This is very challenging given the high degrees of freedom of both hands and objects. We use Reinforcement Learning with 3D representation learning to achieve generalization. Through extensive studies, we provide new insights into how 3D representation learning affects decision making in RL with 3D point cloud inputs. More details can be found at https://www.chenbao.tech/dexart/.
△ Less
Submitted 9 May, 2023;
originally announced May 2023.
-
Locate Then Generate: Bridging Vision and Language with Bounding Box for Scene-Text VQA
Authors:
Yongxin Zhu,
Zhen Liu,
Yukang Liang,
Xin Li,
Hao Liu,
Changcun Bao,
Linli Xu
Abstract:
In this paper, we propose a novel multi-modal framework for Scene Text Visual Question Answering (STVQA), which requires models to read scene text in images for question answering. Apart from text or visual objects, which could exist independently, scene text naturally links text and visual modalities together by conveying linguistic semantics while being a visual object in an image simultaneously…
▽ More
In this paper, we propose a novel multi-modal framework for Scene Text Visual Question Answering (STVQA), which requires models to read scene text in images for question answering. Apart from text or visual objects, which could exist independently, scene text naturally links text and visual modalities together by conveying linguistic semantics while being a visual object in an image simultaneously. Different to conventional STVQA models which take the linguistic semantics and visual semantics in scene text as two separate features, in this paper, we propose a paradigm of "Locate Then Generate" (LTG), which explicitly unifies this two semantics with the spatial bounding box as a bridge connecting them. Specifically, at first, LTG locates the region in an image that may contain the answer words with an answer location module (ALM) consisting of a region proposal network and a language refinement network, both of which can transform to each other with one-to-one map** via the scene text bounding box. Next, given the answer words selected by ALM, LTG generates a readable answer sequence with an answer generation module (AGM) based on a pre-trained language model. As a benefit of the explicit alignment of the visual and linguistic semantics, even without any scene text based pre-training tasks, LTG can boost the absolute accuracy by +6.06% and +6.92% on the TextVQA dataset and the ST-VQA dataset respectively, compared with a non-pre-training baseline. We further demonstrate that LTG effectively unifies visual and text modalities through the spatial bounding box connection, which is underappreciated in previous methods.
△ Less
Submitted 4 April, 2023;
originally announced April 2023.