Search | arXiv e-print repository

Simplifying Kinematic Parameter Estimation in sEMG Prosthetic Hands: A Two-Point Approach

Authors: Gang Liu, Zhenxiang Wang, Ziyang He, Shanshan Guo, Rui Zhang, Dezhong Yao

Abstract: Regression-based sEMG prosthetic hands are widely used for their ability to provide continuous kinematic parameters. However, establishing these models traditionally requires complex kinematic sensor systems to collect corresponding kinematic data in synchronization with EMG, which is cumbersome and user-unfriendly. This paper presents a simplified approach utilizing only two data points to depict… ▽ More Regression-based sEMG prosthetic hands are widely used for their ability to provide continuous kinematic parameters. However, establishing these models traditionally requires complex kinematic sensor systems to collect corresponding kinematic data in synchronization with EMG, which is cumbersome and user-unfriendly. This paper presents a simplified approach utilizing only two data points to depict kinematic parameters. Finger flexion is recorded as 1, extension as -1, and a near-linear model is employed to interpolate intermediate values, offering a viable alternative for kinematic data. We validated the approach with twenty participants through offline analysis and online experiments. The offline analysis confirmed the model's capability to fill in intermediate points and the online experiments demonstrated that participants could control gestures, adjust force accurately. This study significantly reduces the complexity of collecting dynamic parameters in EMG-based regression prosthetics, thus enhancing usability for prosthetic hands. △ Less

Submitted 1 May, 2024; originally announced July 2024.

Comments: 13 pages

arXiv:2406.19939 [pdf, other]

Data-driven methods for flow and transport in porous media: a review

Authors: Guang Yang, Ran Xu, Yusong Tian, Songyuan Guo, **gyi Wu, Xu Chu

Abstract: This review examined the current advancements in data-driven methods for analyzing flow and transport in porous media, which has various applications in energy, chemical engineering, environmental science, and beyond. Although there has been progress in recent years, the challenges of current experimental and high-fidelity numerical simulations, such as high computational costs and difficulties in… ▽ More This review examined the current advancements in data-driven methods for analyzing flow and transport in porous media, which has various applications in energy, chemical engineering, environmental science, and beyond. Although there has been progress in recent years, the challenges of current experimental and high-fidelity numerical simulations, such as high computational costs and difficulties in accurately representing complex, heterogeneous structures, can still potentially be addressed by state-of-the-art data-driven methods. We analyzed the synergistic potential of these methods, addressed their limitations, and suggested how they can be effectively integrated to improve both the fidelity and efficiency of current research. A discussion on future research directions in this field was conducted, emphasizing the need for collaborative efforts that combine domain expertise in physics and advanced computationald and data-driven methodologies. △ Less

Submitted 28 June, 2024; originally announced June 2024.

arXiv:2406.19486 [pdf, other]

LoPT: Low-Rank Prompt Tuning for Parameter Efficient Language Models

Authors: Shouchang Guo, Sonam Damani, Keng-hao Chang

Abstract: In prompt tuning, a prefix or suffix text is added to the prompt, and the embeddings (soft prompts) or token indices (hard prompts) of the prefix/suffix are optimized to gain more control over language models for specific tasks. This approach eliminates the need for hand-crafted prompt engineering or explicit model fine-tuning. Prompt tuning is significantly more parameter-efficient than model fin… ▽ More In prompt tuning, a prefix or suffix text is added to the prompt, and the embeddings (soft prompts) or token indices (hard prompts) of the prefix/suffix are optimized to gain more control over language models for specific tasks. This approach eliminates the need for hand-crafted prompt engineering or explicit model fine-tuning. Prompt tuning is significantly more parameter-efficient than model fine-tuning, as it involves optimizing partial inputs of language models to produce desired outputs. In this work, we aim to further reduce the amount of trainable parameters required for a language model to perform well on specific tasks. We propose Low-rank Prompt Tuning (LoPT), a low-rank model for prompts that achieves efficient prompt optimization. The proposed method demonstrates similar outcomes to full parameter prompt tuning while reducing the number of trainable parameters by a factor of 5. It also provides promising results compared to the state-of-the-art methods that would require 10 to 20 times more parameters. △ Less

Submitted 27 June, 2024; originally announced June 2024.

arXiv:2406.19246 [pdf, other]

An Interpretable and Efficient Sleep Staging Algorithm: DetectsleepNet

Authors: Shengwei Guo

Abstract: Sleep quality directly impacts human health and quality of life, so accurate sleep staging is essential for assessing sleep quality. However, most traditional methods are inefficient and time-consuming due to segmenting different sleep cycles by manual labeling. In contrast, automated sleep staging technology not only directly assesses sleep quality but also helps sleep specialists analyze sleep s… ▽ More Sleep quality directly impacts human health and quality of life, so accurate sleep staging is essential for assessing sleep quality. However, most traditional methods are inefficient and time-consuming due to segmenting different sleep cycles by manual labeling. In contrast, automated sleep staging technology not only directly assesses sleep quality but also helps sleep specialists analyze sleep status, significantly improving efficiency and reducing the cost of sleep monitoring, especially for continuous sleep monitoring. Most of the existing models, however, are deficient in computational efficiency, lightweight design, and model interpretability. In this paper, we propose a neural network architecture based on the prior knowledge of sleep experts. Specifically, 1) Propose an end-to-end model named DetectsleepNet that uses single-channel EEG signals without additional data processing, which has achieved an impressive 80.9% accuracy on the SHHS dataset and an outstanding 88.0% accuracy on the Physio2018 dataset. 2) Constructure an efficient lightweight sleep staging model named DetectsleepNet-tiny based on DetectsleepNet, which has just 6% of the parameter numbers of existing models, but its accuracy exceeds 99% of state-of-the-art models, 3) Introducing a specific inference header to assess the attention given to a specific EEG segment in each sleep frame, enhancing the transparency in the decisions of models. Our model comprises fewer parameters compared to existing ones and ulteriorly explores the interpretability of the model to facilitate its application in healthcare. The code is available at https://github.com/komdec/DetectSleepNet.git. △ Less

Submitted 27 June, 2024; originally announced June 2024.

Comments: 25 pages, 11 figures

arXiv:2406.17142 [pdf, other]

Continuous drive heterodyne microwave sensing with spin qubits in hexagonal boron nitride

Authors: Charlie J. Patrickson, Valentin Haemmerli, Shi Guo, Andrew J. Ramsay, Isaac J. Luxmoore

Abstract: Quantum sensors that use solid state spin defects have emerged as effective probes of weak alternating magnetic signals. By recording the phase of a signal relative to an external clock, these devices can resolve signal frequencies to a precision orders of magnitude longer than the spin state lifetime. However, these quantum heterodyne protocols suffer from sub-optimal sensitivity, as they are cur… ▽ More Quantum sensors that use solid state spin defects have emerged as effective probes of weak alternating magnetic signals. By recording the phase of a signal relative to an external clock, these devices can resolve signal frequencies to a precision orders of magnitude longer than the spin state lifetime. However, these quantum heterodyne protocols suffer from sub-optimal sensitivity, as they are currently limited to pulsed spin control techniques, which are susceptible to cumulative pulse-area errors, or single continuous drives which offer no protection of the spin coherence. Here, we present a control scheme based on a continuous microwave drive that extends spin coherence towards the effective $T_2 \approx \frac{1}{2}T_1$ limit and can resolve the frequency, amplitude and phase of GHz magnetic fields. The scheme is demonstrated using an ensemble of boron vacancies in hexagonal boron nitride, and achieves an amplitude sensitivity of $η\approx 3-5 \:\mathrm{μT \sqrt{Hz}}$ and phase sensitivity of $η_φ \approx 0.076 \:\mathrm{rads \sqrt{Hz}}$. By repeatedly referencing the phase of a resonant signal against the coherent continuous microwave drive in a quantum heterodyne demonstration, we measure a GHz signal with a resolution $<$1 Hz over a 10 s measurement. Achieving this level of performance in a two-dimensional material platform could have broad applications, from probing nanoscale condensed matter systems to integration into heterostructures for quantum networking. △ Less

Submitted 24 June, 2024; originally announced June 2024.

arXiv:2406.16878 [pdf, ps, other]

Benchmarking Semantic Communications for Image Transmission Over MIMO Interference Channels

Authors: Yanhu Wang, Shuaishuai Guo, Anming Dong, Hui Zhao

Abstract: Semantic communications offer promising prospects for enhancing data transmission efficiency. However, existing schemes have predominantly concentrated on point-to-point transmissions. In this paper, we aim to investigate the validity of this claim in interference scenarios compared to baseline approaches. Specifically, our focus is on general multiple-input multiple-output (MIMO) interference cha… ▽ More Semantic communications offer promising prospects for enhancing data transmission efficiency. However, existing schemes have predominantly concentrated on point-to-point transmissions. In this paper, we aim to investigate the validity of this claim in interference scenarios compared to baseline approaches. Specifically, our focus is on general multiple-input multiple-output (MIMO) interference channels, where we propose an interference-robust semantic communication (IRSC) scheme. This scheme involves the development of transceivers based on neural networks (NNs), which integrate channel state information (CSI) either solely at the receiver or at both transmitter and receiver ends. Moreover, we establish a composite loss function for training IRSC transceivers, along with a dynamic mechanism for updating the weights of various components in the loss function to enhance system fairness among users. Experimental results demonstrate that the proposed IRSC scheme effectively learns to mitigate interference and outperforms baseline approaches, particularly in low signal-to-noise (SNR) regimes. △ Less

Submitted 10 April, 2024; originally announced June 2024.

arXiv:2406.14675 [pdf, other]

This Looks Better than That: Better Interpretable Models with ProtoPNeXt

Authors: Frank Willard, Luke Moffett, Emmanuel Mokel, Jon Donnelly, Stark Guo, Julia Yang, Giyoung Kim, Alina Jade Barnett, Cynthia Rudin

Abstract: Prototypical-part models are a popular interpretable alternative to black-box deep learning models for computer vision. However, they are difficult to train, with high sensitivity to hyperparameter tuning, inhibiting their application to new datasets and our understanding of which methods truly improve their performance. To facilitate the careful study of prototypical-part networks (ProtoPNets), w… ▽ More Prototypical-part models are a popular interpretable alternative to black-box deep learning models for computer vision. However, they are difficult to train, with high sensitivity to hyperparameter tuning, inhibiting their application to new datasets and our understanding of which methods truly improve their performance. To facilitate the careful study of prototypical-part networks (ProtoPNets), we create a new framework for integrating components of prototypical-part models -- ProtoPNeXt. Using ProtoPNeXt, we show that applying Bayesian hyperparameter tuning and an angular prototype similarity metric to the original ProtoPNet is sufficient to produce new state-of-the-art accuracy for prototypical-part models on CUB-200 across multiple backbones. We further deploy this framework to jointly optimize for accuracy and prototype interpretability as measured by metrics included in ProtoPNeXt. Using the same resources, this produces models with substantially superior semantics and changes in accuracy between +1.3% and -1.5%. The code and trained models will be made publicly available upon publication. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.14577 [pdf, ps, other]

Non-abelian extensions of Lie triple systems and Wells exact sequences

Authors: Qinxiu Sun, Shuangjian Guo

Abstract: In this paper, we investigate non-abelian extensions and inducibility of pairs of automorphisms of Lie triple systems. First, we introduce non-abelian cohomology groups and classify the non-abelian extensions in terms of non-abelian cohomology groups. Next, we characterize the non-abelian extensions using Maurer-Cartan elements. Furthermore, we explore the inducibility of pairs of automorphism… ▽ More In this paper, we investigate non-abelian extensions and inducibility of pairs of automorphisms of Lie triple systems. First, we introduce non-abelian cohomology groups and classify the non-abelian extensions in terms of non-abelian cohomology groups. Next, we characterize the non-abelian extensions using Maurer-Cartan elements. Furthermore, we explore the inducibility of pairs of automorphisms and derive the analog Wells exact sequences under the circumstance of Lie triple systems. Finally, we state the previous results under the context of abelian extensions of Lie triple systems. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: 23 pages. arXiv admin note: substantial text overlap with arXiv:2401.15333, arXiv:2404.02752

MSC Class: 17A30; 17B62; 17B38

arXiv:2406.14540 [pdf, other]

IRASim: Learning Interactive Real-Robot Action Simulators

Authors: Fangqi Zhu, Hongtao Wu, Song Guo, Yuxiao Liu, Chilam Cheang, Tao Kong

Abstract: Scalable robot learning in the real world is limited by the cost and safety issues of real robots. In addition, rolling out robot trajectories in the real world can be time-consuming and labor-intensive. In this paper, we propose to learn an interactive real-robot action simulator as an alternative. We introduce a novel method, IRASim, which leverages the power of generative models to generate ext… ▽ More Scalable robot learning in the real world is limited by the cost and safety issues of real robots. In addition, rolling out robot trajectories in the real world can be time-consuming and labor-intensive. In this paper, we propose to learn an interactive real-robot action simulator as an alternative. We introduce a novel method, IRASim, which leverages the power of generative models to generate extremely realistic videos of a robot arm that executes a given action trajectory, starting from an initial given frame. To validate the effectiveness of our method, we create a new benchmark, IRASim Benchmark, based on three real-robot datasets and perform extensive experiments on the benchmark. Results show that IRASim outperforms all the baseline methods and is more preferable in human evaluations. We hope that IRASim can serve as an effective and scalable approach to enhance robot learning in the real world. To promote research for generative real-robot action simulators, we open-source code, benchmark, and checkpoints at https: //gen-irasim.github.io. △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: Opensource, project website: https://gen-irasim.github.io

arXiv:2406.14399 [pdf, other]

WEATHER-5K: A Large-scale Global Station Weather Dataset Towards Comprehensive Time-series Forecasting Benchmark

Authors: Tao Han, Song Guo, Zhenghao Chen, Wanghan Xu, Lei Bai

Abstract: Global Station Weather Forecasting (GSWF) is crucial for various sectors, including aviation, agriculture, energy, and disaster preparedness. Recent advancements in deep learning have significantly improved the accuracy of weather predictions by optimizing models based on public meteorological data. However, existing public datasets for GSWF optimization and benchmarking still suffer from signific… ▽ More Global Station Weather Forecasting (GSWF) is crucial for various sectors, including aviation, agriculture, energy, and disaster preparedness. Recent advancements in deep learning have significantly improved the accuracy of weather predictions by optimizing models based on public meteorological data. However, existing public datasets for GSWF optimization and benchmarking still suffer from significant limitations, such as small sizes, limited temporal coverage, and a lack of comprehensive variables. These shortcomings prevent them from effectively reflecting the benchmarks of current forecasting methods and fail to support the real needs of operational weather forecasting. To address these challenges, we present the WEATHER-5K dataset. This dataset comprises a comprehensive collection of data from 5,672 weather stations worldwide, spanning a 10-year period with one-hour intervals. It includes multiple crucial weather elements, providing a more reliable and interpretable resource for forecasting. Furthermore, our WEATHER-5K dataset can serve as a benchmark for comprehensively evaluating existing well-known forecasting models, extending beyond GSWF methods to support future time-series research challenges and opportunities. The dataset and benchmark implementation are publicly available at: https://github.com/taohan10200/WEATHER-5K. △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: 26 pages,13 figures

arXiv:2406.14302 [pdf, ps, other]

Identifiable Exchangeable Mechanisms for Causal Structure and Representation Learning

Authors: Patrik Reizinger, Siyuan Guo, Ferenc Huszár, Bernhard Schölkopf, Wieland Brendel

Abstract: Identifying latent representations or causal structures is important for good generalization and downstream task performance. However, both fields have been developed rather independently. We observe that several methods in both representation and causal structure learning rely on the same data-generating process (DGP), namely, exchangeable but not i.i.d. (independent and identically distributed)… ▽ More Identifying latent representations or causal structures is important for good generalization and downstream task performance. However, both fields have been developed rather independently. We observe that several methods in both representation and causal structure learning rely on the same data-generating process (DGP), namely, exchangeable but not i.i.d. (independent and identically distributed) data. We provide a unified framework, termed Identifiable Exchangeable Mechanisms (IEM), for representation and structure learning under the lens of exchangeability. IEM provides new insights that let us relax the necessary conditions for causal structure identification in exchangeable non--i.i.d. data. We also demonstrate the existence of a duality condition in identifiable representation learning, leading to new identifiability results. We hope this work will pave the way for further research in causal representation learning. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.14067 [pdf]

A microwave photonic prototype for concurrent radar detection and spectrum sensing over an 8 to 40 GHz bandwidth

Authors: Taixia Shi, Dingding Liang, Lu Wang, Lin Li, Shaogang Guo, Jiawei Gao, Xiaowei Li, Chulun Lin, Lei Shi, Baogang Ding, Shiyang Liu, Fangyi Yang, Chi Jiang, Yang Chen

Abstract: In this work, a microwave photonic prototype for concurrent radar detection and spectrum sensing is proposed, designed, built, and investigated. A direct digital synthesizer and an analog electronic circuit are integrated to generate an intermediate frequency (IF) linearly frequency-modulated (LFM) signal with a tunable center frequency from 2.5 to 9.5 GHz and an instantaneous bandwidth of 1 GHz.… ▽ More In this work, a microwave photonic prototype for concurrent radar detection and spectrum sensing is proposed, designed, built, and investigated. A direct digital synthesizer and an analog electronic circuit are integrated to generate an intermediate frequency (IF) linearly frequency-modulated (LFM) signal with a tunable center frequency from 2.5 to 9.5 GHz and an instantaneous bandwidth of 1 GHz. The IF LFM signal is converted to the optical domain via an intensity modulator and then filtered by a fiber Bragg grating (FBG) to generate only two 2nd-order optical LFM sidebands. In radar detection, the two optical LFM sidebands beat with each other to generate a frequency-and-bandwidth-quadrupled LFM signal, which is used for ranging, radial velocity measurement, and imaging. By changing the center frequency of the IF LFM signal, the radar function can be operated within 8 to 40 GHz. In spectrum sensing, one 2nd-order optical LFM sideband is selected by another FBG, which then works in conjunction with the stimulated Brillouin scattering gain spectrum to map the frequency of the signal under test to time with an instantaneous measurement bandwidth of 2 GHz. By using a frequency shift module to adjust the pump frequency, the frequency measurement range can be adjusted from 0 to 40 GHz. The prototype is comprehensively studied and tested, which is capable of achieving a range resolution of 3.75 cm, a range error of less than $\pm$ 2 cm, a radial velocity error within $\pm$ 1 cm/s, delivering clear imaging of multiple small targets, and maintaining a frequency measurement error of less than $\pm$ 7 MHz and a frequency resolution of better than 20 MHz. △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: 18 pages, 12 figures, 1 table

arXiv:2406.14056 [pdf, other]

VGA: Vision GUI Assistant -- Minimizing Hallucinations through Image-Centric Fine-Tuning

Authors: Ziyang Meng, Yu Dai, Zezheng Gong, Shaoxiong Guo, Minglong Tang, Tongquan Wei

Abstract: Recent advances in Large Vision-Language Models (LVLMs) have significantly improve performance in image comprehension tasks, such as formatted charts and rich-content images. Yet, Graphical User Interface (GUI) pose a greater challenge due to their structured format and detailed textual information. Existing LVLMs often overly depend on internal knowledge and neglect image content, resulting in ha… ▽ More Recent advances in Large Vision-Language Models (LVLMs) have significantly improve performance in image comprehension tasks, such as formatted charts and rich-content images. Yet, Graphical User Interface (GUI) pose a greater challenge due to their structured format and detailed textual information. Existing LVLMs often overly depend on internal knowledge and neglect image content, resulting in hallucinations and incorrect responses in GUI comprehension. To address these issues, we introduce VGA, a fine-tuned model designed for comprehensive GUI understanding. Our model aims to enhance the interpretation of visual data of GUI and reduce hallucinations. We first construct a Vision Question Answering (VQA) dataset of 63.8k high-quality examples with our propose Referent Method, which ensures the model's responses are highly depend on visual content within the image. We then design a two-stage fine-tuning method called Foundation and Advanced Comprehension (FAC) to enhance both the model's ability to extract information from image content and alignment with human intent. Experiments show that our approach enhances the model's ability to extract information from images and achieves state-of-the-art results in GUI understanding tasks. Our dataset and fine-tuning script will be released soon. △ Less

Submitted 21 June, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

Comments: 18 pages

MSC Class: 68-04 68-04 ACM Class: I.2.7; I.2.10

arXiv:2406.13999 [pdf, other]

Individually Addressed Entangling Gates in a Two-Dimensional Ion Crystal

Authors: Y. -H. Hou, Y. -J. Yi, Y. -K. Wu, Y. -Y. Chen, L. Zhang, Y. Wang, Y. -L. Xu, C. Zhang, Q. -X. Mei, H. -X. Yang, J. -Y. Ma, S. -A. Guo, J. Ye, B. -X. Qi, Z. -C. Zhou, P. -Y. Hou, L. -M. Duan

Abstract: Two-dimensional (2D) ion crystals have become a promising way to scale up qubit numbers for ion trap quantum information processing. However, to realize universal quantum computing in this system, individually addressed high-fidelity two-qubit entangling gates still remain challenging due to the inevitable micromotion of ions in a 2D crystal as well as the technical difficulty in 2D addressing. He… ▽ More Two-dimensional (2D) ion crystals have become a promising way to scale up qubit numbers for ion trap quantum information processing. However, to realize universal quantum computing in this system, individually addressed high-fidelity two-qubit entangling gates still remain challenging due to the inevitable micromotion of ions in a 2D crystal as well as the technical difficulty in 2D addressing. Here we demonstrate two-qubit entangling gates between any ion pairs in a 2D crystal of four ions. We use symmetrically placed crossed acousto-optic deflectors (AODs) to drive Raman transitions and achieve an addressing crosstalk error below 0.1%. We design and demonstrate a gate sequence by alternatingly addressing two target ions, making it compatible with any single-ion addressing techniques without crosstalk from multiple addressing beams. We further examine the gate performance versus the micromotion amplitude of the ions and show that its effect can be compensated by a recalibration of the laser intensity without degrading the gate fidelity. Our work paves the way for ion trap quantum computing with hundreds to thousands of qubits on a 2D ion crystal. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.13986 [pdf, other]

doi 10.3847/1538-4357/ad5a10

Novae: An Important Source of Lithium in the Galaxy

Authors: Jun Gao, Chunhua Zhu, Guoliang Lü, **long Yu, Lin Li, Helei Liu, Sufen Guo

Abstract: The source of the Galactic Lithium (Li) has long been a puzzle. With the discovery of Li in novae, extensive research has been conducted. However, there still exists a significant disparity between the observed abundance of lithium in novae and the existing theoretical predictions. Using the Modules for Experiments in Stellar Astrophysics (MESA), we simulate the evolution of nova with element diff… ▽ More The source of the Galactic Lithium (Li) has long been a puzzle. With the discovery of Li in novae, extensive research has been conducted. However, there still exists a significant disparity between the observed abundance of lithium in novae and the existing theoretical predictions. Using the Modules for Experiments in Stellar Astrophysics (MESA), we simulate the evolution of nova with element diffusion and appropriately increased the amount of 3^He in the mixtures. Element diffusion enhances the transport efficiency between the nuclear reaction zone and the convective region on the surface of the white dwarf during nova eruptions, which results in more 7^Be to be transmitted to the white dwarf surface and ultimately ejected. Compared to the previous predictions, the abundance of 7^Be in novae simulated in our model significantly increases. And the result is able to explain almost all observed novae. Using the method of population synthesis, we calculate Li yield in the Galaxy. We find that the Galactic occurrence rate of nova is about 130 yr^{-1}, and about 110M Li produced by nova eruption is ejected into the interstellar medium (ISM). About 73\% of Li in the Galactic ISM originates from novae, and approximately 15\%-20\% of the entire Galaxy. It means that novae are the important source of Li in the Galactic. △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: 12 pages, 4 figures. Accepted for publication in Astrophysical Journal

arXiv:2406.13970 [pdf]

doi 10.1002/lpor.202300027

Pixel-scale NIR-VIS Spectral Routers Based on 2D Mie-type Metagratings

Authors: Yifan Shao, Shuhan Guo, Rui Chen, Yongdi Dang, Yi Zhou, Yubo Wang, Junjie Zhan, Jiaqi Yu, Bing-Feng Ju, Yungui Ma

Abstract: The out-of-band energy loss caused by in-built color filters significantly degrades the signal-to-noise ratio and the dynamic range of conventional image sensors, which has restricted the attempt to develop ultrahigh-density imaging devices by merely shrinking the pixel size. This issue will be more serious for security cameras which need to collect visible (VIS) light and near-infrared (NIR) phot… ▽ More The out-of-band energy loss caused by in-built color filters significantly degrades the signal-to-noise ratio and the dynamic range of conventional image sensors, which has restricted the attempt to develop ultrahigh-density imaging devices by merely shrinking the pixel size. This issue will be more serious for security cameras which need to collect visible (VIS) light and near-infrared (NIR) photons as well. The existing solutions mostly explore complex photonic nanostructures, which are often too complicated for production. In this work, we demonstrate a pixel-scale spectral router utilizing two-dimensional (2D) Si3N4 Mie scattering metagratings that can spatially divide NIR (850 nm) and VIS (400-700 nm) light to different pixels at high efficiencies. It has a minimum feature size larger than 360 nm, highly promising for massive production. Compared with the traditional filter design, our router can gain about 42% and 30% signal enhancement for NIR and VIS band, respectively. We show that it also has good polarization insensitivity and incident angle tolerance. The NIR-VIS simultaneous imaging is inspected without any complex reconstruction algorithm. Mode analysis indicates that the multipolar scattering of our Mie-type metagratings provides the necessary degrees of freedom to spatially optimize the routing functions for broadband photons. △ Less

Submitted 24 June, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

Journal ref: Laser and Photonics Reviews 17, 2300027(2023)

arXiv:2406.13950 [pdf, ps, other]

Valley polarization in twisted altermagnetism

Authors: San-Dong Guo, Yichen Liu, Cheng-Cheng Liu

Abstract: The combination of altermagnetism, twistronics and valleytronics is of great significance for potential applications in advanced electronic devices. Twisted magnetic van der Waals bilayers have been identified as an ideal platform for altermagnetism of any type, such as $d$-wave, $g$-wave, and $i$-wave, by choosing the constituent monolayer with specific symmetry [arXiv:2404.17146 (2024)]. Here, w… ▽ More The combination of altermagnetism, twistronics and valleytronics is of great significance for potential applications in advanced electronic devices. Twisted magnetic van der Waals bilayers have been identified as an ideal platform for altermagnetism of any type, such as $d$-wave, $g$-wave, and $i$-wave, by choosing the constituent monolayer with specific symmetry [arXiv:2404.17146 (2024)]. Here, we propose a way for achieving valley polarization in twisted altermagnetism by applying out-of-plane external electric field. Since the out-of-plane electric field creates a layer-dependent electrostatic potential, the valleys form different layers will stagger, producing valley polarization. We also demonstrate the effectiveness of our proposed way using the twisted tight-binding model. It is found that the applied electric field can also induce valley/spin-gapless semiconductor and half metal besides valley polarization. Based on first-principles calculations, our proposed way to achieve valley polarization can be verified in twisted bilayer VOBr and monolayer $\mathrm{Ca(CoN)_2}$ as a special twisted altermagnet. These findings provide new opportunities for innovative spintronics, twistronics and valleytronics applications. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: 5 pages, 5 figures

arXiv:2406.13948 [pdf, other]

CityGPT: Empowering Urban Spatial Cognition of Large Language Models

Authors: Jie Feng, Yuwei Du, Tianhui Liu, Siqi Guo, Yuming Lin, Yong Li

Abstract: Large language models(LLMs) with powerful language generation and reasoning capabilities have already achieved success in many domains, e.g., math and code generation. However, due to the lacking of physical world's corpus and knowledge during training, they usually fail to solve many real-life tasks in the urban space. In this paper, we propose CityGPT, a systematic framework for enhancing the ca… ▽ More Large language models(LLMs) with powerful language generation and reasoning capabilities have already achieved success in many domains, e.g., math and code generation. However, due to the lacking of physical world's corpus and knowledge during training, they usually fail to solve many real-life tasks in the urban space. In this paper, we propose CityGPT, a systematic framework for enhancing the capability of LLMs on understanding urban space and solving the related urban tasks by building a city-scale world model in the model. First, we construct a diverse instruction tuning dataset CityInstruction for injecting urban knowledge and enhancing spatial reasoning capability effectively. By using a mixture of CityInstruction and general instruction data, we fine-tune various LLMs (e.g., ChatGLM3-6B, Qwen1.5 and LLama3 series) to enhance their capability without sacrificing general abilities. To further validate the effectiveness of proposed methods, we construct a comprehensive benchmark CityEval to evaluate the capability of LLMs on diverse urban scenarios and problems. Extensive evaluation results demonstrate that small LLMs trained with CityInstruction can achieve competitive performance with commercial LLMs in the comprehensive evaluation of CityEval. The source codes are openly accessible to the research community via https://github.com/tsinghua-fib-lab/CityGPT. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.13945 [pdf, other]

CityBench: Evaluating the Capabilities of Large Language Model as World Model

Authors: Jie Feng, Jun Zhang, Junbo Yan, Xin Zhang, Tianjian Ouyang, Tianhui Liu, Yuwei Du, Siqi Guo, Yong Li

Abstract: Large language models (LLMs) with powerful generalization ability has been widely used in many domains. A systematic and reliable evaluation of LLMs is a crucial step in their development and applications, especially for specific professional fields. In the urban domain, there have been some early explorations about the usability of LLMs, but a systematic and scalable evaluation benchmark is still… ▽ More Large language models (LLMs) with powerful generalization ability has been widely used in many domains. A systematic and reliable evaluation of LLMs is a crucial step in their development and applications, especially for specific professional fields. In the urban domain, there have been some early explorations about the usability of LLMs, but a systematic and scalable evaluation benchmark is still lacking. The challenge in constructing a systematic evaluation benchmark for the urban domain lies in the diversity of data and scenarios, as well as the complex and dynamic nature of cities. In this paper, we propose CityBench, an interactive simulator based evaluation platform, as the first systematic evaluation benchmark for the capability of LLMs for urban domain. First, we build CitySim to integrate the multi-source data and simulate fine-grained urban dynamics. Based on CitySim, we design 7 tasks in 2 categories of perception-understanding and decision-making group to evaluate the capability of LLMs as city-scale world model for urban domain. Due to the flexibility and ease-of-use of CitySim, our evaluation platform CityBench can be easily extended to any city in the world. We evaluate 13 well-known LLMs including open source LLMs and commercial LLMs in 13 cities around the world. Extensive experiments demonstrate the scalability and effectiveness of proposed CityBench and shed lights for the future development of LLMs in urban domain. The dataset, benchmark and source codes are openly accessible to the research community via https://github.com/tsinghua-fib-lab/CityBench △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.13368 [pdf]

Lewis Acidity and Basicity Diagnostics of Molten Salt for its Properties and Structure Online Monitoring

Authors: Changzu Zhu, Jia Song, Xiaorui Xu, Chengyu Wang, Yang Tong, Lve Lin, Shaoqiang Guo, Wentao Zhou, Adrien Couet, Yafei Wang

Abstract: Analogous to the aqueous solution where the pH of the solvent affects its multiple behaviors, the Lewis acidity-basicity of molten salts also greatly influences their thermophysical and thermochemical properties. In the study, we develop ion probes to quantitatively determine the acidity-basicity scale of molten NaCl-xAlCl3 (x = 1.5-2.1) salt using in-situ ultra-violet visible (UV-Vis) spectroscop… ▽ More Analogous to the aqueous solution where the pH of the solvent affects its multiple behaviors, the Lewis acidity-basicity of molten salts also greatly influences their thermophysical and thermochemical properties. In the study, we develop ion probes to quantitatively determine the acidity-basicity scale of molten NaCl-xAlCl3 (x = 1.5-2.1) salt using in-situ ultra-violet visible (UV-Vis) spectroscopy. With the accumulation of acidity-basicity data of NaCl-AlCl3 molten salt for a variety of compositions, the correlation between the acidity-basicity of salt and its measured fundamental properties are derived. To understand the physical and chemical features controlling the acidity-basicity variations, the structures of NaCl-xAlCl3 molten salts with different chemical compositions are investigated in terms of bonded complexes and coordination numbers. The comprehensive understanding of the correlation between composition, acidity-basicity, properties, and structures of molten salt can serve for the full screening and online monitoring of salt melt in extreme environments by simply measuring the salt acidity-basicity as developed in this study. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.12859 [pdf, ps, other]

Cohomologies of Reynolds Lie-Yamaguti algebras of any weight and applications

Authors: Wen Teng, Shuangjian Guo

Abstract: The purpose of the present paper is to investigate cohomologies of Reynolds Lie-Yamaguti algebras of any weight and provide some applications. First, we introduce the notion of Reynolds Lie-Yamaguti algebras and give some new examples. Moreover, cohomologies of Reynolds operators and Reynolds Lie-Yamaguti algebras with coefficients in a suitable representation are established. Finally, formal defo… ▽ More The purpose of the present paper is to investigate cohomologies of Reynolds Lie-Yamaguti algebras of any weight and provide some applications. First, we introduce the notion of Reynolds Lie-Yamaguti algebras and give some new examples. Moreover, cohomologies of Reynolds operators and Reynolds Lie-Yamaguti algebras with coefficients in a suitable representation are established. Finally, formal deformations and abelian extensions of Reynolds Lie-Yamaguti algebras are characterized in terms of lower degree cohomology groups. △ Less

Submitted 6 March, 2024; originally announced June 2024.

MSC Class: 17B38; 17B60; 17B56; 17D99

arXiv:2406.12466 [pdf, other]

Rastall gravity: accretion disk image in radiation fields context and visual transformations compared to Reissner-Nordstrom black holes

Authors: Yu-Xiang Huang, Sen Guo, Yu Liang, Yu-Hao Cui, Qing-Quan Jiang, Kai Lin

Abstract: Our study investigates the astronomical implications of Rastall gravity, particularly its behavior amidst a radiation field compared to Reissner-Nordstrom (RN) black holes. Our research delineates a crucial correlation between the dynamics of the accretion disk and the parameters Q and N_{\rm r}, which aptly reflect the influence of spacetime metrics on the disk's appearance. Elevated electric cha… ▽ More Our study investigates the astronomical implications of Rastall gravity, particularly its behavior amidst a radiation field compared to Reissner-Nordstrom (RN) black holes. Our research delineates a crucial correlation between the dynamics of the accretion disk and the parameters Q and N_{\rm r}, which aptly reflect the influence of spacetime metrics on the disk's appearance. Elevated electric charge Q prompts contraction in the disk's orbit due to enhanced gravitational effects, while higher N_{\rm r} values lead to outward expansion, influenced by the radiation field's attributes. Interestingly, the charged black holes surrounded by radiation fields display distinct visual disparities from RN black holes. Brightness decreases and expansion occurs within the accretion disk's innermost stable circular orbit with rising N_{\rm r} values. Our study also reveals the process by which the accretion disk transitions from a conventional disk-like structure to a hat-like form at different observation angles, with the redshift effect gradually intensifying. Moreover, the results of the Rastall gravity radiation field we consider are consistent with the constraints of the host galaxy's gravitational lensing on the Rastall gravity parameters, enhancing the consistency between theoretical predictions and actual observations. △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.12074 [pdf, other]

COMMUNITY-CROSS-INSTRUCT: Unsupervised Instruction Generation for Aligning Large Language Models to Online Communities

Authors: Zihao He, Rebecca Dorn, Siyi Guo, Minh Duc Chu, Kristina Lerman

Abstract: Social scientists use surveys to probe the opinions and beliefs of populations, but these methods are slow, costly, and prone to biases. Recent advances in large language models (LLMs) enable creating computational representations or "digital twins" of populations that generate human-like responses mimicking the population's language, styles, and attitudes. We introduce Community-Cross-Instruct, a… ▽ More Social scientists use surveys to probe the opinions and beliefs of populations, but these methods are slow, costly, and prone to biases. Recent advances in large language models (LLMs) enable creating computational representations or "digital twins" of populations that generate human-like responses mimicking the population's language, styles, and attitudes. We introduce Community-Cross-Instruct, an unsupervised framework for aligning LLMs to online communities to elicit their beliefs. Given a corpus of a community's online discussions, Community-Cross-Instruct automatically generates instruction-output pairs by an advanced LLM to (1) finetune an foundational LLM to faithfully represent that community, and (2) evaluate the alignment of the finetuned model to the community. We demonstrate the method's utility in accurately representing political and fitness communities on Reddit. Unlike prior methods requiring human-authored instructions, Community-Cross-Instruct generates instructions in a fully unsupervised manner, enhancing scalability and generalization across domains. This work enables cost-effective and automated surveying of diverse online communities. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.10506 [pdf, ps, other]

Validating an Instrument for Teachers' Acceptance of Artificial Intelligence in Education

Authors: Shuchen Guo, Lehong Shi, Xiaoming Zhai

Abstract: As artificial intelligence (AI) receives wider attention in education, examining teachers' acceptance of AI (TAAI) becomes essential. However, existing instruments measuring TAAI reported limited reliability and validity evidence and faced some design challenges, such as missing informed definitions of AI to participants. This study aimed to develop and validate a TAAI instrument, with providing s… ▽ More As artificial intelligence (AI) receives wider attention in education, examining teachers' acceptance of AI (TAAI) becomes essential. However, existing instruments measuring TAAI reported limited reliability and validity evidence and faced some design challenges, such as missing informed definitions of AI to participants. This study aimed to develop and validate a TAAI instrument, with providing sufficient evidence for high psychometric quality. Based on the literature, we first identified five dimensions of TAAI, including perceived usefulness, perceived ease of use, behavioral intention, self-efficacy, and anxiety, and then developed items to assess each dimension. We examined the face and content validity using expert review and think-aloud with pre-service teachers. Using the revised instrument, we collected responses from 274 pre-service teachers and examined the item discriminations to identify outlier items. We employed the confirmatory factor analysis and Cronbach's alpha to examine the construct validity, convergent validity, discriminant validity, and reliability. Results confirmed the dimensionality of the scale, resulting in 27 items distributed in five dimensions. The study exhibits robust validity and reliability evidence for TAAI, thus affirming its usefulness as a valid measurement instrument. △ Less

Submitted 15 June, 2024; originally announced June 2024.

arXiv:2406.08909 [pdf, other]

A Label-Free and Non-Monotonic Metric for Evaluating Denoising in Event Cameras

Authors: Chenyang Shi, Shasha Guo, Boyi Wei, Hanxiao Liu, Yibo Zhang, Ningfang Song, **g **

Abstract: Event cameras are renowned for their high efficiency due to outputting a sparse, asynchronous stream of events. However, they are plagued by noisy events, especially in low light conditions. Denoising is an essential task for event cameras, but evaluating denoising performance is challenging. Label-dependent denoising metrics involve artificially adding noise to clean sequences, complicating evalu… ▽ More Event cameras are renowned for their high efficiency due to outputting a sparse, asynchronous stream of events. However, they are plagued by noisy events, especially in low light conditions. Denoising is an essential task for event cameras, but evaluating denoising performance is challenging. Label-dependent denoising metrics involve artificially adding noise to clean sequences, complicating evaluations. Moreover, the majority of these metrics are monotonic, which can inflate scores by removing substantial noise and valid events. To overcome these limitations, we propose the first label-free and non-monotonic evaluation metric, the area of the continuous contrast curve (AOCC), which utilizes the area enclosed by event frame contrast curves across different time intervals. This metric is inspired by how events capture the edge contours of scenes or objects with high temporal resolution. An effective denoising method removes noise without eliminating these edge-contour events, thus preserving the contrast of event frames. Consequently, contrast across various time ranges serves as a metric to assess denoising effectiveness. As the time interval lengthens, the curve will initially rise and then fall. The proposed metric is validated through both theoretical and experimental evidence. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.08090 [pdf, other]

From Sim-to-Real: Toward General Event-based Low-light Frame Interpolation with Per-scene Optimization

Authors: Ziran Zhang, Yongrui Ma, Yueting Chen, Feng Zhang, **wei Gu, Tianfan Xue, Shi Guo

Abstract: Video Frame Interpolation (VFI) is important for video enhancement, frame rate up-conversion, and slow-motion generation. The introduction of event cameras, which capture per-pixel brightness changes asynchronously, has significantly enhanced VFI capabilities, particularly for high-speed, nonlinear motions. However, these event-based methods encounter challenges in low-light conditions, notably tr… ▽ More Video Frame Interpolation (VFI) is important for video enhancement, frame rate up-conversion, and slow-motion generation. The introduction of event cameras, which capture per-pixel brightness changes asynchronously, has significantly enhanced VFI capabilities, particularly for high-speed, nonlinear motions. However, these event-based methods encounter challenges in low-light conditions, notably trailing artifacts and signal latency, which hinder their direct applicability and generalization. Addressing these issues, we propose a novel per-scene optimization strategy tailored for low-light conditions. This approach utilizes the internal statistics of a sequence to handle degraded event data under low-light conditions, improving the generalizability to different lighting and camera settings. To evaluate its robustness in low-light condition, we further introduce EVFI-LL, a unique RGB+Event dataset captured under low-light conditions. Our results demonstrate state-of-the-art performance in low-light environments. Both the dataset and the source code will be made publicly available upon publication. Project page: https://naturezhanghn.github.io/sim2real. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.06937 [pdf, other]

A Non-autoregressive Generation Framework for End-to-End Simultaneous Speech-to-Any Translation

Authors: Zhengrui Ma, Qingkai Fang, Shaolei Zhang, Shoutao Guo, Yang Feng, Min Zhang

Abstract: Simultaneous translation models play a crucial role in facilitating communication. However, existing research primarily focuses on text-to-text or speech-to-text models, necessitating additional cascade components to achieve speech-to-speech translation. These pipeline methods suffer from error propagation and accumulate delays in each cascade component, resulting in reduced synchronization betwee… ▽ More Simultaneous translation models play a crucial role in facilitating communication. However, existing research primarily focuses on text-to-text or speech-to-text models, necessitating additional cascade components to achieve speech-to-speech translation. These pipeline methods suffer from error propagation and accumulate delays in each cascade component, resulting in reduced synchronization between the speaker and listener. To overcome these challenges, we propose a novel non-autoregressive generation framework for simultaneous speech translation (NAST-S2X), which integrates speech-to-text and speech-to-speech tasks into a unified end-to-end framework. We develop a non-autoregressive decoder capable of concurrently generating multiple text or acoustic unit tokens upon receiving fixed-length speech chunks. The decoder can generate blank or repeated tokens and employ CTC decoding to dynamically adjust its latency. Experimental results show that NAST-S2X outperforms state-of-the-art models in both speech-to-text and speech-to-speech tasks. It achieves high-quality simultaneous interpretation within a delay of less than 3 seconds and provides a 28 times decoding speedup in offline generation. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: ACL 2024; Codes and demos are at https://github.com/ictnlp/NAST-S2x

arXiv:2406.06910 [pdf, other]

Agent-SiMT: Agent-assisted Simultaneous Machine Translation with Large Language Models

Authors: Shoutao Guo, Shaolei Zhang, Zhengrui Ma, Min Zhang, Yang Feng

Abstract: Simultaneous Machine Translation (SiMT) generates target translations while reading the source sentence. It relies on a policy to determine the optimal timing for reading sentences and generating translations. Existing SiMT methods generally adopt the traditional Transformer architecture, which concurrently determines the policy and generates translations. While they excel at determining policies,… ▽ More Simultaneous Machine Translation (SiMT) generates target translations while reading the source sentence. It relies on a policy to determine the optimal timing for reading sentences and generating translations. Existing SiMT methods generally adopt the traditional Transformer architecture, which concurrently determines the policy and generates translations. While they excel at determining policies, their translation performance is suboptimal. Conversely, Large Language Models (LLMs), trained on extensive corpora, possess superior generation capabilities, but it is difficult for them to acquire translation policy through the training methods of SiMT. Therefore, we introduce Agent-SiMT, a framework combining the strengths of LLMs and traditional SiMT methods. Agent-SiMT contains the policy-decision agent and the translation agent. The policy-decision agent is managed by a SiMT model, which determines the translation policy using partial source sentence and translation. The translation agent, leveraging an LLM, generates translation based on the partial source sentence. The two agents collaborate to accomplish SiMT. Experiments demonstrate that Agent-SiMT attains state-of-the-art performance. △ Less

Submitted 12 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

Comments: 18 pages, 8 figures, 7 tables. v2 of arXiv:2402.13036

arXiv:2406.05449 [pdf, ps, other]

Anderson localization for CMV matrices with Verblunsky coefficients defined by the hyperbolic toral automorphism

Authors: Yanxue Lin, Shuzheng Guo, Daxiong Piao

Abstract: In this paper, we prove the large deviation estimates and Anderson localization for CMV matrices on $\ell^2(\mathbb{Z}_+)$ with Verblunsky coefficients defined dynamically by the hyperbolic toral automorphism. Part of positivity results on the Lyapunov exponents of Chulaevsky-Spencer and Anderson localization results of Bourgain-Schlag on Schrödinger operators with strongly mixing potentials are e… ▽ More In this paper, we prove the large deviation estimates and Anderson localization for CMV matrices on $\ell^2(\mathbb{Z}_+)$ with Verblunsky coefficients defined dynamically by the hyperbolic toral automorphism. Part of positivity results on the Lyapunov exponents of Chulaevsky-Spencer and Anderson localization results of Bourgain-Schlag on Schrödinger operators with strongly mixing potentials are extended to CMV matrices. △ Less

Submitted 8 June, 2024; originally announced June 2024.

MSC Class: 37A30; 42C05; 70G60

arXiv:2406.03878 [pdf, other]

Decoder-only Streaming Transformer for Simultaneous Translation

Authors: Shoutao Guo, Shaolei Zhang, Yang Feng

Abstract: Simultaneous Machine Translation (SiMT) generates translation while reading source tokens, essentially producing the target prefix based on the source prefix. To achieve good performance, it leverages the relationship between source and target prefixes to exact a policy to guide the generation of translations. Although existing SiMT methods primarily focus on the Encoder-Decoder architecture, we e… ▽ More Simultaneous Machine Translation (SiMT) generates translation while reading source tokens, essentially producing the target prefix based on the source prefix. To achieve good performance, it leverages the relationship between source and target prefixes to exact a policy to guide the generation of translations. Although existing SiMT methods primarily focus on the Encoder-Decoder architecture, we explore the potential of Decoder-only architecture, owing to its superior performance in various tasks and its inherent compatibility with SiMT. However, directly applying the Decoder-only architecture to SiMT poses challenges in terms of training and inference. To alleviate the above problems, we propose the first Decoder-only SiMT model, named Decoder-only Streaming Transformer (DST). Specifically, DST separately encodes the positions of the source and target prefixes, ensuring that the position of the target prefix remains unaffected by the expansion of the source prefix. Furthermore, we propose a Streaming Self-Attention (SSA) mechanism tailored for the Decoder-only architecture. It is capable of obtaining translation policy by assessing the sufficiency of input source information and integrating with the soft-attention mechanism to generate translations. Experiments demonstrate that our approach achieves state-of-the-art performance on three translation tasks. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: Accepted to ACL 2024. 14 pages, 10 Tables, 5 Figures

arXiv:2406.03049 [pdf, other]

StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning

Authors: Shaolei Zhang, Qingkai Fang, Shoutao Guo, Zhengrui Ma, Min Zhang, Yang Feng

Abstract: Simultaneous speech-to-speech translation (Simul-S2ST, a.k.a streaming speech translation) outputs target speech while receiving streaming speech inputs, which is critical for real-time communication. Beyond accomplishing translation between speech, Simul-S2ST requires a policy to control the model to generate corresponding target speech at the opportune moment within speech inputs, thereby posing… ▽ More Simultaneous speech-to-speech translation (Simul-S2ST, a.k.a streaming speech translation) outputs target speech while receiving streaming speech inputs, which is critical for real-time communication. Beyond accomplishing translation between speech, Simul-S2ST requires a policy to control the model to generate corresponding target speech at the opportune moment within speech inputs, thereby posing a double challenge of translation and policy. In this paper, we propose StreamSpeech, a direct Simul-S2ST model that jointly learns translation and simultaneous policy in a unified framework of multi-task learning. Adhering to a multi-task learning approach, StreamSpeech can perform offline and simultaneous speech recognition, speech translation and speech synthesis via an "All-in-One" seamless model. Experiments on CVSS benchmark demonstrate that StreamSpeech achieves state-of-the-art performance in both offline S2ST and Simul-S2ST tasks. Besides, StreamSpeech is able to present high-quality intermediate results (i.e., ASR or translation results) during simultaneous translation process, offering a more comprehensive real-time communication experience. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: Accepted to ACL 2024 main conference, Project Page: https://ictnlp.github.io/StreamSpeech-site/

arXiv:2406.02903 [pdf, other]

Open Grounded Planning: Challenges and Benchmark Construction

Authors: Shiguang Guo, Ziliang Deng, Hongyu Lin, Yaojie Lu, Xianpei Han, Le Sun

Abstract: The emergence of large language models (LLMs) has increasingly drawn attention to the use of LLMs for human-like planning. Existing work on LLM-based planning either focuses on leveraging the inherent language generation capabilities of LLMs to produce free-style plans, or employs reinforcement learning approaches to learn decision-making for a limited set of actions within restricted environments… ▽ More The emergence of large language models (LLMs) has increasingly drawn attention to the use of LLMs for human-like planning. Existing work on LLM-based planning either focuses on leveraging the inherent language generation capabilities of LLMs to produce free-style plans, or employs reinforcement learning approaches to learn decision-making for a limited set of actions within restricted environments. However, both approaches exhibit significant discrepancies from the open and executable requirements in real-world planning. In this paper, we propose a new planning task--open grounded planning. The primary objective of open grounded planning is to ask the model to generate an executable plan based on a variable action set, thereby ensuring the executability of the produced plan. To this end, we establishes a benchmark for open grounded planning spanning a wide range of domains. Then we test current state-of-the-art LLMs along with five planning approaches, revealing that existing LLMs and methods still struggle to address the challenges posed by grounded planning in open domains. The outcomes of this paper define and establish a foundational dataset for open grounded planning, and shed light on the potential challenges and future directions of LLM-based planning. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: Accept to ACL 2024 main conference

arXiv:2406.02260 [pdf]

Near-Room-Temperature Field-Controllable Exchange Bias in 2D van der Waals Ferromagnet Fe3GaTe2

Authors: Jifeng Shao, Xiaolong Yin, Chunhao Bao, Sirong Lu, Xiaoming Ma, Shu Guo, Le Wang, Xi Zhang, Zhiyue Li, Longxiang Li, Yue Zhao, Tingyong Chen

Abstract: Exchange bias (EB) is a cornerstone of modern magnetic memory and sensing technologies. Its extension to the realm of two-dimensional (2D) van der Waals (vdW) magnets holds promise for revolutionary advancements in miniaturized and efficient atomic spintronic devices. However, the blocking temperature of EB in 2D vdW magnets is currently well below room temperature ~130 K. This study reports a rob… ▽ More Exchange bias (EB) is a cornerstone of modern magnetic memory and sensing technologies. Its extension to the realm of two-dimensional (2D) van der Waals (vdW) magnets holds promise for revolutionary advancements in miniaturized and efficient atomic spintronic devices. However, the blocking temperature of EB in 2D vdW magnets is currently well below room temperature ~130 K. This study reports a robust EB phenomenon in Fe3GaTe2 thin-layer devices, which significantly increases the blocking temperature to a near-room-temperature record of 280 K. Both the bias direction and magnitude can be isothermally tuned by adjusting the field sweep range, in striking contrast to the conventional EB in ferromagnetic/antiferromagnetic (FM/AFM) bilayers. We propose an exchange spring model in which crystal defects with higher coercivity act as the pivotal pinning source for the observed EB phenomenon, deviating from the conventional FM/AFM interface mechanism. Cumulative growth of minor loops and multiple magnetization reversal paths are observed in field cycles below the saturation field, consistent with the hard FM defects behavior of our exchange spring model. These findings provide insights into the complex magnetic order in 2D ferromagnets and open new avenues for develo** practical ultrathin vdW spintronic devices with EB-like properties at room temperature. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: 14 pages, 5 figures

arXiv:2406.01866 [pdf, other]

#EpiTwitter: Public Health Messaging During the COVID-19 Pandemic

Authors: Ashwin Rao, Nazanin Sabri, Siyi Guo, Louiqa Raschid, Kristina Lerman

Abstract: Effective communication during health crises is critical, with social media serving as a key platform for public health experts (PHEs) to engage with the public. However, it also amplifies pseudo-experts promoting contrarian views. Despite its importance, the role of emotional and moral language in PHEs' communication during COVID-19 remains under explored. This study examines how PHEs and pseudo-… ▽ More Effective communication during health crises is critical, with social media serving as a key platform for public health experts (PHEs) to engage with the public. However, it also amplifies pseudo-experts promoting contrarian views. Despite its importance, the role of emotional and moral language in PHEs' communication during COVID-19 remains under explored. This study examines how PHEs and pseudo-experts communicated on Twitter during the pandemic, focusing on emotional and moral language and their engagement with political elites. Analyzing tweets from 489 PHEs and 356 pseudo-experts from January 2020 to January 2021, alongside public responses, we identified key priorities and differences in messaging strategy. PHEs prioritize masking, healthcare, education, and vaccines, using positive emotional language like optimism. In contrast, pseudo-experts discuss therapeutics and lockdowns more frequently, employing negative emotions like pessimism and disgust. Negative emotional and moral language tends to drive engagement, but positive language from PHEs fosters positivity in public responses. PHEs exhibit liberal partisanship, expressing more positivity towards liberals and negativity towards conservative elites, while pseudo-experts show conservative partisanship. These findings shed light on the polarization of COVID-19 discourse and underscore the importance of strategic use of emotional and moral language by experts to mitigate polarization and enhance public trust. △ Less

Submitted 10 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

arXiv:2406.01574 [pdf, other]

MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark

Authors: Yubo Wang, Xueguang Ma, Ge Zhang, Yuansheng Ni, Abhranil Chandra, Shiguang Guo, Weiming Ren, Aaran Arulraj, Xuan He, Ziyan Jiang, Tianle Li, Max Ku, Kai Wang, Alex Zhuang, Rongqi Fan, Xiang Yue, Wenhu Chen

Abstract: In the age of large-scale language models, benchmarks like the Massive Multitask Language Understanding (MMLU) have been pivotal in pushing the boundaries of what AI can achieve in language comprehension and reasoning across diverse domains. However, as models continue to improve, their performance on these benchmarks has begun to plateau, making it increasingly difficult to discern differences in… ▽ More In the age of large-scale language models, benchmarks like the Massive Multitask Language Understanding (MMLU) have been pivotal in pushing the boundaries of what AI can achieve in language comprehension and reasoning across diverse domains. However, as models continue to improve, their performance on these benchmarks has begun to plateau, making it increasingly difficult to discern differences in model capabilities. This paper introduces MMLU-Pro, an enhanced dataset designed to extend the mostly knowledge-driven MMLU benchmark by integrating more challenging, reasoning-focused questions and expanding the choice set from four to ten options. Additionally, MMLU-Pro eliminates the trivial and noisy questions in MMLU. Our experimental results show that MMLU-Pro not only raises the challenge, causing a significant drop in accuracy by 16% to 33% compared to MMLU but also demonstrates greater stability under varying prompts. With 24 different prompt styles tested, the sensitivity of model scores to prompt variations decreased from 4-5% in MMLU to just 2% in MMLU-Pro. Additionally, we found that models utilizing Chain of Thought (CoT) reasoning achieved better performance on MMLU-Pro compared to direct answering, which is in stark contrast to the findings on the original MMLU, indicating that MMLU-Pro includes more complex reasoning questions. Our assessments confirm that MMLU-Pro is a more discriminative benchmark to better track progress in the field. △ Less

Submitted 23 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

arXiv:2406.00894 [pdf, other]

Pretrained Hybrids with MAD Skills

Authors: Nicholas Roberts, Samuel Guo, Zhiqi Gao, Satya Sai Srinath Namburi GNVV, Sonia Cromp, Chengjun Wu, Chengyu Duan, Frederic Sala

Abstract: While Transformers underpin modern large language models (LMs), there is a growing list of alternative architectures with new capabilities, promises, and tradeoffs. This makes choosing the right LM architecture challenging. Recently-proposed $\textit{hybrid architectures}$ seek a best-of-all-worlds approach that reaps the benefits of all architectures. Hybrid design is difficult for two reasons: i… ▽ More While Transformers underpin modern large language models (LMs), there is a growing list of alternative architectures with new capabilities, promises, and tradeoffs. This makes choosing the right LM architecture challenging. Recently-proposed $\textit{hybrid architectures}$ seek a best-of-all-worlds approach that reaps the benefits of all architectures. Hybrid design is difficult for two reasons: it requires manual expert-driven search, and new hybrids must be trained from scratch. We propose $\textbf{Manticore}$, a framework that addresses these challenges. Manticore $\textit{automates the design of hybrid architectures}$ while reusing pretrained models to create $\textit{pretrained}$ hybrids. Our approach augments ideas from differentiable Neural Architecture Search (NAS) by incorporating simple projectors that translate features between pretrained blocks from different architectures. We then fine-tune hybrids that combine pretrained models from different architecture families -- such as the GPT series and Mamba -- end-to-end. With Manticore, we enable LM selection without training multiple models, the construction of pretrained hybrids from existing pretrained models, and the ability to $\textit{program}$ pretrained hybrids to have certain capabilities. Manticore hybrids outperform existing manually-designed hybrids, achieve strong performance on Long Range Arena (LRA) tasks, and can improve on pretrained transformers and state space models. △ Less

Submitted 2 June, 2024; originally announced June 2024.

arXiv:2405.19327 [pdf, other]

MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series

Authors: Ge Zhang, Scott Qu, Jiaheng Liu, Chenchen Zhang, Chenghua Lin, Chou Leuang Yu, Danny Pan, Esther Cheng, Jie Liu, Qunshu Lin, Raven Yuan, Tuney Zheng, Wei Pang, Xinrun Du, Yiming Liang, Yinghao Ma, Yizhi Li, Ziyang Ma, Bill Lin, Emmanouil Benetos, Huan Yang, Junting Zhou, Kai**g Ma, Minghao Liu, Morry Niu , et al. (20 additional authors not shown)

Abstract: Large Language Models (LLMs) have made great strides in recent years to achieve unprecedented performance across different tasks. However, due to commercial interest, the most competitive models like GPT, Gemini, and Claude have been gated behind proprietary interfaces without disclosing the training details. Recently, many institutions have open-sourced several strong LLMs like LLaMA-3, comparabl… ▽ More Large Language Models (LLMs) have made great strides in recent years to achieve unprecedented performance across different tasks. However, due to commercial interest, the most competitive models like GPT, Gemini, and Claude have been gated behind proprietary interfaces without disclosing the training details. Recently, many institutions have open-sourced several strong LLMs like LLaMA-3, comparable to existing closed-source LLMs. However, only the model's weights are provided with most details (e.g., intermediate checkpoints, pre-training corpus, and training code, etc.) being undisclosed. To improve the transparency of LLMs, the research community has formed to open-source truly open LLMs (e.g., Pythia, Amber, OLMo), where more details (e.g., pre-training corpus and training code) are being provided. These models have greatly advanced the scientific study of these large models including their strengths, weaknesses, biases and risks. However, we observe that the existing truly open LLMs on reasoning, knowledge, and coding tasks are still inferior to existing state-of-the-art LLMs with similar model sizes. To this end, we open-source MAP-Neo, a highly capable and transparent bilingual language model with 7B parameters trained from scratch on 4.5T high-quality tokens. Our MAP-Neo is the first fully open-sourced bilingual LLM with comparable performance compared to existing state-of-the-art LLMs. Moreover, we open-source all details to reproduce our MAP-Neo, where the cleaned pre-training corpus, data cleaning pipeline, checkpoints, and well-optimized training/evaluation framework are provided. Finally, we hope our MAP-Neo will enhance and strengthen the open research community and inspire more innovations and creativities to facilitate the further improvements of LLMs. △ Less

Submitted 2 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

Comments: https://map-neo.github.io/

arXiv:2405.18836 [pdf, other]

Do Finetti: On Causal Effects for Exchangeable Data

Authors: Siyuan Guo, Chi Zhang, Karthika Mohan, Ferenc Huszár, Bernhard Schölkopf

Abstract: We study causal effect estimation in a setting where the data are not i.i.d. (independent and identically distributed). We focus on exchangeable data satisfying an assumption of independent causal mechanisms. Traditional causal effect estimation frameworks, e.g., relying on structural causal models and do-calculus, are typically limited to i.i.d. data and do not extend to more general exchangeable… ▽ More We study causal effect estimation in a setting where the data are not i.i.d. (independent and identically distributed). We focus on exchangeable data satisfying an assumption of independent causal mechanisms. Traditional causal effect estimation frameworks, e.g., relying on structural causal models and do-calculus, are typically limited to i.i.d. data and do not extend to more general exchangeable generative processes, which naturally arise in multi-environment data. To address this gap, we develop a generalized framework for exchangeable data and introduce a truncated factorization formula that facilitates both the identification and estimation of causal effects in our setting. To illustrate potential applications, we introduce a causal Pólya urn model and demonstrate how intervention propagates effects in exchangeable data settings. Finally, we develop an algorithm that performs simultaneous causal discovery and effect estimation given multi-environment data. △ Less

Submitted 29 May, 2024; originally announced May 2024.

arXiv:2405.18826 [pdf, ps, other]

Isovalent alloying assisted anomalous valley Hall effect in hexagonal antiferromagnetic monolayer

Authors: San-Dong Guo, Liguo Zhang, Xiao-Shu Guo, Gangqiang Zhu

Abstract: Exploring combination of antiferromagnetic (AFM) spintronics and anomalous valley Hall effect (AVHE) is one of the most important questions for valleytronic applications. The key to address this issue is to achieve spin splitting around the valleys in AFM systems. Here, we propose a possible way for achieving AVHE in hexagonal AFM monolayer, which involves the isovalent alloying. This can break th… ▽ More Exploring combination of antiferromagnetic (AFM) spintronics and anomalous valley Hall effect (AVHE) is one of the most important questions for valleytronic applications. The key to address this issue is to achieve spin splitting around the valleys in AFM systems. Here, we propose a possible way for achieving AVHE in hexagonal AFM monolayer, which involves the isovalent alloying. This can break the combined symmetry ($PT$ symmetry) of spatial inversion ($P$) and time reversal ($T$), giving rise to spin splitting. More specifically, the large spin splitting around the Fermi energy level owes to $d$ orbital mismatch among these different transition metal ions. Based on first-principles calculations, the proposed way can be verified in out-of-plane AFM $\mathrm{CrMoC_2S_6}$ monolayer, which possesses spontaneous valley polarization and spitting splitting, providing possibility to realize AVHE. It is also proved that tensile strain can strengthen the valley splitting and maintain the out-of-plane AFM ordering. Our works provide an experimentally feasible way for develo** AFM valleytronic devices. △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: 6 pages, 7 figures

arXiv:2405.17546 [pdf, other]

Complexity is not Enough for Randomness

Authors: Shiyong Guo, Martin Sasieta, Brian Swingle

Abstract: We study the dynamical generation of randomness in Brownian systems as a function of the degree of locality of the Hamiltonian. We first express the trace distance to a unitary design for these systems in terms of an effective equilibrium thermal partition function, and provide a set of conditions that guarantee a linear time to design. We relate the trace distance to design to spectral properties… ▽ More We study the dynamical generation of randomness in Brownian systems as a function of the degree of locality of the Hamiltonian. We first express the trace distance to a unitary design for these systems in terms of an effective equilibrium thermal partition function, and provide a set of conditions that guarantee a linear time to design. We relate the trace distance to design to spectral properties of the time-evolution operator. We apply these considerations to the Brownian $p$-SYK model as a function of the degree of locality $p$. We show that the time to design is linear, with a slope proportional to $1/p$. We corroborate that when $p$ is of order the system size this reproduces the behavior of a completely non-local Brownian model of random matrices. For the random matrix model, we reinterpret these results from the point of view of classical Brownian motion in the unitary manifold. Therefore, we find that the generation of randomness typically persists for exponentially long times in the system size, even for systems governed by highly non-local time-dependent Hamiltonians. We conjecture this to be a general property: there is no efficient way to generate approximate Haar random unitaries dynamically, unless a large degree of fine-tuning is present in the ensemble of time-dependent Hamiltonians. We contrast the slow generation of randomness to the growth of quantum complexity of the time-evolution operator. Using known bounds on circuit complexity for unitary designs, we obtain a lower bound determining that complexity grows at least linearly in time for Brownian systems. We argue that these bounds on circuit complexity are far from tight and that complexity grows at a much faster rate, at least for non-local systems. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: 40 pages + appendices

arXiv:2405.16152 [pdf, other]

SuDA: Support-based Domain Adaptation for Sim2Real Motion Capture with Flexible Sensors

Authors: Jiawei Fang, Haishan Song, Chengxu Zuo, Xiaoxia Gao, Xiaowei Chen, Shihui Guo, Yipeng Qin

Abstract: Flexible sensors hold promise for human motion capture (MoCap), offering advantages such as wearability, privacy preservation, and minimal constraints on natural movement. However, existing flexible sensor-based MoCap methods rely on deep learning and necessitate large and diverse labeled datasets for training. These data typically need to be collected in MoCap studios with specialized equipment a… ▽ More Flexible sensors hold promise for human motion capture (MoCap), offering advantages such as wearability, privacy preservation, and minimal constraints on natural movement. However, existing flexible sensor-based MoCap methods rely on deep learning and necessitate large and diverse labeled datasets for training. These data typically need to be collected in MoCap studios with specialized equipment and substantial manual labor, making them difficult and expensive to obtain at scale. Thanks to the high-linearity of flexible sensors, we address this challenge by proposing a novel Sim2Real Mocap solution based on domain adaptation, eliminating the need for labeled data yet achieving comparable accuracy to supervised learning. Our solution relies on a novel Support-based Domain Adaptation method, namely SuDA, which aligns the supports of the predictive functions rather than the instance-dependent distributions between the source and target domains. Extensive experimental results demonstrate the effectiveness of our method andits superiority over state-of-the-art distribution-based domain adaptation methods in our task. △ Less

Submitted 25 May, 2024; originally announced May 2024.

Comments: 20 pages conference, accepted ICML paper

arXiv:2405.16011 [pdf, ps, other]

Semantic Importance-Aware Communications with Semantic Correction Using Large Language Models

Authors: Shuaishuai Guo, Yanhu Wang, Jia Ye, Anbang Zhang, Kun Xu

Abstract: Semantic communications, a promising approach for agent-human and agent-agent interactions, typically operate at a feature level, lacking true semantic understanding. This paper explores understanding-level semantic communications (ULSC), transforming visual data into human-intelligible semantic content. We employ an image caption neural network (ICNN) to derive semantic representations from visua… ▽ More Semantic communications, a promising approach for agent-human and agent-agent interactions, typically operate at a feature level, lacking true semantic understanding. This paper explores understanding-level semantic communications (ULSC), transforming visual data into human-intelligible semantic content. We employ an image caption neural network (ICNN) to derive semantic representations from visual data, expressed as natural language descriptions. These are further refined using a pre-trained large language model (LLM) for importance quantification and semantic error correction. The subsequent semantic importance-aware communications (SIAC) aim to minimize semantic loss while respecting transmission delay constraints, exemplified through adaptive modulation and coding strategies. At the receiving end, LLM-based semantic error correction is utilized. If visual data recreation is desired, a pre-trained generative artificial intelligence (AI) model can regenerate it using the corrected descriptions. We assess semantic similarities between transmitted and recovered content, demonstrating ULSC's superior ability to convey semantic understanding compared to feature-level semantic communications (FLSC). ULSC's conversion of visual data to natural language facilitates various cognitive tasks, leveraging human knowledge bases. Additionally, this method enhances privacy, as neither original data nor features are directly transmitted. △ Less

Submitted 24 May, 2024; originally announced May 2024.

arXiv:2405.15485 [pdf, other]

Learning Beyond Pattern Matching? Assaying Mathematical Understanding in LLMs

Authors: Siyuan Guo, Aniket Didolkar, Nan Rosemary Ke, Anirudh Goyal, Ferenc Huszár, Bernhard Schölkopf

Abstract: We are beginning to see progress in language model assisted scientific discovery. Motivated by the use of LLMs as a general scientific assistant, this paper assesses the domain knowledge of LLMs through its understanding of different mathematical skills required to solve problems. In particular, we look at not just what the pre-trained model already knows, but how it learned to learn from informat… ▽ More We are beginning to see progress in language model assisted scientific discovery. Motivated by the use of LLMs as a general scientific assistant, this paper assesses the domain knowledge of LLMs through its understanding of different mathematical skills required to solve problems. In particular, we look at not just what the pre-trained model already knows, but how it learned to learn from information during in-context learning or instruction-tuning through exploiting the complex knowledge structure within mathematics. Motivated by the Neural Tangent Kernel (NTK), we propose \textit{NTKEval} to assess changes in LLM's probability distribution via training on different kinds of math data. Our systematic analysis finds evidence of domain understanding during in-context learning. By contrast, certain instruction-tuning leads to similar performance changes irrespective of training on different data, suggesting a lack of domain understanding across different skills. △ Less

Submitted 24 May, 2024; originally announced May 2024.

arXiv:2405.14744 [pdf, other]

Exploring Prosocial Irrationality for LLM Agents: A Social Cognition View

Authors: Xuan Liu, Jie Zhang, Song Guo, Haoyang Shang, Chengxu Yang, Quanyan Zhu

Abstract: Large language models (LLMs) have been shown to face hallucination issues due to the data they trained on often containing human bias; whether this is reflected in the decision-making process of LLM agents remains under-explored. As LLM Agents are increasingly employed in intricate social environments, a pressing and natural question emerges: Can LLM Agents leverage hallucinations to mirror human… ▽ More Large language models (LLMs) have been shown to face hallucination issues due to the data they trained on often containing human bias; whether this is reflected in the decision-making process of LLM agents remains under-explored. As LLM Agents are increasingly employed in intricate social environments, a pressing and natural question emerges: Can LLM Agents leverage hallucinations to mirror human cognitive biases, thus exhibiting irrational social intelligence? In this paper, we probe the irrational behavior among contemporary LLM agents by melding practical social science experiments with theoretical insights. Specifically, We propose CogMir, an open-ended Multi-LLM Agents framework that utilizes hallucination properties to assess and enhance LLM Agents' social intelligence through cognitive biases. Experimental results on CogMir subsets show that LLM Agents and humans exhibit high consistency in irrational and prosocial decision-making under uncertain conditions, underscoring the prosociality of LLM Agents as social entities, and highlighting the significance of hallucination properties. Additionally, CogMir framework demonstrates its potential as a valuable platform for encouraging more research into the social intelligence of LLM Agents. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.13999 [pdf, other]

Computer-Vision-Enabled Worker Video Analysis for Motion Amount Quantification

Authors: Hari Iyer, Neel Macwan, Shenghan Guo, Hee** Jeong

Abstract: The performance of physical workers is significantly influenced by the quantity of their motions. However, monitoring and assessing these motions is challenging due to the complexities of motion sensing, tracking, and quantification. Recent advancements have utilized in-situ video analysis for real-time observation of worker behaviors, enabling data-driven quantification of motion amounts. Neverth… ▽ More The performance of physical workers is significantly influenced by the quantity of their motions. However, monitoring and assessing these motions is challenging due to the complexities of motion sensing, tracking, and quantification. Recent advancements have utilized in-situ video analysis for real-time observation of worker behaviors, enabling data-driven quantification of motion amounts. Nevertheless, there are limitations to monitoring worker movements using video data. This paper introduces a novel framework based on computer vision to track and quantify the motion of workers' upper and lower limbs, issuing alerts when the motion reaches critical thresholds. Using joint position data from posture estimation, the framework employs Hotelling's T$^2$ statistic to quantify and monitor motion amounts, integrating computer vision tools to address challenges in automated worker training and enhance exploratory research in this field. We collected data of participants performing lifting and moving tasks with large boxes and small wooden cubes, to simulate macro and micro assembly tasks respectively. It was found that the correlation between workers' joint motion amount and the Hotelling's T$^2$ statistic was approximately 35% greater for micro tasks compared to macro tasks, highlighting the framework's ability to identify fine-grained motion differences. This study demonstrates the effectiveness of the proposed system in real-time applications across various industry settings. It provides a tool for enhancing worker safety and productivity through precision motion analysis and proactive ergonomic adjustments. △ Less

Submitted 22 May, 2024; originally announced May 2024.

arXiv:2405.13080 [pdf, other]

EmInspector: Combating Backdoor Attacks in Federated Self-Supervised Learning Through Embedding Inspection

Authors: Yuwen Qian, Shuchi Wu, Kang Wei, Ming Ding, Di Xiao, Tao Xiang, Chuan Ma, Song Guo

Abstract: Federated self-supervised learning (FSSL) has recently emerged as a promising paradigm that enables the exploitation of clients' vast amounts of unlabeled data while preserving data privacy. While FSSL offers advantages, its susceptibility to backdoor attacks, a concern identified in traditional federated supervised learning (FSL), has not been investigated. To fill the research gap, we undertake… ▽ More Federated self-supervised learning (FSSL) has recently emerged as a promising paradigm that enables the exploitation of clients' vast amounts of unlabeled data while preserving data privacy. While FSSL offers advantages, its susceptibility to backdoor attacks, a concern identified in traditional federated supervised learning (FSL), has not been investigated. To fill the research gap, we undertake a comprehensive investigation into a backdoor attack paradigm, where unscrupulous clients conspire to manipulate the global model, revealing the vulnerability of FSSL to such attacks. In FSL, backdoor attacks typically build a direct association between the backdoor trigger and the target label. In contrast, in FSSL, backdoor attacks aim to alter the global model's representation for images containing the attacker's specified trigger pattern in favor of the attacker's intended target class, which is less straightforward. In this sense, we demonstrate that existing defenses are insufficient to mitigate the investigated backdoor attacks in FSSL, thus finding an effective defense mechanism is urgent. To tackle this issue, we dive into the fundamental mechanism of backdoor attacks on FSSL, proposing the Embedding Inspector (EmInspector) that detects malicious clients by inspecting the embedding space of local models. In particular, EmInspector assesses the similarity of embeddings from different local models using a small set of inspection images (e.g., ten images of CIFAR100) without specific requirements on sample distribution or labels. We discover that embeddings from backdoored models tend to cluster together in the embedding space for a given inspection image. Evaluation results show that EmInspector can effectively mitigate backdoor attacks on FSSL across various adversary settings. Our code is avaliable at https://github.com/ShuchiWu/EmInspector. △ Less

Submitted 21 May, 2024; originally announced May 2024.

Comments: 18 pages, 12 figures

arXiv:2405.12808 [pdf, other]

Influence of quantum correction on the Schwarzschild black hole polarized image

Authors: Sen Guo, Yu-Xiang Huang, Kuan Liu, En-Wei Liang, Kai Lin

Abstract: Using a model of an accretion disk around a Schwarzschild black hole, the analytic estimates for image polarization were derived by Narayan $et~al.$. [Astrophys. J, 102, 912 (2021)]. Recently, the EHT team also obtained polarization images of the Sgr A$^{*}$ and measured both linear and circular polarization [Astrophys. J. Lett, 964, L25 (2024)]. We find that quantum correction effects can also in… ▽ More Using a model of an accretion disk around a Schwarzschild black hole, the analytic estimates for image polarization were derived by Narayan $et~al.$. [Astrophys. J, 102, 912 (2021)]. Recently, the EHT team also obtained polarization images of the Sgr A$^{*}$ and measured both linear and circular polarization [Astrophys. J. Lett, 964, L25 (2024)]. We find that quantum correction effects can also influence polarization information. Considering the quantum corrected Schwarzschild black hole (Kazakov-Solodukhin black hole), we derive the polarization intensity of the target black hole and investigate polarization images under different parameters. It is found that a larger quantum deformation leads to an expansion of the polarization region, while the polarization intensity value decrease. Under different observation angles, magnetic fields, fluid direction angles, and fluid velocity conditions, we also derive polarization images of corrected black holes. These key indicators not only affect the intensity of polarization but also the direction of polarization. We establish the relationship between polarization intensity and quantum correction deformation parameters, revealing a gradual decline in polarization intensity with reduced radius and an anti-polarization behavior induced by the progressive increase in deformation parameters at a constant radius. Our analysis may provide observational evidence for quantum effect of general relativity. △ Less

Submitted 21 May, 2024; originally announced May 2024.

Comments: 20 pages, 8 figures

Report number: Accepted European Physical Journal C (EPJC) 2024

arXiv:2405.12459 [pdf, other]

PLM4Traj: Cognizing Movement Patterns and Travel Purposes from Trajectories with Pre-trained Language Models

Authors: Zeyu Zhou, Yan Lin, Haomin Wen, Shengnan Guo, Jilin Hu, Youfang Lin, Huaiyu Wan

Abstract: Spatio-temporal trajectories play a vital role in various spatio-temporal data mining tasks. Develo** a versatile trajectory learning approach that can adapt to different tasks while ensuring high accuracy is crucial. This requires effectively extracting movement patterns and travel purposes embedded in trajectories. However, this task is challenging due to limitations in the size and quality of… ▽ More Spatio-temporal trajectories play a vital role in various spatio-temporal data mining tasks. Develo** a versatile trajectory learning approach that can adapt to different tasks while ensuring high accuracy is crucial. This requires effectively extracting movement patterns and travel purposes embedded in trajectories. However, this task is challenging due to limitations in the size and quality of available trajectory datasets. On the other hand, pre-trained language models (PLMs) have shown great success in adapting to different tasks by training on large-scale, high-quality corpus datasets. Given the similarities between trajectories and sentences, there is potential in leveraging PLMs to enhance the development of a versatile and effective trajectory learning method. Nevertheless, vanilla PLMs are not tailored to handle the unique spatio-temporal features present in trajectories and lack the capability to extract movement patterns and travel purposes from them. To overcome these obstacles, we propose a model called PLM4Traj that effectively utilizes PLMs to model trajectories. PLM4Traj leverages the strengths of PLMs to create a versatile trajectory learning approach while addressing the limitations of vanilla PLMs in modeling trajectories. Firstly, PLM4Traj incorporates a novel trajectory semantic embedder that enables PLMs to process spatio-temporal features in trajectories and extract movement patterns and travel purposes from them. Secondly, PLM4Traj introduces a novel trajectory prompt that integrates movement patterns and travel purposes into PLMs, while also allowing the model to adapt to various tasks. Extensive experiments conducted on two real-world datasets and two representative tasks demonstrate that PLM4Traj successfully achieves its design goals. Codes are available at https://github.com/Zeru19/PLM4Traj. △ Less

Submitted 20 May, 2024; originally announced May 2024.

arXiv:2405.12205 [pdf, other]

Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving

Authors: Aniket Didolkar, Anirudh Goyal, Nan Rosemary Ke, Siyuan Guo, Michal Valko, Timothy Lillicrap, Danilo Rezende, Yoshua Bengio, Michael Mozer, Sanjeev Arora

Abstract: Metacognitive knowledge refers to humans' intuitive knowledge of their own thinking and reasoning processes. Today's best LLMs clearly possess some reasoning processes. The paper gives evidence that they also have metacognitive knowledge, including ability to name skills and procedures to apply given a task. We explore this primarily in context of math reasoning, develo** a prompt-guided interac… ▽ More Metacognitive knowledge refers to humans' intuitive knowledge of their own thinking and reasoning processes. Today's best LLMs clearly possess some reasoning processes. The paper gives evidence that they also have metacognitive knowledge, including ability to name skills and procedures to apply given a task. We explore this primarily in context of math reasoning, develo** a prompt-guided interaction procedure to get a powerful LLM to assign sensible skill labels to math questions, followed by having it perform semantic clustering to obtain coarser families of skill labels. These coarse skill labels look interpretable to humans. To validate that these skill labels are meaningful and relevant to the LLM's reasoning processes we perform the following experiments. (a) We ask GPT-4 to assign skill labels to training questions in math datasets GSM8K and MATH. (b) When using an LLM to solve the test questions, we present it with the full list of skill labels and ask it to identify the skill needed. Then it is presented with randomly selected exemplar solved questions associated with that skill label. This improves accuracy on GSM8k and MATH for several strong LLMs, including code-assisted models. The methodology presented is domain-agnostic, even though this article applies it to math problems. △ Less

Submitted 20 May, 2024; originally announced May 2024.

Comments: Preprint. Under review

arXiv:2405.11571 [pdf, other]

The population synthesis of Wolf-Rayet stars involving binary merger channels

Authors: Zhuowen Li, Chunhua Zhu, Guoliang Lü, Lin Li, Helei Liu, Sufen Guo, **long Yu, Xizhen Lu

Abstract: Wolf-Rayet stars (WRs) are very important massive stars. However, their origin and the observed binary fraction within the entire WR population are still debated. We investigate some possible merger channels for the formation of WRs, including main sequence (MS)/ Hertzsprung Gap (HG) + MS, He + HG/ Giant Branch (GB). We find that many products produced via binary merger can evolve into WRs, the MS… ▽ More Wolf-Rayet stars (WRs) are very important massive stars. However, their origin and the observed binary fraction within the entire WR population are still debated. We investigate some possible merger channels for the formation of WRs, including main sequence (MS)/ Hertzsprung Gap (HG) + MS, He + HG/ Giant Branch (GB). We find that many products produced via binary merger can evolve into WRs, the MS/ HG + MS merger channel can explain WRs with luminosities higher than $\sim 10^{5.4}$\,L$_{\odot}$, while the He + HG/ GB merger channel can explain low-luminosity WRs in the range of $10^{4.7}$\,L$_{\odot}$\,$\sim$\,$10^{5.5}$\,L$_{\odot}$. In the population synthesis analysis of WRs, we assume an initial binary fraction ($f_{\rm ini,bin}$) of 50\% and 100\% for massive stars. We also assume that MS/ HG + MS merger products are non-rotating or rapidly rotating ($ω/ω_{\rm crit}=0.8$). In different cases, the calculated single fractions of WRs range from $22.2\%$ to $60.6\%$ in the Milky Way (MW) and from $8.3\%$ to $70.9\%$ in the Large Magellanic Cloud (LMC). The current observations fall within the range of our calculations. When the merger product of MS/HG+MS rotates rapidly, we estimate that there are approximately 1015 to 1396 WRs in the MW and 128 to 204 WRs in the LMC. Our model also roughly reproduces the observed single-peak luminosity distribution of WRs in the MW. However, the weak bimodal luminosity distribution observed in the LMC is not reproduced in our model. We assess that this may be due to the model underestimating the mass-loss rate in the LMC. In conclusion, we consider that the binary merger is significant formation channel for WR formation, and can explain the observed high fraction of the single WRs in the total population. △ Less

Submitted 19 May, 2024; originally announced May 2024.

Comments: 14 pages, 8 figures , Accepted to APJ

Showing 1–50 of 1,030 results for author: Guo, S