-
Federated Adversarial Learning for Robust Autonomous Landing Runway Detection
Authors:
Yi Li,
Plamen Angelov,
Zhengxin Yu,
Alvaro Lopez Pellicer,
Neeraj Suri
Abstract:
As the development of deep learning techniques in autonomous landing systems continues to grow, one of the major challenges is trust and security in the face of possible adversarial attacks. In this paper, we propose a federated adversarial learning-based framework to detect landing runways using paired data comprising of clean local data and its adversarial version. Firstly, the local model is pr…
▽ More
As the development of deep learning techniques in autonomous landing systems continues to grow, one of the major challenges is trust and security in the face of possible adversarial attacks. In this paper, we propose a federated adversarial learning-based framework to detect landing runways using paired data comprising of clean local data and its adversarial version. Firstly, the local model is pre-trained on a large-scale lane detection dataset. Then, instead of exploiting large instance-adaptive models, we resort to a parameter-efficient fine-tuning method known as scale and shift deep features (SSF), upon the pre-trained model. Secondly, in each SSF layer, distributions of clean local data and its adversarial version are disentangled for accurate statistics estimation. To the best of our knowledge, this marks the first instance of federated learning work that address the adversarial sample problem in landing runway detection. Our experimental evaluations over both synthesis and real images of Landing Approach Runway Detection (LARD) dataset consistently demonstrate good performance of the proposed federated adversarial learning and robust to adversarial attacks.
△ Less
Submitted 22 June, 2024;
originally announced June 2024.
-
PUDD: Towards Robust Multi-modal Prototype-based Deepfake Detection
Authors:
Alvaro Lopez Pellcier,
Yi Li,
Plamen Angelov
Abstract:
Deepfake techniques generate highly realistic data, making it challenging for humans to discern between actual and artificially generated images. Recent advancements in deep learning-based deepfake detection methods, particularly with diffusion models, have shown remarkable progress. However, there is a growing demand for real-world applications to detect unseen individuals, deepfake techniques, a…
▽ More
Deepfake techniques generate highly realistic data, making it challenging for humans to discern between actual and artificially generated images. Recent advancements in deep learning-based deepfake detection methods, particularly with diffusion models, have shown remarkable progress. However, there is a growing demand for real-world applications to detect unseen individuals, deepfake techniques, and scenarios. To address this limitation, we propose a Prototype-based Unified Framework for Deepfake Detection (PUDD). PUDD offers a detection system based on similarity, comparing input data against known prototypes for video classification and identifying potential deepfakes or previously unseen classes by analyzing drops in similarity. Our extensive experiments reveal three key findings: (1) PUDD achieves an accuracy of 95.1% on Celeb-DF, outperforming state-of-the-art deepfake detection methods; (2) PUDD leverages image classification as the upstream task during training, demonstrating promising performance in both image classification and deepfake detection tasks during inference; (3) PUDD requires only 2.7 seconds for retraining on new data and emits 10$^{5}$ times less carbon compared to the state-of-the-art model, making it significantly more environmentally friendly.
△ Less
Submitted 30 June, 2024; v1 submitted 22 June, 2024;
originally announced June 2024.
-
LLM-Powered Explanations: Unraveling Recommendations Through Subgraph Reasoning
Authors:
Guangsi Shi,
Xiaofeng Deng,
Linhao Luo,
Lijuan Xia,
Lei Bao,
Bei Ye,
Fei Du,
Shirui Pan,
Yuxiao Li
Abstract:
Recommender systems are pivotal in enhancing user experiences across various web applications by analyzing the complicated relationships between users and items. Knowledge graphs(KGs) have been widely used to enhance the performance of recommender systems. However, KGs are known to be noisy and incomplete, which are hard to provide reliable explanations for recommendation results. An explainable r…
▽ More
Recommender systems are pivotal in enhancing user experiences across various web applications by analyzing the complicated relationships between users and items. Knowledge graphs(KGs) have been widely used to enhance the performance of recommender systems. However, KGs are known to be noisy and incomplete, which are hard to provide reliable explanations for recommendation results. An explainable recommender system is crucial for the product development and subsequent decision-making. To address these challenges, we introduce a novel recommender that synergies Large Language Models (LLMs) and KGs to enhance the recommendation and provide interpretable results. Specifically, we first harness the power of LLMs to augment KG reconstruction. LLMs comprehend and decompose user reviews into new triples that are added into KG. In this way, we can enrich KGs with explainable paths that express user preferences. To enhance the recommendation on augmented KGs, we introduce a novel subgraph reasoning module that effectively measures the importance of nodes and discovers reasoning for recommendation. Finally, these reasoning paths are fed into the LLMs to generate interpretable explanations of the recommendation results. Our approach significantly enhances both the effectiveness and interpretability of recommender systems, especially in cross-selling scenarios where traditional methods falter. The effectiveness of our approach has been rigorously tested on four open real-world datasets, with our methods demonstrating a superior performance over contemporary state-of-the-art techniques by an average improvement of 12%. The application of our model in a multinational engineering and technology company cross-selling recommendation system further underscores its practical utility and potential to redefine recommendation practices through improved accuracy and user trust.
△ Less
Submitted 29 June, 2024; v1 submitted 22 June, 2024;
originally announced June 2024.
-
EDGE-LLM: Enabling Efficient Large Language Model Adaptation on Edge Devices via Layerwise Unified Compression and Adaptive Layer Tuning and Voting
Authors:
Zhongzhi Yu,
Zheng Wang,
Yuhan Li,
Haoran You,
Ruijie Gao,
Xiaoya Zhou,
Sreenidhi Reedy Bommu,
Yang Katie Zhao,
Yingyan Celine Lin
Abstract:
Efficient adaption of large language models (LLMs) on edge devices is essential for applications requiring continuous and privacy-preserving adaptation and inference. However, existing tuning techniques fall short because of the high computation and memory overheads. To this end, we introduce a computation- and memory-efficient LLM tuning framework, called Edge-LLM, to facilitate affordable and ef…
▽ More
Efficient adaption of large language models (LLMs) on edge devices is essential for applications requiring continuous and privacy-preserving adaptation and inference. However, existing tuning techniques fall short because of the high computation and memory overheads. To this end, we introduce a computation- and memory-efficient LLM tuning framework, called Edge-LLM, to facilitate affordable and effective LLM adaptation on edge devices. Specifically, Edge-LLM features three core components: (1) a layer-wise unified compression (LUC) technique to reduce the computation overhead by generating layer-wise pruning sparsity and quantization bit-width policies, (2) an adaptive layer tuning and voting scheme to reduce the memory overhead by reducing the backpropagation depth, and (3) a complementary hardware scheduling strategy to handle the irregular computation patterns introduced by LUC and adaptive layer tuning, thereby achieving efficient computation and data movements. Extensive experiments demonstrate that Edge-LLM achieves a 2.92x speed up and a 4x memory overhead reduction as compared to vanilla tuning methods with comparable task accuracy. Our code is available at https://github.com/GATECH-EIC/Edge-LLM
△ Less
Submitted 22 June, 2024;
originally announced June 2024.
-
Perturbative stability and error correction thresholds of quantum codes
Authors:
Yaodong Li,
Nicholas O'Dea,
Vedika Khemani
Abstract:
Topologically-ordered phases are stable to local perturbations, and topological quantum error-correcting codes enjoy thresholds to local errors. We connect the two notions of stability by constructing classical statistical mechanics models for decoding general CSS codes and classical linear codes. Our construction encodes correction success probabilities under uncorrelated bit-flip and phase-flip…
▽ More
Topologically-ordered phases are stable to local perturbations, and topological quantum error-correcting codes enjoy thresholds to local errors. We connect the two notions of stability by constructing classical statistical mechanics models for decoding general CSS codes and classical linear codes. Our construction encodes correction success probabilities under uncorrelated bit-flip and phase-flip errors, and simultaneously describes a generalized Z2 lattice gauge theory with quenched disorder. We observe that the clean limit of the latter is precisely the discretized imaginary time path integral of the corresponding quantum code Hamiltonian when the errors are turned into a perturbative X or Z magnetic field. Motivated by error correction considerations, we define general order parameters for all such generalized Z2 lattice gauge theories, and show that they are generally lower bounded by success probabilities of error correction. For CSS codes satisfying the LDPC condition and with a sufficiently large code distance, we prove the existence of a low temperature ordered phase of the corresponding lattice gauge theories, particularly for those lacking Euclidean spatial locality and/or when there is a nonzero code rate. We further argue that these results provide evidence to stable phases in the corresponding perturbed quantum Hamiltonians, obtained in the limit of continuous imaginary time. To do so, we distinguish space- and time-like defects in the lattice gauge theory. A high free-energy cost of space-like defects corresponds to a successful "memory experiment" and suppresses the energy splitting among the ground states, while a high free-energy cost of time-like defects corresponds to a successful "stability experiment" and points to a nonzero gap to local excitations.
△ Less
Submitted 22 June, 2024;
originally announced June 2024.
-
Observation of Heat Pum** Effect by Radiative Shuttling
Authors:
Yuxuan Li,
Yongdi Dang,
Sen Zhang,
Xinran Li,
Tianle Chen,
Pankaj K. Choudhury,
Yi **,
Jianbin Xu,
Philippe Ben-Abdallah,
Bing-Feng Ju,
Yungui Ma
Abstract:
Heat shuttling phenomenon is characterized by the presence of a non-zero heat flow between two bodies without net thermal bias on average. It was initially predicted in the context of nonlinear heat conduction within atomic lattices coupled to two time-oscillating thermostats. Recent theoretical works revealed an analog of this effect for heat exchanges mediated by thermal photons between two soli…
▽ More
Heat shuttling phenomenon is characterized by the presence of a non-zero heat flow between two bodies without net thermal bias on average. It was initially predicted in the context of nonlinear heat conduction within atomic lattices coupled to two time-oscillating thermostats. Recent theoretical works revealed an analog of this effect for heat exchanges mediated by thermal photons between two solids having a temperature dependent emissivity. In this paper, we present the experimental proof of this effect using systems made with composite materials based on phase change materials. By periodically modulating the temperature of one of two solids we report that the system akin to heat pum** with a controllable heat flow direction. Additionally, we demonstrate the effectiveness of a simultaneous modulation of two temperatures to control both the strength and direction of heat shuttling by exploiting the phase delay between these temperatures. These results show that this effect is promising for an active thermal management of solid-state technology, to cool down solids, to insulate them from their background or to amplify heat exchanges.
△ Less
Submitted 22 June, 2024;
originally announced June 2024.
-
psPRF:Pansharpening Planar Neural Radiance Field for Generalized 3D Reconstruction Satellite Imagery
Authors:
Tongtong Zhang,
Yuanxiang Li
Abstract:
Most current NeRF variants for satellites are designed for one specific scene and fall short of generalization to new geometry. Additionally, the RGB images require pan-sharpening as an independent preprocessing step. This paper introduces psPRF, a Planar Neural Radiance Field designed for paired low-resolution RGB (LR-RGB) and high-resolution panchromatic (HR-PAN) images from satellite sensors wi…
▽ More
Most current NeRF variants for satellites are designed for one specific scene and fall short of generalization to new geometry. Additionally, the RGB images require pan-sharpening as an independent preprocessing step. This paper introduces psPRF, a Planar Neural Radiance Field designed for paired low-resolution RGB (LR-RGB) and high-resolution panchromatic (HR-PAN) images from satellite sensors with Rational Polynomial Cameras (RPC). To capture the cross-modal prior from both of the LR-RGB and HR-PAN images, for the Unet-shaped architecture, we adapt the encoder with explicit spectral-to-spatial convolution (SSConv) to enhance the multimodal representation ability. To support the generalization ability of psRPF across scenes, we adopt projection loss to ensure strong geometry self-supervision. The proposed method is evaluated with the multi-scene WorldView-3 LR-RGB and HR-PAN pairs, and achieves state-of-the-art performance.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
Secure Combination of Untrusted Time information Based on Optimized Dempster-Shafer Theory
Authors:
Yang Li,
Yujie Luo,
Yichen Zhang,
Ao Sun,
Wei Huang,
Shuai Zhang,
Tao Zhang,
Chuang Zhou,
Li Ma,
Jie Yang,
Mei Wu,
Heng Wang,
Yan Pan,
Yun Shao,
Xing Chen,
Ziyang Chen,
Song Yu,
Hong Guo,
Bingjie Xu
Abstract:
Secure precision time synchronization is important for applications of Cyber-Physical Systems. However, several attacks, especially the Time Delay Attack (TDA), deteriorates the performance of time synchronization system seriously. Multiple paths scheme is thought as an effective security countermeasure to decrease the influence of TDA. However, the effective secure combination algorithm is still…
▽ More
Secure precision time synchronization is important for applications of Cyber-Physical Systems. However, several attacks, especially the Time Delay Attack (TDA), deteriorates the performance of time synchronization system seriously. Multiple paths scheme is thought as an effective security countermeasure to decrease the influence of TDA. However, the effective secure combination algorithm is still missed for precision time synchronization. In this paper, a secure combination algorithm based on Dempster-Shafer theory is proposed for multiple paths method. Special optimizations are done for the combination algorithm to solve the potential problems due to untrusted evidence. Theoretical simulation shows that the proposed algorithm works much better than Fault Tolerant Algorithm (FTA) and the attack detection method based on single path. And experimental demonstration proves the feasibility and superiority of the proposed algorithm, where the time stability with 27.97 ps, 1.57 ps, and 1.12 ps at average time 1s, 10s, 100s is achieved under TDA and local clock jump. The proposed algorithm can be used to improve the security and resilience of many importance synchronization protocol, such as NTP, PTP, and TWFTT.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Preliminary Design of a General Electronics Platform for Accelerator Facilities
Authors:
**fu Zhu,
Hongli Ding,
Haokui Li,
Qiaoye Ran,
Xiwen Dai,
Wei Li,
Jiawei Han,
Yue Li,
Zhiyuan Zhang,
Weixin Qiu,
Weiqing Zhang
Abstract:
Many accelerators require considerable electronic systems for tests, verification, and operation. In Shenzhen Superconducting Soft X-ray Free Electron Laser (S3FEL), to meet the early tests and verification of various systems, save development expenses, and improve the reusability of hardware, firmware, and software systems, we have considered the needs of each system and preliminarily designed a…
▽ More
Many accelerators require considerable electronic systems for tests, verification, and operation. In Shenzhen Superconducting Soft X-ray Free Electron Laser (S3FEL), to meet the early tests and verification of various systems, save development expenses, and improve the reusability of hardware, firmware, and software systems, we have considered the needs of each system and preliminarily designed a general electronics platform based on MicroTCA.4. The Advanced Mezzanine Card (AMC) will place an FPGA Mezzanine Card (FMC) that supports 500 MSPS to 2 GSPS ADC/DAC. We will design two FMC cards on the Rear Transition Module (RTM), which can be used for analog signal conditioning and waveform digitization by 10 MSPS to 250 MSPS ADC/DAC or motor control. The commercial MCH, CPU, power module, and MTCA crate are deployed. This platform can also be applied to other accelerator facilities.
△ Less
Submitted 11 May, 2024;
originally announced June 2024.
-
Image Conductor: Precision Control for Interactive Video Synthesis
Authors:
Yaowei Li,
Xintao Wang,
Zhaoyang Zhang,
Zhouxia Wang,
Ziyang Yuan,
Liangbin Xie,
Yuexian Zou,
Ying Shan
Abstract:
Filmmaking and animation production often require sophisticated techniques for coordinating camera transitions and object movements, typically involving labor-intensive real-world capturing. Despite advancements in generative AI for video creation, achieving precise control over motion for interactive video asset generation remains challenging. To this end, we propose Image Conductor, a method for…
▽ More
Filmmaking and animation production often require sophisticated techniques for coordinating camera transitions and object movements, typically involving labor-intensive real-world capturing. Despite advancements in generative AI for video creation, achieving precise control over motion for interactive video asset generation remains challenging. To this end, we propose Image Conductor, a method for precise control of camera transitions and object movements to generate video assets from a single image. An well-cultivated training strategy is proposed to separate distinct camera and object motion by camera LoRA weights and object LoRA weights. To further address cinematographic variations from ill-posed trajectories, we introduce a camera-free guidance technique during inference, enhancing object movements while eliminating camera transitions. Additionally, we develop a trajectory-oriented video motion data curation pipeline for training. Quantitative and qualitative experiments demonstrate our method's precision and fine-grained control in generating motion-controllable videos from images, advancing the practical application of interactive video synthesis. Project webpage available at https://liyaowei-stu.github.io/project/ImageConductor/
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
You Only Acquire Sparse-channel (YOAS): A Unified Framework for Dense-channel EEG Generation
Authors:
Hongyu Chen,
Weiming Zeng,
Luhui Cai,
Yueyang Li,
Lei Wang,
Jia Lu,
Hongjie Yan,
Wai Ting Siok,
Nizhuan Wang
Abstract:
High-precision acquisition of dense-channel electroencephalogram (EEG) signals is often impeded by the costliness and lack of portability of equipment. In contrast, generating dense-channel EEG signals effectively from sparse channels shows promise and economic viability. However, sparse-channel EEG poses challenges such as reduced spatial resolution, information loss, signal mixing, and heightene…
▽ More
High-precision acquisition of dense-channel electroencephalogram (EEG) signals is often impeded by the costliness and lack of portability of equipment. In contrast, generating dense-channel EEG signals effectively from sparse channels shows promise and economic viability. However, sparse-channel EEG poses challenges such as reduced spatial resolution, information loss, signal mixing, and heightened susceptibility to noise and interference. To address these challenges, we first theoretically formulate the dense-channel EEG generation problem as by optimizing a set of cross-channel EEG signal generation problems. Then, we propose the YOAS framework for generating dense-channel data from sparse-channel EEG signals. The YOAS totally consists of four sequential stages: Data Preparation, Data Preprocessing, Biased-EEG Generation, and Synthetic EEG Generation. Data Preparation and Preprocessing carefully consider the distribution of EEG electrodes and low signal-to-noise ratio problem of EEG signals. Biased-EEG Generation includes sub-modules of BiasEEGGanFormer and BiasEEGDiffFormer, which facilitate long-term feature extraction with attention and generate signals by combining electrode position alignment with diffusion model, respectively. Synthetic EEG Generation synthesizes the final signals, employing a deduction paradigm for multi-channel EEG generation. Extensive experiments confirmed YOAS's feasibility, efficiency, and theoretical validity, even remarkably enhancing data discernibility. This breakthrough in dense-channel EEG signal generation from sparse-channel data opens new avenues for exploration in EEG signal processing and application.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
KnobTree: Intelligent Database Parameter Configuration via Explainable Reinforcement Learning
Authors:
Jiahan Chen,
Shuhan Qi,
Yifan Li,
Zeyu Dong,
Mingfeng Ding,
Yulin Wu,
Xuan Wang
Abstract:
Databases are fundamental to contemporary information systems, yet traditional rule-based configuration methods struggle to manage the complexity of real-world applications with hundreds of tunable parameters. Deep reinforcement learning (DRL), which combines perception and decision-making, presents a potential solution for intelligent database configuration tuning. However, due to black-box prope…
▽ More
Databases are fundamental to contemporary information systems, yet traditional rule-based configuration methods struggle to manage the complexity of real-world applications with hundreds of tunable parameters. Deep reinforcement learning (DRL), which combines perception and decision-making, presents a potential solution for intelligent database configuration tuning. However, due to black-box property of RL-based method, the generated database tuning strategies still face the urgent problem of lack explainability. Besides, the redundant parameters in large scale database always make the strategy learning become unstable. This paper proposes KnobTree, an interpertable framework designed for the optimization of database parameter configuration. In this framework, an interpertable database tuning algorithm based on RL-based differentatial tree is proposed, which building a transparent tree-based model to generate explainable database tuning strategies. To address the problem of large-scale parameters, We also introduce a explainable method for parameter importance assessment, by utilizing Shapley Values to identify parameters that have significant impacts on database performance. Experiments conducted on MySQL and Gbase8s databases have verified exceptional transparency and interpretability of the KnobTree model. The good property makes generated strategies can offer practical guidance to algorithm designers and database administrators. Moreover, our approach also slightly outperforms the existing RL-based tuning algorithms in aspects such as throughput, latency, and processing time.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
Time-Domain Signatures of Distinct Correlated Insulators in a Moiré Superlattice
Authors:
Eric A. Arsenault,
Yiliu Li,
Birui Yang,
Takashi Taniguchi,
Kenji Watanabe,
James C. Hone,
Cory R. Dean,
Xiaodong Xu,
X. -Y. Zhu
Abstract:
Among expanding discoveries of quantum phases in moiré superlattices, correlated insulators stand out as both the most stable and most commonly observed. Despite the central importance of these states in moiré physics, little is known about their underlying nature. Here, we use pump-probe spectroscopy to show distinct time-domain signatures of correlated insulators at fillings of one (v = -1) and…
▽ More
Among expanding discoveries of quantum phases in moiré superlattices, correlated insulators stand out as both the most stable and most commonly observed. Despite the central importance of these states in moiré physics, little is known about their underlying nature. Here, we use pump-probe spectroscopy to show distinct time-domain signatures of correlated insulators at fillings of one (v = -1) and two (v = -2) holes per moiré unit cell in the angle-aligned WSe2/WS2 system. Following photo-do**, we find that the disordering time of the v = -1 state is independent of excitation density (n_ex), as expected from the characteristic phonon response time associated with a polaronic state. In contrast, the disordering time of the v = -2 state scales with (n_ex)^-0.5, in agreement with plasmonic screening from free holons and doublons. These states display disparate reordering behavior dominated either by first order (v = -1) or second order (v = -2) recombination, suggesting the presence of Hubbard excitons and free carrier-like holons/doublons, respectively. Our work delineates the roles of electron-phonon (e-ph) versus electron-electron (e-e) interactions in correlated insulators on the moiré landscape and establishes non-equilibrium responses as mechanistic signatures for distinguishing and discovering quantum phases.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
Search for the $e^+e^- \to φχ_{c1}(3872)$ process at BESIII
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (639 additional authors not shown)
Abstract:
Based on 368.5 pb$^{-1}$ of $e^+e^-$ collision data collected at center-of-mass energies 4.914 and 4.946 GeV by the BESIII detector, the $e^+e^- \to φχ_{c1}(3872)$ process is searched for the first time. No significant signal is observed and the upper limits at the 90\% confidence level on the product of the Born cross section $σ(e^+e^- \to φχ_{c1}(3872))$ and the branching fraction…
▽ More
Based on 368.5 pb$^{-1}$ of $e^+e^-$ collision data collected at center-of-mass energies 4.914 and 4.946 GeV by the BESIII detector, the $e^+e^- \to φχ_{c1}(3872)$ process is searched for the first time. No significant signal is observed and the upper limits at the 90\% confidence level on the product of the Born cross section $σ(e^+e^- \to φχ_{c1}(3872))$ and the branching fraction $\mathcal{B}[χ_{c1}(3872)\toπ^+π^- J/ψ]$ at 4.914 and 4.946 GeV are set to be 0.85 and 0.96 pb, respectively. These measurements provide useful information for the production of the $χ_{c1}(3872)$ at $e^+e^-$ collider and deepen our understanding about the nature of this particle.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
CCAT: Detector Noise Limited Performance of the RFSoC-based Readout Electronics for mm/sub-mm/far-IR KIDs
Authors:
Adrian K. Sinclair,
James Burgoyne,
Anthony I. Huber,
Colin Murphy,
Steve K. Choi,
Cody J. Duell,
Zachary B. Huber,
Yaqiong Li,
Scott C. Chapman,
Michael D. Niemack,
Thomas Nikola,
Eve M. Vavagiakis,
Samantha Walker,
Jordan D. Wheeler,
Jason Austermann,
Lawrence Lin,
Ruixuan Xie,
Bugao Zou,
Philip D. Mauskopf
Abstract:
The Fred Young Submillimeter Telescope (FYST), on Cerro Chajnantor in the Atacama desert of Chile, will conduct wide-field and small deep-field surveys of the sky with more than 100,000 detectors on the Prime-Cam instrument. Kinetic inductance detectors (KIDs) were chosen as the primary sensor technology for their high density focal plane packing. Additionally, they benefit from low cost, ease of…
▽ More
The Fred Young Submillimeter Telescope (FYST), on Cerro Chajnantor in the Atacama desert of Chile, will conduct wide-field and small deep-field surveys of the sky with more than 100,000 detectors on the Prime-Cam instrument. Kinetic inductance detectors (KIDs) were chosen as the primary sensor technology for their high density focal plane packing. Additionally, they benefit from low cost, ease of fabrication, and simplified cryogenic readout, which are all beneficial for successful deployment at scale. The cryogenic multiplexing complexity is pulled out of the cryostat and is instead pushed into the digital signal processing of the room temperature electronics. Using the Xilinx Radio Frequency System on a Chip (RFSoC), a highly multiplexed KID readout was developed for the first light Prime-Cam and commissioning Mod-Cam instruments. We report on the performance of the RFSoC-based readout with multiple detector arrays in various cryogenic setups. Specifically we demonstrate detector noise limited performance of the RFSoC-based readout under the expected optical loading conditions.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
InternLM-Law: An Open Source Chinese Legal Large Language Model
Authors:
Zhiwei Fei,
Songyang Zhang,
Xiaoyu Shen,
Dawei Zhu,
Xiao Wang,
Maosong Cao,
Fengzhe Zhou,
Yining Li,
Wenwei Zhang,
Dahua Lin,
Kai Chen,
Jidong Ge
Abstract:
While large language models (LLMs) have showcased impressive capabilities, they struggle with addressing legal queries due to the intricate complexities and specialized expertise required in the legal field. In this paper, we introduce InternLM-Law, a specialized LLM tailored for addressing diverse legal queries related to Chinese laws, spanning from responding to standard legal questions (e.g., l…
▽ More
While large language models (LLMs) have showcased impressive capabilities, they struggle with addressing legal queries due to the intricate complexities and specialized expertise required in the legal field. In this paper, we introduce InternLM-Law, a specialized LLM tailored for addressing diverse legal queries related to Chinese laws, spanning from responding to standard legal questions (e.g., legal exercises in textbooks) to analyzing complex real-world legal situations. We meticulously construct a dataset in the Chinese legal domain, encompassing over 1 million queries, and implement a data filtering and processing pipeline to ensure its diversity and quality. Our training approach involves a novel two-stage process: initially fine-tuning LLMs on both legal-specific and general-purpose content to equip the models with broad knowledge, followed by exclusive fine-tuning on high-quality legal data to enhance structured output generation. InternLM-Law achieves the highest average performance on LawBench, outperforming state-of-the-art models, including GPT-4, on 13 out of 20 subtasks. We make InternLM-Law and our dataset publicly available to facilitate future research in applying LLMs within the legal domain.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
FlowBench: Revisiting and Benchmarking Workflow-Guided Planning for LLM-based Agents
Authors:
Ruixuan Xiao,
Wentao Ma,
Ke Wang,
Yuchuan Wu,
Junbo Zhao,
Haobo Wang,
Fei Huang,
Yongbin Li
Abstract:
LLM-based agents have emerged as promising tools, which are crafted to fulfill complex tasks by iterative planning and action. However, these agents are susceptible to undesired planning hallucinations when lacking specific knowledge for expertise-intensive tasks. To address this, preliminary attempts are made to enhance planning reliability by incorporating external workflow-related knowledge. De…
▽ More
LLM-based agents have emerged as promising tools, which are crafted to fulfill complex tasks by iterative planning and action. However, these agents are susceptible to undesired planning hallucinations when lacking specific knowledge for expertise-intensive tasks. To address this, preliminary attempts are made to enhance planning reliability by incorporating external workflow-related knowledge. Despite the promise, such infused knowledge is mostly disorganized and diverse in formats, lacking rigorous formalization and comprehensive comparisons. Motivated by this, we formalize different formats of workflow knowledge and present FlowBench, the first benchmark for workflow-guided planning. FlowBench covers 51 different scenarios from 6 domains, with knowledge presented in diverse formats. To assess different LLMs on FlowBench, we design a multi-tiered evaluation framework. We evaluate the efficacy of workflow knowledge across multiple formats, and the results indicate that current LLM agents need considerable improvements for satisfactory planning. We hope that our challenging benchmark can pave the way for future agent planning research.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
DN-CL: Deep Symbolic Regression against Noise via Contrastive Learning
Authors:
**gyi Liu,
Yanjie Li,
Lina Yu,
Min Wu,
Weijun Li,
Wenqiang Li,
Meilan Hao,
Yusong Deng,
Shu Wei
Abstract:
Noise ubiquitously exists in signals due to numerous factors including physical, electronic, and environmental effects. Traditional methods of symbolic regression, such as genetic programming or deep learning models, aim to find the most fitting expressions for these signals. However, these methods often overlook the noise present in real-world data, leading to reduced fitting accuracy. To tackle…
▽ More
Noise ubiquitously exists in signals due to numerous factors including physical, electronic, and environmental effects. Traditional methods of symbolic regression, such as genetic programming or deep learning models, aim to find the most fitting expressions for these signals. However, these methods often overlook the noise present in real-world data, leading to reduced fitting accuracy. To tackle this issue, we propose \textit{\textbf{D}eep Symbolic Regression against \textbf{N}oise via \textbf{C}ontrastive \textbf{L}earning (DN-CL)}. DN-CL employs two parameter-sharing encoders to embed data points from various data transformations into feature shields against noise. This model treats noisy data and clean data as different views of the ground-truth mathematical expressions. Distances between these features are minimized, utilizing contrastive learning to distinguish between 'positive' noise-corrected pairs and 'negative' contrasting pairs. Our experiments indicate that DN-CL demonstrates superior performance in handling both noisy and clean data, presenting a promising method of symbolic regression.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Word Matters: What Influences Domain Adaptation in Summarization?
Authors:
Yinghao Li,
Siyu Miao,
Heyan Huang,
Yang Gao
Abstract:
Domain adaptation aims to enable Large Language Models (LLMs) to generalize domain datasets unseen effectively during the training phase. However, factors such as the size of the model parameters and the scale of training data are general influencers and do not reflect the nuances of domain adaptation performance. This paper investigates the fine-grained factors affecting domain adaptation perform…
▽ More
Domain adaptation aims to enable Large Language Models (LLMs) to generalize domain datasets unseen effectively during the training phase. However, factors such as the size of the model parameters and the scale of training data are general influencers and do not reflect the nuances of domain adaptation performance. This paper investigates the fine-grained factors affecting domain adaptation performance, analyzing the specific impact of `words' in training data on summarization tasks. We propose quantifying dataset learning difficulty as the learning difficulty of generative summarization, which is determined by two indicators: word-based compression rate and abstraction level. Our experiments conclude that, when considering dataset learning difficulty, the cross-domain overlap and the performance gain in summarization tasks exhibit an approximate linear relationship, which is not directly related to the number of words. Based on this finding, predicting a model's performance on unknown domain datasets is possible without undergoing training.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Story of Your Lazy Function's Life: A Bidirectional Demand Semantics for Mechanized Cost Analysis of Lazy Programs
Authors:
Li-yao Xia,
Laura Israel,
Maite Kramarz,
Nicholas Coltharp,
Koen Claessen,
Stephanie Weirich,
Yao Li
Abstract:
Lazy evaluation is a powerful tool that enables better compositionality and potentially better performance in functional programming, but it is challenging to analyze its computation cost. Existing works either require manually annotating sharing, or rely on separation logic to reason about heaps of mutable cells. In this paper, we propose a bidirectional demand semantics that allows for extrinsic…
▽ More
Lazy evaluation is a powerful tool that enables better compositionality and potentially better performance in functional programming, but it is challenging to analyze its computation cost. Existing works either require manually annotating sharing, or rely on separation logic to reason about heaps of mutable cells. In this paper, we propose a bidirectional demand semantics that allows for extrinsic reasoning about the computation cost of lazy programs without relying on special program logics. To show the effectiveness of our approach, we apply the demand semantics to a variety of case studies including insertion sort, selection sort, Okasaki's banker's queue, and the implicit queue. We formally prove that the banker's queue and the implicit queue are both amortized and persistent using the Rocq Prover (formerly known as Coq). We also propose the reverse physicist's method, a novel variant of the classical physicist's method, which enables mechanized, modular and compositional reasoning about amortization and persistence with the demand semantics.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
1+1>2: Can Large Language Models Serve as Cross-Lingual Knowledge Aggregators?
Authors:
Yue Huang,
Chenrui Fan,
Yuan Li,
Siyuan Wu,
Tianyi Zhou,
Xiangliang Zhang,
Lichao Sun
Abstract:
Large Language Models (LLMs) have garnered significant attention due to their remarkable ability to process information across various languages. Despite their capabilities, they exhibit inconsistencies in handling identical queries in different languages, presenting challenges for further advancement. This paper introduces a method to enhance the multilingual performance of LLMs by aggregating kn…
▽ More
Large Language Models (LLMs) have garnered significant attention due to their remarkable ability to process information across various languages. Despite their capabilities, they exhibit inconsistencies in handling identical queries in different languages, presenting challenges for further advancement. This paper introduces a method to enhance the multilingual performance of LLMs by aggregating knowledge from diverse languages. This approach incorporates a low-resource knowledge detector specific to a language, a language selection process, and mechanisms for answer replacement and integration. Our experiments demonstrate notable performance improvements, particularly in reducing language performance disparity. An ablation study confirms that each component of our method significantly contributes to these enhancements. This research highlights the inherent potential of LLMs to harmonize multilingual capabilities and offers valuable insights for further exploration.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Experimental Validation of Cooperative RSS-based Localization with Unknown Transmit Power, Path Loss Exponent, and Precise Anchor Location
Authors:
Yingquan Li,
Bodhibrata Mukhopadhyay,
Jiajie Xu,
Mohamed-Slim Alouini
Abstract:
Received signal strength (RSS)--based cooperative localization has gained significant attention due to its straightforward system architectures and cost-effectiveness. In this paper, we propose Cooperative Localization Techniques (with Unknown Parameters), referred to as CTUP(s), which consider uncertainty in anchor nodes' locations and assume the transmit power and \textcolor{blue}{path loss expo…
▽ More
Received signal strength (RSS)--based cooperative localization has gained significant attention due to its straightforward system architectures and cost-effectiveness. In this paper, we propose Cooperative Localization Techniques (with Unknown Parameters), referred to as CTUP(s), which consider uncertainty in anchor nodes' locations and assume the transmit power and \textcolor{blue}{path loss exponent (PLE)} to be unknown. Unlike prior studies, CTUP(s) address unknowns by estimating these parameters, along with the location of target nodes. The non-convex and non-linear nature of the maximum likelihood (ML) estimator of the problem is addressed through relaxation techniques, employing Taylor series expansion, semidefinite relaxation (SDR), and the epigraph method. The resulting problem is solved using semidefinite second-order cone programming (SDP-SOCP), leveraging the precision of SDP and the simplicity of SOCP. We deployed an extensive network comprising 50 BLE nodes covering an area of 640~m $\times$ 180~m to gather RSS data. The precise location of the nodes is obtained using real-time kinematics global positioning system (RTK-GPS), which is treated as the ground truth. Furthermore, to replicate real-world scenarios, we recorded the positions of the anchor nodes using a standard GPS, thereby introducing uncertainty into the anchor node locations. Extensive simulation and hardware experimentation demonstrate the superior performance of CTUP compared to existing techniques.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Unveiling the Spectrum of Data Contamination in Language Models: A Survey from Detection to Remediation
Authors:
Chunyuan Deng,
Yilun Zhao,
Yuzhao Heng,
Yitong Li,
Jiannan Cao,
Xiangru Tang,
Arman Cohan
Abstract:
Data contamination has garnered increased attention in the era of large language models (LLMs) due to the reliance on extensive internet-derived training corpora. The issue of training corpus overlap with evaluation benchmarks--referred to as contamination--has been the focus of significant recent research. This body of work aims to identify contamination, understand its impacts, and explore mitig…
▽ More
Data contamination has garnered increased attention in the era of large language models (LLMs) due to the reliance on extensive internet-derived training corpora. The issue of training corpus overlap with evaluation benchmarks--referred to as contamination--has been the focus of significant recent research. This body of work aims to identify contamination, understand its impacts, and explore mitigation strategies from diverse perspectives. However, comprehensive studies that provide a clear pathway from foundational concepts to advanced insights are lacking in this nascent field. Therefore, we present a comprehensive survey in the field of data contamination, laying out the key issues, methodologies, and findings to date, and highlighting areas in need of further research and development. In particular, we begin by examining the effects of data contamination across various stages and forms. We then provide a detailed analysis of current contamination detection methods, categorizing them to highlight their focus, assumptions, strengths, and limitations. We also discuss mitigation strategies, offering a clear guide for future research. This survey serves as a succinct overview of the most recent advancements in data contamination research, providing a straightforward guide for the benefit of future research endeavors.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
GraphReader: Building Graph-based Agent to Enhance Long-Context Abilities of Large Language Models
Authors:
Shilong Li,
Yancheng He,
Hangyu Guo,
Xingyuan Bu,
Ge Bai,
Jie Liu,
Jiaheng Liu,
Xingwei Qu,
Yangguang Li,
Wanli Ouyang,
Wenbo Su,
Bo Zheng
Abstract:
Long-context capabilities are essential for large language models (LLMs) to tackle complex and long-input tasks. Despite numerous efforts made to optimize LLMs for long contexts, challenges persist in robustly processing long inputs. In this paper, we introduce GraphReader, a graph-based agent system designed to handle long texts by structuring them into a graph and employing an agent to explore t…
▽ More
Long-context capabilities are essential for large language models (LLMs) to tackle complex and long-input tasks. Despite numerous efforts made to optimize LLMs for long contexts, challenges persist in robustly processing long inputs. In this paper, we introduce GraphReader, a graph-based agent system designed to handle long texts by structuring them into a graph and employing an agent to explore this graph autonomously. Upon receiving a question, the agent first undertakes a step-by-step analysis and devises a rational plan. It then invokes a set of predefined functions to read node content and neighbors, facilitating a coarse-to-fine exploration of the graph. Throughout the exploration, the agent continuously records new insights and reflects on current circumstances to optimize the process until it has gathered sufficient information to generate an answer. Experimental results on the LV-Eval dataset reveal that GraphReader, using a 4k context window, consistently outperforms GPT-4-128k across context lengths from 16k to 256k by a large margin. Additionally, our approach demonstrates superior performance on four challenging single-hop and multi-hop benchmarks.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding
Authors:
Xinyu Fang,
Kangrui Mao,
Haodong Duan,
Xiangyu Zhao,
Yining Li,
Dahua Lin,
Kai Chen
Abstract:
The advent of large vision-language models (LVLMs) has spurred research into their applications in multi-modal contexts, particularly in video understanding. Traditional VideoQA benchmarks, despite providing quantitative metrics, often fail to encompass the full spectrum of video content and inadequately assess models' temporal comprehension. To address these limitations, we introduce MMBench-Vide…
▽ More
The advent of large vision-language models (LVLMs) has spurred research into their applications in multi-modal contexts, particularly in video understanding. Traditional VideoQA benchmarks, despite providing quantitative metrics, often fail to encompass the full spectrum of video content and inadequately assess models' temporal comprehension. To address these limitations, we introduce MMBench-Video, a quantitative benchmark designed to rigorously evaluate LVLMs' proficiency in video understanding. MMBench-Video incorporates lengthy videos from YouTube and employs free-form questions, mirroring practical use cases. The benchmark is meticulously crafted to probe the models' temporal reasoning skills, with all questions human-annotated according to a carefully constructed ability taxonomy. We employ GPT-4 for automated assessment, demonstrating superior accuracy and robustness over earlier LLM-based evaluations. Utilizing MMBench-Video, we have conducted comprehensive evaluations that include both proprietary and open-source LVLMs for images and videos. MMBench-Video stands as a valuable resource for the research community, facilitating improved evaluation of LVLMs and catalyzing progress in the field of video understanding. The evalutation code of MMBench-Video will be integrated into VLMEvalKit: https://github.com/open-compass/VLMEvalKit.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Rewarding What Matters: Step-by-Step Reinforcement Learning for Task-Oriented Dialogue
Authors:
Huifang Du,
Shuqin Li,
Minghao Wu,
Xue**g Feng,
Yuan-Fang Li,
Haofen Wang
Abstract:
Reinforcement learning (RL) is a powerful approach to enhance task-oriented dialogue (TOD) systems. However, existing RL methods tend to mainly focus on generation tasks, such as dialogue policy learning (DPL) or response generation (RG), while neglecting dialogue state tracking (DST) for understanding. This narrow focus limits the systems to achieve globally optimal performance by overlooking the…
▽ More
Reinforcement learning (RL) is a powerful approach to enhance task-oriented dialogue (TOD) systems. However, existing RL methods tend to mainly focus on generation tasks, such as dialogue policy learning (DPL) or response generation (RG), while neglecting dialogue state tracking (DST) for understanding. This narrow focus limits the systems to achieve globally optimal performance by overlooking the interdependence between understanding and generation. Additionally, RL methods face challenges with sparse and delayed rewards, which complicates training and optimization. To address these issues, we extend RL into both understanding and generation tasks by introducing step-by-step rewards throughout the token generation. The understanding reward increases as more slots are correctly filled in DST, while the generation reward grows with the accurate inclusion of user requests. Our approach provides a balanced optimization aligned with task completion. Experimental results demonstrate that our approach effectively enhances the performance of TOD systems and achieves new state-of-the-art results on three widely used datasets, including MultiWOZ2.0, MultiWOZ2.1, and In-Car. Our approach also shows superior few-shot ability in low-resource settings compared to current models.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
MM-GTUNets: Unified Multi-Modal Graph Deep Learning for Brain Disorders Prediction
Authors:
Luhui Cai,
Weiming Zeng,
Hongyu Chen,
Hua Zhang,
Yueyang Li,
Hongjie Yan,
Lingbin Bian,
Nizhuan Wang
Abstract:
Graph deep learning (GDL) has demonstrated impressive performance in predicting population-based brain disorders (BDs) through the integration of both imaging and non-imaging data. However, the effectiveness of GDL based methods heavily depends on the quality of modeling the multi-modal population graphs and tends to degrade as the graph scale increases. Furthermore, these methods often constrain…
▽ More
Graph deep learning (GDL) has demonstrated impressive performance in predicting population-based brain disorders (BDs) through the integration of both imaging and non-imaging data. However, the effectiveness of GDL based methods heavily depends on the quality of modeling the multi-modal population graphs and tends to degrade as the graph scale increases. Furthermore, these methods often constrain interactions between imaging and non-imaging data to node-edge interactions within the graph, overlooking complex inter-modal correlations, leading to suboptimal outcomes. To overcome these challenges, we propose MM-GTUNets, an end-to-end graph transformer based multi-modal graph deep learning (MMGDL) framework designed for brain disorders prediction at large scale. Specifically, to effectively leverage rich multi-modal information related to diseases, we introduce Modality Reward Representation Learning (MRRL) which adaptively constructs population graphs using a reward system. Additionally, we employ variational autoencoder to reconstruct latent representations of non-imaging features aligned with imaging features. Based on this, we propose Adaptive Cross-Modal Graph Learning (ACMGL), which captures critical modality-specific and modality-shared features through a unified GTUNet encoder taking advantages of Graph UNet and Graph Transformer, and feature fusion module. We validated our method on two public multi-modal datasets ABIDE and ADHD-200, demonstrating its superior performance in diagnosing BDs. Our code is available at https://github.com/NZWANG/MM-GTUNets.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Communication with Quantum Catalysts
Authors:
Yuqi Li,
Jun**g Xing,
Dengke Qu,
Lei Xiao,
Zhaobing Fan,
Zhu-Jun Zheng,
Haitao Ma,
Peng Xue,
Kishor Bharti,
Dax Enshan Koh,
Yunlong Xiao
Abstract:
Communication is essential for advancing science and technology. Quantum communication, in particular, benefits from the use of catalysts. During the communication process, these catalysts enhance performance while remaining unchanged. Although chemical catalysts that undergo deactivation typically perform worse than those that remain unaffected, quantum catalysts, referred to as embezzling cataly…
▽ More
Communication is essential for advancing science and technology. Quantum communication, in particular, benefits from the use of catalysts. During the communication process, these catalysts enhance performance while remaining unchanged. Although chemical catalysts that undergo deactivation typically perform worse than those that remain unaffected, quantum catalysts, referred to as embezzling catalysts, can surprisingly outperform their non-deactivating counterparts despite experiencing slight alterations. In this work, we employ embezzling quantum catalysts to enhance the transmission of both quantum and classical information. Our results reveal that using embezzling catalysts augments the efficiency of information transmission across noisy quantum channels, ensuring a non-zero catalytic channel capacity. Furthermore, we introduce catalytic superdense coding, demonstrating how embezzling catalysts can enhance the transmission of classical information. Finally, we explore methods to reduce the dimensionality of catalysts, a step toward making quantum catalysis a practical reality.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Teleportation with Embezzling Catalysts
Authors:
Jun**g Xing,
Yuqi Li,
Dengke Qu,
Lei Xiao,
Zhaobing Fan,
Haitao Ma,
Peng Xue,
Kishor Bharti,
Dax Enshan Koh,
Yunlong Xiao
Abstract:
Quantum teleportation is the process of transferring quantum information using classical communication and pre-shared entanglement. This process can benefit from the use of catalysts, which are ancillary entangled states that can enhance teleportation without being consumed. While chemical catalysts undergoing deactivation invariably exhibit inferior performance compared to those unaffected by dea…
▽ More
Quantum teleportation is the process of transferring quantum information using classical communication and pre-shared entanglement. This process can benefit from the use of catalysts, which are ancillary entangled states that can enhance teleportation without being consumed. While chemical catalysts undergoing deactivation invariably exhibit inferior performance compared to those unaffected by deactivation, quantum catalysts, termed embezzling catalysts, that are subject to deactivation, may surprisingly outperform their non-deactivating counterparts. In this work, we present teleportation protocols with embezzling catalyst that can achieve arbitrarily high fidelity, namely the teleported state can be made arbitrarily close to the original state, with finite-dimensional embezzling catalysts. We show that some embezzling catalysts are universal, meaning that they can improve the teleportation fidelity for any pre-shared entanglement. We also explore methods to reduce the dimension of catalysts without increasing catalyst consumption, an essential step towards realizing quantum catalysis in practice.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
The neural correlates of logical-mathematical symbol systems processing resemble that of spatial cognition more than natural language processing
Authors:
Yuannan Li,
Shan Xu,
Jia Liu
Abstract:
The ability to manipulate logical-mathematical symbols (LMS), encompassing tasks such as calculation, reasoning, and programming, is a cognitive skill arguably unique to humans. Considering the relatively recent emergence of this ability in human evolutionary history, it has been suggested that LMS processing may build upon more fundamental cognitive systems, possibly through neuronal recycling. P…
▽ More
The ability to manipulate logical-mathematical symbols (LMS), encompassing tasks such as calculation, reasoning, and programming, is a cognitive skill arguably unique to humans. Considering the relatively recent emergence of this ability in human evolutionary history, it has been suggested that LMS processing may build upon more fundamental cognitive systems, possibly through neuronal recycling. Previous studies have pinpointed two primary candidates, natural language processing and spatial cognition. Existing comparisons between these domains largely relied on task-level comparison, which may be confounded by task idiosyncrasy. The present study instead compared the neural correlates at the domain level with both automated meta-analysis and synthesized maps based on three representative LMS tasks, reasoning, calculation, and mental programming. Our results revealed a more substantial cortical overlap between LMS processing and spatial cognition, in contrast to language processing. Furthermore, in regions activated by both spatial and language processing, the multivariate activation pattern for LMS processing exhibited greater multivariate similarity to spatial cognition than to language processing. A hierarchical clustering analysis further indicated that typical LMS tasks were indistinguishable from spatial cognition tasks at the neural level, suggesting an inherent connection between these two cognitive processes. Taken together, our findings support the hypothesis that spatial cognition is likely the basis of LMS processing, which may shed light on the limitations of large language models in logical reasoning, particularly those trained exclusively on textual data without explicit emphasis on spatial content.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
In Tree Structure Should Sentence Be Generated
Authors:
Yaguang Li,
Xin Chen
Abstract:
Generative models reliant on sequential autoregression have been at the forefront of language generation for an extensive period, particularly following the introduction of widely acclaimed transformers. Despite its excellent performance, there are always some issues that we face today. For example, problems such as hallucinations and getting trapped in a logic loop may occur. To enhance the perfo…
▽ More
Generative models reliant on sequential autoregression have been at the forefront of language generation for an extensive period, particularly following the introduction of widely acclaimed transformers. Despite its excellent performance, there are always some issues that we face today. For example, problems such as hallucinations and getting trapped in a logic loop may occur. To enhance the performance of existing systems, this paper introduces a new method for generating sequences in natural language, which involves generating the targeted sentence in a tree-traversing order. The paper includes an illustration of the theoretical basis and validity of the approach, as well as a comparison of its fundamentals with the diffusion model in graphic generation. Finally, a module called SenTree is introduced for generating an approximating binary tree. It is already available at https://github.com/arklyg/sentree. Additionally, a joint training framework based on this approach is proposed, incorporating the intrinsics of generative adversarial networks.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
ExVideo: Extending Video Diffusion Models via Parameter-Efficient Post-Tuning
Authors:
Zhongjie Duan,
Wenmeng Zhou,
Cen Chen,
Yaliang Li,
Weining Qian
Abstract:
Recently, advancements in video synthesis have attracted significant attention. Video synthesis models such as AnimateDiff and Stable Video Diffusion have demonstrated the practical applicability of diffusion models in creating dynamic visual content. The emergence of SORA has further spotlighted the potential of video generation technologies. Nonetheless, the extension of video lengths has been c…
▽ More
Recently, advancements in video synthesis have attracted significant attention. Video synthesis models such as AnimateDiff and Stable Video Diffusion have demonstrated the practical applicability of diffusion models in creating dynamic visual content. The emergence of SORA has further spotlighted the potential of video generation technologies. Nonetheless, the extension of video lengths has been constrained by the limitations in computational resources. Most existing video synthesis models can only generate short video clips. In this paper, we propose a novel post-tuning methodology for video synthesis models, called ExVideo. This approach is designed to enhance the capability of current video synthesis models, allowing them to produce content over extended temporal durations while incurring lower training expenditures. In particular, we design extension strategies across common temporal model architectures respectively, including 3D convolution, temporal attention, and positional embedding. To evaluate the efficacy of our proposed post-tuning approach, we conduct extension training on the Stable Video Diffusion model. Our approach augments the model's capacity to generate up to $5\times$ its original number of frames, requiring only 1.5k GPU hours of training on a dataset comprising 40k videos. Importantly, the substantial increase in video length doesn't compromise the model's innate generalization capabilities, and the model showcases its advantages in generating videos of diverse styles and resolutions. We will release the source code and the enhanced model publicly.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Towards Event-oriented Long Video Understanding
Authors:
Yifan Du,
Kun Zhou,
Yuqi Huo,
Yifan Li,
Wayne Xin Zhao,
Haoyu Lu,
Zijia Zhao,
Bingning Wang,
Weipeng Chen,
Ji-Rong Wen
Abstract:
With the rapid development of video Multimodal Large Language Models (MLLMs), numerous benchmarks have been proposed to assess their video understanding capability. However, due to the lack of rich events in the videos, these datasets may suffer from the short-cut bias that the answers can be deduced from a few frames, without the need to watch the entire video. To address this issue, we introduce…
▽ More
With the rapid development of video Multimodal Large Language Models (MLLMs), numerous benchmarks have been proposed to assess their video understanding capability. However, due to the lack of rich events in the videos, these datasets may suffer from the short-cut bias that the answers can be deduced from a few frames, without the need to watch the entire video. To address this issue, we introduce Event-Bench, an event-oriented long video understanding benchmark built on existing datasets and human annotations. Event-Bench includes six event-related tasks and 2,190 test instances to comprehensively evaluate video event understanding ability. Additionally, we propose Video Instruction Merging~(VIM), a cost-effective method that enhances video MLLMs using merged, event-intensive video instructions, addressing the scarcity of human-annotated, event-intensive data. Extensive experiments show that the best-performing model, GPT-4o, achieves an overall accuracy of 53.33, significantly outperforming the best open-source model by 41.42%. Leveraging an effective instruction synthesis method and an adaptive model architecture, VIM surpasses both state-of-the-art open-source models and GPT-4V on the Event-Bench. All code, data, and models are publicly available at https://github.com/RUCAIBox/Event-Bench.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
EasyECR: A Library for Easy Implementation and Evaluation of Event Coreference Resolution Models
Authors:
Yuncong Li,
Tianhua Xu,
Sheng-hua Zhong,
Haiqin Yang
Abstract:
Event Coreference Resolution (ECR) is the task of clustering event mentions that refer to the same real-world event. Despite significant advancements, ECR research faces two main challenges: limited generalizability across domains due to narrow dataset evaluations, and difficulties in comparing models within diverse ECR pipelines. To address these issues, we develop EasyECR, the first open-source…
▽ More
Event Coreference Resolution (ECR) is the task of clustering event mentions that refer to the same real-world event. Despite significant advancements, ECR research faces two main challenges: limited generalizability across domains due to narrow dataset evaluations, and difficulties in comparing models within diverse ECR pipelines. To address these issues, we develop EasyECR, the first open-source library designed to standardize data structures and abstract ECR pipelines for easy implementation and fair evaluation. More specifically, EasyECR integrates seven representative pipelines and ten popular benchmark datasets, enabling model evaluations on various datasets and promoting the development of robust ECR pipelines. By conducting extensive evaluation via our EasyECR, we find that, \lowercase\expandafter{\romannumeral1}) the representative ECR pipelines cannot generalize across multiple datasets, hence evaluating ECR pipelines on multiple datasets is necessary, \lowercase\expandafter{\romannumeral2}) all models in ECR pipelines have a great effect on pipeline performance, therefore, when one model in ECR pipelines are compared, it is essential to ensure that the other models remain consistent. Additionally, reproducing ECR results is not trivial, and the developed library can help reduce this discrepancy. The experimental results provide valuable baselines for future research.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Self-Attention in Transformer Networks Explains Monkeys' Gaze Pattern in Pac-Man Game
Authors:
Zhongqiao Lin,
Yunwei Li,
Tianming Yang
Abstract:
We proactively direct our eyes and attention to collect information during problem solving and decision making. Understanding gaze patterns is crucial for gaining insights into the computation underlying the problem-solving process. However, there is a lack of interpretable models that can account for how the brain directs the eyes to collect information and utilize it, especially in the context o…
▽ More
We proactively direct our eyes and attention to collect information during problem solving and decision making. Understanding gaze patterns is crucial for gaining insights into the computation underlying the problem-solving process. However, there is a lack of interpretable models that can account for how the brain directs the eyes to collect information and utilize it, especially in the context of complex problem solving. In the current study, we analyzed the gaze patterns of two monkeys playing the Pac-Man game. We trained a transformer network to mimic the monkeys' gameplay and found its attention pattern captures the monkeys' eye movements. In addition, the prediction based on the transformer network's attention outperforms the human subjects' predictions. Importantly, we dissected the computation underlying the attention mechanism of the transformer network, revealing its layered structures reflecting a value-based attention component and a component that captures the interactions between Pac-Man and other game objects. Based on these findings, we built a condensed attention model that is not only as accurate as the transformer network but also fully interpretable. Our results highlight the potential of using transformer neural networks to model and understand the cognitive processes underlying complex problem solving in the brain, opening new avenues for investigating the neural basis of cognition.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Urban-Focused Multi-Task Offline Reinforcement Learning with Contrastive Data Sharing
Authors:
Xinbo Zhao,
Yingxue Zhang,
Xin Zhang,
Yu Yang,
Yiqun Xie,
Yanhua Li,
Jun Luo
Abstract:
Enhancing diverse human decision-making processes in an urban environment is a critical issue across various applications, including ride-sharing vehicle dispatching, public transportation management, and autonomous driving. Offline reinforcement learning (RL) is a promising approach to learn and optimize human urban strategies (or policies) from pre-collected human-generated spatial-temporal urba…
▽ More
Enhancing diverse human decision-making processes in an urban environment is a critical issue across various applications, including ride-sharing vehicle dispatching, public transportation management, and autonomous driving. Offline reinforcement learning (RL) is a promising approach to learn and optimize human urban strategies (or policies) from pre-collected human-generated spatial-temporal urban data. However, standard offline RL faces two significant challenges: (1) data scarcity and data heterogeneity, and (2) distributional shift. In this paper, we introduce MODA -- a Multi-Task Offline Reinforcement Learning with Contrastive Data Sharing approach. MODA addresses the challenges of data scarcity and heterogeneity in a multi-task urban setting through Contrastive Data Sharing among tasks. This technique involves extracting latent representations of human behaviors by contrasting positive and negative data pairs. It then shares data presenting similar representations with the target task, facilitating data augmentation for each task. Moreover, MODA develops a novel model-based multi-task offline RL algorithm. This algorithm constructs a robust Markov Decision Process (MDP) by integrating a dynamics model with a Generative Adversarial Network (GAN). Once the robust MDP is established, any online RL or planning algorithm can be applied. Extensive experiments conducted in a real-world multi-task urban setting validate the effectiveness of MODA. The results demonstrate that MODA exhibits significant improvements compared to state-of-the-art baselines, showcasing its capability in advancing urban decision-making processes. We also made our code available to the research community.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Symmetry engineering in 2D bioelectronics facilitating augmented biosensing interfaces
Authors:
Yizhang Wu,
Yihan Liu,
Yuan Li,
Ziquan Wei,
Sicheng Xing,
Yunlang Wang,
Dashuai Zhu,
Ziheng Guo,
Anran Zhang,
Gongkai Yuan,
Zhibo Zhang,
Ke Huang,
Yong Wang,
Guorong Wu,
Ke Cheng,
Wubin Bai
Abstract:
Symmetry lies at the heart of 2D bioelectronics, determining material properties at the fundamental level. Breaking the symmetry allows emergent functionalities and effects. However, symmetry modulation in 2D bioelectronics and the resultant applications have been largely overlooked. Here we devise an oxidized architectural MXene, referred as OXene, that couples orbit symmetric breaking with inver…
▽ More
Symmetry lies at the heart of 2D bioelectronics, determining material properties at the fundamental level. Breaking the symmetry allows emergent functionalities and effects. However, symmetry modulation in 2D bioelectronics and the resultant applications have been largely overlooked. Here we devise an oxidized architectural MXene, referred as OXene, that couples orbit symmetric breaking with inverse symmetric breaking to entitle the optimized interfacial impedance and Schottky-induced piezoelectric effects. The resulting OXene validates applications ranging from microelectrode arrays, gait analysis, active transistor matrix, and wireless signaling transmission, which enables highly-fidelity signal transmission and reconfigurable logic gates. Further OXene interfaces are investigated in both rodent and porcine myocardium, featuring high-quality and spatiotemporally resolved physiological recordings, while accurate differentiated predictions, enabled via various machine learning pipelines.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Orbit symmetry breaking in MXene implements enhanced soft bioelectronic implants
Authors:
Yizhang Wu,
Yuan Li,
Yihan Liu,
Dashuai Zhu,
Sicheng Xing,
Noah Lambert,
Hannah Weisbecker,
Siyuan Liu,
Brayden Davis,
Lin Zhang,
Meixiang Wang,
Gongkai Yuan,
Chris Zhoufan You,
Anran Zhang,
Cate Duncan,
Wanrong Xie,
Yihang Wang,
Yong Wang,
Sreya Kanamurlapudi,
Garcia-Guzman Evert,
Arjun Putcha,
Michael D. Dickey,
Ke Huang,
Wubin Bai
Abstract:
Bioelectronic implants with soft mechanics, biocompatibility, and excellent electrical performance enable biomedical implants to record electrophysiological signals and execute interventions within internal organs, promising to revolutionize the diagnosing, monitoring, and treatment of various pathological conditions. However, challenges remain in improving excessive impedance at the bioelectronic…
▽ More
Bioelectronic implants with soft mechanics, biocompatibility, and excellent electrical performance enable biomedical implants to record electrophysiological signals and execute interventions within internal organs, promising to revolutionize the diagnosing, monitoring, and treatment of various pathological conditions. However, challenges remain in improving excessive impedance at the bioelectronic-tissue interface and thus the efficacy of electrophysiological signaling and intervention. Here, we devise orbit symmetry breaking in MXene (a low-cost scalability, biocompatible, and conductive 2D layered material, that we refer to as OBXene), that exhibits low bioelectronic-tissue impedance, originating from the out-of-plane charge transfer. Furthermore, the Schottky-induced piezoelectricity stemming from the asymmetric orbital configuration of OBXene facilitates interlayered charge transport in the device. In this study, we report an OBXene-based cardiac patch applied on the left ventricular epicardium of both rodent and porcine models to enable spatiotemporal epicardium map** and pacing, while coupling the wireless and battery-free operation for long-term real-time recording and closed-loop stimulation.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
CityGPT: Empowering Urban Spatial Cognition of Large Language Models
Authors:
Jie Feng,
Yuwei Du,
Tianhui Liu,
Siqi Guo,
Yuming Lin,
Yong Li
Abstract:
Large language models(LLMs) with powerful language generation and reasoning capabilities have already achieved success in many domains, e.g., math and code generation. However, due to the lacking of physical world's corpus and knowledge during training, they usually fail to solve many real-life tasks in the urban space. In this paper, we propose CityGPT, a systematic framework for enhancing the ca…
▽ More
Large language models(LLMs) with powerful language generation and reasoning capabilities have already achieved success in many domains, e.g., math and code generation. However, due to the lacking of physical world's corpus and knowledge during training, they usually fail to solve many real-life tasks in the urban space. In this paper, we propose CityGPT, a systematic framework for enhancing the capability of LLMs on understanding urban space and solving the related urban tasks by building a city-scale world model in the model. First, we construct a diverse instruction tuning dataset CityInstruction for injecting urban knowledge and enhancing spatial reasoning capability effectively. By using a mixture of CityInstruction and general instruction data, we fine-tune various LLMs (e.g., ChatGLM3-6B, Qwen1.5 and LLama3 series) to enhance their capability without sacrificing general abilities. To further validate the effectiveness of proposed methods, we construct a comprehensive benchmark CityEval to evaluate the capability of LLMs on diverse urban scenarios and problems. Extensive evaluation results demonstrate that small LLMs trained with CityInstruction can achieve competitive performance with commercial LLMs in the comprehensive evaluation of CityEval. The source codes are openly accessible to the research community via https://github.com/tsinghua-fib-lab/CityGPT.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
AspirinSum: an Aspect-based utility-preserved de-identification Summarization framework
Authors:
Ya-Lun Li
Abstract:
Due to the rapid advancement of Large Language Model (LLM), the whole community eagerly consumes any available text data in order to train the LLM. Currently, large portion of the available text data are collected from internet, which has been thought as a cheap source of the training data. However, when people try to extend the LLM's capability to the personal related domain, such as healthcare o…
▽ More
Due to the rapid advancement of Large Language Model (LLM), the whole community eagerly consumes any available text data in order to train the LLM. Currently, large portion of the available text data are collected from internet, which has been thought as a cheap source of the training data. However, when people try to extend the LLM's capability to the personal related domain, such as healthcare or education, the lack of public dataset in these domains make the adaption of the LLM in such domains much slower. The reason of lacking public available dataset in such domains is because they usually contain personal sensitive information. In order to comply with privacy law, the data in such domains need to be de-identified before any kind of dissemination. It had been much research tried to address this problem for the image or tabular data. However, there was limited research on the efficient and general de-identification method for text data. Most of the method based on human annotation or predefined category list. It usually can not be easily adapted to specific domains. The goal of this proposal is to develop a text de-identification framework, which can be easily adapted to the specific domain, leverage the existing expert knowledge without further human annotation. We propose an aspect-based utility-preserved de-identification summarization framework, AspirinSum, by learning to align expert's aspect from existing comment data, it can efficiently summarize the personal sensitive document by extracting personal sensitive aspect related sub-sentence and de-identify it by substituting it with similar aspect sub-sentence. We envision that the de-identified text can then be used in data publishing, eventually publishing our de-identified dataset for downstream task use.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
CityBench: Evaluating the Capabilities of Large Language Model as World Model
Authors:
Jie Feng,
Jun Zhang,
Junbo Yan,
Xin Zhang,
Tianjian Ouyang,
Tianhui Liu,
Yuwei Du,
Siqi Guo,
Yong Li
Abstract:
Large language models (LLMs) with powerful generalization ability has been widely used in many domains. A systematic and reliable evaluation of LLMs is a crucial step in their development and applications, especially for specific professional fields. In the urban domain, there have been some early explorations about the usability of LLMs, but a systematic and scalable evaluation benchmark is still…
▽ More
Large language models (LLMs) with powerful generalization ability has been widely used in many domains. A systematic and reliable evaluation of LLMs is a crucial step in their development and applications, especially for specific professional fields. In the urban domain, there have been some early explorations about the usability of LLMs, but a systematic and scalable evaluation benchmark is still lacking. The challenge in constructing a systematic evaluation benchmark for the urban domain lies in the diversity of data and scenarios, as well as the complex and dynamic nature of cities. In this paper, we propose CityBench, an interactive simulator based evaluation platform, as the first systematic evaluation benchmark for the capability of LLMs for urban domain. First, we build CitySim to integrate the multi-source data and simulate fine-grained urban dynamics. Based on CitySim, we design 7 tasks in 2 categories of perception-understanding and decision-making group to evaluate the capability of LLMs as city-scale world model for urban domain. Due to the flexibility and ease-of-use of CitySim, our evaluation platform CityBench can be easily extended to any city in the world. We evaluate 13 well-known LLMs including open source LLMs and commercial LLMs in 13 cities around the world. Extensive experiments demonstrate the scalability and effectiveness of proposed CityBench and shed lights for the future development of LLMs in urban domain. The dataset, benchmark and source codes are openly accessible to the research community via https://github.com/tsinghua-fib-lab/CityBench
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
UpDLRM: Accelerating Personalized Recommendation using Real-World PIM Architecture
Authors:
Sitian Chen,
Haobin Tan,
Amelie Chi Zhou,
Yusen Li,
Pavan Balaji
Abstract:
Deep Learning Recommendation Models (DLRMs) have gained popularity in recommendation systems due to their effectiveness in handling large-scale recommendation tasks. The embedding layers of DLRMs have become the performance bottleneck due to their intensive needs on memory capacity and memory bandwidth. In this paper, we propose UpDLRM, which utilizes real-world processingin-memory (PIM) hardware,…
▽ More
Deep Learning Recommendation Models (DLRMs) have gained popularity in recommendation systems due to their effectiveness in handling large-scale recommendation tasks. The embedding layers of DLRMs have become the performance bottleneck due to their intensive needs on memory capacity and memory bandwidth. In this paper, we propose UpDLRM, which utilizes real-world processingin-memory (PIM) hardware, UPMEM DPU, to boost the memory bandwidth and reduce recommendation latency. The parallel nature of the DPU memory can provide high aggregated bandwidth for the large number of irregular memory accesses in embedding lookups, thus offering great potential to reduce the inference latency. To fully utilize the DPU memory bandwidth, we further studied the embedding table partitioning problem to achieve good workload-balance and efficient data caching. Evaluations using real-world datasets show that, UpDLRM achieves much lower inference time for DLRM compared to both CPU-only and CPU-GPU hybrid counterparts.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
EndoUIC: Promptable Diffusion Transformer for Unified Illumination Correction in Capsule Endoscopy
Authors:
Long Bai,
Qiaozhi Tan,
Tong Chen,
Wan Jun Nah,
Yanheng Li,
Zhicheng He,
Sishen Yuan,
Zhen Chen,
**lin Wu,
Mobarakol Islam,
Zhen Li,
Hongbin Liu,
Hongliang Ren
Abstract:
Wireless Capsule Endoscopy (WCE) is highly valued for its non-invasive and painless approach, though its effectiveness is compromised by uneven illumination from hardware constraints and complex internal dynamics, leading to overexposed or underexposed images. While researchers have discussed the challenges of low-light enhancement in WCE, the issue of correcting for different exposure levels rema…
▽ More
Wireless Capsule Endoscopy (WCE) is highly valued for its non-invasive and painless approach, though its effectiveness is compromised by uneven illumination from hardware constraints and complex internal dynamics, leading to overexposed or underexposed images. While researchers have discussed the challenges of low-light enhancement in WCE, the issue of correcting for different exposure levels remains underexplored. To tackle this, we introduce EndoUIC, a WCE unified illumination correction solution using an end-to-end promptable diffusion transformer (DFT) model. In our work, the illumination prompt module shall navigate the model to adapt to different exposure levels and perform targeted image enhancement, in which the Adaptive Prompt Integration (API) and Global Prompt Scanner (GPS) modules shall further boost the concurrent representation learning between the prompt parameters and features. Besides, the U-shaped restoration DFT model shall capture the long-range dependencies and contextual information for unified illumination restoration. Moreover, we present a novel Capsule-endoscopy Exposure Correction (CEC) dataset, including ground-truth and corrupted image pairs annotated by expert photographers. Extensive experiments against a variety of state-of-the-art (SOTA) methods on four datasets showcase the effectiveness of our proposed method and components in WCE illumination restoration, and the additional downstream experiments further demonstrate its utility for clinical diagnosis and surgical assistance.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Dual-Phase Accelerated Prompt Optimization
Authors:
Muchen Yang,
Moxin Li,
Yongle Li,
Zijun Chen,
Chongming Gao,
Junqi Zhang,
Yangyang Li,
Fuli Feng
Abstract:
Gradient-free prompt optimization methods have made significant strides in enhancing the performance of closed-source Large Language Models (LLMs) across a wide range of tasks. However, existing approaches make light of the importance of high-quality prompt initialization and the identification of effective optimization directions, thus resulting in substantial optimization steps to obtain satisfa…
▽ More
Gradient-free prompt optimization methods have made significant strides in enhancing the performance of closed-source Large Language Models (LLMs) across a wide range of tasks. However, existing approaches make light of the importance of high-quality prompt initialization and the identification of effective optimization directions, thus resulting in substantial optimization steps to obtain satisfactory performance. In this light, we aim to accelerate prompt optimization process to tackle the challenge of low convergence rate. We propose a dual-phase approach which starts with generating high-quality initial prompts by adopting a well-designed meta-instruction to delve into task-specific information, and iteratively optimize the prompts at the sentence level, leveraging previous tuning experience to expand prompt candidates and accept effective ones. Extensive experiments on eight datasets demonstrate the effectiveness of our proposed method, achieving a consistent accuracy gain over baselines with less than five optimization steps.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Multi-messenger modeling of the Monogem pulsar halo
Authors:
Youyou Li,
Oscar Macias,
Shinichiro Ando,
Jacco Vink
Abstract:
The High-Altitude Water Cherenkov Telescope (HAWC) has detected TeV halos associated with two nearby pulsars/pulsar wind nebulae (PWN) -- Geminga and B0656+14. These TeV halos extend up to tens of pc from the central accelerators, indicating that the diffusion of ultrarelativistic electrons and positrons in the interstellar medium has been suppressed by two orders of magnitude. Although Geminga an…
▽ More
The High-Altitude Water Cherenkov Telescope (HAWC) has detected TeV halos associated with two nearby pulsars/pulsar wind nebulae (PWN) -- Geminga and B0656+14. These TeV halos extend up to tens of pc from the central accelerators, indicating that the diffusion of ultrarelativistic electrons and positrons in the interstellar medium has been suppressed by two orders of magnitude. Although Geminga and B0656+14 are at similar distances and in the same field of view, they have distinct histories. Notably, B0656+14 probably still resides within its parent supernova remnant, the Monogem Ring, which can be observed in X-rays. In this work, we perform high-resolution simulations of the propagation and emission of relativistic lepton pairs around B0656+14 using a two-zone diffusion model using the GALPROP numerical code. We compared the predicted inverse-Compton spectrum to the observations made by HAWC and Fermi-LAT and found physically plausible model parameters that resulted in a good fit to the data. Additionally, we estimated the contribution of this TeV-halo to the positron flux observed on Earth and found it to be smaller than 10\% of the measured flux. We conclude that future observations of the TeV halo and its synchrotron emission counterpart in radio and X-ray frequencies will be crucial to distinguish between various possible models.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Low-Latency Layer-Aware Proactive and Passive Container Migration in Meta Computing
Authors:
Mengjie Liu,
Yihua Li,
Fangyi Mou,
Zhiqing Tang,
Jiong Lou,
Jianxiong Guo,
Weijia Jia
Abstract:
Meta computing is a new computing paradigm that aims to efficiently utilize all network computing resources to provide fault-tolerant, personalized services with strong security and privacy guarantees. It also seeks to virtualize the Internet as many meta computers. In meta computing, tasks can be assigned to containers at edge nodes for processing, based on container images with multiple layers.…
▽ More
Meta computing is a new computing paradigm that aims to efficiently utilize all network computing resources to provide fault-tolerant, personalized services with strong security and privacy guarantees. It also seeks to virtualize the Internet as many meta computers. In meta computing, tasks can be assigned to containers at edge nodes for processing, based on container images with multiple layers. The dynamic and resource-constrained nature of meta computing environments requires an optimal container migration strategy for mobile users to minimize latency. However, the problem of container migration in meta computing has not been thoroughly explored. To address this gap, we present low-latency, layer-aware container migration strategies that consider both proactive and passive migration. Specifically: 1) We formulate the container migration problem in meta computing, taking into account layer dependencies to reduce migration costs and overall task duration by considering four delays. 2) We introduce a reinforcement learning algorithm based on policy gradients to minimize total latency by identifying layer dependencies for action selection, making decisions for both proactive and passive migration. Expert demonstrations are introduced to enhance exploitation. 3) Experiments using real data trajectories show that the algorithm outperforms baseline algorithms, achieving lower total latency.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
The association of domain-specific physical activity and sedentary activity with stroke: A prospective cohort study
Authors:
Xinyi He,
Shidi Wang,
Yi Li,
Jiucun Wang,
Guangrui Yang,
Jun Chen,
Zixin Hu
Abstract:
Background The incidence of stroke places a heavy burden on both society and individuals. Activity is closely related to cardiovascular health. This study aimed to investigate the relationship between the varying domains of PA, like occupation-related Physical Activity (OPA), transportation-related Physical Activity (TPA), leisure-time Physical Activity (LTPA), and Sedentary Activity (SA) with str…
▽ More
Background The incidence of stroke places a heavy burden on both society and individuals. Activity is closely related to cardiovascular health. This study aimed to investigate the relationship between the varying domains of PA, like occupation-related Physical Activity (OPA), transportation-related Physical Activity (TPA), leisure-time Physical Activity (LTPA), and Sedentary Activity (SA) with stroke. Methods Our analysis included 30,400 participants aged 20+ years from 2007 to 2018 National Health and Nutrition Examination Survey (NHANES). Stroke was identified based on the participant's self-reported diagnoses from previous medical consultations, and PA and SA were self-reported. Multivariable logistic and restricted cubic spline models were used to assess the associations. Results Participants achieving PA guidelines (performing PA more than 150 min/week) were 35.7% less likely to have a stroke based on both the total PA (odds ratio [OR] 0.643, 95% confidence interval [CI] 0.523-0.790) and LTPA (OR 0.643, 95% CI 0.514-0.805), while OPA or TPA did not demonstrate lower stroke risk. Furthermore, participants with less than 7.5 h/day SA levels were 21.6% (OR 0.784, 95% CI 0.665-0.925) less likely to have a stroke. The intensities of total PA and LTPA exhibited nonlinear U-shaped associations with stroke risk. In contrast, those of OPA and TPA showed negative linear associations, while SA intensities were positively linearly correlated with stroke risk. Conclusions LTPA, but not OPA or TPA, was associated with a lower risk of stroke at any amount, suggesting that significant cardiovascular health would benefit from increased PA. Additionally, the positive association between SA and stroke indicated that prolonged sitting was detrimental to cardiovascular health. Overall, increased PA within a reasonable range reduces the risk of stroke, while increased SA elevates it.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Toward Structure Fairness in Dynamic Graph Embedding: A Trend-aware Dual Debiasing Approach
Authors:
Yicong Li,
Yu Yang,
Jiannong Cao,
Shuaiqi Liu,
Haoran Tang,
Guandong Xu
Abstract:
Recent studies successfully learned static graph embeddings that are structurally fair by preventing the effectiveness disparity of high- and low-degree vertex groups in downstream graph mining tasks. However, achieving structure fairness in dynamic graph embedding remains an open problem. Neglecting degree changes in dynamic graphs will significantly impair embedding effectiveness without notably…
▽ More
Recent studies successfully learned static graph embeddings that are structurally fair by preventing the effectiveness disparity of high- and low-degree vertex groups in downstream graph mining tasks. However, achieving structure fairness in dynamic graph embedding remains an open problem. Neglecting degree changes in dynamic graphs will significantly impair embedding effectiveness without notably improving structure fairness. This is because the embedding performance of high-degree and low-to-high-degree vertices will significantly drop close to the generally poorer embedding performance of most slightly changed vertices in the long-tail part of the power-law distribution. We first identify biased structural evolutions in a dynamic graph based on the evolving trend of vertex degree and then propose FairDGE, the first structurally Fair Dynamic Graph Embedding algorithm. FairDGE learns biased structural evolutions by jointly embedding the connection changes among vertices and the long-short-term evolutionary trend of vertex degrees. Furthermore, a novel dual debiasing approach is devised to encode fair embeddings contrastively, customizing debiasing strategies for different biased structural evolutions. This innovative debiasing strategy breaks the effectiveness bottleneck of embeddings without notable fairness loss. Extensive experiments demonstrate that FairDGE achieves simultaneous improvement in the effectiveness and fairness of embeddings.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
PRESTO: Progressive Pretraining Enhances Synthetic Chemistry Outcomes
Authors:
He Cao,
Yanjun Shao,
Zhiyuan Liu,
Zi**g Liu,
Xiangru Tang,
Yuan Yao,
Yu Li
Abstract:
Multimodal Large Language Models (MLLMs) have seen growing adoption across various scientific disciplines. These advancements encourage the investigation of molecule-text modeling within synthetic chemistry, a field dedicated to designing and conducting chemical reactions to synthesize new compounds with desired properties and applications. Current approaches, however, often neglect the critical r…
▽ More
Multimodal Large Language Models (MLLMs) have seen growing adoption across various scientific disciplines. These advancements encourage the investigation of molecule-text modeling within synthetic chemistry, a field dedicated to designing and conducting chemical reactions to synthesize new compounds with desired properties and applications. Current approaches, however, often neglect the critical role of multiple molecule graph interaction in understanding chemical reactions, leading to suboptimal performance in synthetic chemistry tasks. This study introduces PRESTO(Progressive Pretraining Enhances Synthetic Chemistry Outcomes), a new framework that bridges the molecule-text modality gap by integrating a comprehensive benchmark of pretraining strategies and dataset configurations. It progressively improves multimodal LLMs through cross-modal alignment and multi-graph understanding. Our extensive experiments demonstrate that PRESTO offers competitive results in downstream synthetic chemistry tasks. The code can be found at https://github.com/IDEA-XL/PRESTO.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
A spectral Erdős-Faudree-Rousseau theorem
Authors:
Yongtao Li,
Lihua Feng,
Yuejian Peng
Abstract:
A well-known theorem of Mantel states that every $n$-vertex graph with more than $\lfloor n^2/4\rfloor $ edges contains a triangle. An interesting problem in extremal graph theory studies the minimum number of edges contained in triangles among graphs with a prescribed number of vertices and edges. Erdős, Faudree and Rousseau (1992) showed that a graph on $n$ vertices with more than…
▽ More
A well-known theorem of Mantel states that every $n$-vertex graph with more than $\lfloor n^2/4\rfloor $ edges contains a triangle. An interesting problem in extremal graph theory studies the minimum number of edges contained in triangles among graphs with a prescribed number of vertices and edges. Erdős, Faudree and Rousseau (1992) showed that a graph on $n$ vertices with more than $\lfloor n^2/4\rfloor $ edges contains at least $2\lfloor n/2\rfloor +1$ edges in triangles. Such edges are called triangular edges. In this paper, we present a spectral version of the result of Erdős, Faudree and Rousseau. Using the supersaturation-stability and the spectral technique, we prove that every $n$-vertex graph $G$ with $λ(G) \ge \sqrt{\lfloor n^2/4\rfloor}$ contains at least $2 \lfloor {n}/{2} \rfloor -1$ triangular edges, unless $G$ is a balanced complete bipartite graph. The method in our paper has some interesting applications. Firstly, the supersaturation-stability can be used to revisit a conjecture of Erdős concerning with the booksize of a graph, which was initially proved by Edwards (unpublished), and independently by Khadžiivanov and Nikiforov (1979). Secondly, our method can improve the bound on the order $n$ of a graph by drop** the condition on $n$ being sufficiently large, which is obtained from the triangle removal lemma. Thirdly, the supersaturation-stability can be applied to deal with the spectral extremal graph problems on counting triangles, which was recently studied by Ning and Zhai (2023).
△ Less
Submitted 18 June, 2024;
originally announced June 2024.