-
A Two-Phase Recall-and-Select Framework for Fast Model Selection
Authors:
Jianwei Cui,
Wenhang Shi,
Honglin Tao,
Wei Lu,
Xiaoyong Du
Abstract:
As the ubiquity of deep learning in various machine learning applications has amplified, a proliferation of neural network models has been trained and shared on public model repositories. In the context of a targeted machine learning assignment, utilizing an apt source model as a starting point typically outperforms the strategy of training from scratch, particularly with limited training data. De…
▽ More
As the ubiquity of deep learning in various machine learning applications has amplified, a proliferation of neural network models has been trained and shared on public model repositories. In the context of a targeted machine learning assignment, utilizing an apt source model as a starting point typically outperforms the strategy of training from scratch, particularly with limited training data. Despite the investigation and development of numerous model selection strategies in prior work, the process remains time-consuming, especially given the ever-increasing scale of model repositories. In this paper, we propose a two-phase (coarse-recall and fine-selection) model selection framework, aiming to enhance the efficiency of selecting a robust model by leveraging the models' training performances on benchmark datasets. Specifically, the coarse-recall phase clusters models showcasing similar training performances on benchmark datasets in an offline manner. A light-weight proxy score is subsequently computed between this model cluster and the target dataset, which serves to recall a significantly smaller subset of potential candidate models in a swift manner. In the following fine-selection phase, the final model is chosen by fine-tuning the recalled models on the target dataset with successive halving. To accelerate the process, the final fine-tuning performance of each potential model is predicted by mining the model's convergence trend on the benchmark datasets, which aids in filtering lower performance models more earlier during fine-tuning. Through extensive experimentation on tasks covering natural language processing and computer vision, it has been demonstrated that the proposed methodology facilitates the selection of a high-performing model at a rate about 3x times faster than conventional baseline methods. Our code is available at https://github.com/plasware/two-phase-selection.
△ Less
Submitted 28 March, 2024;
originally announced April 2024.
-
TAFormer: A Unified Target-Aware Transformer for Video and Motion Joint Prediction in Aerial Scenes
Authors:
Liangyu Xu,
Wanxuan Lu,
Hongfeng Yu,
Yongqiang Mao,
Hanbo Bi,
Chenglong Liu,
Xian Sun,
Kun Fu
Abstract:
As drone technology advances, using unmanned aerial vehicles for aerial surveys has become the dominant trend in modern low-altitude remote sensing. The surge in aerial video data necessitates accurate prediction for future scenarios and motion states of the interested target, particularly in applications like traffic management and disaster response. Existing video prediction methods focus solely…
▽ More
As drone technology advances, using unmanned aerial vehicles for aerial surveys has become the dominant trend in modern low-altitude remote sensing. The surge in aerial video data necessitates accurate prediction for future scenarios and motion states of the interested target, particularly in applications like traffic management and disaster response. Existing video prediction methods focus solely on predicting future scenes (video frames), suffering from the neglect of explicitly modeling target's motion states, which is crucial for aerial video interpretation. To address this issue, we introduce a novel task called Target-Aware Aerial Video Prediction, aiming to simultaneously predict future scenes and motion states of the target. Further, we design a model specifically for this task, named TAFormer, which provides a unified modeling approach for both video and target motion states. Specifically, we introduce Spatiotemporal Attention (STA), which decouples the learning of video dynamics into spatial static attention and temporal dynamic attention, effectively modeling the scene appearance and motion. Additionally, we design an Information Sharing Mechanism (ISM), which elegantly unifies the modeling of video and target motion by facilitating information interaction through two sets of messenger tokens. Moreover, to alleviate the difficulty of distinguishing targets in blurry predictions, we introduce Target-Sensitive Gaussian Loss (TSGL), enhancing the model's sensitivity to both target's position and content. Extensive experiments on UAV123VP and VisDroneVP (derived from single-object tracking datasets) demonstrate the exceptional performance of TAFormer in target-aware video prediction, showcasing its adaptability to the additional requirements of aerial video interpretation for target awareness.
△ Less
Submitted 27 March, 2024;
originally announced March 2024.
-
Revealing the Microscopic Mechanism of Elementary Vortex Pinning in Superconductors
Authors:
C. Chen,
Y. Liu,
Y. Chen,
Y. N. Hu,
T. Z. Zhang,
D. Li,
X. Wang,
C. X. Wang,
Z. Y. W. Lu,
Y. H. Zhang,
Q. L. Zhang,
X. L. Dong,
R. Wang,
D. L. Feng,
T. Zhang
Abstract:
Vortex pinning is a crucial factor that determines the critical current of practical superconductors. However, the understanding of its underlying mechanism has long been phenomenological without a clear microscopic description. Here using high-resolution scanning tunneling microscopy, we studied single vortex pinning induced by point defect in layered FeSe-based superconductors. We found the defe…
▽ More
Vortex pinning is a crucial factor that determines the critical current of practical superconductors. However, the understanding of its underlying mechanism has long been phenomenological without a clear microscopic description. Here using high-resolution scanning tunneling microscopy, we studied single vortex pinning induced by point defect in layered FeSe-based superconductors. We found the defect-vortex interaction drives low-energy vortex bound states away from EF, resulting a mini gap which effectively lowered the energy of vortex and caused the pinning. By measuring the local density-of-states, we directly obtained the elementary pinning energy and estimated the pinning force through the spatial gradient of pinning energy. The results align with the bulk critical current measurement. We further show that a general microscopic quantum model with considering defect-vortex interaction can well capture our observation. It indicates the local pairing near pinned vortex core is actually enhanced, which is beyond the traditional understanding that non-superconducting regions pin vortices. Our study thus revealed a general microscopic mechanism of vortex pinning in superconductors.
△ Less
Submitted 26 March, 2024;
originally announced March 2024.
-
SFOD: Spiking Fusion Object Detector
Authors:
Yimeng Fan,
Wei Zhang,
Changsong Liu,
Mingyang Li,
Wenrui Lu
Abstract:
Event cameras, characterized by high temporal resolution, high dynamic range, low power consumption, and high pixel bandwidth, offer unique capabilities for object detection in specialized contexts. Despite these advantages, the inherent sparsity and asynchrony of event data pose challenges to existing object detection algorithms. Spiking Neural Networks (SNNs), inspired by the way the human brain…
▽ More
Event cameras, characterized by high temporal resolution, high dynamic range, low power consumption, and high pixel bandwidth, offer unique capabilities for object detection in specialized contexts. Despite these advantages, the inherent sparsity and asynchrony of event data pose challenges to existing object detection algorithms. Spiking Neural Networks (SNNs), inspired by the way the human brain codes and processes information, offer a potential solution to these difficulties. However, their performance in object detection using event cameras is limited in current implementations. In this paper, we propose the Spiking Fusion Object Detector (SFOD), a simple and efficient approach to SNN-based object detection. Specifically, we design a Spiking Fusion Module, achieving the first-time fusion of feature maps from different scales in SNNs applied to event cameras. Additionally, through integrating our analysis and experiments conducted during the pretraining of the backbone network on the NCAR dataset, we delve deeply into the impact of spiking decoding strategies and loss functions on model performance. Thereby, we establish state-of-the-art classification results based on SNNs, achieving 93.7\% accuracy on the NCAR dataset. Experimental results on the GEN1 detection dataset demonstrate that the SFOD achieves a state-of-the-art mAP of 32.1\%, outperforming existing SNN-based approaches. Our research not only underscores the potential of SNNs in object detection with event cameras but also propels the advancement of SNNs. Code is available at https://github.com/yimeng-fan/SFOD.
△ Less
Submitted 22 March, 2024;
originally announced March 2024.
-
Hilbert's Irreducibility Theorem for Linear Differential Operators
Authors:
Ruyong Feng,
Zewang Guo,
Wei Lu
Abstract:
We prove a differential analogue of Hilbert's irreducibility theorem. Let $\mathcal{L}$ be a linear differential operator with coefficients in $C(\mathbb{X})(x)$ that is irreducible over $\overline{C(\mathbb{X})}(x)$, where $\mathbb{X}$ is an irreducible affine algebraic variety over an algebraically closed field $C$ of characteristic zero. We show that the set of $c\in \mathbb{X}(C)$ such that th…
▽ More
We prove a differential analogue of Hilbert's irreducibility theorem. Let $\mathcal{L}$ be a linear differential operator with coefficients in $C(\mathbb{X})(x)$ that is irreducible over $\overline{C(\mathbb{X})}(x)$, where $\mathbb{X}$ is an irreducible affine algebraic variety over an algebraically closed field $C$ of characteristic zero. We show that the set of $c\in \mathbb{X}(C)$ such that the specialized operator $\mathcal{L}^c$ of $\mathcal{L}$ remains irreducible over $C(x)$ is Zariski dense in $\mathbb{X}(C)$.
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
Giant electrode effect on tunneling magnetoresistance and electroresistance in van der Waals intrinsic multiferroic tunnel junctions using VS2
Authors:
Zhi Yan,
Ruixia Yang,
Cheng Fang,
Wentian Lu,
Xiaohong Xu
Abstract:
Van der Waals multiferroic tunnel junctions (vdW-MFTJs) with multiple nonvolatile resistive states are highly suitable for new physics and next-generation storage electronics. However, currently reported vdW-MFTJs are based on two types of materials, i.e., vdW ferromagnetic and ferroelectric materials, forming a multiferroic system. This undoubtedly introduces additional interfaces, increasing the…
▽ More
Van der Waals multiferroic tunnel junctions (vdW-MFTJs) with multiple nonvolatile resistive states are highly suitable for new physics and next-generation storage electronics. However, currently reported vdW-MFTJs are based on two types of materials, i.e., vdW ferromagnetic and ferroelectric materials, forming a multiferroic system. This undoubtedly introduces additional interfaces, increasing the complexity of experimental preparation. Herein, we engineer vdW intrinsic MFTJs utilizing bilayer VS$_2$. By employing the nonequilibrium Green's function combined with density functional theory, we systematically investigate the influence of three types of electrodes (including non-vdW pure metal Ag/Au, vdW metallic 1T-MoS$_2$/2H-PtTe$_2$, and vdW ferromagnetic metallic Fe$_3$GaTe$_2$/Fe$_3$GeTe$_2$) on the electronic transport properties of VS$_2$-based intrinsic MFTJs. We demonstrate that these MFTJs manifest a giant electrode-dependent electronic transport characteristic effect. Comprehensively comparing these electrode pairs, the Fe$_3$GaTe$_2$/Fe$_3$GeTe$_2$ electrode combination exhibits optimal transport properties, the maximum TMR (TER) can reach 10949\% (69\%) and the minimum resistance-area product (RA) is 0.45 $Ω$$μ$m$^{2}$, as well as the perfect spin filtering and negative differential resistance effects. More intriguingly, TMR (TER) can be further enhanced to 34000\% (380\%) by applying an external bias voltage (0.1 V), while RA can be reduced to 0.16 $Ω$$μ$m$^{2}$ under the influence of biaxial stress (-3\%). Our proposed concept of designing vdW-MFTJs using intrinsic multiferroic materials points towards new avenues in experimental exploration.
△ Less
Submitted 7 May, 2024; v1 submitted 19 March, 2024;
originally announced March 2024.
-
Development of Automated Neural Network Prediction for Echocardiographic Left ventricular Ejection Fraction
Authors:
Yuting Zhang,
Boyang Liu,
Karina V. Bunting,
David Brind,
Alexander Thorley,
Andreas Karwath,
Wenqi Lu,
Diwei Zhou,
Xiaoxia Wang,
Alastair R. Mobley,
Otilia Tica,
Georgios Gkoutos,
Dipak Kotecha,
**ming Duan
Abstract:
The echocardiographic measurement of left ventricular ejection fraction (LVEF) is fundamental to the diagnosis and classification of patients with heart failure (HF). In order to quantify LVEF automatically and accurately, this paper proposes a new pipeline method based on deep neural networks and ensemble learning. Within the pipeline, an Atrous Convolutional Neural Network (ACNN) was first train…
▽ More
The echocardiographic measurement of left ventricular ejection fraction (LVEF) is fundamental to the diagnosis and classification of patients with heart failure (HF). In order to quantify LVEF automatically and accurately, this paper proposes a new pipeline method based on deep neural networks and ensemble learning. Within the pipeline, an Atrous Convolutional Neural Network (ACNN) was first trained to segment the left ventricle (LV), before employing the area-length formulation based on the ellipsoid single-plane model to calculate LVEF values. This formulation required inputs of LV area, derived from segmentation using an improved Jeffrey's method, as well as LV length, derived from a novel ensemble learning model. To further improve the pipeline's accuracy, an automated peak detection algorithm was used to identify end-diastolic and end-systolic frames, avoiding issues with human error. Subsequently, single-beat LVEF values were averaged across all cardiac cycles to obtain the final LVEF. This method was developed and internally validated in an open-source dataset containing 10,030 echocardiograms. The Pearson's correlation coefficient was 0.83 for LVEF prediction compared to expert human analysis (p<0.001), with a subsequent area under the receiver operator curve (AUROC) of 0.98 (95% confidence interval 0.97 to 0.99) for categorisation of HF with reduced ejection (HFrEF; LVEF<40%). In an external dataset with 200 echocardiograms, this method achieved an AUC of 0.90 (95% confidence interval 0.88 to 0.91) for HFrEF assessment. This study demonstrates that an automated neural network-based calculation of LVEF is comparable to expert clinicians performing time-consuming, frame-by-frame manual evaluation of cardiac systolic function.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
Lion: Minimizing Distributed Transactions through Adaptive Replica Provision (Extended Version)
Authors:
Qiushi Zheng,
Zhanhao Zhao,
Wei Lu,
Chang Yao,
Yuxing Chen,
Anqun Pan,
Xiaoyong Du
Abstract:
Distributed transaction processing often involves multiple rounds of cross-node communications, and therefore tends to be slow. To improve performance, existing approaches convert distributed transactions into single-node transactions by either migrating co-accessed partitions onto the same nodes or establishing a super node housing replicas of the entire database. However, migration-based methods…
▽ More
Distributed transaction processing often involves multiple rounds of cross-node communications, and therefore tends to be slow. To improve performance, existing approaches convert distributed transactions into single-node transactions by either migrating co-accessed partitions onto the same nodes or establishing a super node housing replicas of the entire database. However, migration-based methods might cause transactions to be blocked due to waiting for data migration, while the super node can become a bottleneck. In this paper, we present Lion, a novel transaction processing protocol that utilizes partition-based replication to reduce the occurrence of distributed transactions. Lion aims to assign a node with one replica from each partition involved in a given transaction's read or write operations. To ensure such a node is available, we propose an adaptive replica provision mechanism, enhanced with an LSTM-based workload prediction algorithm, to determine the appropriate node for locating replicas of co-accessed partitions. The adaptation of replica placement is conducted preemptively and asynchronously, thereby minimizing its impact on performance. By employing this adaptive replica placement strategy, we ensure that the majority of transactions can be efficiently processed on a single node without additional overhead. Only a small fraction of transactions will need to be treated as regular distributed transactions when such a node is unavailable. Consequently, Lion effectively minimizes distributed transactions while avoiding any disruption caused by data migration or the creation of a super node. We conduct extensive experiments to compare Lion against various transaction processing protocols. The results show that Lion achieves up to 2.7x higher throughput and 76.4% better scalability against these state-of-the-art approaches.
△ Less
Submitted 17 March, 2024;
originally announced March 2024.
-
Re-Search for The Truth: Multi-round Retrieval-augmented Large Language Models are Strong Fake News Detectors
Authors:
Guanghua Li,
Wensheng Lu,
Wei Zhang,
Defu Lian,
Kezhong Lu,
Rui Mao,
Kai Shu,
Hao Liao
Abstract:
The proliferation of fake news has had far-reaching implications on politics, the economy, and society at large. While Fake news detection methods have been employed to mitigate this issue, they primarily depend on two essential elements: the quality and relevance of the evidence, and the effectiveness of the verdict prediction mechanism. Traditional methods, which often source information from st…
▽ More
The proliferation of fake news has had far-reaching implications on politics, the economy, and society at large. While Fake news detection methods have been employed to mitigate this issue, they primarily depend on two essential elements: the quality and relevance of the evidence, and the effectiveness of the verdict prediction mechanism. Traditional methods, which often source information from static repositories like Wikipedia, are limited by outdated or incomplete data, particularly for emerging or rare claims. Large Language Models (LLMs), known for their remarkable reasoning and generative capabilities, introduce a new frontier for fake news detection. However, like traditional methods, LLM-based solutions also grapple with the limitations of stale and long-tail knowledge. Additionally, retrieval-enhanced LLMs frequently struggle with issues such as low-quality evidence retrieval and context length constraints. To address these challenges, we introduce a novel, retrieval-augmented LLMs framework--the first of its kind to automatically and strategically extract key evidence from web sources for claim verification. Employing a multi-round retrieval strategy, our framework ensures the acquisition of sufficient, relevant evidence, thereby enhancing performance. Comprehensive experiments across three real-world datasets validate the framework's superiority over existing methods. Importantly, our model not only delivers accurate verdicts but also offers human-readable explanations to improve result interpretability.
△ Less
Submitted 13 March, 2024;
originally announced March 2024.
-
High precision proton beam monitor system concept design on CSNS based on SiC
Authors:
Ye He,
Xingchen Li,
Zijun Xu,
Ming Qi,
Congcong Wang,
Chenwei Wang,
Hai Lu,
Xiaojun Nie,
Ruirui Fan,
Hantao **g,
Weiming Song,
Keqi Wang,
Kai Liu,
Peilian Liu,
Hui Li,
Zaiyi Li,
Chenxi Fu,
Xiyuan Zhang,
Xiaoshen Kang,
Zhan Li,
Weiguo Lu,
Suyu Xiao,
Xin Shi
Abstract:
A high precision beam monitor system based on silicon carbide PIN sensor is designed for China Spallation Neutron Source 1.6 GeV proton beam to monitor the proton beam fluence.The concept design of the beam monitor system is finished together with front-end electronics with silicon carbide PIN sensors, readout system and mechanical system.Several tests are performed to study the performance of eac…
▽ More
A high precision beam monitor system based on silicon carbide PIN sensor is designed for China Spallation Neutron Source 1.6 GeV proton beam to monitor the proton beam fluence.The concept design of the beam monitor system is finished together with front-end electronics with silicon carbide PIN sensors, readout system and mechanical system.Several tests are performed to study the performance of each component of the system.The charge collection of the SiC PIN sensors after proton radiation is studied with 80 MeV proton beam for continuous running. Research on the performance of the front-end electronics and readout system is finished for better data acquisition.The uncertainty of proton beam fluence is below 1% in the beam monitor system.
△ Less
Submitted 14 March, 2024;
originally announced March 2024.
-
ProSwitch: Knowledge-Guided Instruction Tuning to Generate Professional and Non-Professional Styled Text
Authors:
Chang Zong,
Yuyan Chen,
Weiming Lu,
Jian Shao,
Yueting Zhuang
Abstract:
Large Language Models (LLMs) have demonstrated efficacy in various linguistic applications, including text summarization and controlled text generation. However, studies into their capacity of switching between styles via fine-tuning remain underexplored. This study concentrates on textual professionalism and introduces a novel methodology, named ProSwitch, which equips a language model with the a…
▽ More
Large Language Models (LLMs) have demonstrated efficacy in various linguistic applications, including text summarization and controlled text generation. However, studies into their capacity of switching between styles via fine-tuning remain underexplored. This study concentrates on textual professionalism and introduces a novel methodology, named ProSwitch, which equips a language model with the ability to produce both professional and non-professional responses through knowledge-guided instruction tuning. ProSwitch unfolds across three phases: data preparation for gathering domain knowledge and training corpus; instruction tuning for optimizing language models with multiple levels of instruction formats; and comprehensive evaluation for assessing the professionalism discrimination and reference-based quality of generated text. Comparative analysis of ProSwitch against both general and specialized language models reveals that our approach outperforms baselines in switching between professional and non-professional text generation.
△ Less
Submitted 15 April, 2024; v1 submitted 14 March, 2024;
originally announced March 2024.
-
Efficient Combinatorial Optimization via Heat Diffusion
Authors:
Hengyuan Ma,
Wenlian Lu,
Jianfeng Feng
Abstract:
Combinatorial optimization problems are widespread but inherently challenging due to their discrete nature.The primary limitation of existing methods is that they can only access a small fraction of the solution space at each iteration, resulting in limited efficiency for searching the global optimal. To overcome this challenge, diverging from conventional efforts of expanding the solver's search…
▽ More
Combinatorial optimization problems are widespread but inherently challenging due to their discrete nature.The primary limitation of existing methods is that they can only access a small fraction of the solution space at each iteration, resulting in limited efficiency for searching the global optimal. To overcome this challenge, diverging from conventional efforts of expanding the solver's search scope, we focus on enabling information to actively propagate to the solver through heat diffusion. By transforming the target function while preserving its optima, heat diffusion facilitates information flow from distant regions to the solver, providing more efficient navigation. Utilizing heat diffusion, we propose a framework for solving general combinatorial optimization problems. The proposed methodology demonstrates superior performance across a range of the most challenging and widely encountered combinatorial optimizations. Echoing recent advancements in harnessing thermodynamics for generative artificial intelligence, our study further reveals its significant potential in advancing combinatorial optimization.
△ Less
Submitted 14 March, 2024; v1 submitted 13 March, 2024;
originally announced March 2024.
-
Aligning Large Language Models for Controllable Recommendations
Authors:
Wensheng Lu,
Jianxun Lian,
Wei Zhang,
Guanghua Li,
Mingyang Zhou,
Hao Liao,
Xing Xie
Abstract:
Inspired by the exceptional general intelligence of Large Language Models (LLMs), researchers have begun to explore their application in pioneering the next generation of recommender systems - systems that are conversational, explainable, and controllable. However, existing literature primarily concentrates on integrating domain-specific knowledge into LLMs to enhance accuracy, often neglecting th…
▽ More
Inspired by the exceptional general intelligence of Large Language Models (LLMs), researchers have begun to explore their application in pioneering the next generation of recommender systems - systems that are conversational, explainable, and controllable. However, existing literature primarily concentrates on integrating domain-specific knowledge into LLMs to enhance accuracy, often neglecting the ability to follow instructions. To address this gap, we initially introduce a collection of supervised learning tasks, augmented with labels derived from a conventional recommender model, aimed at explicitly improving LLMs' proficiency in adhering to recommendation-specific instructions. Subsequently, we develop a reinforcement learning-based alignment procedure to further strengthen LLMs' aptitude in responding to users' intentions and mitigating formatting errors. Through extensive experiments on two real-world datasets, our method markedly advances the capability of LLMs to comply with instructions within recommender systems, while sustaining a high level of accuracy performance.
△ Less
Submitted 8 March, 2024;
originally announced March 2024.
-
Enhancing Cloud-Based Large Language Model Processing with Elasticsearch and Transformer Models
Authors:
Chunhe Ni,
Jiang Wu,
Hongbo Wang,
Wenran Lu,
Chenwei Zhang
Abstract:
Large Language Models (LLMs) are a class of generative AI models built using the Transformer network, capable of leveraging vast datasets to identify, summarize, translate, predict, and generate language. LLMs promise to revolutionize society, yet training these foundational models poses immense challenges. Semantic vector search within large language models is a potent technique that can signific…
▽ More
Large Language Models (LLMs) are a class of generative AI models built using the Transformer network, capable of leveraging vast datasets to identify, summarize, translate, predict, and generate language. LLMs promise to revolutionize society, yet training these foundational models poses immense challenges. Semantic vector search within large language models is a potent technique that can significantly enhance search result accuracy and relevance. Unlike traditional keyword-based search methods, semantic search utilizes the meaning and context of words to grasp the intent behind queries and deliver more precise outcomes. Elasticsearch emerges as one of the most popular tools for implementing semantic search an exceptionally scalable and robust search engine designed for indexing and searching extensive datasets. In this article, we delve into the fundamentals of semantic search and explore how to harness Elasticsearch and Transformer models to bolster large language model processing paradigms. We gain a comprehensive understanding of semantic search principles and acquire practical skills for implementing semantic search in real-world model application scenarios.
△ Less
Submitted 24 February, 2024;
originally announced March 2024.
-
Enhanced User Interaction in Operating Systems through Machine Learning Language Models
Authors:
Chenwei Zhang,
Wenran Lu,
Chunhe Ni,
Hongbo Wang,
Jiang Wu
Abstract:
With the large language model showing human-like logical reasoning and understanding ability, whether agents based on the large language model can simulate the interaction behavior of real users, so as to build a reliable virtual recommendation A/B test scene to help the application of recommendation research is an urgent, important and economic value problem. The combination of interaction design…
▽ More
With the large language model showing human-like logical reasoning and understanding ability, whether agents based on the large language model can simulate the interaction behavior of real users, so as to build a reliable virtual recommendation A/B test scene to help the application of recommendation research is an urgent, important and economic value problem. The combination of interaction design and machine learning can provide a more efficient and personalized user experience for products and services. This personalized service can meet the specific needs of users and improve user satisfaction and loyalty. Second, the interactive system can understand the user's views and needs for the product by providing a good user interface and interactive experience, and then use machine learning algorithms to improve and optimize the product. This iterative optimization process can continuously improve the quality and performance of the product to meet the changing needs of users. At the same time, designers need to consider how these algorithms and tools can be combined with interactive systems to provide a good user experience. This paper explores the potential applications of large language models, machine learning and interaction design for user interaction in recommendation systems and operating systems. By integrating these technologies, more intelligent and personalized services can be provided to meet user needs and promote continuous improvement and optimization of products. This is of great value for both recommendation research and user experience applications.
△ Less
Submitted 24 February, 2024;
originally announced March 2024.
-
Let LLMs Take on the Latest Challenges! A Chinese Dynamic Question Answering Benchmark
Authors:
Zhikun Xu,
Yinghui Li,
Ruixue Ding,
Xinyu Wang,
Boli Chen,
Yong Jiang,
Hai-Tao Zheng,
Wenlian Lu,
Pengjun Xie,
Fei Huang
Abstract:
How to better evaluate the capabilities of Large Language Models (LLMs) is the focal point and hot topic in current LLMs research. Previous work has noted that due to the extremely high cost of iterative updates of LLMs, they are often unable to answer the latest dynamic questions well. To promote the improvement of Chinese LLMs' ability to answer dynamic questions, in this paper, we introduce CDQ…
▽ More
How to better evaluate the capabilities of Large Language Models (LLMs) is the focal point and hot topic in current LLMs research. Previous work has noted that due to the extremely high cost of iterative updates of LLMs, they are often unable to answer the latest dynamic questions well. To promote the improvement of Chinese LLMs' ability to answer dynamic questions, in this paper, we introduce CDQA, a Chinese Dynamic QA benchmark containing question-answer pairs related to the latest news on the Chinese Internet. We obtain high-quality data through a pipeline that combines humans and models, and carefully classify the samples according to the frequency of answer changes to facilitate a more fine-grained observation of LLMs' capabilities. We have also evaluated and analyzed mainstream and advanced Chinese LLMs on CDQA. Extensive experiments and valuable insights suggest that our proposed CDQA is challenging and worthy of more further study. We believe that the benchmark we provide will become one of the key data resources for improving LLMs' Chinese question-answering ability in the future.
△ Less
Submitted 1 March, 2024; v1 submitted 29 February, 2024;
originally announced February 2024.
-
SFTformer: A Spatial-Frequency-Temporal Correlation-Decoupling Transformer for Radar Echo Extrapolation
Authors:
Liangyu Xu,
Wanxuan Lu,
Hongfeng Yu,
Fanglong Yao,
Xian Sun,
Kun Fu
Abstract:
Extrapolating future weather radar echoes from past observations is a complex task vital for precipitation nowcasting. The spatial morphology and temporal evolution of radar echoes exhibit a certain degree of correlation, yet they also possess independent characteristics. {Existing methods learn unified spatial and temporal representations in a highly coupled feature space, emphasizing the correla…
▽ More
Extrapolating future weather radar echoes from past observations is a complex task vital for precipitation nowcasting. The spatial morphology and temporal evolution of radar echoes exhibit a certain degree of correlation, yet they also possess independent characteristics. {Existing methods learn unified spatial and temporal representations in a highly coupled feature space, emphasizing the correlation between spatial and temporal features but neglecting the explicit modeling of their independent characteristics, which may result in mutual interference between them.} To effectively model the spatiotemporal dynamics of radar echoes, we propose a Spatial-Frequency-Temporal correlation-decoupling Transformer (SFTformer). The model leverages stacked multiple SFT-Blocks to not only mine the correlation of the spatiotemporal dynamics of echo cells but also avoid the mutual interference between the temporal modeling and the spatial morphology refinement by decoupling them. Furthermore, inspired by the practice that weather forecast experts effectively review historical echo evolution to make accurate predictions, SFTfomer incorporates a joint training paradigm for historical echo sequence reconstruction and future echo sequence prediction. Experimental results on the HKO-7 dataset and ChinaNorth-2021 dataset demonstrate the superior performance of SFTfomer in short(1h), mid(2h), and long-term(3h) precipitation nowcasting.
△ Less
Submitted 27 February, 2024;
originally announced February 2024.
-
Agent-Pro: Learning to Evolve via Policy-Level Reflection and Optimization
Authors:
Wenqi Zhang,
Ke Tang,
Hai Wu,
Mengna Wang,
Yongliang Shen,
Guiyang Hou,
Zeqi Tan,
Peng Li,
Yueting Zhuang,
Weiming Lu
Abstract:
Large Language Models (LLMs) exhibit robust problem-solving capabilities for diverse tasks. However, most LLM-based agents are designed as specific task solvers with sophisticated prompt engineering, rather than agents capable of learning and evolving through interactions. These task solvers necessitate manually crafted prompts to inform task rules and regulate LLM behaviors, inherently incapacita…
▽ More
Large Language Models (LLMs) exhibit robust problem-solving capabilities for diverse tasks. However, most LLM-based agents are designed as specific task solvers with sophisticated prompt engineering, rather than agents capable of learning and evolving through interactions. These task solvers necessitate manually crafted prompts to inform task rules and regulate LLM behaviors, inherently incapacitating to address complex dynamic scenarios e.g., large interactive games. In light of this, we propose Agent-Pro: an LLM-based Agent with Policy-level Reflection and Optimization that can learn a wealth of expertise from interactive experiences and progressively elevate its behavioral policy. Specifically, it involves a dynamic belief generation and reflection process for policy evolution. Rather than action-level reflection, Agent-Pro iteratively reflects on past trajectories and beliefs, fine-tuning its irrational beliefs for a better policy. Moreover, a depth-first search is employed for policy optimization, ensuring continual enhancement in policy payoffs. Agent-Pro is evaluated across two games: Blackjack and Texas Hold'em, outperforming vanilla LLM and specialized models. Our results show Agent-Pro can learn and evolve in complex and dynamic scenes, which also benefits numerous LLM-based applications.
△ Less
Submitted 6 June, 2024; v1 submitted 27 February, 2024;
originally announced February 2024.
-
Generative Models are Self-Watermarked: Declaring Model Authentication through Re-Generation
Authors:
Aditya Desu,
Xuanli He,
Qiongkai Xu,
Wei Lu
Abstract:
As machine- and AI-generated content proliferates, protecting the intellectual property of generative models has become imperative, yet verifying data ownership poses formidable challenges, particularly in cases of unauthorized reuse of generated data. The challenge of verifying data ownership is further amplified by using Machine Learning as a Service (MLaaS), which often functions as a black-box…
▽ More
As machine- and AI-generated content proliferates, protecting the intellectual property of generative models has become imperative, yet verifying data ownership poses formidable challenges, particularly in cases of unauthorized reuse of generated data. The challenge of verifying data ownership is further amplified by using Machine Learning as a Service (MLaaS), which often functions as a black-box system.
Our work is dedicated to detecting data reuse from even an individual sample. Traditionally, watermarking has been leveraged to detect AI-generated content. However, unlike watermarking techniques that embed additional information as triggers into models or generated content, potentially compromising output quality, our approach identifies latent fingerprints inherently present within the outputs through re-generation. We propose an explainable verification procedure that attributes data ownership through re-generation, and further amplifies these fingerprints in the generative models through iterative data re-generation. This methodology is theoretically grounded and demonstrates viability and robustness using recent advanced text and image generative models. Our methodology is significant as it goes beyond protecting the intellectual property of APIs and addresses important issues such as the spread of misinformation and academic misconduct. It provides a useful tool to ensure the integrity of sources and authorship, expanding its application in different scenarios where authenticity and ownership verification are essential.
△ Less
Submitted 23 February, 2024;
originally announced February 2024.
-
Adaptive Online Learning of Separable Path Graph Transforms for Intra-prediction
Authors:
Wen-Yang Lu,
Eduardo Pavez,
Antonio Ortega,
Xin Zhao,
Shan Liu
Abstract:
Current video coding standards, including H.264/AVC, HEVC, and VVC, employ discrete cosine transform (DCT), discrete sine transform (DST), and secondary to Karhunen-Loeve transforms (KLTs) decorrelate the intra-prediction residuals. However, the efficiency of these transforms in decorrelation can be limited when the signal has a non-smooth and non-periodic structure, such as those occurring in tex…
▽ More
Current video coding standards, including H.264/AVC, HEVC, and VVC, employ discrete cosine transform (DCT), discrete sine transform (DST), and secondary to Karhunen-Loeve transforms (KLTs) decorrelate the intra-prediction residuals. However, the efficiency of these transforms in decorrelation can be limited when the signal has a non-smooth and non-periodic structure, such as those occurring in textures with intricate patterns. This paper introduces a novel adaptive separable path graph-based transform (GBT) that can provide better decorrelation than the DCT for intra-predicted texture data. The proposed GBT is learned in an online scenario with sequential K-means clustering, which groups similar blocks during encoding and decoding to adaptively learn the GBT for the current block from previously reconstructed areas with similar characteristics. A signaling overhead is added to the bitstream of each coding block to indicate the usage of the proposed graph-based transform. We assess the performance of this method combined with H.264/AVC intra-coding tools and demonstrate that it can significantly outperform H.264/AVC DCT for intra-predicted texture data.
△ Less
Submitted 26 February, 2024;
originally announced February 2024.
-
Triad: A Framework Leveraging a Multi-Role LLM-based Agent to Solve Knowledge Base Question Answering
Authors:
Chang Zong,
Yuchen Yan,
Weiming Lu,
Jian Shao,
Eliot Huang,
Heng Chang,
Yueting Zhuang
Abstract:
Recent progress with LLM-based agents has shown promising results across various tasks. However, their use in answering questions from knowledge bases remains largely unexplored. Implementing a KBQA system using traditional methods is challenging due to the shortage of task-specific training data and the complexity of creating task-focused model structures. In this paper, we present Triad, a unifi…
▽ More
Recent progress with LLM-based agents has shown promising results across various tasks. However, their use in answering questions from knowledge bases remains largely unexplored. Implementing a KBQA system using traditional methods is challenging due to the shortage of task-specific training data and the complexity of creating task-focused model structures. In this paper, we present Triad, a unified framework that utilizes an LLM-based agent with three roles for KBQA tasks. The agent is assigned three roles to tackle different KBQA subtasks: agent as a generalist for mastering various subtasks, as a decision maker for the selection of candidates, and as an advisor for answering questions with knowledge. Our KBQA framework is executed in four phases, involving the collaboration of the agent's multiple roles. We evaluated the performance of our framework using three benchmark datasets, and the results show that our framework outperforms state-of-the-art systems on the LC-QuAD and YAGO-QA benchmarks, yielding F1 scores of 11.8% and 20.7%, respectively.
△ Less
Submitted 15 April, 2024; v1 submitted 22 February, 2024;
originally announced February 2024.
-
Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models
Authors:
Haoran Li,
Qingxiu Dong,
Zhengyang Tang,
Chaojun Wang,
Xingxing Zhang,
Haoyang Huang,
Shaohan Huang,
Xiaolong Huang,
Zeqiang Huang,
Dongdong Zhang,
Yuxian Gu,
Xin Cheng,
Xun Wang,
Si-Qing Chen,
Li Dong,
Wei Lu,
Zhifang Sui,
Benyou Wang,
Wai Lam,
Furu Wei
Abstract:
We introduce Generalized Instruction Tuning (called GLAN), a general and scalable method for instruction tuning of Large Language Models (LLMs). Unlike prior work that relies on seed examples or existing datasets to construct instruction tuning data, GLAN exclusively utilizes a pre-curated taxonomy of human knowledge and capabilities as input and generates large-scale synthetic instruction data ac…
▽ More
We introduce Generalized Instruction Tuning (called GLAN), a general and scalable method for instruction tuning of Large Language Models (LLMs). Unlike prior work that relies on seed examples or existing datasets to construct instruction tuning data, GLAN exclusively utilizes a pre-curated taxonomy of human knowledge and capabilities as input and generates large-scale synthetic instruction data across all disciplines. Specifically, inspired by the systematic structure in human education system, we build the taxonomy by decomposing human knowledge and capabilities to various fields, sub-fields and ultimately, distinct disciplines semi-automatically, facilitated by LLMs. Subsequently, we generate a comprehensive list of subjects for every discipline and proceed to design a syllabus tailored to each subject, again utilizing LLMs. With the fine-grained key concepts detailed in every class session of the syllabus, we are able to generate diverse instructions with a broad coverage across the entire spectrum of human knowledge and skills. Extensive experiments on large language models (e.g., Mistral) demonstrate that GLAN excels in multiple dimensions from mathematical reasoning, coding, academic exams, logical reasoning to general instruction following without using task-specific training data of these tasks. In addition, GLAN allows for easy customization and new fields or skills can be added by simply incorporating a new node into our taxonomy.
△ Less
Submitted 20 February, 2024;
originally announced February 2024.
-
Data Pipeline Training: Integrating AutoML to Optimize the Data Flow of Machine Learning Models
Authors:
Jiang Wu,
Hongbo Wang,
Chunhe Ni,
Chenwei Zhang,
Wenran Lu
Abstract:
Data Pipeline plays an indispensable role in tasks such as modeling machine learning and develo** data products. With the increasing diversification and complexity of Data sources, as well as the rapid growth of data volumes, building an efficient Data Pipeline has become crucial for improving work efficiency and solving complex problems. This paper focuses on exploring how to optimize data flow…
▽ More
Data Pipeline plays an indispensable role in tasks such as modeling machine learning and develo** data products. With the increasing diversification and complexity of Data sources, as well as the rapid growth of data volumes, building an efficient Data Pipeline has become crucial for improving work efficiency and solving complex problems. This paper focuses on exploring how to optimize data flow through automated machine learning methods by integrating AutoML with Data Pipeline. We will discuss how to leverage AutoML technology to enhance the intelligence of Data Pipeline, thereby achieving better results in machine learning tasks. By delving into the automation and optimization of Data flows, we uncover key strategies for constructing efficient data pipelines that can adapt to the ever-changing data landscape. This not only accelerates the modeling process but also provides innovative solutions to complex problems, enabling more significant outcomes in increasingly intricate data domains. Keywords- Data Pipeline Training;AutoML; Data environment; Machine learning
△ Less
Submitted 20 February, 2024;
originally announced February 2024.
-
Let's Learn Step by Step: Enhancing In-Context Learning Ability with Curriculum Learning
Authors:
Yinpeng Liu,
Jiawei Liu,
Xiang Shi,
Qikai Cheng,
Yong Huang,
Wei Lu
Abstract:
Demonstration ordering, which is an important strategy for in-context learning (ICL), can significantly affects the performance of large language models (LLMs). However, most of the current approaches of ordering require high computational costs to introduce the priori knowledge. In this paper, inspired by the human learning process, we propose a simple but effective demonstration ordering method…
▽ More
Demonstration ordering, which is an important strategy for in-context learning (ICL), can significantly affects the performance of large language models (LLMs). However, most of the current approaches of ordering require high computational costs to introduce the priori knowledge. In this paper, inspired by the human learning process, we propose a simple but effective demonstration ordering method for ICL, named the few-shot In-Context Curriculum Learning (ICCL). The ICCL implies gradually increasing the complexity of prompt demonstrations during the inference process. The difficulty can be assessed by human experts or LLMs-driven metrics, such as perplexity. Then we design extensive experiments to discuss the effectiveness of the ICCL at both corpus-level and instance-level. Moreover, we also investigate the formation mechanism of LLM's ICCL capability. Experimental results demonstrate that ICCL, developed during the instruction-tuning stage, is effective for representative open-source LLMs. To facilitate further research and applications by other scholars, we make the code publicly available.
△ Less
Submitted 16 June, 2024; v1 submitted 16 February, 2024;
originally announced February 2024.
-
Generative Modeling for Tabular Data via Penalized Optimal Transport Network
Authors:
Wenhui Sophia Lu,
Chenyang Zhong,
Wing Hung Wong
Abstract:
The task of precisely learning the probability distribution of rows within tabular data and producing authentic synthetic samples is both crucial and non-trivial. Wasserstein generative adversarial network (WGAN) marks a notable improvement in generative modeling, addressing the challenges faced by its predecessor, generative adversarial network. However, due to the mixed data types and multimodal…
▽ More
The task of precisely learning the probability distribution of rows within tabular data and producing authentic synthetic samples is both crucial and non-trivial. Wasserstein generative adversarial network (WGAN) marks a notable improvement in generative modeling, addressing the challenges faced by its predecessor, generative adversarial network. However, due to the mixed data types and multimodalities prevalent in tabular data, the delicate equilibrium between the generator and discriminator, as well as the inherent instability of Wasserstein distance in high dimensions, WGAN often fails to produce high-fidelity samples. To this end, we propose POTNet (Penalized Optimal Transport Network), a generative deep neural network based on a novel, robust, and interpretable marginally-penalized Wasserstein (MPW) loss. POTNet can effectively model tabular data containing both categorical and continuous features. Moreover, it offers the flexibility to condition on a subset of features. We provide theoretical justifications for the motivation behind the MPW loss. We also empirically demonstrate the effectiveness of our proposed method on four different benchmarks across a variety of real-world and simulated datasets. Our proposed model achieves orders of magnitude speedup during the sampling stage compared to state-of-the-art generative models for tabular data, thereby enabling efficient large-scale synthetic data generation.
△ Less
Submitted 16 February, 2024;
originally announced February 2024.
-
Multi-Center Fetal Brain Tissue Annotation (FeTA) Challenge 2022 Results
Authors:
Kelly Payette,
Céline Steger,
Roxane Licandro,
Priscille de Dumast,
Hongwei Bran Li,
Matthew Barkovich,
Liu Li,
Maik Dannecker,
Chen Chen,
Cheng Ouyang,
Niccolò McConnell,
Alina Miron,
Yongmin Li,
Alena Uus,
Irina Grigorescu,
Paula Ramirez Gilliland,
Md Mahfuzur Rahman Siddiquee,
Daguang Xu,
Andriy Myronenko,
Haoyu Wang,
Ziyan Huang,
** Ye,
Mireia Alenyà,
Valentin Comte,
Oscar Camara
, et al. (42 additional authors not shown)
Abstract:
Segmentation is a critical step in analyzing the develo** human fetal brain. There have been vast improvements in automatic segmentation methods in the past several years, and the Fetal Brain Tissue Annotation (FeTA) Challenge 2021 helped to establish an excellent standard of fetal brain segmentation. However, FeTA 2021 was a single center study, and the generalizability of algorithms across dif…
▽ More
Segmentation is a critical step in analyzing the develo** human fetal brain. There have been vast improvements in automatic segmentation methods in the past several years, and the Fetal Brain Tissue Annotation (FeTA) Challenge 2021 helped to establish an excellent standard of fetal brain segmentation. However, FeTA 2021 was a single center study, and the generalizability of algorithms across different imaging centers remains unsolved, limiting real-world clinical applicability. The multi-center FeTA Challenge 2022 focuses on advancing the generalizability of fetal brain segmentation algorithms for magnetic resonance imaging (MRI). In FeTA 2022, the training dataset contained images and corresponding manually annotated multi-class labels from two imaging centers, and the testing data contained images from these two imaging centers as well as two additional unseen centers. The data from different centers varied in many aspects, including scanners used, imaging parameters, and fetal brain super-resolution algorithms applied. 16 teams participated in the challenge, and 17 algorithms were evaluated. Here, a detailed overview and analysis of the challenge results are provided, focusing on the generalizability of the submissions. Both in- and out of domain, the white matter and ventricles were segmented with the highest accuracy, while the most challenging structure remains the cerebral cortex due to anatomical complexity. The FeTA Challenge 2022 was able to successfully evaluate and advance generalizability of multi-class fetal brain tissue segmentation algorithms for MRI and it continues to benchmark new algorithms. The resulting new methods contribute to improving the analysis of brain development in utero.
△ Less
Submitted 8 February, 2024;
originally announced February 2024.
-
Upper limits on the radio pulses from magnetars and a central compact object with FAST
Authors:
Wan-** Lu,
** Zhou,
Pei Wang,
Yi-Xuan Shao,
Xiang-Dong Li,
Jacco Vink,
Di Li,
Yang Chen
Abstract:
Magnetars and central compact objects (CCOs) are subgroups of neutron stars that show a number of properties distinguished from canonical radio pulsars. We performed radio observations of three magnetars SGR 0418+5729, 1E 2259+586, 4U 0142+61, and a CCO PSR J1852+0040 with the Fivehundred-meter Aperture Spherical radio Telescope (FAST) at 1.25 GHz, aiming to search for radio pulsations in their qu…
▽ More
Magnetars and central compact objects (CCOs) are subgroups of neutron stars that show a number of properties distinguished from canonical radio pulsars. We performed radio observations of three magnetars SGR 0418+5729, 1E 2259+586, 4U 0142+61, and a CCO PSR J1852+0040 with the Fivehundred-meter Aperture Spherical radio Telescope (FAST) at 1.25 GHz, aiming to search for radio pulsations in their quiescent states. During two observation epochs, no radio pulses have been detected towards any target above a significance of signal-to-noise ratio (S/N) = 7 from either the direct folding or blind search. We provided the most stringent upper limit of radio flux (<~ 2 -- 4 $μ$Jy) for the magnetars and the CCO. For the magnetars with long periods, the real upper limits are likely an order of magnitude larger due to the red noise. The deep radio observations suggest that these magnetars and the CCO are indeed radio-quiet sources or unfavorably beamed.
△ Less
Submitted 8 February, 2024;
originally announced February 2024.
-
Large Language Model for Table Processing: A Survey
Authors:
Weizheng Lu,
Jiaming Zhang,
**g Zhang,
Yueguo Chen
Abstract:
Tables, typically two-dimensional and structured to store large amounts of data, are essential in daily activities like database queries, spreadsheet calculations, and generating reports from web tables. Automating these table-centric tasks with Large Language Models (LLMs) offers significant public benefits, garnering interest from academia and industry. This survey provides an extensive overview…
▽ More
Tables, typically two-dimensional and structured to store large amounts of data, are essential in daily activities like database queries, spreadsheet calculations, and generating reports from web tables. Automating these table-centric tasks with Large Language Models (LLMs) offers significant public benefits, garnering interest from academia and industry. This survey provides an extensive overview of table tasks, encompassing not only the traditional areas like table question answering (Table QA) and fact verification, but also newly emphasized aspects such as table manipulation and advanced table data analysis. Additionally, it goes beyond the early strategies of pre-training and fine-tuning small language models, to include recent paradigms in LLM usage. The focus here is particularly on instruction-tuning, prompting, and agent-based approaches within the realm of LLMs. Finally, we highlight several challenges, ranging from private deployment and efficient inference to the development of extensive benchmarks for table manipulation and advanced data analysis.
△ Less
Submitted 3 February, 2024;
originally announced February 2024.
-
Observation of Giant Spin Splitting and d-wave Spin Texture in Room Temperature Altermagnet RuO2
Authors:
Zihan Lin,
Dong Chen,
Wenlong Lu,
Xin Liang,
Shiyu Feng,
Kohei Yamagami,
Jacek Osiecki,
Mats Leandersson,
Balasubramanian Thiagarajan,
Junwei Liu,
Claudia Felser,
Junzhang Ma
Abstract:
Recently, a novel magnetic phase called altermagnetism has been proposed, ushering in a third distinct magnetic phase beyond ferromagnetism and antiferromagnetism. It is expected that this groundbreaking phase exhibits unique physical properties such as C-paired spin-valley locking, anomalous Hall effect, nontrivial Berry phase, and giant magnetoresistance, etc. Among all the predicted candidates,…
▽ More
Recently, a novel magnetic phase called altermagnetism has been proposed, ushering in a third distinct magnetic phase beyond ferromagnetism and antiferromagnetism. It is expected that this groundbreaking phase exhibits unique physical properties such as C-paired spin-valley locking, anomalous Hall effect, nontrivial Berry phase, and giant magnetoresistance, etc. Among all the predicted candidates, several room temperature altermagnets are suggested to host significant potential applications in the near future. Nevertheless, direct evidence about the spin pattern of the room temperature altermagnet is still unrevealed. Previous studies found that RuO2 is identified as the most promising candidate for room temperature d-wave altermagnetism, exhibiting a substantial spin splitting of up to 1.4 eV. In this study, utilizing angle-resolved photoemission spectroscopy (ARPES), we report experimental observation of the spin splitting in RuO2. Furthermore, employing spin-ARPES, we directly observed the d-wave spin pattern. Our results unequivocally show that RuO2 is a perfect d-wave altermagnet with great potential for upcoming spintronic applications.
△ Less
Submitted 7 February, 2024;
originally announced February 2024.
-
Decoder-Only Image Registration
Authors:
Xi Jia,
Wenqi Lu,
Xinxing Cheng,
**ming Duan
Abstract:
In unsupervised medical image registration, the predominant approaches involve the utilization of a encoder-decoder network architecture, allowing for precise prediction of dense, full-resolution displacement fields from given paired images. Despite its widespread use in the literature, we argue for the necessity of making both the encoder and decoder learnable in such an architecture. For this, w…
▽ More
In unsupervised medical image registration, the predominant approaches involve the utilization of a encoder-decoder network architecture, allowing for precise prediction of dense, full-resolution displacement fields from given paired images. Despite its widespread use in the literature, we argue for the necessity of making both the encoder and decoder learnable in such an architecture. For this, we propose a novel network architecture, termed LessNet in this paper, which contains only a learnable decoder, while entirely omitting the utilization of a learnable encoder. LessNet substitutes the learnable encoder with simple, handcrafted features, eliminating the need to learn (optimize) network parameters in the encoder altogether. Consequently, this leads to a compact, efficient, and decoder-only architecture for 3D medical image registration. Evaluated on two publicly available brain MRI datasets, we demonstrate that our decoder-only LessNet can effectively and efficiently learn both dense displacement and diffeomorphic deformation fields in 3D. Furthermore, our decoder-only LessNet can achieve comparable registration performance to state-of-the-art methods such as VoxelMorph and TransMorph, while requiring significantly fewer computational resources. Our code and pre-trained models are available at https://github.com/xi-jia/LessNet.
△ Less
Submitted 5 February, 2024;
originally announced February 2024.
-
PromptRR: Diffusion Models as Prompt Generators for Single Image Reflection Removal
Authors:
Tao Wang,
Wanglong Lu,
Kaihao Zhang,
Wenhan Luo,
Tae-Kyun Kim,
Tong Lu,
Hongdong Li,
Ming-Hsuan Yang
Abstract:
Existing single image reflection removal (SIRR) methods using deep learning tend to miss key low-frequency (LF) and high-frequency (HF) differences in images, affecting their effectiveness in removing reflections. To address this problem, this paper proposes a novel prompt-guided reflection removal (PromptRR) framework that uses frequency information as new visual prompts for better reflection per…
▽ More
Existing single image reflection removal (SIRR) methods using deep learning tend to miss key low-frequency (LF) and high-frequency (HF) differences in images, affecting their effectiveness in removing reflections. To address this problem, this paper proposes a novel prompt-guided reflection removal (PromptRR) framework that uses frequency information as new visual prompts for better reflection performance. Specifically, the proposed framework decouples the reflection removal process into the prompt generation and subsequent prompt-guided restoration. For the prompt generation, we first propose a prompt pre-training strategy to train a frequency prompt encoder that encodes the ground-truth image into LF and HF prompts. Then, we adopt diffusion models (DMs) as prompt generators to generate the LF and HF prompts estimated by the pre-trained frequency prompt encoder. For the prompt-guided restoration, we integrate specially generated prompts into the PromptFormer network, employing a novel Transformer-based prompt block to effectively steer the model toward enhanced reflection removal. The results on commonly used benchmarks show that our method outperforms state-of-the-art approaches. The codes and models are available at https://github.com/TaoWangzj/PromptRR.
△ Less
Submitted 4 February, 2024;
originally announced February 2024.
-
Quantum Information Geometry with Non-Hermitian Systems
Authors:
Wangjun Lu,
Zhao-Hui Peng,
HongTao
Abstract:
Information geometry is the application of differential geometry in statistics, where the Fisher-Rao metric serves as the Riemannian metric on the statistical manifold, providing an intrinsic property for parameter sensitivity. In this paper, we explore the Fisher-Rao metric with the non-Hermitian systems. By approximating the Lindblad master equation in the non-Hermitian Hamiltonian, we calculate…
▽ More
Information geometry is the application of differential geometry in statistics, where the Fisher-Rao metric serves as the Riemannian metric on the statistical manifold, providing an intrinsic property for parameter sensitivity. In this paper, we explore the Fisher-Rao metric with the non-Hermitian systems. By approximating the Lindblad master equation in the non-Hermitian Hamiltonian, we calculate the time evolution of the quantum geometric metric. Finally, we give an example of the quantum spin Ising model of the imaginary magnetic field, explore the energy spectrum of $\mathcal{PT}$-symmetric Hamiltonian and the evolution of geometric metric, and discuss that the dissipative effect of the imaginary magnetic field can be eliminated under the condition of adding the control Hamiltonian, so as to improve the accuracy of parameter estimation.
△ Less
Submitted 1 February, 2024;
originally announced February 2024.
-
PML-based boundary integral equation method for electromagnetic scattering problems in a layered-medium
Authors:
Gang Bao,
Wangtao Lu,
Tao Yin,
Lu Zhang
Abstract:
This paper proposes a new boundary integral equation (BIE) methodology based on the perfectly matched layer (PML) truncation technique for solving the electromagnetic scattering problems in a multi-layered medium. Instead of using the original PML stretched fields, artificial fields which are also equivalent to the solutions in the physical region are introduced. This significantly simplifies the…
▽ More
This paper proposes a new boundary integral equation (BIE) methodology based on the perfectly matched layer (PML) truncation technique for solving the electromagnetic scattering problems in a multi-layered medium. Instead of using the original PML stretched fields, artificial fields which are also equivalent to the solutions in the physical region are introduced. This significantly simplifies the study of the proposed methodology to derive the PML problem. Then some PML transformed layer potentials and the associated boundary integral operators (BIOs) are defined and the corresponding jump relations are shown. Under the assumption that the fields vanish on the PML boundary, the solution representations, as well as the related BIEs and regularization of the hyper-singular operators, in terms of the current density functions on the truncated interface, are derived. Numerical experiments are presented to demonstrate the efficiency and accuracy of the method.
△ Less
Submitted 28 January, 2024;
originally announced January 2024.
-
Anomaly Detection of Particle Orbit in Accelerator using LSTM Deep Learning Technology
Authors:
Zhiyuan Chen,
Wei Lu,
Radhika Bhong,
Yimin Hu,
Brian Freeman,
Adam Carpenter
Abstract:
A stable, reliable, and controllable orbit lock system is crucial to an electron (or ion) accelerator because the beam orbit and beam energy instability strongly affect the quality of the beam delivered to experimental halls. Currently, when the orbit lock system fails operators must manually intervene. This paper develops a Machine Learning based fault detection methodology to identify orbit lock…
▽ More
A stable, reliable, and controllable orbit lock system is crucial to an electron (or ion) accelerator because the beam orbit and beam energy instability strongly affect the quality of the beam delivered to experimental halls. Currently, when the orbit lock system fails operators must manually intervene. This paper develops a Machine Learning based fault detection methodology to identify orbit lock anomalies and notify accelerator operations staff of the off-normal behavior. Our method is unsupervised, so it does not require labeled data. It uses Long-Short Memory Networks (LSTM) Auto Encoder to capture normal patterns and predict future values of monitoring sensors in the orbit lock system. Anomalies are detected when the prediction error exceeds a threshold. We conducted experiments using monitoring data from Jefferson Lab's Continuous Electron Beam Accelerator Facility (CEBAF). The results are promising: the percentage of real anomalies identified by our solution is 68.6%-89.3% using monitoring data of a single component in the orbit lock control system. The accuracy can be as high as 82%.
△ Less
Submitted 27 January, 2024;
originally announced January 2024.
-
On Time-Varying Delayed Stochastic Differential Systems with Non-Markovian Switching Parameters
Authors:
Xinyu Wu,
Zidong Wang,
Wenlian Lu
Abstract:
This paper focuses on time-varying delayed stochastic differential systems with stochastically switching parameters formulated by a unified switching behavior combining a discrete adapted process and a Cox process. Unlike prior studies limited to stationary and ergodic switching scenarios, our research emphasizes non-Markovian, non-stationary, and non-ergodic cases. It arrives at more general resu…
▽ More
This paper focuses on time-varying delayed stochastic differential systems with stochastically switching parameters formulated by a unified switching behavior combining a discrete adapted process and a Cox process. Unlike prior studies limited to stationary and ergodic switching scenarios, our research emphasizes non-Markovian, non-stationary, and non-ergodic cases. It arrives at more general results regarding stability analysis with a more rigorous methodology. The theoretical results are validated through numerical examples.
△ Less
Submitted 26 January, 2024;
originally announced January 2024.
-
Diffusion Model Conditioning on Gaussian Mixture Model and Negative Gaussian Mixture Gradient
Authors:
Weiguo Lu,
Xuan Wu,
Deng Ding,
**qiao Duan,
Jirong Zhuang,
Gangnan Yuan
Abstract:
Diffusion models (DMs) are a type of generative model that has a huge impact on image synthesis and beyond. They achieve state-of-the-art generation results in various generative tasks. A great diversity of conditioning inputs, such as text or bounding boxes, are accessible to control the generation. In this work, we propose a conditioning mechanism utilizing Gaussian mixture models (GMMs) as feat…
▽ More
Diffusion models (DMs) are a type of generative model that has a huge impact on image synthesis and beyond. They achieve state-of-the-art generation results in various generative tasks. A great diversity of conditioning inputs, such as text or bounding boxes, are accessible to control the generation. In this work, we propose a conditioning mechanism utilizing Gaussian mixture models (GMMs) as feature conditioning to guide the denoising process. Based on set theory, we provide a comprehensive theoretical analysis that shows that conditional latent distribution based on features and classes is significantly different, so that conditional latent distribution on features produces fewer defect generations than conditioning on classes. Two diffusion models conditioned on the Gaussian mixture model are trained separately for comparison. Experiments support our findings. A novel gradient function called the negative Gaussian mixture gradient (NGMG) is proposed and applied in diffusion model training with an additional classifier. Training stability has improved. We also theoretically prove that NGMG shares the same benefit as the Earth Mover distance (Wasserstein) as a more sensible cost function when learning distributions supported by low-dimensional manifolds.
△ Less
Submitted 1 February, 2024; v1 submitted 20 January, 2024;
originally announced January 2024.
-
Online estimation of the inverse of the Hessian for stochastic optimization with application to universal stochastic Newton algorithms
Authors:
Antoine Godichon-Baggioni,
Wei Lu,
Bruno Portier
Abstract:
This paper addresses second-order stochastic optimization for estimating the minimizer of a convex function written as an expectation. A direct recursive estimation technique for the inverse Hessian matrix using a Robbins-Monro procedure is introduced. This approach enables to drastically reduces computational complexity. Above all, it allows to develop universal stochastic Newton methods and inve…
▽ More
This paper addresses second-order stochastic optimization for estimating the minimizer of a convex function written as an expectation. A direct recursive estimation technique for the inverse Hessian matrix using a Robbins-Monro procedure is introduced. This approach enables to drastically reduces computational complexity. Above all, it allows to develop universal stochastic Newton methods and investigate the asymptotic efficiency of the proposed approach. This work so expands the application scope of secondorder algorithms in stochastic optimization.
△ Less
Submitted 15 January, 2024;
originally announced January 2024.
-
Generating Bell states and Werner states of two qubits via optical field
Authors:
Dengkui Jiang,
Cuilu Zhai,
Yaju Song,
Zhaohui Peng,
Jibing Yuan,
Shiqing Tang,
Wangjun Lu
Abstract:
In this paper, we investigate how the evolution of the states of two qubits initially in a direct product state can be controlled by the optical field in a Tavis-Cummings (TC) model. For the two qubits initially in the direct product state, we find that their matrix elements at any moment can be modulated by the coefficients of the optical field initial states in the number state space. We propose…
▽ More
In this paper, we investigate how the evolution of the states of two qubits initially in a direct product state can be controlled by the optical field in a Tavis-Cummings (TC) model. For the two qubits initially in the direct product state, we find that their matrix elements at any moment can be modulated by the coefficients of the optical field initial states in the number state space. We propose a method for preparing an \textit{X}-type state of two qubits. Subsequently, for descriptive convenience, we divide the Bell states of the two qubits into two kinds in the paper. When both qubits are initially in the ground state, we find that the two qubits can be controlled to produce the first type of Bell state by the superposition state optical field that is initially in the next-nearest-neighbor number state and that the production of any of the first type of Bell states can be controlled by controlling the phase between the two next-nearest-neighbor number states. When one of the two qubits is in the ground state, and the other is in the excited state, we can control the two qubits to produce the second type of Bell state by the single-photon number state optical field. Finally, we study the generation of Werner states by controlling two qubits initially, both in the ground state, using an optical field.
△ Less
Submitted 15 January, 2024;
originally announced January 2024.
-
Clifford algebra Cl(0,6) approach to beyond the standard model and naturalness problems
Authors:
Wei Lu
Abstract:
Is there more to Dirac's gamma matrices than meets the eye? It turns out that gamma zero can be factorized into a product of three operators. This revelation facilitates the expansion of Dirac's space-time algebra to Clifford algebra Cl(0,6). The resultant rich geometric structure can be leveraged to establish a combined framework of the standard model and gravity, wherein a gravi-weak interaction…
▽ More
Is there more to Dirac's gamma matrices than meets the eye? It turns out that gamma zero can be factorized into a product of three operators. This revelation facilitates the expansion of Dirac's space-time algebra to Clifford algebra Cl(0,6). The resultant rich geometric structure can be leveraged to establish a combined framework of the standard model and gravity, wherein a gravi-weak interaction between the extended vierbein field and the extended weak gauge field is allowed. Inspired by the composite Higgs model, we examine the vierbein field as an effective description of the fermion-antifermion condensation. The compositeness of space-time manifests itself at an energy scale which is different from the Planck scale. We propose that all the regular classical Lagrangian terms are of quantum condensation origin, thus possibly addressing the cosmological constant problem provided that we adopt an unconventional multi-scale renormalization procedure for quantum condensations that entail multiplications of divergent integrals. The Clifford algebra approach also permits a weaker form of charge conjugation without particle-antiparticle interchange, which leads to a Majorana-type mass that conserves lepton number. Additionally, in the context of spontaneous breaking of two global U(1) symmetries, we explore a three-Higgs-doublet model with Higgs VEVs 246 GeV, 41 GeV and 2.5 GeV which could explain the fermion mass hierarchies.
△ Less
Submitted 4 March, 2024; v1 submitted 11 January, 2024;
originally announced January 2024.
-
TinyLlama: An Open-Source Small Language Model
Authors:
Peiyuan Zhang,
Guangtao Zeng,
Tianduo Wang,
Wei Lu
Abstract:
We present TinyLlama, a compact 1.1B language model pretrained on around 1 trillion tokens for approximately 3 epochs. Building on the architecture and tokenizer of Llama 2, TinyLlama leverages various advances contributed by the open-source community (e.g., FlashAttention and Lit-GPT), achieving better computational efficiency. Despite its relatively small size, TinyLlama demonstrates remarkable…
▽ More
We present TinyLlama, a compact 1.1B language model pretrained on around 1 trillion tokens for approximately 3 epochs. Building on the architecture and tokenizer of Llama 2, TinyLlama leverages various advances contributed by the open-source community (e.g., FlashAttention and Lit-GPT), achieving better computational efficiency. Despite its relatively small size, TinyLlama demonstrates remarkable performance in a series of downstream tasks. It significantly outperforms existing open-source language models with comparable sizes. Our model checkpoints and code are publicly available on GitHub at https://github.com/jzhang38/TinyLlama.
△ Less
Submitted 3 June, 2024; v1 submitted 4 January, 2024;
originally announced January 2024.
-
Self-Contrast: Better Reflection Through Inconsistent Solving Perspectives
Authors:
Wenqi Zhang,
Yongliang Shen,
Linjuan Wu,
Qiuying Peng,
Jun Wang,
Yueting Zhuang,
Weiming Lu
Abstract:
The reflection capacity of Large Language Model (LLM) has garnered extensive attention. A post-hoc prompting strategy, e.g., reflexion and self-refine, refines LLM's response based on self-evaluated or external feedback. However, recent research indicates without external feedback, LLM's intrinsic reflection is unstable. Our investigation unveils that the key bottleneck is the quality of the self-…
▽ More
The reflection capacity of Large Language Model (LLM) has garnered extensive attention. A post-hoc prompting strategy, e.g., reflexion and self-refine, refines LLM's response based on self-evaluated or external feedback. However, recent research indicates without external feedback, LLM's intrinsic reflection is unstable. Our investigation unveils that the key bottleneck is the quality of the self-evaluated feedback. We find LLMs often exhibit overconfidence or high randomness when self-evaluate, offering stubborn or inconsistent feedback, which causes poor reflection. To remedy this, we advocate Self-Contrast: It adaptively explores diverse solving perspectives tailored to the request, contrasts the differences, and summarizes these discrepancies into a checklist which could be used to re-examine and eliminate discrepancies. Our method endows LLM with diverse perspectives to alleviate stubborn biases. Moreover, their discrepancies indicate potential errors or inherent uncertainties that LLM often overlooks. Reflecting upon these can catalyze more accurate and stable reflection. Experiments conducted on a series of reasoning and translation tasks with different LLMs serve to underscore the effectiveness and generality of our strategy.
△ Less
Submitted 6 June, 2024; v1 submitted 3 January, 2024;
originally announced January 2024.
-
Angular scanning VHEE (very high energy electron) pencil beam delivery for radiotherapy
Authors:
Bing Zhou,
Zhiyuan Guo,
Yang Wan,
Shuang Liu,
Jianfei Hua,
Wei Lu
Abstract:
The use of very high energy electrons (VHEE) for radiotherapy has been actively studied for over two decades due to its advantageous dose distribution, deep penetration depth and great potential of ultra-high dose-rate irradiation. However, the high entrance dose of VHEE beams can damage the surface skin of patients and hinder its widespread application. To address this challenge, a novel method u…
▽ More
The use of very high energy electrons (VHEE) for radiotherapy has been actively studied for over two decades due to its advantageous dose distribution, deep penetration depth and great potential of ultra-high dose-rate irradiation. However, the high entrance dose of VHEE beams can damage the surface skin of patients and hinder its widespread application. To address this challenge, a novel method utilizing only two dipole magnets is presented in this article. By adjusting the magnet strengths, the electron beams can be guided along different angular directions towards a specific position as deep as 20 cm inside a water phantom, creating a maximum dose over the target region and significantly reducing the entrance dose Supported by Monte Carlo simulations, such a beam delivery approach contains two major advantages over previous methods: first, it is insensitive to beam energy spread, releasing the constraints on accelerator performance, and second, the dose peak position can be accurately controlled in both lateral and longitudinal directions. In addition, we also show that a flattop dose peak can be generated by the weighted sum of VHEE beams focusing at different positions. These results demonstrate that VHEE beams can be compactly delivered into a deep-seated tumor region in a controllable manner, thus advancing the development of the VHEE radiotherapy towards the practical clinical applications in the near future.
△ Less
Submitted 3 January, 2024;
originally announced January 2024.
-
Engineering the strain and interlayer excitons of 2D materials via lithographically engraved hexagonal boron nitride
Authors:
Yu-Chiang Hsieh,
Zhen-You Lin,
Shin-Ji Fung,
Wen-Shin Lu,
Sheng-Chin Ho,
Siang-** Hong,
Sheng-Zhu Ho,
Chiu-Hua Huang,
Kenji Watanabe,
Takashi Taniguchi,
Yang-Hao Chan,
Yi-Chun Chen,
Chung-Lin Wu,
Tse-Ming Chen
Abstract:
Strain engineering has quickly emerged as a viable option to modify the electronic, optical and magnetic properties of 2D materials. However, it remains challenging to arbitrarily control the strain. Here we show that by creating atomically-flat surface nanostructures in hexagonal boron nitride, we achieve an arbitrary on-chip control of both the strain distribution and magnitude on high-quality m…
▽ More
Strain engineering has quickly emerged as a viable option to modify the electronic, optical and magnetic properties of 2D materials. However, it remains challenging to arbitrarily control the strain. Here we show that by creating atomically-flat surface nanostructures in hexagonal boron nitride, we achieve an arbitrary on-chip control of both the strain distribution and magnitude on high-quality molybdenum disulfide. The phonon and exciton emissions are shown to vary in accordance with our strain field designs, enabling us to write and draw any photoluminescence color image in a single chip. Moreover, our strain engineering offers a powerful means to significantly and controllably alter the strengths and energies of interlayer excitons at room temperature. This method can be easily extended to other material systems and offers a promise for functional excitonic devices.
△ Less
Submitted 2 January, 2024;
originally announced January 2024.
-
Optimal Rates of Kernel Ridge Regression under Source Condition in Large Dimensions
Authors:
Haobo Zhang,
Yicheng Li,
Weihao Lu,
Qian Lin
Abstract:
Motivated by the studies of neural networks (e.g.,the neural tangent kernel theory), we perform a study on the large-dimensional behavior of kernel ridge regression (KRR) where the sample size $n \asymp d^γ$ for some $γ> 0$. Given an RKHS $\mathcal{H}$ associated with an inner product kernel defined on the sphere $\mathbb{S}^{d}$, we suppose that the true function $f_ρ^{*} \in [\mathcal{H}]^{s}$,…
▽ More
Motivated by the studies of neural networks (e.g.,the neural tangent kernel theory), we perform a study on the large-dimensional behavior of kernel ridge regression (KRR) where the sample size $n \asymp d^γ$ for some $γ> 0$. Given an RKHS $\mathcal{H}$ associated with an inner product kernel defined on the sphere $\mathbb{S}^{d}$, we suppose that the true function $f_ρ^{*} \in [\mathcal{H}]^{s}$, the interpolation space of $\mathcal{H}$ with source condition $s>0$. We first determined the exact order (both upper and lower bound) of the generalization error of kernel ridge regression for the optimally chosen regularization parameter $λ$. We then further showed that when $0<s\le1$, KRR is minimax optimal; and when $s>1$, KRR is not minimax optimal (a.k.a. he saturation effect). Our results illustrate that the curves of rate varying along $γ$ exhibit the periodic plateau behavior and the multiple descent behavior and show how the curves evolve with $s>0$. Interestingly, our work provides a unified viewpoint of several recent works on kernel regression in the large-dimensional setting, which correspond to $s=0$ and $s=1$ respectively.
△ Less
Submitted 2 January, 2024;
originally announced January 2024.
-
Xorbits: Automating Operator Tiling for Distributed Data Science
Authors:
Weizheng Lu,
Kaisheng He,
Xuye Qin,
Chengjie Li,
Zhong Wang,
Tao Yuan,
Xia Liao,
Feng Zhang,
Yueguo Chen,
Xiaoyong Du
Abstract:
Data science pipelines commonly utilize dataframe and array operations for tasks such as data preprocessing, analysis, and machine learning. The most popular tools for these tasks are pandas and NumPy. However, these tools are limited to executing on a single node, making them unsuitable for processing large-scale data. Several systems have attempted to distribute data science applications to clus…
▽ More
Data science pipelines commonly utilize dataframe and array operations for tasks such as data preprocessing, analysis, and machine learning. The most popular tools for these tasks are pandas and NumPy. However, these tools are limited to executing on a single node, making them unsuitable for processing large-scale data. Several systems have attempted to distribute data science applications to clusters while maintaining interfaces similar to single-node libraries, enabling data scientists to scale their workloads without significant effort. However, existing systems often struggle with processing large datasets due to Out-of-Memory (OOM) problems caused by poor data partitioning. To overcome these challenges, we develop Xorbits, a high-performance, scalable data science framework specifically designed to distribute data science workloads across clusters while retaining familiar APIs. The key differentiator of Xorbits is its ability to dynamically switch between graph construction and graph execution. Xorbits has been successfully deployed in production environments with up to 5k CPU cores. Its applications span various domains, including user behavior analysis and recommendation systems in the e-commerce sector, as well as credit assessment and risk management in the finance industry. Users can easily scale their data science workloads by simply changing the import line of their pandas and NumPy code. Our experiments demonstrate that Xorbits can effectively process very large datasets without encountering OOM or data-skewing problems. Over the fastest state-of-the-art solutions, Xorbits achieves an impressive 2.66* speedup on average. In terms of API coverage, Xorbits attains a compatibility rate of 96.7%, surpassing the fastest framework by an impressive margin of 60 percentage points. Xorbits is available at https://github.com/xorbitsai/xorbits.
△ Less
Submitted 19 March, 2024; v1 submitted 29 December, 2023;
originally announced January 2024.
-
Learn to integrate parts for whole through correlated neural variability
Authors:
Zhichao Zhu,
Yang Qi,
Wenlian Lu,
Jianfeng Feng
Abstract:
Sensory perception originates from the responses of sensory neurons, which react to a collection of sensory signals linked to various physical attributes of a singular perceptual object. Unraveling how the brain extracts perceptual information from these neuronal responses is a pivotal challenge in both computational neuroscience and machine learning. Here we introduce a statistical mechanical the…
▽ More
Sensory perception originates from the responses of sensory neurons, which react to a collection of sensory signals linked to various physical attributes of a singular perceptual object. Unraveling how the brain extracts perceptual information from these neuronal responses is a pivotal challenge in both computational neuroscience and machine learning. Here we introduce a statistical mechanical theory, where perceptual information is first encoded in the correlated variability of sensory neurons and then reformatted into the firing rates of downstream neurons. Applying this theory, we illustrate the encoding of motion direction using neural covariance and demonstrate high-fidelity direction recovery by spiking neural networks. Networks trained under this theory also show enhanced performance in classifying natural images, achieving higher accuracy and faster inference speed. Our results challenge the traditional view of neural covariance as a secondary factor in neural coding, highlighting its potential influence on brain function.
△ Less
Submitted 1 January, 2024;
originally announced January 2024.
-
Optimizing ADMM and Over-Relaxed ADMM Parameters for Linear Quadratic Problems
Authors:
**tao Song,
Wenqi Lu,
Yunwen Lei,
Yuchao Tang,
Zhenkuan Pan,
**ming Duan
Abstract:
The Alternating Direction Method of Multipliers (ADMM) has gained significant attention across a broad spectrum of machine learning applications. Incorporating the over-relaxation technique shows potential for enhancing the convergence rate of ADMM. However, determining optimal algorithmic parameters, including both the associated penalty and relaxation parameters, often relies on empirical approa…
▽ More
The Alternating Direction Method of Multipliers (ADMM) has gained significant attention across a broad spectrum of machine learning applications. Incorporating the over-relaxation technique shows potential for enhancing the convergence rate of ADMM. However, determining optimal algorithmic parameters, including both the associated penalty and relaxation parameters, often relies on empirical approaches tailored to specific problem domains and contextual scenarios. Incorrect parameter selection can significantly hinder ADMM's convergence rate. To address this challenge, in this paper we first propose a general approach to optimize the value of penalty parameter, followed by a novel closed-form formula to compute the optimal relaxation parameter in the context of linear quadratic problems (LQPs). We then experimentally validate our parameter selection methods through random instantiations and diverse imaging applications, encompassing diffeomorphic image registration, image deblurring, and MRI reconstruction.
△ Less
Submitted 31 December, 2023;
originally announced January 2024.
-
Causal State Distillation for Explainable Reinforcement Learning
Authors:
Wenhao Lu,
Xufeng Zhao,
Thilo Fryen,
Jae Hee Lee,
Mengdi Li,
Sven Magg,
Stefan Wermter
Abstract:
Reinforcement learning (RL) is a powerful technique for training intelligent agents, but understanding why these agents make specific decisions can be quite challenging. This lack of transparency in RL models has been a long-standing problem, making it difficult for users to grasp the reasons behind an agent's behaviour. Various approaches have been explored to address this problem, with one promi…
▽ More
Reinforcement learning (RL) is a powerful technique for training intelligent agents, but understanding why these agents make specific decisions can be quite challenging. This lack of transparency in RL models has been a long-standing problem, making it difficult for users to grasp the reasons behind an agent's behaviour. Various approaches have been explored to address this problem, with one promising avenue being reward decomposition (RD). RD is appealing as it sidesteps some of the concerns associated with other methods that attempt to rationalize an agent's behaviour in a post-hoc manner. RD works by exposing various facets of the rewards that contribute to the agent's objectives during training. However, RD alone has limitations as it primarily offers insights based on sub-rewards and does not delve into the intricate cause-and-effect relationships that occur within an RL agent's neural model. In this paper, we present an extension of RD that goes beyond sub-rewards to provide more informative explanations. Our approach is centred on a causal learning framework that leverages information-theoretic measures for explanation objectives that encourage three crucial properties of causal factors: causal sufficiency, sparseness, and orthogonality. These properties help us distill the cause-and-effect relationships between the agent's states and actions or rewards, allowing for a deeper understanding of its decision-making processes. Our framework is designed to generate local explanations and can be applied to a wide range of RL tasks with multiple reward channels. Through a series of experiments, we demonstrate that our approach offers more meaningful and insightful explanations for the agent's action selections.
△ Less
Submitted 1 April, 2024; v1 submitted 29 December, 2023;
originally announced January 2024.
-
Does PML exponentially absorb outgoing waves scattering from a periodic surface?
Authors:
Wangtao Lu,
Kuanrong Shen,
Ruming Zhang
Abstract:
The PML method is well-known for its exponential convergence rate and easy implementation for scattering problems with unbounded domains. For rough-surface scattering problems, authors in [5] proved that the PML method converges at most algebraically in the physical domain. However, the authors also asked a question whether exponential convergence still holds for compact subsets. In [25], one of o…
▽ More
The PML method is well-known for its exponential convergence rate and easy implementation for scattering problems with unbounded domains. For rough-surface scattering problems, authors in [5] proved that the PML method converges at most algebraically in the physical domain. However, the authors also asked a question whether exponential convergence still holds for compact subsets. In [25], one of our authors proved the exponential convergence for periodic surfaces via the Floquet-Bloch transform when the wavenumber is positive and not a half integer; when the wavenumber is a positive half integer, a nearly fourth-order convergence rate was shown in [26]. The extension of this method to locally perturbed cases is not straightforward, since the domain is no longer periodic thus the Floquet-Bloch transform doesn't work, especially when the domain topology is changed. Moreover, the exact decay rate when the wavenumber is a half integer remains unclear. The purpose of this paper is to address these two significant issues. For the first topic, the main idea is to reduce the problem by the DtN map on an artificial curve, then the convergence rate of the PML is obtained from the investigation of the DtN map. It shows exactly the same convergence rate as in the unperturbed case. Second, to illustrate the convergence rate when the wavenumber is a half integer, we design a specific periodic structure for which the PML converges at the fourth-order, showing that the algebraic convergence rate is sharp. We adopt a previously developed high-accuracy PML-BIE solver to exhibit this unexpected phenomenon.
△ Less
Submitted 26 December, 2023;
originally announced December 2023.
-
NodeMixup: Tackling Under-Reaching for Graph Neural Networks
Authors:
Weigang Lu,
Ziyu Guan,
Wei Zhao,
Yaming Yang,
Long **
Abstract:
Graph Neural Networks (GNNs) have become mainstream methods for solving the semi-supervised node classification problem. However, due to the uneven location distribution of labeled nodes in the graph, labeled nodes are only accessible to a small portion of unlabeled nodes, leading to the \emph{under-reaching} issue. In this study, we firstly reveal under-reaching by conducting an empirical investi…
▽ More
Graph Neural Networks (GNNs) have become mainstream methods for solving the semi-supervised node classification problem. However, due to the uneven location distribution of labeled nodes in the graph, labeled nodes are only accessible to a small portion of unlabeled nodes, leading to the \emph{under-reaching} issue. In this study, we firstly reveal under-reaching by conducting an empirical investigation on various well-known graphs. Then, we demonstrate that under-reaching results in unsatisfactory distribution alignment between labeled and unlabeled nodes through systematic experimental analysis, significantly degrading GNNs' performance. To tackle under-reaching for GNNs, we propose an architecture-agnostic method dubbed NodeMixup. The fundamental idea is to (1) increase the reachability of labeled nodes by labeled-unlabeled pairs mixup, (2) leverage graph structures via fusing the neighbor connections of intra-class node pairs to improve performance gains of mixup, and (3) use neighbor label distribution similarity incorporating node degrees to determine sampling weights for node mixup. Extensive experiments demonstrate the efficacy of NodeMixup in assisting GNNs in handling under-reaching. The source code is available at \url{https://github.com/WeigangLu/NodeMixup}.
△ Less
Submitted 20 December, 2023; v1 submitted 20 December, 2023;
originally announced December 2023.