Search | arXiv e-print repository

arXiv:2407.03319 [pdf, other]

`Interaction annealing' to determine effective quantized valence and orbital structure: an illustration with ferro-orbital order in WTe$_2$

Authors: Ruoshi Jiang, Fangyuan Gu, Wei Ku

Abstract: Strongly correlated materials are known to display qualitatively distinct emergent behaviors at low energy. Conveniently, the superposition principle of quantum mechanics ensures that, upon absorbing quantum fluctuation, these rich low-energy behaviors can always be effectively described by dressed particles with fully quantized charge, spin, and orbitals structure. Such a powerful and simple desc… ▽ More Strongly correlated materials are known to display qualitatively distinct emergent behaviors at low energy. Conveniently, the superposition principle of quantum mechanics ensures that, upon absorbing quantum fluctuation, these rich low-energy behaviors can always be effectively described by dressed particles with fully quantized charge, spin, and orbitals structure. Such a powerful and simple description is, however, difficult to access through density functional theory (DFT) calculations, since in terms of bare particles the quantum fluctuation would heavily smear the quantized quantities. To address this difficulty, we propose an `interaction annealing' approach to decipher the dominant valence and orbital structure by suppressing the charge fluctuation through enhancing ionic charging energy. Applying this approach to ferroelectric semi-metal WTe${_2}$ as a demonstration, we identify a dominant ferro-orbital ordered structure with W ion in a $d^2$ spin-0 configuration. The proposed approach is straightforward to implement in standard DFT calculations to grant additional access to essential low-energy physics. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: 6 pages, 3 figures

arXiv:2407.01846 [pdf, other]

Investigating the Segment Anything Foundation Model for Map** Smallholder Agriculture Field Boundaries Without Training Labels

Authors: Pratyush Tripathy, Kathy Baylis, Kyle Wu, Jyles Watson, Ruizhe Jiang

Abstract: Accurate map** of agricultural field boundaries is crucial for enhancing outcomes like precision agriculture, crop monitoring, and yield estimation. However, extracting these boundaries from satellite images is challenging, especially for smallholder farms and data-scarce environments. This study explores the Segment Anything Model (SAM) to delineate agricultural field boundaries in Bihar, India… ▽ More Accurate map** of agricultural field boundaries is crucial for enhancing outcomes like precision agriculture, crop monitoring, and yield estimation. However, extracting these boundaries from satellite images is challenging, especially for smallholder farms and data-scarce environments. This study explores the Segment Anything Model (SAM) to delineate agricultural field boundaries in Bihar, India, using 2-meter resolution SkySat imagery without additional training. We evaluate SAM's performance across three model checkpoints, various input sizes, multi-date satellite images, and edge-enhanced imagery. Our results show that SAM correctly identifies about 58% of field boundaries, comparable to other approaches requiring extensive training data. Using different input image sizes improves accuracy, with the most significant improvement observed when using multi-date satellite images. This work establishes proof of concept for using SAM and maximizing its potential in agricultural field boundary map**. Our work highlights SAM's potential in delineating agriculture field boundary in training-data scarce settings to enable a wide range of agriculture related analysis. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: 11 pages, 6 main figures, 7 supplementary figures

arXiv:2406.12709 [pdf, other]

Enhancing Spatio-temporal Quantile Forecasting with Curriculum Learning: Lessons Learned

Authors: Du Yin, **liang Deng, Shuang Ao, Zechen Li, Hao Xue, Arian Prabowo, Renhe Jiang, Xuan Song, Flora Salim

Abstract: Training models on spatio-temporal (ST) data poses an open problem due to the complicated and diverse nature of the data itself, and it is challenging to ensure the model's performance directly trained on the original ST data. While limiting the variety of training data can make training easier, it can also lead to a lack of knowledge and information for the model, resulting in a decrease in perfo… ▽ More Training models on spatio-temporal (ST) data poses an open problem due to the complicated and diverse nature of the data itself, and it is challenging to ensure the model's performance directly trained on the original ST data. While limiting the variety of training data can make training easier, it can also lead to a lack of knowledge and information for the model, resulting in a decrease in performance. To address this challenge, we presented an innovative paradigm that incorporates three separate forms of curriculum learning specifically targeting from spatial, temporal, and quantile perspectives. Furthermore, our framework incorporates a stacking fusion module to combine diverse information from three types of curriculum learning, resulting in a strong and thorough learning process. We demonstrated the effectiveness of this framework with extensive empirical evaluations, highlighting its better performance in addressing complex ST challenges. We provided thorough ablation studies to investigate the effectiveness of our curriculum and to explain how it contributes to the improvement of learning efficiency on ST data. △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.12208 [pdf, other]

Knowledge Fusion By Evolving Weights of Language Models

Authors: Guodong Du, **g Li, Hanting Liu, Runhua Jiang, Shuyang Yu, Yifei Guo, Sim Kuan Goh, Ho-Kin Tang

Abstract: Fine-tuning pre-trained language models, particularly large language models, demands extensive computing resources and can result in varying performance outcomes across different domains and datasets. This paper examines the approach of integrating multiple models from diverse training scenarios into a unified model. This unified model excels across various data domains and exhibits the ability to… ▽ More Fine-tuning pre-trained language models, particularly large language models, demands extensive computing resources and can result in varying performance outcomes across different domains and datasets. This paper examines the approach of integrating multiple models from diverse training scenarios into a unified model. This unified model excels across various data domains and exhibits the ability to generalize well on out-of-domain data. We propose a knowledge fusion method named Evolver, inspired by evolutionary algorithms, which does not need further training or additional training data. Specifically, our method involves aggregating the weights of different language models into a population and subsequently generating offspring models through mutation and crossover operations. These offspring models are then evaluated against their parents, allowing for the preservation of those models that show enhanced performance on development datasets. Importantly, our model evolving strategy can be seamlessly integrated with existing model merging frameworks, offering a versatile tool for model enhancement. Experimental results on mainstream language models (i.e., encoder-only, decoder-only, encoder-decoder) reveal that Evolver outperforms previous state-of-the-art models by large margins. The code is publicly available at {https://github.com/duguodong7/model-evolution}. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: Accepted by ACL2024 Findings

arXiv:2406.11191 [pdf, other]

A Survey on Human Preference Learning for Large Language Models

Authors: Ruili Jiang, Kehai Chen, Xuefeng Bai, Zhixuan He, Juntao Li, Muyun Yang, Tiejun Zhao, Liqiang Nie, Min Zhang

Abstract: The recent surge of versatile large language models (LLMs) largely depends on aligning increasingly capable foundation models with human intentions by preference learning, enhancing LLMs with excellent applicability and effectiveness in a wide range of contexts. Despite the numerous related studies conducted, a perspective on how human preferences are introduced into LLMs remains limited, which ma… ▽ More The recent surge of versatile large language models (LLMs) largely depends on aligning increasingly capable foundation models with human intentions by preference learning, enhancing LLMs with excellent applicability and effectiveness in a wide range of contexts. Despite the numerous related studies conducted, a perspective on how human preferences are introduced into LLMs remains limited, which may prevent a deeper comprehension of the relationships between human preferences and LLMs as well as the realization of their limitations. In this survey, we review the progress in exploring human preference learning for LLMs from a preference-centered perspective, covering the sources and formats of preference feedback, the modeling and usage of preference signals, as well as the evaluation of the aligned LLMs. We first categorize the human feedback according to data sources and formats. We then summarize techniques for human preferences modeling and compare the advantages and disadvantages of different schools of models. Moreover, we present various preference usage methods sorted by the objectives to utilize human preference signals. Finally, we summarize some prevailing approaches to evaluate LLMs in terms of alignment with human intentions and discuss our outlooks on the human intention alignment for LLMs. △ Less

Submitted 18 June, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

Comments: IEEE copyright statement added (also applied to the former version)

arXiv:2406.04592 [pdf, ps, other]

Convergence Analysis of Adaptive Gradient Methods under Refined Smoothness and Noise Assumptions

Authors: Devyani Maladkar, Ruichen Jiang, Aryan Mokhtari

Abstract: Adaptive gradient methods are arguably the most successful optimization algorithms for neural network training. While it is well-known that adaptive gradient methods can achieve better dimensional dependence than stochastic gradient descent (SGD) under favorable geometry for stochastic convex optimization, the theoretical justification for their success in stochastic non-convex optimization remain… ▽ More Adaptive gradient methods are arguably the most successful optimization algorithms for neural network training. While it is well-known that adaptive gradient methods can achieve better dimensional dependence than stochastic gradient descent (SGD) under favorable geometry for stochastic convex optimization, the theoretical justification for their success in stochastic non-convex optimization remains elusive. In this paper, we aim to close this gap by analyzing the convergence rates of AdaGrad measured by the $\ell_1$-norm of the gradient. Specifically, when the objective has $L$-Lipschitz gradient and the stochastic gradient variance is bounded by $σ^2$, we prove a worst-case convergence rate of $\tilde{\mathcal{O}}(\frac{\sqrt{d}L}{\sqrt{T}} + \frac{\sqrt{d} σ}{T^{1/4}})$, where $d$ is the dimension of the problem.We also present a lower bound of $Ω(\frac{\sqrt{d}}{\sqrt{T}})$ for minimizing the gradient $\ell_1$-norm in the deterministic setting, showing the tightness of our upper bound in the noiseless case. Moreover, under more fine-grained assumptions on the smoothness structure of the objective and the gradient noise and under favorable gradient $\ell_1/\ell_2$ geometry, we show that AdaGrad can potentially shave a factor of $\sqrt{d}$ compared to SGD. To the best of our knowledge, this is the first result for adaptive gradient methods that demonstrates a provable gain over SGD in the non-convex setting. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: 21 pages

arXiv:2406.02349 [pdf, other]

CADE: Cosine Annealing Differential Evolution for Spiking Neural Network

Authors: Runhua Jiang, Guodong Du, Shuyang Yu, Yifei Guo, Sim Kuan Goh, Ho-Kin Tang

Abstract: Spiking neural networks (SNNs) have gained prominence for their potential in neuromorphic computing and energy-efficient artificial intelligence, yet optimizing them remains a formidable challenge for gradient-based methods due to their discrete, spike-based computation. This paper attempts to tackle the challenges by introducing Cosine Annealing Differential Evolution (CADE), designed to modulate… ▽ More Spiking neural networks (SNNs) have gained prominence for their potential in neuromorphic computing and energy-efficient artificial intelligence, yet optimizing them remains a formidable challenge for gradient-based methods due to their discrete, spike-based computation. This paper attempts to tackle the challenges by introducing Cosine Annealing Differential Evolution (CADE), designed to modulate the mutation factor (F) and crossover rate (CR) of differential evolution (DE) for the SNN model, i.e., Spiking Element Wise (SEW) ResNet. Extensive empirical evaluations were conducted to analyze CADE. CADE showed a balance in exploring and exploiting the search space, resulting in accelerated convergence and improved accuracy compared to existing gradient-based and DE-based methods. Moreover, an initialization method based on a transfer learning setting was developed, pretraining on a source dataset (i.e., CIFAR-10) and fine-tuning the target dataset (i.e., CIFAR-100), to improve population diversity. It was found to further enhance CADE for SNN. Remarkably, CADE elevates the performance of the highest accuracy SEW model by an additional 0.52 percentage points, underscoring its effectiveness in fine-tuning and enhancing SNNs. These findings emphasize the pivotal role of a scheduler for F and CR adjustment, especially for DE-based SNN. Source Code on Github: https://github.com/Tank-Jiang/CADE4SNN. △ Less

Submitted 4 June, 2024; originally announced June 2024.

arXiv:2406.02016 [pdf, other]

Adaptive and Optimal Second-order Optimistic Methods for Minimax Optimization

Authors: Ruichen Jiang, Ali Kavis, Qiujiang **, Sujay Sanghavi, Aryan Mokhtari

Abstract: We propose adaptive, line search-free second-order methods with optimal rate of convergence for solving convex-concave min-max problems. By means of an adaptive step size, our algorithms feature a simple update rule that requires solving only one linear system per iteration, eliminating the need for line search or backtracking mechanisms. Specifically, we base our algorithms on the optimistic meth… ▽ More We propose adaptive, line search-free second-order methods with optimal rate of convergence for solving convex-concave min-max problems. By means of an adaptive step size, our algorithms feature a simple update rule that requires solving only one linear system per iteration, eliminating the need for line search or backtracking mechanisms. Specifically, we base our algorithms on the optimistic method and appropriately combine it with second-order information. Moreover, distinct from common adaptive schemes, we define the step size recursively as a function of the gradient norm and the prediction error in the optimistic update. We first analyze a variant where the step size requires knowledge of the Lipschitz constant of the Hessian. Under the additional assumption of Lipschitz continuous gradients, we further design a parameter-free version by tracking the Hessian Lipschitz constant locally and ensuring the iterates remain bounded. We also evaluate the practical performance of our algorithm by comparing it to existing second-order algorithms for minimax optimization. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: 33 pages, 2 figures

arXiv:2406.01478 [pdf, other]

Stochastic Newton Proximal Extragradient Method

Authors: Ruichen Jiang, Michał Dereziński, Aryan Mokhtari

Abstract: Stochastic second-order methods achieve fast local convergence in strongly convex optimization by using noisy Hessian estimates to precondition the gradient. However, these methods typically reach superlinear convergence only when the stochastic Hessian noise diminishes, increasing per-iteration costs over time. Recent work in [arXiv:2204.09266] addressed this with a Hessian averaging scheme that… ▽ More Stochastic second-order methods achieve fast local convergence in strongly convex optimization by using noisy Hessian estimates to precondition the gradient. However, these methods typically reach superlinear convergence only when the stochastic Hessian noise diminishes, increasing per-iteration costs over time. Recent work in [arXiv:2204.09266] addressed this with a Hessian averaging scheme that achieves superlinear convergence without higher per-iteration costs. Nonetheless, the method has slow global convergence, requiring up to $\tilde{O}(κ^2)$ iterations to reach the superlinear rate of $\tilde{O}((1/t)^{t/2})$, where $κ$ is the problem's condition number. In this paper, we propose a novel stochastic Newton proximal extragradient method that improves these bounds, achieving a faster global linear rate and reaching the same fast superlinear rate in $\tilde{O}(κ)$ iterations. We accomplish this by extending the Hybrid Proximal Extragradient (HPE) framework, achieving fast global and local convergence rates for strongly convex functions with access to a noisy Hessian oracle. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: 32 pages, 1 figure

arXiv:2405.18322 [pdf, other]

SCE-MAE: Selective Correspondence Enhancement with Masked Autoencoder for Self-Supervised Landmark Estimation

Authors: Kejia Yin, Varshanth R. Rao, Ruowei Jiang, Xudong Liu, Parham Aarabi, David B. Lindell

Abstract: Self-supervised landmark estimation is a challenging task that demands the formation of locally distinct feature representations to identify sparse facial landmarks in the absence of annotated data. To tackle this task, existing state-of-the-art (SOTA) methods (1) extract coarse features from backbones that are trained with instance-level self-supervised learning (SSL) paradigms, which neglect the… ▽ More Self-supervised landmark estimation is a challenging task that demands the formation of locally distinct feature representations to identify sparse facial landmarks in the absence of annotated data. To tackle this task, existing state-of-the-art (SOTA) methods (1) extract coarse features from backbones that are trained with instance-level self-supervised learning (SSL) paradigms, which neglect the dense prediction nature of the task, (2) aggregate them into memory-intensive hypercolumn formations, and (3) supervise lightweight projector networks to naively establish full local correspondences among all pairs of spatial features. In this paper, we introduce SCE-MAE, a framework that (1) leverages the MAE, a region-level SSL method that naturally better suits the landmark prediction task, (2) operates on the vanilla feature map instead of on expensive hypercolumns, and (3) employs a Correspondence Approximation and Refinement Block (CARB) that utilizes a simple density peak clustering algorithm and our proposed Locality-Constrained Repellence Loss to directly hone only select local correspondences. We demonstrate through extensive experiments that SCE-MAE is highly effective and robust, outperforming existing SOTA methods by large margins of approximately 20%-44% on the landmark matching and approximately 9%-15% on the landmark detection tasks. △ Less

Submitted 28 May, 2024; originally announced May 2024.

Comments: Accepted at CVPR 2024

arXiv:2405.16075 [pdf, other]

Continuous Temporal Domain Generalization

Authors: Zekun Cai, Guangji Bai, Renhe Jiang, Xuan Song, Liang Zhao

Abstract: Temporal Domain Generalization (TDG) addresses the challenge of training predictive models under temporally varying data distributions. Traditional TDG approaches typically focus on domain data collected at fixed, discrete time intervals, which limits their capability to capture the inherent dynamics within continuous-evolving and irregularly-observed temporal domains. To overcome this, this work… ▽ More Temporal Domain Generalization (TDG) addresses the challenge of training predictive models under temporally varying data distributions. Traditional TDG approaches typically focus on domain data collected at fixed, discrete time intervals, which limits their capability to capture the inherent dynamics within continuous-evolving and irregularly-observed temporal domains. To overcome this, this work formalizes the concept of Continuous Temporal Domain Generalization (CTDG), where domain data are derived from continuous times and are collected at arbitrary times. CTDG tackles critical challenges including: 1) Characterizing the continuous dynamics of both data and models, 2) Learning complex high-dimensional nonlinear dynamics, and 3) Optimizing and controlling the generalization across continuous temporal domains. To address them, we propose a Koopman operator-driven continuous temporal domain generalization (Koodos) framework. We formulate the problem within a continuous dynamic system and leverage the Koopman theory to learn the underlying dynamics; the framework is further enhanced with a comprehensive optimization strategy equipped with analysis and control driven by prior knowledge of the dynamics patterns. Extensive experiments demonstrate the effectiveness and efficiency of our approach. △ Less

Submitted 25 May, 2024; originally announced May 2024.

arXiv:2405.15344 [pdf, other]

Adaptive Finite Element Method for a Nonlinear Helmholtz Equation with High Wave Number

Authors: Run Jiang, Haijun Wu, Yifeng Xu, Jun Zou

Abstract: A nonlinear Helmholtz (NLH) equation with high frequencies and corner singularities is discretized by the linear finite element method (FEM). After deriving some wave-number-explicit stability estimates and the singularity decomposition for the NLH problem, a priori stability and error estimates are established for the FEM on shape regular meshes including the case of locally refined meshes. Then… ▽ More A nonlinear Helmholtz (NLH) equation with high frequencies and corner singularities is discretized by the linear finite element method (FEM). After deriving some wave-number-explicit stability estimates and the singularity decomposition for the NLH problem, a priori stability and error estimates are established for the FEM on shape regular meshes including the case of locally refined meshes. Then a posteriori upper and lower bounds using a new residual-type error estimator, which is equivalent to the standard one, are derived for the FE solutions to the NLH problem. These a posteriori estimates have confirmed a significant fact that is also valid for the NLH problem, namely the residual-type estimator seriously underestimates the error of the FE solution in the preasymptotic regime, which was first observed by Babuška et al. [Int J Numer Methods Eng 40 (1997)] for a one-dimensional linear problem. Based on the new a posteriori error estimator, both the convergence and the quasi-optimality of the resulting adaptive finite element algorithm are proved the first time for the NLH problem, when the initial mesh size lying in the preasymptotic regime. Finally, numerical examples are presented to validate the theoretical findings and demonstrate that applying the continuous interior penalty (CIP) technique with appropriate penalty parameters can reduce the pollution errors efficiently. In particular, the nonlinear phenomenon of optical bistability with Gaussian incident waves is successfully simulated by the adaptive CIPFEM. △ Less

Submitted 27 May, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

arXiv:2405.10800 [pdf, other]

Heterogeneity-Informed Meta-Parameter Learning for Spatiotemporal Time Series Forecasting

Authors: Zheng Dong, Renhe Jiang, Haotian Gao, Hangchen Liu, **liang Deng, Qingsong Wen, Xuan Song

Abstract: Spatiotemporal time series forecasting plays a key role in a wide range of real-world applications. While significant progress has been made in this area, fully capturing and leveraging spatiotemporal heterogeneity remains a fundamental challenge. Therefore, we propose a novel Heterogeneity-Informed Meta-Parameter Learning scheme. Specifically, our approach implicitly captures spatiotemporal heter… ▽ More Spatiotemporal time series forecasting plays a key role in a wide range of real-world applications. While significant progress has been made in this area, fully capturing and leveraging spatiotemporal heterogeneity remains a fundamental challenge. Therefore, we propose a novel Heterogeneity-Informed Meta-Parameter Learning scheme. Specifically, our approach implicitly captures spatiotemporal heterogeneity through learning spatial and temporal embeddings, which can be viewed as a clustering process. Then, a novel spatiotemporal meta-parameter learning paradigm is proposed to learn spatiotemporal-specific parameters from meta-parameter pools, which is informed by the captured heterogeneity. Based on these ideas, we develop a Heterogeneity-Informed Spatiotemporal Meta-Network (HimNet) for spatiotemporal time series forecasting. Extensive experiments on five widely-used benchmarks demonstrate our method achieves state-of-the-art performance while exhibiting superior interpretability. Our code is available at https://github.com/XDZhelheim/HimNet. △ Less

Submitted 17 May, 2024; originally announced May 2024.

Comments: Accepted by KDD'24 Research Track

arXiv:2405.04976 [pdf, other]

RF-based Energy Harvesting: Nonlinear Models, Applications and Challenges

Authors: Ruihong Jiang

Abstract: So far, various aspects associated with wireless energy harvesting (EH) have been investigated from diverse perspectives, including energy sources and models, usage protocols, energy scheduling and optimization, and EH implementation in different wireless communication systems. However, a comprehensive survey specifically focusing on models of radio frequency (RF)-based EH behaviors has not yet be… ▽ More So far, various aspects associated with wireless energy harvesting (EH) have been investigated from diverse perspectives, including energy sources and models, usage protocols, energy scheduling and optimization, and EH implementation in different wireless communication systems. However, a comprehensive survey specifically focusing on models of radio frequency (RF)-based EH behaviors has not yet been presented. To address this gap, this article provides an overview of the mainstream mathematical models that capture the nonlinear behavior of practical EH circuits, serving as a valuable handbook of mathematical models for EH application research. Moreover, we summarize the application of each nonlinear EH model, including the associated challenges and precautions. We also analyze the impact and advancements of each EH model on RF-based EH systems in wireless communication, utilizing artificial intelligence (AI) techniques. Additionally, we highlight emerging research directions in the context of nonlinear RF-based EH. This article aims to contribute to the future application of RF-based EH in novel communication research domains to a significant extent. △ Less

Submitted 8 May, 2024; originally announced May 2024.

arXiv:2405.04350 [pdf, other]

Decision-Dependent Uncertainty-Aware Distribution System Planning Under Wildfire Risk

Authors: Felipe Piancó, Alexandre Moreira, Bruno Fanzeres, Ruiwei Jiang, Chaoyue Zhao, Miguel Heleno

Abstract: The interaction between power systems and wildfires can be dangerous and costly. Damaged structures, load shedding, and high operational costs are potential consequences when the grid is unprepared. In fact, the operation of distribution grids can be liable for the outbreak of wildfires when extreme weather conditions arise. Within this context, investment planning should consider the impact of op… ▽ More The interaction between power systems and wildfires can be dangerous and costly. Damaged structures, load shedding, and high operational costs are potential consequences when the grid is unprepared. In fact, the operation of distribution grids can be liable for the outbreak of wildfires when extreme weather conditions arise. Within this context, investment planning should consider the impact of operational actions on the uncertainty related to wildfires that can directly affect line failure likelihood. Neglecting this can compromise the cost-benefit evaluation in planning system investments for wildfire risk. In this paper, we propose a decision-dependent uncertainty (DDU) aware methodology that provides the optimal portfolio of investments for distribution systems while considering that high power-flow levels through line segments in high-threat areas can ignite wildfires and, therefore, increase the probability of line failures. The methodology identifies the best combination of system upgrades (installation of new lines, hardening existing lines, and placement of switching devices) to provide the necessary leeway to operate the distribution system under wildfire-prone conditions. Our case study demonstrates that by modeling the DDU relationship between power flow prescriptions and line failures, investment decisions are more accurate and better prepare the grid infrastructure to deal with wildfire risk. △ Less

Submitted 8 May, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

arXiv:2405.03255 [pdf, other]

Multi-Modality Spatio-Temporal Forecasting via Self-Supervised Learning

Authors: Jiewen Deng, Renhe Jiang, Jiaqi Zhang, Xuan Song

Abstract: Multi-modality spatio-temporal (MoST) data extends spatio-temporal (ST) data by incorporating multiple modalities, which is prevalent in monitoring systems, encompassing diverse traffic demands and air quality assessments. Despite significant strides in ST modeling in recent years, there remains a need to emphasize harnessing the potential of information from different modalities. Robust MoST fore… ▽ More Multi-modality spatio-temporal (MoST) data extends spatio-temporal (ST) data by incorporating multiple modalities, which is prevalent in monitoring systems, encompassing diverse traffic demands and air quality assessments. Despite significant strides in ST modeling in recent years, there remains a need to emphasize harnessing the potential of information from different modalities. Robust MoST forecasting is more challenging because it possesses (i) high-dimensional and complex internal structures and (ii) dynamic heterogeneity caused by temporal, spatial, and modality variations. In this study, we propose a novel MoST learning framework via Self-Supervised Learning, namely MoSSL, which aims to uncover latent patterns from temporal, spatial, and modality perspectives while quantifying dynamic heterogeneity. Experiment results on two real-world MoST datasets verify the superiority of our approach compared with the state-of-the-art baselines. Model implementation is available at https://github.com/beginner-sketch/MoSSL. △ Less

Submitted 6 May, 2024; originally announced May 2024.

Comments: Accepted by IJCAI 2024 Main Track

arXiv:2405.01350 [pdf, other]

Community-Invariant Graph Contrastive Learning

Authors: Shiyin Tan, Dongyuan Li, Renhe Jiang, Ying Zhang, Manabu Okumura

Abstract: Graph augmentation has received great attention in recent years for graph contrastive learning (GCL) to learn well-generalized node/graph representations. However, mainstream GCL methods often favor randomly disrupting graphs for augmentation, which shows limited generalization and inevitably leads to the corruption of high-level graph information, i.e., the graph community. Moreover, current know… ▽ More Graph augmentation has received great attention in recent years for graph contrastive learning (GCL) to learn well-generalized node/graph representations. However, mainstream GCL methods often favor randomly disrupting graphs for augmentation, which shows limited generalization and inevitably leads to the corruption of high-level graph information, i.e., the graph community. Moreover, current knowledge-based graph augmentation methods can only focus on either topology or node features, causing the model to lack robustness against various types of noise. To address these limitations, this research investigated the role of the graph community in graph augmentation and figured out its crucial advantage for learnable graph augmentation. Based on our observations, we propose a community-invariant GCL framework to maintain graph community structure during learnable graph augmentation. By maximizing the spectral changes, this framework unifies the constraints of both topology and feature augmentation, enhancing the model's robustness. Empirical evidence on 21 benchmark datasets demonstrates the exclusive merits of our framework. Code is released on Github (https://github.com/ShiyinTan/CI-GCL.git). △ Less

Submitted 2 May, 2024; originally announced May 2024.

Comments: This paper is accepted by ICML-2024

arXiv:2405.00713 [pdf, ps, other]

Some inequalities related to Riesz transform on exterior Lipschitz domains

Authors: Ren** Jiang, Sibei Yang

Abstract: Let $n\ge2$ and $\mathcal{L}=-\mathrm{div}(A\nabla\cdot)$ be an elliptic operator on $\mathbb{R}^n$. Given an exterior Lipschitz domain $Ω$, let $\mathcal{L}_D$ and $\mathcal{L}_N$ be the elliptic operators $\mathcal{L}$ on $Ω$ subject to the Dirichlet and the Neumann boundary {conditions}, respectively. For the Neumann operator, we show that the reverse inequality… ▽ More Let $n\ge2$ and $\mathcal{L}=-\mathrm{div}(A\nabla\cdot)$ be an elliptic operator on $\mathbb{R}^n$. Given an exterior Lipschitz domain $Ω$, let $\mathcal{L}_D$ and $\mathcal{L}_N$ be the elliptic operators $\mathcal{L}$ on $Ω$ subject to the Dirichlet and the Neumann boundary {conditions}, respectively. For the Neumann operator, we show that the reverse inequality $\|\mathcal{L}_N^{1/2}f\|_{L^p(Ω)} \le C\|\nabla f\|_{L^p(Ω)}$ holds true for any $p\in(1,\infty)$. For the Dirichlet operator, it was known that the Riesz operator $\nabla \mathcal{L}_D^{-1/2}$ is not bounded for $p>2$ and $p\ge n$, even if $\mathcal{L}=-Δ$ being the Laplace operator. Suppose that $A$ are CMO coefficients or VMO coefficients satisfying certain perturbation property, and $\partialΩ$ is $C^1$, we prove that for $p>2$ and $p\in [n,\infty)$, it holds $$ \inf_{φ\in\mathcal{A}^p_0(Ω)}\left\|\nabla f-\nablaφ\right\|_{L^p(Ω)}\le C\left\|\mathcal{L}^{1/2}_D f\right\|_{L^p(Ω)} $$ for $f\in \dot{W}^{1,p}_0(Ω)$. Here $\mathcal{A}^p_0(Ω)=\{f\in \dot{W}^{1,p}_0(Ω):\,\mathcal{L}_Df=0\}$ is a non-trivial subspace generated by harmonic function in $Ω$ with zero boundary value. △ Less

Submitted 25 April, 2024; originally announced May 2024.

Comments: 24pp, comments are welcome

arXiv:2405.00334 [pdf, other]

A Survey on Deep Active Learning: Recent Advances and New Frontiers

Authors: Dongyuan Li, Zhen Wang, Yankai Chen, Renhe Jiang, Wei** Ding, Manabu Okumura

Abstract: Active learning seeks to achieve strong performance with fewer training samples. It does this by iteratively asking an oracle to label new selected samples in a human-in-the-loop manner. This technique has gained increasing popularity due to its broad applicability, yet its survey papers, especially for deep learning-based active learning (DAL), remain scarce. Therefore, we conduct an advanced and… ▽ More Active learning seeks to achieve strong performance with fewer training samples. It does this by iteratively asking an oracle to label new selected samples in a human-in-the-loop manner. This technique has gained increasing popularity due to its broad applicability, yet its survey papers, especially for deep learning-based active learning (DAL), remain scarce. Therefore, we conduct an advanced and comprehensive survey on DAL. We first introduce reviewed paper collection and filtering. Second, we formally define the DAL task and summarize the most influential baselines and widely used datasets. Third, we systematically provide a taxonomy of DAL methods from five perspectives, including annotation types, query strategies, deep model architectures, learning paradigms, and training processes, and objectively analyze their strengths and weaknesses. Then, we comprehensively summarize main applications of DAL in Natural Language Processing (NLP), Computer Vision (CV), and Data Mining (DM), etc. Finally, we discuss challenges and perspectives after a detailed analysis of current studies. This work aims to serve as a useful and quick guide for researchers in overcoming difficulties in DAL. We hope that this survey will spur further progress in this burgeoning field. △ Less

Submitted 1 May, 2024; originally announced May 2024.

Comments: This paper is accepted by IEEE Transactions on Neural Networks and Learning Systems

arXiv:2404.16731 [pdf, ps, other]

Non-asymptotic Global Convergence Analysis of BFGS with the Armijo-Wolfe Line Search

Authors: Qiujiang **, Ruichen Jiang, Aryan Mokhtari

Abstract: In this paper, we establish the first explicit and non-asymptotic global convergence analysis of the BFGS method when deployed with an inexact line search scheme that satisfies the Armijo-Wolfe conditions. We show that BFGS achieves a global convergence rate of $(1-\frac{1}κ)^k$ for $μ$-strongly convex functions with $L$-Lipschitz gradients, where $κ=\frac{L}μ$ denotes the condition number. Furthe… ▽ More In this paper, we establish the first explicit and non-asymptotic global convergence analysis of the BFGS method when deployed with an inexact line search scheme that satisfies the Armijo-Wolfe conditions. We show that BFGS achieves a global convergence rate of $(1-\frac{1}κ)^k$ for $μ$-strongly convex functions with $L$-Lipschitz gradients, where $κ=\frac{L}μ$ denotes the condition number. Furthermore, if the objective function's Hessian is Lipschitz, BFGS with the Armijo-Wolfe line search achieves a linear convergence rate only determined by the line search parameters and independent of the condition number. These results hold for any initial point $x_0$ and any symmetric positive definite initial Hessian approximation matrix $B_0$, although the choice of $B_0$ affects the iteration count required to attain these rates. Specifically, we show that for $B_0 = LI$, the rate of $O((1-\frac{1}κ)^k)$ appears from the first iteration, while for $B_0 = μI$, it takes $d\log κ$ iterations. Conversely, the condition number-independent linear convergence rate for $B_0 = LI$ occurs after $O\left(κ\left(d +\frac{M \sqrt{f(x_0)-f(x_*)}}{μ^{3/2}}\right)\right)$ iterations, whereas for $B_0 = μI$, it holds after $O\left(\frac{M \sqrt{f(x_0)-f(x_*)}}{μ^{3/2}}\left(d\log κ+ κ\right)\right)$ iterations. Here, $d$ denotes the dimension of the problem, $M$ is the Lipschitz parameter of the Hessian, and $x_*$ denotes the optimal solution. We further leverage these global linear convergence results to characterize the overall iteration complexity of BFGS when deployed with the Armijo-Wolfe line search. △ Less

Submitted 25 April, 2024; originally announced April 2024.

arXiv:2404.15597 [pdf, other]

GRSN: Gated Recurrent Spiking Neurons for POMDPs and MARL

Authors: Lang Qin, Ziming Wang, Runhao Jiang, Rui Yan, Hua** Tang

Abstract: Spiking neural networks (SNNs) are widely applied in various fields due to their energy-efficient and fast-inference capabilities. Applying SNNs to reinforcement learning (RL) can significantly reduce the computational resource requirements for agents and improve the algorithm's performance under resource-constrained conditions. However, in current spiking reinforcement learning (SRL) algorithms,… ▽ More Spiking neural networks (SNNs) are widely applied in various fields due to their energy-efficient and fast-inference capabilities. Applying SNNs to reinforcement learning (RL) can significantly reduce the computational resource requirements for agents and improve the algorithm's performance under resource-constrained conditions. However, in current spiking reinforcement learning (SRL) algorithms, the simulation results of multiple time steps can only correspond to a single-step decision in RL. This is quite different from the real temporal dynamics in the brain and also fails to fully exploit the capacity of SNNs to process temporal data. In order to address this temporal mismatch issue and further take advantage of the inherent temporal dynamics of spiking neurons, we propose a novel temporal alignment paradigm (TAP) that leverages the single-step update of spiking neurons to accumulate historical state information in RL and introduces gated units to enhance the memory capacity of spiking neurons. Experimental results show that our method can solve partially observable Markov decision processes (POMDPs) and multi-agent cooperation problems with similar performance as recurrent neural networks (RNNs) but with about 50% power consumption. △ Less

Submitted 23 April, 2024; originally announced April 2024.

arXiv:2404.12184 [pdf, other]

Boolean Matching Reversible Circuits: Algorithm and Complexity

Authors: Tian-Fu Chen, Jie-Hong R. Jiang

Abstract: Boolean matching is an important problem in logic synthesis and verification. Despite being well-studied for conventional Boolean circuits, its treatment for reversible logic circuits remains largely, if not completely, missing. This work provides the first such study. Given two (black-box) reversible logic circuits that are promised to be matchable, we check their equivalences under various input… ▽ More Boolean matching is an important problem in logic synthesis and verification. Despite being well-studied for conventional Boolean circuits, its treatment for reversible logic circuits remains largely, if not completely, missing. This work provides the first such study. Given two (black-box) reversible logic circuits that are promised to be matchable, we check their equivalences under various input/output negation and permutation conditions subject to the availability/unavailability of their inverse circuits. Notably, among other results, we show that the equivalence up to input negation and permutation is solvable in quantum polynomial time, while its classical complexity is exponential. This result is arguably the first demonstration of quantum exponential speedup in solving design automation problems. Also, as a negative result, we show that the equivalence up to both input and output negations is not solvable in quantum polynomial time unless UNIQUE-SAT is, which is unlikely. This work paves the theoretical foundation of Boolean matching reversible circuits for potential applications, e.g., in quantum circuit synthesis. △ Less

Submitted 18 April, 2024; originally announced April 2024.

arXiv:2404.10947 [pdf, other]

Residual Connections Harm Abstract Feature Learning in Masked Autoencoders

Authors: Xiao Zhang, Ruoxi Jiang, William Gao, Rebecca Willett, Michael Maire

Abstract: We demonstrate that adding a weighting factor to decay the strength of identity shortcuts within residual networks substantially improves semantic feature learning in the state-of-the-art self-supervised masked autoencoding (MAE) paradigm. Our modification to the identity shortcuts within a VIT-B/16 backbone of an MAE boosts linear probing accuracy on ImageNet from 67.8% to 72.7%. This significant… ▽ More We demonstrate that adding a weighting factor to decay the strength of identity shortcuts within residual networks substantially improves semantic feature learning in the state-of-the-art self-supervised masked autoencoding (MAE) paradigm. Our modification to the identity shortcuts within a VIT-B/16 backbone of an MAE boosts linear probing accuracy on ImageNet from 67.8% to 72.7%. This significant gap suggests that, while residual connection structure serves an essential role in facilitating gradient propagation, it may have a harmful side effect of reducing capacity for abstract learning by virtue of injecting an echo of shallower representations into deeper layers. We ameliorate this downside via a fixed formula for monotonically decreasing the contribution of identity connections as layer depth increases. Our design promotes the gradual development of feature abstractions, without impacting network trainability. Analyzing the representations learned by our modified residual networks, we find correlation between low effective feature rank and downstream task performance. △ Less

Submitted 20 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

arXiv:2404.09679 [pdf, other]

AntDT: A Self-Adaptive Distributed Training Framework for Leader and Straggler Nodes

Authors: Youshao Xiao, Lin Ju, Zhenglei Zhou, Siyuan Li, Zhaoxin Huan, Dalong Zhang, Rujie Jiang, Lin Wang, Xiaolu Zhang, Lei Liang, Jun Zhou

Abstract: Many distributed training techniques like Parameter Server and AllReduce have been proposed to take advantage of the increasingly large data and rich features. However, stragglers frequently occur in distributed training due to resource contention and hardware heterogeneity, which significantly hampers the training efficiency. Previous works only address part of the stragglers and could not adapti… ▽ More Many distributed training techniques like Parameter Server and AllReduce have been proposed to take advantage of the increasingly large data and rich features. However, stragglers frequently occur in distributed training due to resource contention and hardware heterogeneity, which significantly hampers the training efficiency. Previous works only address part of the stragglers and could not adaptively solve various stragglers in practice. Additionally, it is challenging to use a systematic framework to address all stragglers because different stragglers require diverse data allocation and fault-tolerance mechanisms. Therefore, this paper proposes a unified distributed training framework called AntDT (Ant Distributed Training Framework) to adaptively solve the straggler problems. Firstly, the framework consists of four components, including the Stateful Dynamic Data Sharding service, Monitor, Controller, and Agent. These components work collaboratively to efficiently distribute workloads and provide a range of pre-defined straggler mitigation methods with fault tolerance, thereby hiding messy details of data allocation and fault handling. Secondly, the framework provides a high degree of flexibility, allowing for the customization of straggler mitigation solutions based on the specific circumstances of the cluster. Leveraging this flexibility, we introduce two straggler mitigation solutions, namely AntDT-ND for non-dedicated clusters and AntDT-DD for dedicated clusters, as practical examples to resolve various types of stragglers at Ant Group. Justified by our comprehensive experiments and industrial deployment statistics, AntDT outperforms other SOTA methods more than 3x in terms of training efficiency. Additionally, in Alipay's homepage recommendation scenario, using AntDT reduces the training duration of the ranking model from 27.8 hours to just 5.4 hours. △ Less

Submitted 15 April, 2024; originally announced April 2024.

arXiv:2404.02613 [pdf, other]

Searches for multi-Z boson productions and anomalous gauge boson couplings at a muon collider

Authors: Ruobing Jiang, Chuqiao Jiang, Alim Ruzi, Tianyi Yang, Yong Ban, Qiang Li

Abstract: Multi-boson productions can be exploited as novel probes either for standard model precision tests or new physics searches, and have become one of those popular topics in the ongoing LHC experiments, and in future collider studies, including those for electron-positron and muon-muon colliders. Here we focus on two examples, i.e., ZZZ direct productions through $μ^{+}μ^{-}$ annihilation at a 1 TeV… ▽ More Multi-boson productions can be exploited as novel probes either for standard model precision tests or new physics searches, and have become one of those popular topics in the ongoing LHC experiments, and in future collider studies, including those for electron-positron and muon-muon colliders. Here we focus on two examples, i.e., ZZZ direct productions through $μ^{+}μ^{-}$ annihilation at a 1 TeV muon collider, and ZZ productions through vector boson scattering at a 10 TeV muon collider, with an integrated luminosity of $10 \, \text{ab}^{-1}$. Various channels are considered, including, such as $ZZZ \rightarrow 4l2ν$ and $ZZZ \rightarrow 4l + 2 \text{ jets}$, etc. Expected significance on these multi-Z boson production processes are provided based on a detailed Monte Carlo study and signal background analysis. Sensitives on anomalous gauge boson couplings are also presented. △ Less

Submitted 28 May, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

Comments: This paper has been submitted to Chinese Physics C

arXiv:2404.01267 [pdf, other]

Non-asymptotic Global Convergence Rates of BFGS with Exact Line Search

Authors: Qiujiang **, Ruichen Jiang, Aryan Mokhtari

Abstract: In this paper, we explore the non-asymptotic global convergence rates of the Broyden-Fletcher-Goldfarb-Shanno (BFGS) method implemented with exact line search. Notably, due to Dixon's equivalence result, our findings are also applicable to other quasi-Newton methods in the convex Broyden class employing exact line search, such as the Davidon-Fletcher-Powell (DFP) method. Specifically, we focus on… ▽ More In this paper, we explore the non-asymptotic global convergence rates of the Broyden-Fletcher-Goldfarb-Shanno (BFGS) method implemented with exact line search. Notably, due to Dixon's equivalence result, our findings are also applicable to other quasi-Newton methods in the convex Broyden class employing exact line search, such as the Davidon-Fletcher-Powell (DFP) method. Specifically, we focus on problems where the objective function is strongly convex with Lipschitz continuous gradient and Hessian. Our results hold for any initial point and any symmetric positive definite initial Hessian approximation matrix. The analysis unveils a detailed three-phase convergence process, characterized by distinct linear and superlinear rates, contingent on the iteration progress. Additionally, our theoretical findings demonstrate the trade-offs between linear and superlinear convergence rates for BFGS when we modify the initial Hessian approximation matrix, a phenomenon further corroborated by our numerical experiments. △ Less

Submitted 1 April, 2024; originally announced April 2024.

arXiv:2403.19172 [pdf, ps, other]

Quantum circuit design for mixture and preparation of arbitrary pure and mixed quantum states

Authors: Bo-Hung Chen, Dah-Wei Chiou, Jie-Hong Roland Jiang

Abstract: This paper addresses the challenge of preparing arbitrary mixed quantum states, an area that has not been extensively studied compared to pure states. Two circuit design methods are presented: one via a mixture of pure states and the other via purification. A novel strategy utilizing the Cholesky decomposition is proposed to improve both computational efficiency during preprocessing and circuit ef… ▽ More This paper addresses the challenge of preparing arbitrary mixed quantum states, an area that has not been extensively studied compared to pure states. Two circuit design methods are presented: one via a mixture of pure states and the other via purification. A novel strategy utilizing the Cholesky decomposition is proposed to improve both computational efficiency during preprocessing and circuit efficiency in the resulting circuits, offering significant advantages, especially when the targeted density matrix is low-ranked or sparse. By leveraging the incomplete Cholesky decomposition with threshold drop**, we also propose an appealing strategy for generating a high-fidelity approximation of the targeted density matrix, enabling substantial efficiency enhancement at the cost of mild fidelity loss. Additionally, as a closely related issue, we prove the "no-superposing theorem": given a certain number of arbitrary unknown pure states as input, it is impossible to devise an operation that produces an output state as the superposition of the input states with predefined coefficients unless all but one of the coefficients vanish. △ Less

Submitted 28 March, 2024; originally announced March 2024.

Comments: 25 pages, 8 figures

arXiv:2403.14769 [pdf, other]

Fractional Tackles: Leveraging Player Tracking Data for Within-Play Tackling Evaluation in American Football

Authors: Quang Nguyen, Ruitong Jiang, Meg Ellingwood, Ronald Yurko

Abstract: Tackling is a fundamental defensive move in American football, with the main purpose of stop** the forward motion of the ball-carrier. However, current tackling metrics are manually recorded outcomes that are inherently flawed due to their discrete and subjective nature. Using player tracking data, we present a novel framework for assessing tackling contribution in a continuous and objective man… ▽ More Tackling is a fundamental defensive move in American football, with the main purpose of stop** the forward motion of the ball-carrier. However, current tackling metrics are manually recorded outcomes that are inherently flawed due to their discrete and subjective nature. Using player tracking data, we present a novel framework for assessing tackling contribution in a continuous and objective manner. Our approach first identifies when a defender is in a ``contact window'' of the ball-carrier during a play, before assigning value to each window and the players involved. This enables us to devise a new metric called fractional tackles, which credits defenders for halting the ball-carrier's forward motion toward the end zone. We demonstrate that fractional tackles overcome the shortcomings of traditional metrics such as tackles and assists, by providing greater variation and measurable information for players lacking recorded statistics like defensive linemen. We view our contribution as a significant step forward in measuring defensive performance in American football and a clear demonstration of the capabilities of player tracking data. △ Less

Submitted 21 March, 2024; originally announced March 2024.

Comments: 16 pages, 6 figures, 2 tables

arXiv:2403.12574 [pdf, other]

EAS-SNN: End-to-End Adaptive Sampling and Representation for Event-based Detection with Recurrent Spiking Neural Networks

Authors: Ziming Wang, Ziling Wang, Huaning Li, Lang Qin, Runhao Jiang, De Ma, Hua** Tang

Abstract: Event cameras, with their high dynamic range and temporal resolution, are ideally suited for object detection, especially under scenarios with motion blur and challenging lighting conditions. However, while most existing approaches prioritize optimizing spatiotemporal representations with advanced detection backbones and early aggregation functions, the crucial issue of adaptive event sampling rem… ▽ More Event cameras, with their high dynamic range and temporal resolution, are ideally suited for object detection, especially under scenarios with motion blur and challenging lighting conditions. However, while most existing approaches prioritize optimizing spatiotemporal representations with advanced detection backbones and early aggregation functions, the crucial issue of adaptive event sampling remains largely unaddressed. Spiking Neural Networks (SNNs), which operate on an event-driven paradigm through sparse spike communication, emerge as a natural fit for addressing this challenge. In this study, we discover that the neural dynamics of spiking neurons align closely with the behavior of an ideal temporal event sampler. Motivated by this insight, we propose a novel adaptive sampling module that leverages recurrent convolutional SNNs enhanced with temporal memory, facilitating a fully end-to-end learnable framework for event-based detection. Additionally, we introduce Residual Potential Dropout (RPD) and Spike-Aware Training (SAT) to regulate potential distribution and address performance degradation encountered in spike-based sampling modules. Through rigorous testing on neuromorphic datasets for event-based detection, our approach demonstrably surpasses existing state-of-the-art spike-based methods, achieving superior performance with significantly fewer parameters and time steps. For instance, our method achieves a 4.4\% mAP improvement on the Gen1 dataset, while requiring 38\% fewer parameters and three time steps. Moreover, the applicability and effectiveness of our adaptive sampling methodology extend beyond SNNs, as demonstrated through further validation on conventional non-spiking detection models. △ Less

Submitted 19 March, 2024; originally announced March 2024.

arXiv:2403.11087 [pdf, other]

Incorporating Higher-order Structural Information for Graph Clustering

Authors: Qiankun Li, Haobing Liu, Ruobing Jiang, Tingting Wang

Abstract: Clustering holds profound significance in data mining. In recent years, graph convolutional network (GCN) has emerged as a powerful tool for deep clustering, integrating both graph structural information and node attributes. However, most existing methods ignore the higher-order structural information of the graph. Evidently, nodes within the same cluster can establish distant connections. Besides… ▽ More Clustering holds profound significance in data mining. In recent years, graph convolutional network (GCN) has emerged as a powerful tool for deep clustering, integrating both graph structural information and node attributes. However, most existing methods ignore the higher-order structural information of the graph. Evidently, nodes within the same cluster can establish distant connections. Besides, recent deep clustering methods usually apply a self-supervised module to monitor the training process of their model, focusing solely on node attributes without paying attention to graph structure. In this paper, we propose a novel graph clustering network to make full use of graph structural information. To capture the higher-order structural information, we design a graph mutual infomax module, effectively maximizing mutual information between graph-level and node-level representations, and employ a trinary self-supervised module that includes modularity as a structural constraint. Our proposed model outperforms many state-of-the-art methods on various datasets, demonstrating its superiority. △ Less

Submitted 19 March, 2024; v1 submitted 17 March, 2024; originally announced March 2024.

Journal ref: DASFAA 2024

arXiv:2403.10568 [pdf, other]

MoPE: Parameter-Efficient and Scalable Multimodal Fusion via Mixture of Prompt Experts

Authors: Ruixiang Jiang, Lingbo Liu, Changwen Chen

Abstract: Prompt-tuning has demonstrated parameter-efficiency in fusing unimodal foundation models for multimodal tasks. However, its limited adaptivity and expressiveness lead to suboptimal performance when compared with other tuning methods. In this paper, we address this issue by disentangling the vanilla prompts to adaptively capture dataset-level and instance-level features. Building upon this disentan… ▽ More Prompt-tuning has demonstrated parameter-efficiency in fusing unimodal foundation models for multimodal tasks. However, its limited adaptivity and expressiveness lead to suboptimal performance when compared with other tuning methods. In this paper, we address this issue by disentangling the vanilla prompts to adaptively capture dataset-level and instance-level features. Building upon this disentanglement, we introduce the mixture of prompt experts (MoPE) technique to enhance expressiveness. MoPE leverages multimodal pairing priors to route the most effective prompt on a per-instance basis. Compared to vanilla prompting, our MoPE-based conditional prompting exhibits greater expressiveness for multimodal fusion, scaling better with the training data and the overall number of trainable parameters. We also study a regularization term for expert routing, leading to emergent expert specialization, where different experts focus on different concepts, enabling interpretable soft prompting. Extensive experiments across three multimodal datasets demonstrate that our method achieves state-of-the-art results, matching or even surpassing the performance of fine-tuning, while requiring only 0.8% of the trainable parameters. Code will be released: https://github.com/songrise/MoPE. △ Less

Submitted 14 March, 2024; originally announced March 2024.

Comments: Extended version of arxiv:2312.03734

arXiv:2403.05886 [pdf, other]

Generalizing to Out-of-Sample Degradations via Model Reprogramming

Authors: Runhua Jiang, Yahong Han

Abstract: Existing image restoration models are typically designed for specific tasks and struggle to generalize to out-of-sample degradations not encountered during training. While zero-shot methods can address this limitation by fine-tuning model parameters on testing samples, their effectiveness relies on predefined natural priors and physical models of specific degradations. Nevertheless, determining ou… ▽ More Existing image restoration models are typically designed for specific tasks and struggle to generalize to out-of-sample degradations not encountered during training. While zero-shot methods can address this limitation by fine-tuning model parameters on testing samples, their effectiveness relies on predefined natural priors and physical models of specific degradations. Nevertheless, determining out-of-sample degradations faced in real-world scenarios is always impractical. As a result, it is more desirable to train restoration models with inherent generalization ability. To this end, this work introduces the Out-of-Sample Restoration (OSR) task, which aims to develop restoration models capable of handling out-of-sample degradations. An intuitive solution involves pre-translating out-of-sample degradations to known degradations of restoration models. However, directly translating them in the image space could lead to complex image translation issues. To address this issue, we propose a model reprogramming framework, which translates out-of-sample degradations by quantum mechanic and wave functions. Specifically, input images are decoupled as wave functions of amplitude and phase terms. The translation of out-of-sample degradation is performed by adapting the phase term. Meanwhile, the image content is maintained and enhanced in the amplitude term. By taking these two terms as inputs, restoration models are able to handle out-of-sample degradations without fine-tuning. Through extensive experiments across multiple evaluation cases, we demonstrate the effectiveness and flexibility of our proposed framework. Our codes are available at \href{https://github.com/ddghjikle/Out-of-sample-restoration}{Github}. △ Less

Submitted 9 March, 2024; originally announced March 2024.

arXiv:2403.02566 [pdf, other]

Enhancing Weakly Supervised 3D Medical Image Segmentation through Probabilistic-aware Learning

Authors: Zhaoxin Fan, Runmin Jiang, Junhao Wu, Xin Huang, Tianyang Wang, Heng Huang, Min Xu

Abstract: 3D medical image segmentation is a challenging task with crucial implications for disease diagnosis and treatment planning. Recent advances in deep learning have significantly enhanced fully supervised medical image segmentation. However, this approach heavily relies on labor-intensive and time-consuming fully annotated ground-truth labels, particularly for 3D volumes. To overcome this limitation,… ▽ More 3D medical image segmentation is a challenging task with crucial implications for disease diagnosis and treatment planning. Recent advances in deep learning have significantly enhanced fully supervised medical image segmentation. However, this approach heavily relies on labor-intensive and time-consuming fully annotated ground-truth labels, particularly for 3D volumes. To overcome this limitation, we propose a novel probabilistic-aware weakly supervised learning pipeline, specifically designed for 3D medical imaging. Our pipeline integrates three innovative components: a probability-based pseudo-label generation technique for synthesizing dense segmentation masks from sparse annotations, a Probabilistic Multi-head Self-Attention network for robust feature extraction within our Probabilistic Transformer Network, and a Probability-informed Segmentation Loss Function to enhance training with annotation confidence. Demonstrating significant advances, our approach not only rivals the performance of fully supervised methods but also surpasses existing weakly supervised methods in CT and MRI datasets, achieving up to 18.1% improvement in Dice scores for certain organs. The code is available at https://github.com/runminjiang/PW4MedSeg. △ Less

Submitted 4 March, 2024; originally announced March 2024.

arXiv:2403.01636 [pdf, other]

Sample Efficient Myopic Exploration Through Multitask Reinforcement Learning with Diverse Tasks

Authors: Zi** Xu, Zifan Xu, Runxuan Jiang, Peter Stone, Ambuj Tewari

Abstract: Multitask Reinforcement Learning (MTRL) approaches have gained increasing attention for its wide applications in many important Reinforcement Learning (RL) tasks. However, while recent advancements in MTRL theory have focused on the improved statistical efficiency by assuming a shared structure across tasks, exploration--a crucial aspect of RL--has been largely overlooked. This paper addresses thi… ▽ More Multitask Reinforcement Learning (MTRL) approaches have gained increasing attention for its wide applications in many important Reinforcement Learning (RL) tasks. However, while recent advancements in MTRL theory have focused on the improved statistical efficiency by assuming a shared structure across tasks, exploration--a crucial aspect of RL--has been largely overlooked. This paper addresses this gap by showing that when an agent is trained on a sufficiently diverse set of tasks, a generic policy-sharing algorithm with myopic exploration design like $ε$-greedy that are inefficient in general can be sample-efficient for MTRL. To the best of our knowledge, this is the first theoretical demonstration of the "exploration benefits" of MTRL. It may also shed light on the enigmatic success of the wide applications of myopic exploration in practice. To validate the role of diversity, we conduct experiments on synthetic robotic control environments, where the diverse task set aligns with the task selection by automatic curriculum learning, which is empirically shown to improve sample-efficiency. △ Less

Submitted 5 March, 2024; v1 submitted 3 March, 2024; originally announced March 2024.

arXiv:2403.00314 [pdf, other]

Lower-level Duality Based Reformulation and Majorization Minimization Algorithm for Hyperparameter Optimization

Authors: He Chen, Haochen Xu, Rujun Jiang, Anthony Man-Cho So

Abstract: Hyperparameter tuning is an important task of machine learning, which can be formulated as a bilevel program (BLP). However, most existing algorithms are not applicable for BLP with non-smooth lower-level problems. To address this, we propose a single-level reformulation of the BLP based on lower-level duality without involving any implicit value function. To solve the reformulation, we propose a… ▽ More Hyperparameter tuning is an important task of machine learning, which can be formulated as a bilevel program (BLP). However, most existing algorithms are not applicable for BLP with non-smooth lower-level problems. To address this, we propose a single-level reformulation of the BLP based on lower-level duality without involving any implicit value function. To solve the reformulation, we propose a majorization minimization algorithm that marjorizes the constraint in each iteration. Furthermore, we show that the subproblems of the proposed algorithm for several widely used hyperparameter turning models can be reformulated into conic programs that can be efficiently solved by the off-the-shelf solvers. We theoretically prove the convergence of the proposed algorithm and demonstrate its superiority through numerical experiments. △ Less

Submitted 1 March, 2024; originally announced March 2024.

Comments: Accepted by AISTATS 2024

arXiv:2402.19004 [pdf, other]

RSAM-Seg: A SAM-based Approach with Prior Knowledge Integration for Remote Sensing Image Semantic Segmentation

Authors: Jie Zhang, Xubing Yang, Rui Jiang, Wei Shao, Li Zhang

Abstract: The development of high-resolution remote sensing satellites has provided great convenience for research work related to remote sensing. Segmentation and extraction of specific targets are essential tasks when facing the vast and complex remote sensing images. Recently, the introduction of Segment Anything Model (SAM) provides a universal pre-training model for image segmentation tasks. While the… ▽ More The development of high-resolution remote sensing satellites has provided great convenience for research work related to remote sensing. Segmentation and extraction of specific targets are essential tasks when facing the vast and complex remote sensing images. Recently, the introduction of Segment Anything Model (SAM) provides a universal pre-training model for image segmentation tasks. While the direct application of SAM to remote sensing image segmentation tasks does not yield satisfactory results, we propose RSAM-Seg, which stands for Remote Sensing SAM with Semantic Segmentation, as a tailored modification of SAM for the remote sensing field and eliminates the need for manual intervention to provide prompts. Adapter-Scale, a set of supplementary scaling modules, are proposed in the multi-head attention blocks of the encoder part of SAM. Furthermore, Adapter-Feature are inserted between the Vision Transformer (ViT) blocks. These modules aim to incorporate high-frequency image information and image embedding features to generate image-informed prompts. Experiments are conducted on four distinct remote sensing scenarios, encompassing cloud detection, field monitoring, building detection and road map** tasks . The experimental results not only showcase the improvement over the original SAM and U-Net across cloud, buildings, fields and roads scenarios, but also highlight the capacity of RSAM-Seg to discern absent areas within the ground truth of certain datasets, affirming its potential as an auxiliary annotation method. In addition, the performance in few-shot scenarios is commendable, underscores its potential in dealing with limited datasets. △ Less

Submitted 29 February, 2024; originally announced February 2024.

Comments: 12 pages, 11 figures

arXiv:2402.17791 [pdf, other]

doi 10.1109/tnnls.2024.3363695

Label Informed Contrastive Pretraining for Node Importance Estimation on Knowledge Graphs

Authors: Tianyu Zhang, Chengbin Hou, Rui Jiang, Xuegong Zhang, Chenghu Zhou, Ke Tang, Hairong Lv

Abstract: Node Importance Estimation (NIE) is a task of inferring importance scores of the nodes in a graph. Due to the availability of richer data and knowledge, recent research interests of NIE have been dedicating to knowledge graphs for predicting future or missing node importance scores. Existing state-of-the-art NIE methods train the model by available labels, and they consider every interested node e… ▽ More Node Importance Estimation (NIE) is a task of inferring importance scores of the nodes in a graph. Due to the availability of richer data and knowledge, recent research interests of NIE have been dedicating to knowledge graphs for predicting future or missing node importance scores. Existing state-of-the-art NIE methods train the model by available labels, and they consider every interested node equally before training. However, the nodes with higher importance often require or receive more attention in real-world scenarios, e.g., people may care more about the movies or webpages with higher importance. To this end, we introduce Label Informed ContrAstive Pretraining (LICAP) to the NIE problem for being better aware of the nodes with high importance scores. Specifically, LICAP is a novel type of contrastive learning framework that aims to fully utilize the continuous labels to generate contrastive samples for pretraining embeddings. Considering the NIE problem, LICAP adopts a novel sampling strategy called top nodes preferred hierarchical sampling to first group all interested nodes into a top bin and a non-top bin based on node importance scores, and then divide the nodes within top bin into several finer bins also based on the scores. The contrastive samples are generated from those bins, and are then used to pretrain node embeddings of knowledge graphs via a newly proposed Predicate-aware Graph Attention Networks (PreGAT), so as to better separate the top nodes from non-top nodes, and distinguish the top nodes within top bin by kee** the relative order among finer bins. Extensive experiments demonstrate that the LICAP pretrained embeddings can further boost the performance of existing NIE methods and achieve the new state-of-the-art performance regarding both regression and ranking metrics. The source code for reproducibility is available at https://github.com/zhangtia16/LICAP △ Less

Submitted 26 February, 2024; originally announced February 2024.

Comments: Accepted by IEEE TNNLS

arXiv:2402.17732 [pdf, other]

Batched Nonparametric Contextual Bandits

Authors: Rong Jiang, Cong Ma

Abstract: We study nonparametric contextual bandits under batch constraints, where the expected reward for each action is modeled as a smooth function of covariates, and the policy updates are made at the end of each batch of observations. We establish a minimax regret lower bound for this setting and propose a novel batch learning algorithm that achieves the optimal regret (up to logarithmic factors). In e… ▽ More We study nonparametric contextual bandits under batch constraints, where the expected reward for each action is modeled as a smooth function of covariates, and the policy updates are made at the end of each batch of observations. We establish a minimax regret lower bound for this setting and propose a novel batch learning algorithm that achieves the optimal regret (up to logarithmic factors). In essence, our procedure dynamically splits the covariate space into smaller bins, carefully aligning their widths with the batch size. Our theoretical results suggest that for nonparametric contextual bandits, a nearly constant number of policy updates can attain optimal regret in the fully online setting. △ Less

Submitted 10 June, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

Comments: Add lower bound when grid is adaptively chosen; add results on adaptivity to margin parameter

arXiv:2402.16190 [pdf]

Accurate predictions of keyhole depths using machine learning-aided simulations

Authors: Jiahui Zhang, Runbo Jiang, Kangming Li, Pengyu Chen, Xiao Shang, Zhiying Liu, Jason Hattrick-Simpers, Brian J. Simonds, Qianglong Wei, Hongze Wang, Tao Sun, Anthony D. Rollett, Yu Zou

Abstract: The keyhole phenomenon is widely observed in laser materials processing, including laser welding, remelting, cladding, drilling, and additive manufacturing. Keyhole-induced defects, primarily pores, dramatically affect the performance of final products, impeding the broad use of these laser-based technologies. The formation of these pores is typically associated with the dynamic behavior of the ke… ▽ More The keyhole phenomenon is widely observed in laser materials processing, including laser welding, remelting, cladding, drilling, and additive manufacturing. Keyhole-induced defects, primarily pores, dramatically affect the performance of final products, impeding the broad use of these laser-based technologies. The formation of these pores is typically associated with the dynamic behavior of the keyhole. So far, the accurate characterization and prediction of keyhole features, particularly keyhole depth, as a function of time has been a challenging task. In situ characterization of keyhole dynamic behavior using a synchrotron X-ray is complicated and expensive. Current simulations are hindered by their poor accuracies in predicting keyhole depths due to the lack of real-time laser absorptance data. Here, we develop a machine learning-aided simulation method that allows us to accurately predict keyhole depth over a wide range of processing parameters. Based on titanium and aluminum alloys, two commonly used engineering materials as examples, we achieve an accuracy with an error margin of 10 %, surpassing those simulated using other existing models (with an error margin in a range of 50-200 %). Our machine learning-aided simulation method is affordable and readily deployable for a large variety of materials, opening new doors to eliminate or reduce defects for a wide range of laser materials processing techniques. △ Less

Submitted 25 February, 2024; originally announced February 2024.

arXiv:2402.14744 [pdf, other]

Large Language Models as Urban Residents: An LLM Agent Framework for Personal Mobility Generation

Authors: Jiawei Wang, Renhe Jiang, Chuang Yang, Zengqing Wu, Makoto Onizuka, Ryosuke Shibasaki, Noboru Koshizuka, Chuan Xiao

Abstract: This paper introduces a novel approach using Large Language Models (LLMs) integrated into an agent framework for flexible and effective personal mobility generation. LLMs overcome the limitations of previous models by effectively processing semantic data and offering versatility in modeling various tasks. Our approach addresses three research questions: aligning LLMs with real-world urban mobility… ▽ More This paper introduces a novel approach using Large Language Models (LLMs) integrated into an agent framework for flexible and effective personal mobility generation. LLMs overcome the limitations of previous models by effectively processing semantic data and offering versatility in modeling various tasks. Our approach addresses three research questions: aligning LLMs with real-world urban mobility data, develo** reliable activity generation strategies, and exploring LLM applications in urban mobility. The key technical contribution is a novel LLM agent framework that accounts for individual activity patterns and motivations, including a self-consistency approach to align LLMs with real-world activity data and a retrieval-augmented strategy for interpretable activity generation. We evaluate our LLM agent framework and compare it with state-of-the-art personal mobility generation approaches, demonstrating the effectiveness of our approach and its potential applications in urban mobility. Overall, this study marks the pioneering work of designing an LLM agent framework for activity generation based on real-world human activity data, offering a promising tool for urban mobility analysis. △ Less

Submitted 23 May, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

Comments: Source codes are available at https://github.com/Wangjw6/LLMob/

arXiv:2402.13483 [pdf, other]

A proposed PKU-Muon experiment for muon tomography and dark matter search

Authors: Xudong Yu, Zijian Wang, Cheng-en Liu, Yiqing Feng, **ning Li, Xinyue Geng, Yimeng Zhang, Leyun Gao, Ruobing Jiang, Youpeng Wu, Chen Zhou, Qite Li, Siguang Wang, Yong Ban, Yajun Mao, Qiang Li

Abstract: We propose here a set of new methods to directly detect light mass dark matter through its scattering with abundant atmospheric muons or accelerator beams. Firstly, we plan to use the free cosmic-ray muons interacting with dark matter in a volume surrounded by tracking detectors, to trace possible interaction between dark matter and muons. Secondly, we will interface our device with domestic or in… ▽ More We propose here a set of new methods to directly detect light mass dark matter through its scattering with abundant atmospheric muons or accelerator beams. Firstly, we plan to use the free cosmic-ray muons interacting with dark matter in a volume surrounded by tracking detectors, to trace possible interaction between dark matter and muons. Secondly, we will interface our device with domestic or international muon beams. Due to much larger muon intensity and focused beam, we anticipate the detector can be made further compact and the resulting sensitivity on dark matter searches will be improved. Furthermore, we will measure precisely directional distributions of cosmic-ray muons, either at mountain or sea level, and the differences may reveal possible information of dark matter distributed near the earth. Specifically, our methods can have advantages over `exotic' dark matters which are either muon-philic or slowed down due to some mechanism, and sensitivity on dark matter and muon scattering cross section can reach as low as microbarn level. △ Less

Submitted 23 March, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

Comments: Added a few sentences to highlight that our methods can have advantages over exotic dark matters which are either muon-philic or slowed down due to some mechanism

arXiv:2402.11764 [pdf, other]

ChatGPT Based Data Augmentation for Improved Parameter-Efficient Debiasing of LLMs

Authors: Pengrui Han, Rafal Kocielnik, Adhithya Saravanan, Roy Jiang, Or Sharir, Anima Anandkumar

Abstract: Large Language models (LLMs), while powerful, exhibit harmful social biases. Debiasing is often challenging due to computational costs, data constraints, and potential degradation of multi-task language capabilities. This work introduces a novel approach utilizing ChatGPT to generate synthetic training data, aiming to enhance the debiasing of LLMs. We propose two strategies: Targeted Prompting, wh… ▽ More Large Language models (LLMs), while powerful, exhibit harmful social biases. Debiasing is often challenging due to computational costs, data constraints, and potential degradation of multi-task language capabilities. This work introduces a novel approach utilizing ChatGPT to generate synthetic training data, aiming to enhance the debiasing of LLMs. We propose two strategies: Targeted Prompting, which provides effective debiasing for known biases but necessitates prior specification of bias in question; and General Prompting, which, while slightly less effective, offers debiasing across various categories. We leverage resource-efficient LLM debiasing using adapter tuning and compare the effectiveness of our synthetic data to existing debiasing datasets. Our results reveal that: (1) ChatGPT can efficiently produce high-quality training data for debiasing other LLMs; (2) data produced via our approach surpasses existing datasets in debiasing performance while also preserving internal knowledge of a pre-trained LLM; and (3) synthetic data exhibits generalizability across categories, effectively mitigating various biases, including intersectional ones. These findings underscore the potential of synthetic data in advancing the fairness of LLMs with minimal retraining cost. △ Less

Submitted 18 February, 2024; originally announced February 2024.

Comments: Accepted to EACL 2024 Workshop on Language Technology for Equality, Diversity, Inclusion (LT-EDI-2024)

MSC Class: 68T50 ACM Class: I.2.7; K.4.1

arXiv:2402.08730 [pdf, other]

Universal low-temperature fluctuation of unconventional superconductors revealed: 'Smoking gun' leaves proper bosonic superfluidity the last theory standing

Authors: Anthony Hegg, Ruoshi Jiang, Jie Wang, **ning Hou, Tao Zeng, Yucel Yildirim, Wei Ku

Abstract: Low-temperature thermal fluctuations offer an essential window in characterizing the true nature of a quantum state of matter, a quintessential example being Fermi liquid theory. Here, we examine the leading thermal fluctuation of the superfluid density across numerous families ranging from relatively conventional to highly unconventional superconductors (MgB$_2$, bismuthates, doped buckyballs, he… ▽ More Low-temperature thermal fluctuations offer an essential window in characterizing the true nature of a quantum state of matter, a quintessential example being Fermi liquid theory. Here, we examine the leading thermal fluctuation of the superfluid density across numerous families ranging from relatively conventional to highly unconventional superconductors (MgB$_2$, bismuthates, doped buckyballs, heavy fermions, UTe$_2$, doped SrTiO$_3$, Chevrel clusters, intermetallics, organic superconductors, transition metal dichalcogenides, ruthenates, iron-pnictides, cuprates, and kagome metals). Amazingly, in all of them an unprecedented universal $T^3$ depletion materializes in the low-temperature superfluid density, even in the believed-to-be-conventional MgB$_2$. This reveals a new quantum superfluid state of matter and requires a necessary change of paradigm in describing modern superconductors. We demonstrate that such unorthodox yet generic behavior can be described by a strictly Galilean consistent theory of bosonic superfluidity hosting a long-lived 'true condensate'. △ Less

Submitted 26 June, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

arXiv:2402.08097 [pdf, ps, other]

An Accelerated Gradient Method for Convex Smooth Simple Bilevel Optimization

Authors: **cheng Cao, Ruichen Jiang, Erfan Yazdandoost Hamedani, Aryan Mokhtari

Abstract: In this paper, we focus on simple bilevel optimization problems, where we minimize a convex smooth objective function over the optimal solution set of another convex smooth constrained optimization problem. We present a novel bilevel optimization method that locally approximates the solution set of the lower-level problem using a cutting plane approach and employs an accelerated gradient-based upd… ▽ More In this paper, we focus on simple bilevel optimization problems, where we minimize a convex smooth objective function over the optimal solution set of another convex smooth constrained optimization problem. We present a novel bilevel optimization method that locally approximates the solution set of the lower-level problem using a cutting plane approach and employs an accelerated gradient-based update to reduce the upper-level objective function over the approximated solution set. We measure the performance of our method in terms of suboptimality and infeasibility errors and provide non-asymptotic convergence guarantees for both error criteria. Specifically, when the feasible set is compact, we show that our method requires at most $\mathcal{O}(\max\{1/\sqrt{ε_{f}}, 1/ε_g\})$ iterations to find a solution that is $ε_f$-suboptimal and $ε_g$-infeasible. Moreover, under the additional assumption that the lower-level objective satisfies the $r$-th Hölderian error bound, we show that our method achieves an iteration complexity of $\mathcal{O}(\max\{ε_{f}^{-\frac{2r-1}{2r}},ε_{g}^{-\frac{2r-1}{2r}}\})$, which matches the optimal complexity of single-level convex constrained optimization when $r=1$. △ Less

Submitted 31 May, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

arXiv:2402.06673 [pdf, other]

Advancing Explainable AI Toward Human-Like Intelligence: Forging the Path to Artificial Brain

Authors: Yongchen Zhou, Richard Jiang

Abstract: The intersection of Artificial Intelligence (AI) and neuroscience in Explainable AI (XAI) is pivotal for enhancing transparency and interpretability in complex decision-making processes. This paper explores the evolution of XAI methodologies, ranging from feature-based to human-centric approaches, and delves into their applications in diverse domains, including healthcare and finance. The challeng… ▽ More The intersection of Artificial Intelligence (AI) and neuroscience in Explainable AI (XAI) is pivotal for enhancing transparency and interpretability in complex decision-making processes. This paper explores the evolution of XAI methodologies, ranging from feature-based to human-centric approaches, and delves into their applications in diverse domains, including healthcare and finance. The challenges in achieving explainability in generative models, ensuring responsible AI practices, and addressing ethical implications are discussed. The paper further investigates the potential convergence of XAI with cognitive sciences, the development of emotionally intelligent AI, and the quest for Human-Like Intelligence (HLI) in AI systems. As AI progresses towards Artificial General Intelligence (AGI), considerations of consciousness, ethics, and societal impact become paramount. The ongoing pursuit of deciphering the mysteries of the brain with AI and the quest for HLI represent transformative endeavors, bridging technical advancements with multidisciplinary explorations of human cognition. △ Less

Submitted 7 February, 2024; originally announced February 2024.

arXiv:2402.05415 [pdf, ps, other]

Near-Optimal Convex Simple Bilevel Optimization with a Bisection Method

Authors: Jiulin Wang, Xu Shi, Rujun Jiang

Abstract: This paper studies a class of simple bilevel optimization problems where we minimize a composite convex function at the upper-level subject to a composite convex lower-level problem. Existing methods either provide asymptotic guarantees for the upper-level objective or attain slow sublinear convergence rates. We propose a bisection algorithm to find a solution that is $ε_f$-optimal for the upper-l… ▽ More This paper studies a class of simple bilevel optimization problems where we minimize a composite convex function at the upper-level subject to a composite convex lower-level problem. Existing methods either provide asymptotic guarantees for the upper-level objective or attain slow sublinear convergence rates. We propose a bisection algorithm to find a solution that is $ε_f$-optimal for the upper-level objective and $ε_g$-optimal for the lower-level objective. In each iteration, the binary search narrows the interval by assessing inequality system feasibility. Under mild conditions, the total operation complexity of our method is ${\tilde {\mathcal{O}}}\left(\max\{\sqrt{L_{f_1}/ε_f},\sqrt{L_{g_1}/ε_g} \} \right)$. Here, a unit operation can be a function evaluation, gradient evaluation, or the invocation of the proximal map**, $L_{f_1}$ and $L_{g_1}$ are the Lipschitz constants of the upper- and lower-level objectives' smooth components, and ${\tilde {\mathcal{O}}}$ hides logarithmic terms. Our approach achieves a near-optimal rate, matching the optimal rate in unconstrained smooth or composite convex optimization when disregarding logarithmic terms. Numerical experiments demonstrate the effectiveness of our method. △ Less

Submitted 4 March, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

Comments: Accepted to AISTATS2024

arXiv:2402.02155 [pdf, ps, other]

Penalty-based Methods for Simple Bilevel Optimization under Hölderian Error Bounds

Authors: Pengyu Chen, Xu Shi, Rujun Jiang, Jiulin Wang

Abstract: This paper investigates simple bilevel optimization problems where the upper-level objective minimizes a composite convex function over the optimal solutions of a composite convex lower-level problem. Existing methods for such problems either only guarantee asymptotic convergence, have slow sublinear rates, or require strong assumptions. To address these challenges, we develop a novel penalty-base… ▽ More This paper investigates simple bilevel optimization problems where the upper-level objective minimizes a composite convex function over the optimal solutions of a composite convex lower-level problem. Existing methods for such problems either only guarantee asymptotic convergence, have slow sublinear rates, or require strong assumptions. To address these challenges, we develop a novel penalty-based approach that employs the accelerated proximal gradient (APG) method. Under an $α$-Hölderian error bound condition on the lower-level objective, our algorithm attains an $(ε,l_F^{-β}ε^β)$-optimal solution for any $β>0$ within $\mathcal{O}\left(\sqrt{\frac{L_{f_1}}{ε}}\right)+\mathcal{O}\left(\sqrt{\frac{l_F^{\max\{α,β\}}L_{g_1}}{ε^{\max\{α,β\}}}}\right)$ iterations, where $l_F$, $L_{f_1}$ and $L_{g_1}$ denote the Lipschitz constants of the upper-level objective, the gradients of the smooth parts of the upper- and lower-level objectives, respectively. If the smooth part of the upper-level objective is strongly convex, the result improves further. We also establish the complexity results when both upper- and lower-level objectives are general convex nonsmooth functions. Numerical experiments demonstrate the effectiveness of our algorithms. △ Less

Submitted 3 February, 2024; originally announced February 2024.

arXiv:2401.10402 [pdf, other]

Reconstructing the Invisible: Video Frame Restoration through Siamese Masked Conditional Variational Autoencoder

Authors: Yongchen Zhou, Richard Jiang

Abstract: In the domain of computer vision, the restoration of missing information in video frames is a critical challenge, particularly in applications such as autonomous driving and surveillance systems. This paper introduces the Siamese Masked Conditional Variational Autoencoder (SiamMCVAE), leveraging a siamese architecture with twin encoders based on vision transformers. This innovative design enhances… ▽ More In the domain of computer vision, the restoration of missing information in video frames is a critical challenge, particularly in applications such as autonomous driving and surveillance systems. This paper introduces the Siamese Masked Conditional Variational Autoencoder (SiamMCVAE), leveraging a siamese architecture with twin encoders based on vision transformers. This innovative design enhances the model's ability to comprehend lost content by capturing intrinsic similarities between paired frames. SiamMCVAE proficiently reconstructs missing elements in masked frames, effectively addressing issues arising from camera malfunctions through variational inferences. Experimental results robustly demonstrate the model's effectiveness in restoring missing information, thus enhancing the resilience of computer vision systems. The incorporation of Siamese Vision Transformer (SiamViT) encoders in SiamMCVAE exemplifies promising potential for addressing real-world challenges in computer vision, reinforcing the adaptability of autonomous systems in dynamic environments. △ Less

Submitted 18 January, 2024; originally announced January 2024.

arXiv:2401.09475 [pdf, other]

Triamese-ViT: A 3D-Aware Method for Robust Brain Age Estimation from MRIs

Authors: Zhaonian Zhang, Richard Jiang

Abstract: The integration of machine learning in medicine has significantly improved diagnostic precision, particularly in the interpretation of complex structures like the human brain. Diagnosing challenging conditions such as Alzheimer's disease has prompted the development of brain age estimation techniques. These methods often leverage three-dimensional Magnetic Resonance Imaging (MRI) scans, with recen… ▽ More The integration of machine learning in medicine has significantly improved diagnostic precision, particularly in the interpretation of complex structures like the human brain. Diagnosing challenging conditions such as Alzheimer's disease has prompted the development of brain age estimation techniques. These methods often leverage three-dimensional Magnetic Resonance Imaging (MRI) scans, with recent studies emphasizing the efficacy of 3D convolutional neural networks (CNNs) like 3D ResNet. However, the untapped potential of Vision Transformers (ViTs), known for their accuracy and interpretability, persists in this domain due to limitations in their 3D versions. This paper introduces Triamese-ViT, an innovative adaptation of the ViT model for brain age estimation. Our model uniquely combines ViTs from three different orientations to capture 3D information, significantly enhancing accuracy and interpretability. Tested on a dataset of 1351 MRI scans, Triamese-ViT achieves a Mean Absolute Error (MAE) of 3.84, a 0.9 Spearman correlation coefficient with chronological age, and a -0.29 Spearman correlation coefficient between the brain age gap (BAG) and chronological age, significantly better than previous methods for brian age estimation. A key innovation of Triamese-ViT is its capacity to generate a comprehensive 3D-like attention map, synthesized from 2D attention maps of each orientation-specific ViT. This feature is particularly beneficial for in-depth brain age analysis and disease diagnosis, offering deeper insights into brain health and the mechanisms of age-related neural changes. △ Less

Submitted 12 January, 2024; originally announced January 2024.

arXiv:2401.04570 [pdf, other]

An Automatic Cascaded Model for Hemorrhagic Stroke Segmentation and Hemorrhagic Volume Estimation

Authors: Wei** Xu, Zhuang Sha, Huihua Yang, Rongcai Jiang, Zhanying Li, Wentao Liu, Ruisheng Su

Abstract: Hemorrhagic Stroke (HS) has a rapid onset and is a serious condition that poses a great health threat. Promptly and accurately delineating the bleeding region and estimating the volume of bleeding in Computer Tomography (CT) images can assist clinicians in treatment planning, leading to improved treatment outcomes for patients. In this paper, a cascaded 3D model is constructed based on UNet to per… ▽ More Hemorrhagic Stroke (HS) has a rapid onset and is a serious condition that poses a great health threat. Promptly and accurately delineating the bleeding region and estimating the volume of bleeding in Computer Tomography (CT) images can assist clinicians in treatment planning, leading to improved treatment outcomes for patients. In this paper, a cascaded 3D model is constructed based on UNet to perform a two-stage segmentation of the hemorrhage area in CT images from rough to fine, and the hemorrhage volume is automatically calculated from the segmented area. On a dataset with 341 cases of hemorrhagic stroke CT scans, the proposed model provides high-quality segmentation outcome with higher accuracy (DSC 85.66%) and better computation efficiency (6.2 second per sample) when compared to the traditional Tada formula with respect to hemorrhage volume estimation. △ Less

Submitted 9 January, 2024; originally announced January 2024.

Comments: Accepted by SWITCH2023: Stroke Workshop on Imaging and Treatment CHallenges, a workshop at MICCAI 2023

Showing 1–50 of 419 results for author: Jiang, R