-
Large-Scale Non-convex Stochastic Constrained Distributionally Robust Optimization
Authors:
Qi Zhang,
Yi Zhou,
Ashley Prater-Bennette,
Lixin Shen,
Shaofeng Zou
Abstract:
Distributionally robust optimization (DRO) is a powerful framework for training robust models against data distribution shifts. This paper focuses on constrained DRO, which has an explicit characterization of the robustness level. Existing studies on constrained DRO mostly focus on convex loss function, and exclude the practical and challenging case with non-convex loss function, e.g., neural netw…
▽ More
Distributionally robust optimization (DRO) is a powerful framework for training robust models against data distribution shifts. This paper focuses on constrained DRO, which has an explicit characterization of the robustness level. Existing studies on constrained DRO mostly focus on convex loss function, and exclude the practical and challenging case with non-convex loss function, e.g., neural network. This paper develops a stochastic algorithm and its performance analysis for non-convex constrained DRO. The computational complexity of our stochastic algorithm at each iteration is independent of the overall dataset size, and thus is suitable for large-scale applications. We focus on the general Cressie-Read family divergence defined uncertainty set which includes $χ^2$-divergences as a special case. We prove that our algorithm finds an $ε$-stationary point with a computational complexity of $\mathcal O(ε^{-3k_*-5})$, where $k_*$ is the parameter of the Cressie-Read divergence. The numerical results indicate that our method outperforms existing methods.} Our method also applies to the smoothed conditional value at risk (CVaR) DRO.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
Factorized Tensor Networks for Multi-Task and Multi-Domain Learning
Authors:
Yash Garg,
Nebiyou Yismaw,
Rakib Hyder,
Ashley Prater-Bennette,
M. Salman Asif
Abstract:
Multi-task and multi-domain learning methods seek to learn multiple tasks/domains, jointly or one after another, using a single unified network. The key challenge and opportunity is to exploit shared information across tasks and domains to improve the efficiency of the unified network. The efficiency can be in terms of accuracy, storage cost, computation, or sample complexity. In this paper, we pr…
▽ More
Multi-task and multi-domain learning methods seek to learn multiple tasks/domains, jointly or one after another, using a single unified network. The key challenge and opportunity is to exploit shared information across tasks and domains to improve the efficiency of the unified network. The efficiency can be in terms of accuracy, storage cost, computation, or sample complexity. In this paper, we propose a factorized tensor network (FTN) that can achieve accuracy comparable to independent single-task/domain networks with a small number of additional parameters. FTN uses a frozen backbone network from a source model and incrementally adds task/domain-specific low-rank tensor factors to the shared frozen network. This approach can adapt to a large number of target domains and tasks without catastrophic forgetting. Furthermore, FTN requires a significantly smaller number of task-specific parameters compared to existing methods. We performed experiments on widely used multi-domain and multi-task datasets. We show the experiments on convolutional-based architecture with different backbones and on transformer-based architecture. We observed that FTN achieves similar accuracy as single-task/domain methods while using only a fraction of additional parameters per task.
△ Less
Submitted 9 October, 2023;
originally announced October 2023.
-
Robust Multimodal Learning with Missing Modalities via Parameter-Efficient Adaptation
Authors:
Md Kaykobad Reza,
Ashley Prater-Bennette,
M. Salman Asif
Abstract:
Multimodal learning seeks to utilize data from multiple sources to improve the overall performance of downstream tasks. It is desirable for redundancies in the data to make multimodal systems robust to missing or corrupted observations in some correlated modalities. However, we observe that the performance of several existing multimodal networks significantly deteriorates if one or multiple modali…
▽ More
Multimodal learning seeks to utilize data from multiple sources to improve the overall performance of downstream tasks. It is desirable for redundancies in the data to make multimodal systems robust to missing or corrupted observations in some correlated modalities. However, we observe that the performance of several existing multimodal networks significantly deteriorates if one or multiple modalities are absent at test time. To enable robustness to missing modalities, we propose a simple and parameter-efficient adaptation procedure for pretrained multimodal networks. In particular, we exploit modulation of intermediate features to compensate for the missing modalities. We demonstrate that such adaptation can partially bridge performance drop due to missing modalities and outperform independent, dedicated networks trained for the available modality combinations in some cases. The proposed adaptation requires extremely small number of parameters (e.g., fewer than 0.7% of the total parameters) and applicable to a wide range of modality combinations and tasks. We conduct a series of experiments to highlight the missing modality robustness of our proposed method on 5 different datasets for multimodal semantic segmentation, multimodal material segmentation, and multimodal sentiment analysis tasks. Our proposed method demonstrates versatility across various tasks and datasets, and outperforms existing methods for robust multimodal learning with missing modalities.
△ Less
Submitted 26 February, 2024; v1 submitted 5 October, 2023;
originally announced October 2023.
-
MMSFormer: Multimodal Transformer for Material and Semantic Segmentation
Authors:
Md Kaykobad Reza,
Ashley Prater-Bennette,
M. Salman Asif
Abstract:
Leveraging information across diverse modalities is known to enhance performance on multimodal segmentation tasks. However, effectively fusing information from different modalities remains challenging due to the unique characteristics of each modality. In this paper, we propose a novel fusion strategy that can effectively fuse information from different modality combinations. We also propose a new…
▽ More
Leveraging information across diverse modalities is known to enhance performance on multimodal segmentation tasks. However, effectively fusing information from different modalities remains challenging due to the unique characteristics of each modality. In this paper, we propose a novel fusion strategy that can effectively fuse information from different modality combinations. We also propose a new model named Multi-Modal Segmentation TransFormer (MMSFormer) that incorporates the proposed fusion strategy to perform multimodal material and semantic segmentation tasks. MMSFormer outperforms current state-of-the-art models on three different datasets. As we begin with only one input modality, performance improves progressively as additional modalities are incorporated, showcasing the effectiveness of the fusion block in combining useful information from diverse input modalities. Ablation studies show that different modules in the fusion block are crucial for overall model performance. Furthermore, our ablation studies also highlight the capacity of different input modalities to improve performance in the identification of different types of materials. The code and pretrained models will be made available at https://github.com/csiplab/MMSFormer.
△ Less
Submitted 7 April, 2024; v1 submitted 7 September, 2023;
originally announced September 2023.
-
Model-Free Robust Average-Reward Reinforcement Learning
Authors:
Yue Wang,
Alvaro Velasquez,
George Atia,
Ashley Prater-Bennette,
Shaofeng Zou
Abstract:
Robust Markov decision processes (MDPs) address the challenge of model uncertainty by optimizing the worst-case performance over an uncertainty set of MDPs. In this paper, we focus on the robust average-reward MDPs under the model-free setting. We first theoretically characterize the structure of solutions to the robust average-reward Bellman equation, which is essential for our later convergence…
▽ More
Robust Markov decision processes (MDPs) address the challenge of model uncertainty by optimizing the worst-case performance over an uncertainty set of MDPs. In this paper, we focus on the robust average-reward MDPs under the model-free setting. We first theoretically characterize the structure of solutions to the robust average-reward Bellman equation, which is essential for our later convergence analysis. We then design two model-free algorithms, robust relative value iteration (RVI) TD and robust RVI Q-learning, and theoretically prove their convergence to the optimal solution. We provide several widely used uncertainty sets as examples, including those defined by the contamination model, total variation, Chi-squared divergence, Kullback-Leibler (KL) divergence and Wasserstein distance.
△ Less
Submitted 17 May, 2023;
originally announced May 2023.
-
Robust Average-Reward Markov Decision Processes
Authors:
Yue Wang,
Alvaro Velasquez,
George Atia,
Ashley Prater-Bennette,
Shaofeng Zou
Abstract:
In robust Markov decision processes (MDPs), the uncertainty in the transition kernel is addressed by finding a policy that optimizes the worst-case performance over an uncertainty set of MDPs. While much of the literature has focused on discounted MDPs, robust average-reward MDPs remain largely unexplored. In this paper, we focus on robust average-reward MDPs, where the goal is to find a policy th…
▽ More
In robust Markov decision processes (MDPs), the uncertainty in the transition kernel is addressed by finding a policy that optimizes the worst-case performance over an uncertainty set of MDPs. While much of the literature has focused on discounted MDPs, robust average-reward MDPs remain largely unexplored. In this paper, we focus on robust average-reward MDPs, where the goal is to find a policy that optimizes the worst-case average reward over an uncertainty set. We first take an approach that approximates average-reward MDPs using discounted MDPs. We prove that the robust discounted value function converges to the robust average-reward as the discount factor $γ$ goes to $1$, and moreover, when $γ$ is large, any optimal policy of the robust discounted MDP is also an optimal policy of the robust average-reward. We further design a robust dynamic programming approach, and theoretically characterize its convergence to the optimum. Then, we investigate robust average-reward MDPs directly without using discounted MDPs as an intermediate step. We derive the robust Bellman equation for robust average-reward MDPs, prove that the optimal policy can be derived from its solution, and further design a robust relative value iteration algorithm that provably finds its solution, or equivalently, the optimal robust policy.
△ Less
Submitted 1 March, 2023; v1 submitted 2 January, 2023;
originally announced January 2023.
-
Incremental Task Learning with Incremental Rank Updates
Authors:
Rakib Hyder,
Ken Shao,
Boyu Hou,
Panos Markopoulos,
Ashley Prater-Bennette,
M. Salman Asif
Abstract:
Incremental Task learning (ITL) is a category of continual learning that seeks to train a single network for multiple tasks (one after another), where training data for each task is only available during the training of that task. Neural networks tend to forget older tasks when they are trained for the newer tasks; this property is often known as catastrophic forgetting. To address this issue, ITL…
▽ More
Incremental Task learning (ITL) is a category of continual learning that seeks to train a single network for multiple tasks (one after another), where training data for each task is only available during the training of that task. Neural networks tend to forget older tasks when they are trained for the newer tasks; this property is often known as catastrophic forgetting. To address this issue, ITL methods use episodic memory, parameter regularization, masking and pruning, or extensible network structures. In this paper, we propose a new incremental task learning framework based on low-rank factorization. In particular, we represent the network weights for each layer as a linear combination of several rank-1 matrices. To update the network for a new task, we learn a rank-1 (or low-rank) matrix and add that to the weights of every layer. We also introduce an additional selector vector that assigns different weights to the low-rank matrices learned for the previous tasks. We show that our approach performs better than the current state-of-the-art methods in terms of accuracy and forgetting. Our method also offers better memory efficiency compared to episodic memory- and mask-based approaches. Our code will be available at https://github.com/CSIPlab/task-increment-rank-update.git
△ Less
Submitted 19 July, 2022;
originally announced July 2022.
-
Scaling and Scalability: Provable Nonconvex Low-Rank Tensor Estimation from Incomplete Measurements
Authors:
Tian Tong,
Cong Ma,
Ashley Prater-Bennette,
Erin Tripp,
Yuejie Chi
Abstract:
Tensors, which provide a powerful and flexible model for representing multi-attribute data and multi-way interactions, play an indispensable role in modern data science across various fields in science and engineering. A fundamental task is to faithfully recover the tensor from highly incomplete measurements in a statistically and computationally efficient manner. Harnessing the low-rank structure…
▽ More
Tensors, which provide a powerful and flexible model for representing multi-attribute data and multi-way interactions, play an indispensable role in modern data science across various fields in science and engineering. A fundamental task is to faithfully recover the tensor from highly incomplete measurements in a statistically and computationally efficient manner. Harnessing the low-rank structure of tensors in the Tucker decomposition, this paper develops a scaled gradient descent (ScaledGD) algorithm to directly recover the tensor factors with tailored spectral initializations, and shows that it provably converges at a linear rate independent of the condition number of the ground truth tensor for two canonical problems -- tensor completion and tensor regression -- as soon as the sample size is above the order of $n^{3/2}$ ignoring other parameter dependencies, where $n$ is the dimension of the tensor. This leads to an extremely scalable approach to low-rank tensor estimation compared with prior art, which suffers from at least one of the following drawbacks: extreme sensitivity to ill-conditioning, high per-iteration costs in terms of memory and computation, or poor sample complexity guarantees. To the best of our knowledge, ScaledGD is the first algorithm that achieves near-optimal statistical and computational complexities simultaneously for low-rank tensor completion with the Tucker decomposition. Our algorithm highlights the power of appropriate preconditioning in accelerating nonconvex statistical estimation, where the iteration-varying preconditioners promote desirable invariance properties of the trajectory with respect to the underlying symmetry in low-rank tensor factorization.
△ Less
Submitted 21 June, 2022; v1 submitted 29 April, 2021;
originally announced April 2021.
-
L1-norm Tucker Tensor Decomposition
Authors:
Dimitris G. Chachlakis,
Ashley Prater-Bennette,
Panos P. Markopoulos
Abstract:
Tucker decomposition is a common method for the analysis of multi-way/tensor data. Standard Tucker has been shown to be sensitive against heavy corruptions, due to its L2-norm-based formulation which places squared emphasis to peripheral entries. In this work, we explore L1-Tucker, an L1-norm based reformulation of standard Tucker decomposition. After formulating the problem, we present two algori…
▽ More
Tucker decomposition is a common method for the analysis of multi-way/tensor data. Standard Tucker has been shown to be sensitive against heavy corruptions, due to its L2-norm-based formulation which places squared emphasis to peripheral entries. In this work, we explore L1-Tucker, an L1-norm based reformulation of standard Tucker decomposition. After formulating the problem, we present two algorithms for its solution, namely L1-norm Higher-Order Singular Value Decomposition (L1-HOSVD) and L1-norm Higher-Order Orthogonal Iterations (L1-HOOI). The presented algorithms are accompanied by complexity and convergence analysis. Our numerical studies on tensor reconstruction and classification corroborate that L1-Tucker, implemented by means of the proposed methods, attains similar performance to standard Tucker when the processed data are corruption-free, while it exhibits sturdy resistance against heavily corrupted entries.
△ Less
Submitted 12 April, 2019;
originally announced April 2019.
-
Sparse tensor recovery via N-mode FISTA with support augmentation
Authors:
Ashley Prater-Bennette,
Lixin Shen
Abstract:
A common approach for performing sparse tensor recovery is to use an N-mode FISTA method. However, this approach may fail in some cases by missing some values in the true support of the tensor and compensating by erroneously assigning nearby values to the support. This work proposes a four-stage method for performing sparse tensor reconstruction that addresses a case where N-mode FISTA may fail by…
▽ More
A common approach for performing sparse tensor recovery is to use an N-mode FISTA method. However, this approach may fail in some cases by missing some values in the true support of the tensor and compensating by erroneously assigning nearby values to the support. This work proposes a four-stage method for performing sparse tensor reconstruction that addresses a case where N-mode FISTA may fail by augmenting the support set. Moreover, the proposed method preserves a Tucker-like structure throughout computations for computational efficiency. Numerical results on synthetic data demonstrate that the proposed method produces results with similar or higher accuracy than N-mode FISTA, and is often faster.
△ Less
Submitted 9 July, 2018;
originally announced July 2018.
-
Randomness and isometries in echo state networks and compressed sensing
Authors:
Ashley Prater-Bennette
Abstract:
Although largely different concepts, echo state networks and compressed sensing models both rely on collections of random weights; as the reservoir dynamics for echo state networks, and the sensing coefficients in compressed sensing. Several methods for generating the random matrices and metrics to indicate desirable performance are well-studied in compressed sensing, but less so for echo state ne…
▽ More
Although largely different concepts, echo state networks and compressed sensing models both rely on collections of random weights; as the reservoir dynamics for echo state networks, and the sensing coefficients in compressed sensing. Several methods for generating the random matrices and metrics to indicate desirable performance are well-studied in compressed sensing, but less so for echo state networks. This work explores any overlap in these compressed sensing methods and metrics for application to echo state networks. Several methods for generating the random reservoir weights are considered, and a new metric, inspired by the restricted isometry property for compressed sensing, is proposed for echo state networks. The methods and metrics are investigated theoretically and experimentally, with results suggesting that the same types of random matrices work well for both echo state network and compressed sensing scenarios, and that echo state network classification accuracy is improved when the proposed restricted isometry-like constants are close to 1.
△ Less
Submitted 5 February, 2018;
originally announced February 2018.