-
Learning to Continuously Optimize Wireless Resource in a Dynamic Environment: A Bilevel Optimization Perspective
Authors:
Haoran Sun,
Wenqiang Pu,
Xiao Fu,
Tsung-Hui Chang,
Mingyi Hong
Abstract:
There has been a growing interest in develo** data-driven, and in particular deep neural network (DNN) based methods for modern communication tasks. For a few popular tasks such as power control, beamforming, and MIMO detection, these methods achieve state-of-the-art performance while requiring less computational efforts, less resources for acquiring channel state information (CSI), etc. However…
▽ More
There has been a growing interest in develo** data-driven, and in particular deep neural network (DNN) based methods for modern communication tasks. For a few popular tasks such as power control, beamforming, and MIMO detection, these methods achieve state-of-the-art performance while requiring less computational efforts, less resources for acquiring channel state information (CSI), etc. However, it is often challenging for these approaches to learn in a dynamic environment.
This work develops a new approach that enables data-driven methods to continuously learn and optimize resource allocation strategies in a dynamic environment. Specifically, we consider an ``episodically dynamic" setting where the environment statistics change in ``episodes", and in each episode the environment is stationary. We propose to build the notion of continual learning (CL) into wireless system design, so that the learning model can incrementally adapt to the new episodes, {\it without forgetting} knowledge learned from the previous episodes. Our design is based on a novel bilevel optimization formulation which ensures certain ``fairness" across different data samples. We demonstrate the effectiveness of the CL approach by integrating it with two popular DNN based models for power control and beamforming, respectively, and testing using both synthetic and ray-tracing based data sets. These numerical results show that the proposed CL approach is not only able to adapt to the new scenarios quickly and seamlessly, but importantly, it also maintains high performance over the previously encountered scenarios as well.
△ Less
Submitted 3 May, 2021;
originally announced May 2021.
-
Deep Spectrum Cartography: Completing Radio Map Tensors Using Learned Neural Models
Authors:
Sagar Shrestha,
Xiao Fu,
Mingyi Hong
Abstract:
The spectrum cartography (SC) technique constructs multi-domain (e.g., frequency, space, and time) radio frequency (RF) maps from limited measurements, which can be viewed as an ill-posed tensor completion problem. Model-based cartography techniques often rely on handcrafted priors (e.g., sparsity, smoothness and low-rank structures) for the completion task. Such priors may be inadequate to captur…
▽ More
The spectrum cartography (SC) technique constructs multi-domain (e.g., frequency, space, and time) radio frequency (RF) maps from limited measurements, which can be viewed as an ill-posed tensor completion problem. Model-based cartography techniques often rely on handcrafted priors (e.g., sparsity, smoothness and low-rank structures) for the completion task. Such priors may be inadequate to capture the essence of complex wireless environments -- especially when severe shadowing happens. To circumvent such challenges, offline-trained deep neural models of radio maps were considered for SC, as deep neural networks (DNNs) are able to "learn" intricate underlying structures from data. However, such deep learning (DL)-based SC approaches encounter serious challenges in both off-line model learning (training) and completion (generalization), possibly because the latent state space for generating the radio maps is prohibitively large. In this work, an emitter radio map disaggregation-based approach is proposed, under which only individual emitters' radio maps are modeled by DNNs. This way, the learning and generalization challenges can both be substantially alleviated. Using the learned DNNs, a fast nonnegative matrix factorization-based two-stage SC method and a performance-enhanced iterative optimization algorithm are proposed. Theoretical aspects -- such as recoverability of the radio tensor, sample complexity, and noise robustness -- under the proposed framework are characterized, and such theoretical properties have been elusive in the context of DL-based radio tensor completion. Experiments using synthetic and real-data from indoor and heavily shadowed environments are employed to showcase the effectiveness of the proposed methods.
△ Less
Submitted 21 January, 2022; v1 submitted 1 May, 2021;
originally announced May 2021.
-
Stochastic Mirror Descent for Low-Rank Tensor Decomposition Under Non-Euclidean Losses
Authors:
Wenqiang Pu,
Shahana Ibrahim,
Xiao Fu,
Mingyi Hong
Abstract:
This work considers low-rank canonical polyadic decomposition (CPD) under a class of non-Euclidean loss functions that frequently arise in statistical machine learning and signal processing. These loss functions are often used for certain types of tensor data, e.g., count and binary tensors, where the least squares loss is considered unnatural.Compared to the least squares loss, the non-Euclidean…
▽ More
This work considers low-rank canonical polyadic decomposition (CPD) under a class of non-Euclidean loss functions that frequently arise in statistical machine learning and signal processing. These loss functions are often used for certain types of tensor data, e.g., count and binary tensors, where the least squares loss is considered unnatural.Compared to the least squares loss, the non-Euclidean losses are generally more challenging to handle. Non-Euclidean CPD has attracted considerable interests and a number of prior works exist. However, pressing computational and theoretical challenges, such as scalability and convergence issues, still remain. This work offers a unified stochastic algorithmic framework for large-scale CPD decomposition under a variety of non-Euclidean loss functions. Our key contribution lies in a tensor fiber sampling strategy-based flexible stochastic mirror descent framework. Leveraging the sampling scheme and the multilinear algebraic structure of low-rank tensors, the proposed lightweight algorithm ensures global convergence to a stationary point under reasonable conditions. Numerical results show that our framework attains promising non-Euclidean CPD performance. The proposed framework also exhibits substantial computational savings compared to state-of-the-art methods.
△ Less
Submitted 29 April, 2021;
originally announced April 2021.
-
Network Space Search for Pareto-Efficient Spaces
Authors:
Min-Fong Hong,
Hao-Yun Chen,
Min-Hung Chen,
Yu-Syuan Xu,
Hsien-Kai Kuo,
Yi-Min Tsai,
Hung-Jen Chen,
Kevin Jou
Abstract:
Network spaces have been known as a critical factor in both handcrafted network designs or defining search spaces for Neural Architecture Search (NAS). However, an effective space involves tremendous prior knowledge and/or manual effort, and additional constraints are required to discover efficiency-aware architectures. In this paper, we define a new problem, Network Space Search (NSS), as searchi…
▽ More
Network spaces have been known as a critical factor in both handcrafted network designs or defining search spaces for Neural Architecture Search (NAS). However, an effective space involves tremendous prior knowledge and/or manual effort, and additional constraints are required to discover efficiency-aware architectures. In this paper, we define a new problem, Network Space Search (NSS), as searching for favorable network spaces instead of a single architecture. We propose an NSS method to directly search for efficient-aware network spaces automatically, reducing the manual effort and immense cost in discovering satisfactory ones. The resultant network spaces, named Elite Spaces, are discovered from Expanded Search Space with minimal human expertise imposed. The Pareto-efficient Elite Spaces are aligned with the Pareto front under various complexity constraints and can be further served as NAS search spaces, benefiting differentiable NAS approaches (e.g. In CIFAR-100, an averagely 2.3% lower error rate and 3.7% closer to target constraint than the baseline with around 90% fewer samples required to find satisfactory networks). Moreover, our NSS approach is capable of searching for superior spaces in future unexplored spaces, revealing great potential in searching for network spaces automatically. Website: https://minhungchen.netlify.app/publication/nss.
△ Less
Submitted 19 June, 2021; v1 submitted 22 April, 2021;
originally announced April 2021.
-
Nanosecond machine learning event classification with boosted decision trees in FPGA for high energy physics
Authors:
Tae Min Hong,
Benjamin Carlson,
Brandon Eubanks,
Stephen Racz,
Stephen Roche,
Joerg Stelzer,
Daniel Stumpp
Abstract:
We present a novel implementation of classification using the machine learning / artificial intelligence method called boosted decision trees (BDT) on field programmable gate arrays (FPGA). The firmware implementation of binary classification requiring 100 training trees with a maximum depth of 4 using four input variables gives a latency value of about 10 ns, independent of the clock speed from 1…
▽ More
We present a novel implementation of classification using the machine learning / artificial intelligence method called boosted decision trees (BDT) on field programmable gate arrays (FPGA). The firmware implementation of binary classification requiring 100 training trees with a maximum depth of 4 using four input variables gives a latency value of about 10 ns, independent of the clock speed from 100 to 320 MHz in our setup. The low timing values are achieved by restructuring the BDT layout and reconfiguring its parameters. The FPGA resource utilization is also kept low at a range from 0.01% to 0.2% in our setup. A software package called fwXmachina achieves this implementation. Our intended user is an expert of custom electronics-based trigger systems in high energy physics experiments or anyone that needs decisions at the lowest latency values for real-time event classification. Two problems from high energy physics are considered, in the separation of electrons vs. photons and in the selection of vector boson fusion-produced Higgs bosons vs. the rejection of the multijet processes.
△ Less
Submitted 17 August, 2021; v1 submitted 7 April, 2021;
originally announced April 2021.
-
Observation of the Modification of Quantum Statistics of Plasmonic Systems
Authors:
Chenglong You,
Mingyuan Hong,
Narayan Bhusal,
**nan Chen,
Mario A. Quiroz-Juárez,
Fatemeh Mostafavi,
Junpeng Guo,
Israel De Leon,
Roberto de J. León-Montiel,
Omar S. Magaña-Loaiza
Abstract:
For almost two decades, it has been believed that the quantum statistical properties of bosons are preserved in plasmonic systems. This idea has been stimulated by experimental work reporting the possibility of preserving nonclassical correlations in light-matter interactions mediated by scattering among photons and plasmons. Furthermore, it has been assumed that similar dynamics underlies the con…
▽ More
For almost two decades, it has been believed that the quantum statistical properties of bosons are preserved in plasmonic systems. This idea has been stimulated by experimental work reporting the possibility of preserving nonclassical correlations in light-matter interactions mediated by scattering among photons and plasmons. Furthermore, it has been assumed that similar dynamics underlies the conservation of the quantum fluctuations that define the nature of light sources. Here, we demonstrate that quantum statistics are not always preserved in plasmonic systems and report the first observation of their modification. Moreover, we show that multiparticle scattering effects induced by confined optical near fields can lead to the modification of the excitation mode of plasmonic systems. These observations are validated through the quantum theory of optical coherence for single- and multi-mode plasmonic systems. Our findings constitute a new paradigm in the understanding of the quantum properties of plasmonic systems and unveil new paths to perform exquisite control of quantum multiparticle systems.
△ Less
Submitted 5 April, 2021;
originally announced April 2021.
-
Enormous Berry-Curvature-Driven Anomalous Hall Effect in Topological Insulator (Bi,Sb)2Te3 on Ferrimagnetic Europium Iron Garnet beyond 400 K
Authors:
Wei-Jhih Zou,
Meng-Xin Guo,
Jyun-Fong Wong,
Zih-** Huang,
Jui-Min Chia,
Wei-Nien Chen,
Sheng-Xin Wang,
Keng-Yung Lin,
Lawrence Boyu Young,
Yen-Hsun Glen Lin,
Mohammad Yahyavi,
Chien-Ting Wu,
Horng-Tay Jeng,
Shang-Fan Lee,
Tay-Rong Chang,
Minghwei Hong,
Jueinai Kwo
Abstract:
To realize the quantum anomalous Hall effect (QAHE) at elevated temperatures, the approach of magnetic proximity effect (MPE) was adopted to break the time-reversal symmetry in the topological insulator (Bi0.3Sb0.7)2Te3 (BST) based heterostructures with a ferrimagnetic insulator europium iron garnet (EuIG) of perpendicular magnetic anisotropy. Here we demonstrate phenomenally large anomalous Hall…
▽ More
To realize the quantum anomalous Hall effect (QAHE) at elevated temperatures, the approach of magnetic proximity effect (MPE) was adopted to break the time-reversal symmetry in the topological insulator (Bi0.3Sb0.7)2Te3 (BST) based heterostructures with a ferrimagnetic insulator europium iron garnet (EuIG) of perpendicular magnetic anisotropy. Here we demonstrate phenomenally large anomalous Hall resistance (RAHE) exceeding 8 Ω (\r{ho}AHE of 3.2 μΩ*cm) at 300 K and sustaining to 400 K in 35 BST/EuIG samples, surpassing the past record of 0.28 Ω (\r{ho}AHE of 0.14 μΩ*cm) at 300 K. The remarkably large RAHE as attributed to an atomically abrupt, Fe-rich interface between BST and EuIG. Importantly, the gate dependence of the AHE loops shows no sign change with varying chemical potential. This observation is supported by our first-principles calculations via applying a gradient Zeeman field plus a contact potential on BST. Our calculations further demonstrate that the AHE in this heterostructure is attributed to the intrinsic Berry curvature. Furthermore, for gate-biased 4 nm BST on EuIG, a pronounced topological Hall effect (THE) coexisting with AHE is observed at the negative top-gate voltage up to 15 K. Interface tuning with theoretical calculations has opened up new opportunities to realize topologically distinct phenomena in tailored magnetic TI-based heterostructures.
△ Less
Submitted 30 September, 2021; v1 submitted 30 March, 2021;
originally announced March 2021.
-
On Instabilities of Conventional Multi-Coil MRI Reconstruction to Small Adverserial Perturbations
Authors:
Chi Zhang,
**ghan Jia,
Burhaneddin Yaman,
Steen Moeller,
Sijia Liu,
Mingyi Hong,
Mehmet Akçakaya
Abstract:
Although deep learning (DL) has received much attention in accelerated MRI, recent studies suggest small perturbations may lead to instabilities in DL-based reconstructions, leading to concern for their clinical application. However, these works focus on single-coil acquisitions, which is not practical. We investigate instabilities caused by small adversarial attacks for multi-coil acquisitions. O…
▽ More
Although deep learning (DL) has received much attention in accelerated MRI, recent studies suggest small perturbations may lead to instabilities in DL-based reconstructions, leading to concern for their clinical application. However, these works focus on single-coil acquisitions, which is not practical. We investigate instabilities caused by small adversarial attacks for multi-coil acquisitions. Our results suggest that, parallel imaging and multi-coil CS exhibit considerable instabilities against small adversarial perturbations.
△ Less
Submitted 25 February, 2021;
originally announced February 2021.
-
A Near-Optimal Algorithm for Stochastic Bilevel Optimization via Double-Momentum
Authors:
Prashant Khanduri,
Siliang Zeng,
Mingyi Hong,
Hoi-To Wai,
Zhaoran Wang,
Zhuoran Yang
Abstract:
This paper proposes a new algorithm -- the \underline{S}ingle-timescale Do\underline{u}ble-momentum \underline{St}ochastic \underline{A}pprox\underline{i}matio\underline{n} (SUSTAIN) -- for tackling stochastic unconstrained bilevel optimization problems. We focus on bilevel problems where the lower level subproblem is strongly-convex and the upper level objective function is smooth. Unlike prior w…
▽ More
This paper proposes a new algorithm -- the \underline{S}ingle-timescale Do\underline{u}ble-momentum \underline{St}ochastic \underline{A}pprox\underline{i}matio\underline{n} (SUSTAIN) -- for tackling stochastic unconstrained bilevel optimization problems. We focus on bilevel problems where the lower level subproblem is strongly-convex and the upper level objective function is smooth. Unlike prior works which rely on \emph{two-timescale} or \emph{double loop} techniques, we design a stochastic momentum-assisted gradient estimator for both the upper and lower level updates. The latter allows us to control the error in the stochastic gradient updates due to inaccurate solution to both subproblems. If the upper objective function is smooth but possibly non-convex, we show that {\aname}~requires $\mathcal{O}(ε^{-3/2})$ iterations (each using ${\cal O}(1)$ samples) to find an $ε$-stationary solution. The $ε$-stationary solution is defined as the point whose squared norm of the gradient of the outer function is less than or equal to $ε$. The total number of stochastic gradient samples required for the upper and lower level objective functions matches the best-known complexity for single-level stochastic gradient algorithms. We also analyze the case when the upper level objective function is strongly-convex.
△ Less
Submitted 15 June, 2021; v1 submitted 15 February, 2021;
originally announced February 2021.
-
Decentralized Riemannian Gradient Descent on the Stiefel Manifold
Authors:
Shixiang Chen,
Alfredo Garcia,
Mingyi Hong,
Shahin Shahrampour
Abstract:
We consider a distributed non-convex optimization where a network of agents aims at minimizing a global function over the Stiefel manifold. The global function is represented as a finite sum of smooth local functions, where each local function is associated with one agent and agents communicate with each other over an undirected connected graph. The problem is non-convex as local functions are pos…
▽ More
We consider a distributed non-convex optimization where a network of agents aims at minimizing a global function over the Stiefel manifold. The global function is represented as a finite sum of smooth local functions, where each local function is associated with one agent and agents communicate with each other over an undirected connected graph. The problem is non-convex as local functions are possibly non-convex (but smooth) and the Steifel manifold is a non-convex set. We present a decentralized Riemannian stochastic gradient method (DRSGD) with the convergence rate of $\mathcal{O}(1/\sqrt{K})$ to a stationary point. To have exact convergence with constant stepsize, we also propose a decentralized Riemannian gradient tracking algorithm (DRGTA) with the convergence rate of $\mathcal{O}(1/K)$ to a stationary point. We use multi-step consensus to preserve the iteration in the local (consensus) region. DRGTA is the first decentralized algorithm with exact convergence for distributed optimization on Stiefel manifold.
△ Less
Submitted 14 February, 2021;
originally announced February 2021.
-
On the Local Linear Rate of Consensus on the Stiefel Manifold
Authors:
Shixiang Chen,
Alfredo Garcia,
Mingyi Hong,
Shahin Shahrampour
Abstract:
We study the convergence properties of Riemannian gradient method for solving the consensus problem (for an undirected connected graph) over the Stiefel manifold. The Stiefel manifold is a non-convex set and the standard notion of averaging in the Euclidean space does not work for this problem. We propose Distributed Riemannian Consensus on Stiefel Manifold (DRCS) and prove that it enjoys a local…
▽ More
We study the convergence properties of Riemannian gradient method for solving the consensus problem (for an undirected connected graph) over the Stiefel manifold. The Stiefel manifold is a non-convex set and the standard notion of averaging in the Euclidean space does not work for this problem. We propose Distributed Riemannian Consensus on Stiefel Manifold (DRCS) and prove that it enjoys a local linear convergence rate to global consensus. More importantly, this local rate asymptotically scales with the second largest singular value of the communication matrix, which is on par with the well-known rate in the Euclidean space. To the best of our knowledge, this is the first work showing the equality of the two rates. The main technical challenges include (i) develo** a Riemannian restricted secant inequality for convergence analysis, and (ii) to identify the conditions (e.g., suitable step-size and initialization) under which the algorithm always stays in the local region.
△ Less
Submitted 22 January, 2021;
originally announced January 2021.
-
Towards Understanding Asynchronous Advantage Actor-critic: Convergence and Linear Speedup
Authors:
Han Shen,
Kaiqing Zhang,
Mingyi Hong,
Tianyi Chen
Abstract:
Asynchronous and parallel implementation of standard reinforcement learning (RL) algorithms is a key enabler of the tremendous success of modern RL. Among many asynchronous RL algorithms, arguably the most popular and effective one is the asynchronous advantage actor-critic (A3C) algorithm. Although A3C is becoming the workhorse of RL, its theoretical properties are still not well-understood, incl…
▽ More
Asynchronous and parallel implementation of standard reinforcement learning (RL) algorithms is a key enabler of the tremendous success of modern RL. Among many asynchronous RL algorithms, arguably the most popular and effective one is the asynchronous advantage actor-critic (A3C) algorithm. Although A3C is becoming the workhorse of RL, its theoretical properties are still not well-understood, including its non-asymptotic analysis and the performance gain of parallelism (a.k.a. linear speedup). This paper revisits the A3C algorithm and establishes its non-asymptotic convergence guarantees. Under both i.i.d. and Markovian sampling, we establish the local convergence guarantee for A3C in the general policy approximation case and the global convergence guarantee in softmax policy parameterization. Under i.i.d. sampling, A3C obtains sample complexity of $\mathcal{O}(ε^{-2.5}/N)$ per worker to achieve $ε$ accuracy, where $N$ is the number of workers. Compared to the best-known sample complexity of $\mathcal{O}(ε^{-2.5})$ for two-timescale AC, A3C achieves \emph{linear speedup}, which justifies the advantage of parallelism and asynchrony in AC algorithms theoretically for the first time. Numerical tests on synthetic environment, OpenAI Gym environments and Atari games have been provided to verify our theoretical analysis.
△ Less
Submitted 16 March, 2022; v1 submitted 31 December, 2020;
originally announced December 2020.
-
Hybrid Federated Learning: Algorithms and Implementation
Authors:
Xinwei Zhang,
Wotao Yin,
Mingyi Hong,
Tianyi Chen
Abstract:
Federated learning (FL) is a recently proposed distributed machine learning paradigm dealing with distributed and private data sets. Based on the data partition pattern, FL is often categorized into horizontal, vertical, and hybrid settings. Despite the fact that many works have been developed for the first two approaches, the hybrid FL setting (which deals with partially overlapped feature space…
▽ More
Federated learning (FL) is a recently proposed distributed machine learning paradigm dealing with distributed and private data sets. Based on the data partition pattern, FL is often categorized into horizontal, vertical, and hybrid settings. Despite the fact that many works have been developed for the first two approaches, the hybrid FL setting (which deals with partially overlapped feature space and sample space) remains less explored, though this setting is extremely important in practice. In this paper, we first set up a new model-matching-based problem formulation for hybrid FL, then propose an efficient algorithm that can collaboratively train the global and local models to deal with full and partial featured data. We conduct numerical experiments on the multi-view ModelNet40 data set to validate the performance of the proposed algorithm. To the best of our knowledge, this is the first formulation and algorithm developed for the hybrid FL.
△ Less
Submitted 17 February, 2021; v1 submitted 22 December, 2020;
originally announced December 2020.
-
Learning to Continuously Optimize Wireless Resource In Episodically Dynamic Environment
Authors:
Haoran Sun,
Wenqiang Pu,
Minghe Zhu,
Xiao Fu,
Tsung-Hui Chang,
Mingyi Hong
Abstract:
There has been a growing interest in develo** data-driven and in particular deep neural network (DNN) based methods for modern communication tasks. For a few popular tasks such as power control, beamforming, and MIMO detection, these methods achieve state-of-the-art performance while requiring less computational efforts, less channel state information (CSI), etc. However, it is often challenging…
▽ More
There has been a growing interest in develo** data-driven and in particular deep neural network (DNN) based methods for modern communication tasks. For a few popular tasks such as power control, beamforming, and MIMO detection, these methods achieve state-of-the-art performance while requiring less computational efforts, less channel state information (CSI), etc. However, it is often challenging for these approaches to learn in a dynamic environment where parameters such as CSIs keep changing.
This work develops a methodology that enables data-driven methods to continuously learn and optimize in a dynamic environment. Specifically, we consider an ``episodically dynamic" setting where the environment changes in ``episodes", and in each episode the environment is stationary. We propose to build the notion of continual learning (CL) into the modeling process of learning wireless systems, so that the learning model can incrementally adapt to the new episodes, {\it without forgetting} knowledge learned from the previous episodes. Our design is based on a novel min-max formulation which ensures certain ``fairness" across different data samples. We demonstrate the effectiveness of the CL approach by customizing it to two popular DNN based models (one for power control and one for beamforming), and testing using both synthetic and real data sets. These numerical results show that the proposed CL approach is not only able to adapt to the new scenarios quickly and seamlessly, but importantly, it maintains high performance over the previously encountered scenarios as well.
△ Less
Submitted 16 November, 2020;
originally announced November 2020.
-
Learning to Beamform in Heterogeneous Massive MIMO Networks
Authors:
Minghe Zhu,
Tsung-Hui Chang,
Mingyi Hong
Abstract:
It is well-known that the problem of finding the optimal beamformers in massive multiple-input multiple-output (MIMO) networks is challenging because of its non-convexity, and conventional optimization based algorithms suffer from high computational costs. While computationally efficient deep learning based methods have been proposed, their complexity heavily relies upon system parameters such as…
▽ More
It is well-known that the problem of finding the optimal beamformers in massive multiple-input multiple-output (MIMO) networks is challenging because of its non-convexity, and conventional optimization based algorithms suffer from high computational costs. While computationally efficient deep learning based methods have been proposed, their complexity heavily relies upon system parameters such as the number of transmit antennas, and therefore these methods typically do not generalize well when deployed in heterogeneous scenarios where the base stations (BSs) are equipped with different numbers of transmit antennas and have different inter-BS distances. This paper proposes a novel deep learning based beamforming algorithm to address the above challenges. Specifically, we consider the weighted sum rate (WSR) maximization problem in multi-input and single-output (MISO) interference channels, and propose a deep neural network architecture by unfolding a parallel gradient projection algorithm. Somewhat surprisingly, by leveraging the low-dimensional structures of the optimal beamforming solution, our constructed neural network can be made independent of the numbers of transmit antennas and BSs. Moreover, such a design can be further extended to a cooperative multicell network. Numerical results based on both synthetic and ray-tracing channel models show that the proposed neural network can achieve high WSRs with significantly reduced runtime, while exhibiting favorable generalization capability with respect to the antenna number, BS number and the inter-BS distance.
△ Less
Submitted 8 November, 2020;
originally announced November 2020.
-
Scalable multiphoton quantum metrology with neither pre- nor post-selected measurements
Authors:
Chenglong You,
Mingyuan Hong,
Peter Bierhorst,
Adriana E. Lita,
Scott Glancy,
Steve Kolthammer,
Emanuel Knill,
Sae Woo Nam,
Richard P. Mirin,
Omar S. Magana-Loaiza,
Thomas Gerrits
Abstract:
The quantum statistical fluctuations of the electromagnetic field establish a limit, known as the shot-noise limit, on the sensitivity of optical measurements performed with classical technologies. However, quantum technologies are not constrained by this shot-noise limit. In this regard, the possibility of using every photon produced by quantum sources of light to estimate small physical paramete…
▽ More
The quantum statistical fluctuations of the electromagnetic field establish a limit, known as the shot-noise limit, on the sensitivity of optical measurements performed with classical technologies. However, quantum technologies are not constrained by this shot-noise limit. In this regard, the possibility of using every photon produced by quantum sources of light to estimate small physical parameters, beyond the shot-noise limit, constitutes one of the main goals of quantum optics. Here we experimentally demonstrate a scalable protocol for quantum-enhanced optical phase estimation across a broad range of phases, with neither pre- nor post-selected measurements. This is achieved through the efficient design of a source of spontaneous parametric down-conversion in combination with photon-number-resolving detection. The robustness of two-mode squeezed vacuum states against loss allows us to outperform schemes based on N00N states, in which the loss of a single photon is enough to remove all phase information from a quantum state. In contrast to other schemes that rely on N00N states or conditional measurements, the sensitivity of our technique could be improved through the generation and detection of high-order photon pairs. This unique feature of our protocol makes it scalable. Our work is important for quantum technologies that rely on multiphoton interference such as quantum imaging, boson sampling and quantum networks.
△ Less
Submitted 24 June, 2021; v1 submitted 4 November, 2020;
originally announced November 2020.
-
Towards a Universal Gating Network for Mixtures of Experts
Authors:
Chen Wen Kang,
Chua Meng Hong,
Tomas Maul
Abstract:
The combination and aggregation of knowledge from multiple neural networks can be commonly seen in the form of mixtures of experts. However, such combinations are usually done using networks trained on the same tasks, with little mention of the combination of heterogeneous pre-trained networks, especially in the data-free regime. This paper proposes multiple data-free methods for the combination o…
▽ More
The combination and aggregation of knowledge from multiple neural networks can be commonly seen in the form of mixtures of experts. However, such combinations are usually done using networks trained on the same tasks, with little mention of the combination of heterogeneous pre-trained networks, especially in the data-free regime. This paper proposes multiple data-free methods for the combination of heterogeneous neural networks, ranging from the utilization of simple output logit statistics, to training specialized gating networks. The gating networks decide whether specific inputs belong to specific networks based on the nature of the expert activations generated. The experiments revealed that the gating networks, including the universal gating approach, constituted the most accurate approach, and therefore represent a pragmatic step towards applications with heterogeneous mixtures of experts in a data-free regime. The code for this project is hosted on github at https://github.com/cwkang1998/network-merging.
△ Less
Submitted 3 November, 2020;
originally announced November 2020.
-
Derivatives of local times for some Gaussian fields II
Authors:
Minhao Hong,
Fangjun Xu
Abstract:
Given a $(2,d)$-Gaussian field \[ Z=\big\{ Z(t,s)= X^{H_1}_t -\tilde{X}^{H_2}_s, s,t \ge 0\big\}, \] where $X^{H_1}$ and $\tilde{X}^{H_2}$ are independent $d$-dimensional centered Gaussian processes satisfying certain properties, we will give the necessary condition for existence of derivatives of the local time of $Z$.
Given a $(2,d)$-Gaussian field \[ Z=\big\{ Z(t,s)= X^{H_1}_t -\tilde{X}^{H_2}_s, s,t \ge 0\big\}, \] where $X^{H_1}$ and $\tilde{X}^{H_2}$ are independent $d$-dimensional centered Gaussian processes satisfying certain properties, we will give the necessary condition for existence of derivatives of the local time of $Z$.
△ Less
Submitted 22 October, 2020;
originally announced October 2020.
-
ASMFS: Adaptive-Similarity-based Multi-modality Feature Selection for Classification of Alzheimer's Disease
Authors:
Yuang Shi,
Chen Zu,
Mei Hong,
Lu** Zhou,
Lei Wang,
Xi Wu,
Jiliu Zhou,
Daoqiang Zhang,
Yan Wang
Abstract:
With the increasing amounts of high-dimensional heterogeneous data to be processed, multi-modality feature selection has become an important research direction in medical image analysis. Traditional methods usually depict the data structure using fixed and predefined similarity matrix for each modality separately, without considering the potential relationship structure across different modalities…
▽ More
With the increasing amounts of high-dimensional heterogeneous data to be processed, multi-modality feature selection has become an important research direction in medical image analysis. Traditional methods usually depict the data structure using fixed and predefined similarity matrix for each modality separately, without considering the potential relationship structure across different modalities. In this paper, we propose a novel multi-modality feature selection method, which performs feature selection and local similarity learning simultaniously. Specially, a similarity matrix is learned by jointly considering different imaging modalities. And at the same time, feature selection is conducted by imposing sparse l_{2, 1} norm constraint. The effectiveness of our proposed joint learning method can be well demonstrated by the experimental results on Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset, which outperforms existing the state-of-the-art multi-modality approaches.
△ Less
Submitted 16 October, 2020;
originally announced October 2020.
-
First-Order Algorithms Without Lipschitz Gradient: A Sequential Local Optimization Approach
Authors:
Junyu Zhang,
Mingyi Hong
Abstract:
First-order algorithms have been popular for solving convex and non-convex optimization problems. A key assumption for the majority of these algorithms is that the gradient of the objective function is globally Lipschitz continuous, but many contemporary problems such as tensor decomposition fail to satisfy such an assumption. This paper develops a sequential local optimization (SLO) framework of…
▽ More
First-order algorithms have been popular for solving convex and non-convex optimization problems. A key assumption for the majority of these algorithms is that the gradient of the objective function is globally Lipschitz continuous, but many contemporary problems such as tensor decomposition fail to satisfy such an assumption. This paper develops a sequential local optimization (SLO) framework of first-order algorithms that can effectively optimize problems without Lipschitz gradient. Operating on the assumption that the gradients are {\it locally} Lipschitz continuous over any compact set, the proposed framework carefully restricts the distance between two successive iterates. We show that the proposed framework can easily adapt to existing first-order methods such as gradient descent (GD), normalized gradient descent (NGD), accelerated gradient descent (AGD), as well as GD with Armijo line search. Remarkably, the latter algorithm is totally parameter-free and do not even require the knowledge of local Lipschitz constants.
We show that for the proposed algorithms to achieve gradient error bound of $\|\nabla f(x)\|^2\le ε$, it requires at most $\mathcal{O}(\frac{1}ε\times \mathcal{L}(Y))$ total access to the gradient oracle, where $\mathcal{L}(Y)$ characterizes how the local Lipschitz constants grow with the size of a given set $Y$. Moreover, we show that the variant of AGD improves the dependency on both $ε$ and the growth function $\mathcal{L}(Y)$. The proposed algorithms complement the existing Bregman Proximal Gradient (BPG) algorithm, because they do not require the global information about problem structure to construct and solve Bregman proximal map**s.
△ Less
Submitted 5 February, 2024; v1 submitted 7 October, 2020;
originally announced October 2020.
-
The 1st Tiny Object Detection Challenge:Methods and Results
Authors:
Xuehui Yu,
Zhenjun Han,
Yuqi Gong,
Nan Jiang,
Jian Zhao,
Qixiang Ye,
Jie Chen,
Yuan Feng,
Bin Zhang,
Xiaodi Wang,
Ying Xin,
**gwei Liu,
Mingyuan Mao,
Sheng Xu,
Baochang Zhang,
Shumin Han,
Cheng Gao,
Wei Tang,
Lizuo **,
Mingbo Hong,
Yuchao Yang,
Shuiwang Li,
Huan Luo,
Qijun Zhao,
Humphrey Shi
Abstract:
The 1st Tiny Object Detection (TOD) Challenge aims to encourage research in develo** novel and accurate methods for tiny object detection in images which have wide views, with a current focus on tiny person detection. The TinyPerson dataset was used for the TOD Challenge and is publicly released. It has 1610 images and 72651 box-levelannotations. Around 36 participating teams from the globe comp…
▽ More
The 1st Tiny Object Detection (TOD) Challenge aims to encourage research in develo** novel and accurate methods for tiny object detection in images which have wide views, with a current focus on tiny person detection. The TinyPerson dataset was used for the TOD Challenge and is publicly released. It has 1610 images and 72651 box-levelannotations. Around 36 participating teams from the globe competed inthe 1st TOD Challenge. In this paper, we provide a brief summary of the1st TOD Challenge including brief introductions to the top three methods.The submission leaderboard will be reopened for researchers that areinterested in the TOD challenge. The benchmark dataset and other information can be found at: https://github.com/ucas-vg/TinyBenchmark.
△ Less
Submitted 6 October, 2020; v1 submitted 16 September, 2020;
originally announced September 2020.
-
Imitation Privacy
Authors:
Xun Xian,
Xinran Wang,
Mingyi Hong,
Jie Ding,
Reza Ghanadan
Abstract:
In recent years, there have been many cloud-based machine learning services, where well-trained models are provided to users on a pay-per-query scheme through a prediction API. The emergence of these services motivates this work, where we will develop a general notion of model privacy named imitation privacy. We show the broad applicability of imitation privacy in classical query-response MLaaS sc…
▽ More
In recent years, there have been many cloud-based machine learning services, where well-trained models are provided to users on a pay-per-query scheme through a prediction API. The emergence of these services motivates this work, where we will develop a general notion of model privacy named imitation privacy. We show the broad applicability of imitation privacy in classical query-response MLaaS scenarios and new multi-organizational learning scenarios. We also exemplify the fundamental difference between imitation privacy and the usual data-level privacy.
△ Less
Submitted 30 August, 2020;
originally announced September 2020.
-
Joint Channel Assignment and Power Allocation for Multi-UAV Communication
Authors:
Lingyun Zhou,
Xihan Chen,
Mingyi Hong,
Shi **,
Qingjiang Shi
Abstract:
Unmanned aerial vehicle (UAV) swarm has emerged as a promising novel paradigm to achieve better coverage and higher capacity for future wireless network by exploiting the more favorable line-of-sight (LoS) propagation. To reap the potential gains of UAV swarm, the remote control signal sent by ground control unit (GCU) is essential, whereas the control signal quality are susceptible in practice du…
▽ More
Unmanned aerial vehicle (UAV) swarm has emerged as a promising novel paradigm to achieve better coverage and higher capacity for future wireless network by exploiting the more favorable line-of-sight (LoS) propagation. To reap the potential gains of UAV swarm, the remote control signal sent by ground control unit (GCU) is essential, whereas the control signal quality are susceptible in practice due to the effect of the adjacent channel interference (ACI) and the external interference (EI) from radiation sources distributed across the region. To tackle these challenges, this paper considers priority-aware resource coordination in a multi-UAV communication system, where multiple UAVs are controlled by a GCU to perform certain tasks with a pre-defined trajectory. Specifically, we maximize the minimum signal-to-interference-plus-noise ratio (SINR) among all the UAVs by jointly optimizing channel assignment and power allocation strategy under stringent resource availability constraints. According to the intensity of ACI, we consider the corresponding problem in two scenarios, i.e., Null-ACI and ACI systems. By virtue of the particular problem structure in Null-ACI case, we first recast the formulation into an equivalent yet more tractable form and obtain the global optimal solution via Hungarian algorithm. For general ACI systems, we develop an efficient iterative algorithm for its solution based on the smooth approximation and alternating optimization methods. Extensive simulation results demonstrate that the proposed algorithms can significantly enhance the minimum SINR among all the UAVs and adapt the allocation of communication resources to diverse mission priority.
△ Less
Submitted 18 August, 2020;
originally announced August 2020.
-
A new representation for the Landau-de Gennes energy of nematic liquid crystals
Authors:
Zhewen Feng,
Min-Chun Hong
Abstract:
In the Landau-de Gennes theory on nematic liquid crystals, the well-known Landau-de Gennes energy depends on four elastic constants; $L_1$, $L_2$, $L_3$, $L_4$. For the general case of $L_4\neq 0$, Ball-Majumdar \cite {BM} found an example that the Landau-de Gennes energy functional from physics literature \cite{MN} does not satisfy a coercivity condition, which causes a problem in mathematics to…
▽ More
In the Landau-de Gennes theory on nematic liquid crystals, the well-known Landau-de Gennes energy depends on four elastic constants; $L_1$, $L_2$, $L_3$, $L_4$. For the general case of $L_4\neq 0$, Ball-Majumdar \cite {BM} found an example that the Landau-de Gennes energy functional from physics literature \cite{MN} does not satisfy a coercivity condition, which causes a problem in mathematics to establish existence of energy minimizers. In order to solve this problem, we observe that the original third order term on $L_4$, proposed by Schiele and Trimper \cite{ST} in physics, is a linear combination of a fourth order term and a second order term. Therefore, we can propose a new Landau-de Gennes energy, which is equal to the original for uniaxial nematic $Q$-tensors. The new Landau-de Gennes energy with general elastic constants satisfies the coercivity condition for all $Q$-tensors, which establishes a new link between mathematical and physical theory. Similarly to the work of Majumdar-Zarnescu \cite{MZ}, we prove existence and convergence of minimizers of the new Landau-de Gennes energy. Moreover, we find a new way to study the limiting problem of the Landau-de Gennes system since the cross product method \cite{Chen} on the Ginzburg-Landau equation does not work for the Landau-de Gennes system.
△ Less
Submitted 6 January, 2021; v1 submitted 21 July, 2020;
originally announced July 2020.
-
A Two-Timescale Framework for Bilevel Optimization: Complexity Analysis and Application to Actor-Critic
Authors:
Mingyi Hong,
Hoi-To Wai,
Zhaoran Wang,
Zhuoran Yang
Abstract:
This paper analyzes a two-timescale stochastic algorithm framework for bilevel optimization. Bilevel optimization is a class of problems which exhibit a two-level structure, and its goal is to minimize an outer objective function with variables which are constrained to be the optimal solution to an (inner) optimization problem. We consider the case when the inner problem is unconstrained and stron…
▽ More
This paper analyzes a two-timescale stochastic algorithm framework for bilevel optimization. Bilevel optimization is a class of problems which exhibit a two-level structure, and its goal is to minimize an outer objective function with variables which are constrained to be the optimal solution to an (inner) optimization problem. We consider the case when the inner problem is unconstrained and strongly convex, while the outer problem is constrained and has a smooth objective function. We propose a two-timescale stochastic approximation (TTSA) algorithm for tackling such a bilevel problem. In the algorithm, a stochastic gradient update with a larger step size is used for the inner problem, while a projected stochastic gradient update with a smaller step size is used for the outer problem. We analyze the convergence rates for the TTSA algorithm under various settings: when the outer problem is strongly convex (resp.~weakly convex), the TTSA algorithm finds an $\mathcal{O}(K^{-2/3})$-optimal (resp.~$\mathcal{O}(K^{-2/5})$-stationary) solution, where $K$ is the total iteration number. As an application, we show that a two-timescale natural actor-critic proximal policy optimization algorithm can be viewed as a special case of our TTSA framework. Importantly, the natural actor-critic algorithm is shown to converge at a rate of $\mathcal{O}(K^{-1/4})$ in terms of the gap in expected discounted reward compared to a global optimal policy.
△ Less
Submitted 8 June, 2022; v1 submitted 10 July, 2020;
originally announced July 2020.
-
Understanding Gradient Clip** in Private SGD: A Geometric Perspective
Authors:
Xiangyi Chen,
Zhiwei Steven Wu,
Mingyi Hong
Abstract:
Deep learning models are increasingly popular in many machine learning applications where the training data may contain sensitive information. To provide formal and rigorous privacy guarantee, many learning systems now incorporate differential privacy by training their models with (differentially) private SGD. A key step in each private SGD update is gradient clip** that shrinks the gradient of…
▽ More
Deep learning models are increasingly popular in many machine learning applications where the training data may contain sensitive information. To provide formal and rigorous privacy guarantee, many learning systems now incorporate differential privacy by training their models with (differentially) private SGD. A key step in each private SGD update is gradient clip** that shrinks the gradient of an individual example whenever its L2 norm exceeds some threshold. We first demonstrate how gradient clip** can prevent SGD from converging to stationary point. We then provide a theoretical analysis that fully quantifies the clip** bias on convergence with a disparity measure between the gradient distribution and a geometrically symmetric distribution. Our empirical evaluation further suggests that the gradient distributions along the trajectory of private SGD indeed exhibit symmetric structure that favors convergence. Together, our results provide an explanation why private SGD with gradient clip** remains effective in practice despite its potential clip** bias. Finally, we develop a new perturbation-based technique that can provably correct the clip** bias even for instances with highly asymmetric gradient distributions.
△ Less
Submitted 17 March, 2021; v1 submitted 27 June, 2020;
originally announced June 2020.
-
Private Stochastic Non-Convex Optimization: Adaptive Algorithms and Tighter Generalization Bounds
Authors:
Yingxue Zhou,
Xiangyi Chen,
Mingyi Hong,
Zhiwei Steven Wu,
Arindam Banerjee
Abstract:
We study differentially private (DP) algorithms for stochastic non-convex optimization. In this problem, the goal is to minimize the population loss over a $p$-dimensional space given $n$ i.i.d. samples drawn from a distribution. We improve upon the population gradient bound of ${\sqrt{p}}/{\sqrt{n}}$ from prior work and obtain a sharper rate of $\sqrt[4]{p}/\sqrt{n}$. We obtain this rate by provi…
▽ More
We study differentially private (DP) algorithms for stochastic non-convex optimization. In this problem, the goal is to minimize the population loss over a $p$-dimensional space given $n$ i.i.d. samples drawn from a distribution. We improve upon the population gradient bound of ${\sqrt{p}}/{\sqrt{n}}$ from prior work and obtain a sharper rate of $\sqrt[4]{p}/\sqrt{n}$. We obtain this rate by providing the first analyses on a collection of private gradient-based methods, including adaptive algorithms DP RMSProp and DP Adam. Our proof technique leverages the connection between differential privacy and adaptive data analysis to bound gradient estimation error at every iterate, which circumvents the worse generalization bound from the standard uniform convergence argument. Finally, we evaluate the proposed algorithms on two popular deep learning tasks and demonstrate the empirical advantages of DP adaptive gradient methods over standard DP SGD.
△ Less
Submitted 10 August, 2020; v1 submitted 24 June, 2020;
originally announced June 2020.
-
On the Divergence of Decentralized Non-Convex Optimization
Authors:
Mingyi Hong,
Siliang Zeng,
Junyu Zhang,
Haoran Sun
Abstract:
We study a generic class of decentralized algorithms in which $N$ agents jointly optimize the non-convex objective $f(u):=1/N\sum_{i=1}^{N}f_i(u)$, while only communicating with their neighbors. This class of problems has become popular in modeling many signal processing and machine learning applications, and many efficient algorithms have been proposed. However, by constructing some counter-examp…
▽ More
We study a generic class of decentralized algorithms in which $N$ agents jointly optimize the non-convex objective $f(u):=1/N\sum_{i=1}^{N}f_i(u)$, while only communicating with their neighbors. This class of problems has become popular in modeling many signal processing and machine learning applications, and many efficient algorithms have been proposed. However, by constructing some counter-examples, we show that when certain local Lipschitz conditions (LLC) on the local function gradient $\nabla f_i$'s are not satisfied, most of the existing decentralized algorithms diverge, even if the global Lipschitz condition (GLC) is satisfied, where the sum function $f$ has Lipschitz gradient. This observation raises an important open question: How to design decentralized algorithms when the LLC, or even the GLC, is not satisfied?
To address the above question, we design a first-order algorithm called Multi-stage gradient tracking algorithm (MAGENTA), which is capable of computing stationary solutions with neither the LLC nor the GLC. In particular, we show that the proposed algorithm converges sublinearly to certain $ε$-stationary solution, where the precise rate depends on various algorithmic and problem parameters. In particular, if the local function $f_i$'s are $Q$th order polynomials, then the rate becomes $\mathcal{O}(1/ε^{Q-1})$. Such a rate is tight for the special case of $Q=2$ where each $f_i$ satisfies LLC. To our knowledge, this is the first attempt that studies decentralized non-convex optimization problems with neither the LLC nor the GLC.
△ Less
Submitted 20 June, 2020;
originally announced June 2020.
-
Non-convex Min-Max Optimization: Applications, Challenges, and Recent Theoretical Advances
Authors:
Meisam Razaviyayn,
Tianjian Huang,
Songtao Lu,
Maher Nouiehed,
Maziar Sanjabi,
Mingyi Hong
Abstract:
The min-max optimization problem, also known as the saddle point problem, is a classical optimization problem which is also studied in the context of zero-sum games. Given a class of objective functions, the goal is to find a value for the argument which leads to a small objective value even for the worst case function in the given class. Min-max optimization problems have recently become very pop…
▽ More
The min-max optimization problem, also known as the saddle point problem, is a classical optimization problem which is also studied in the context of zero-sum games. Given a class of objective functions, the goal is to find a value for the argument which leads to a small objective value even for the worst case function in the given class. Min-max optimization problems have recently become very popular in a wide range of signal and data processing applications such as fair beamforming, training generative adversarial networks (GANs), and robust machine learning, to just name a few. The overarching goal of this article is to provide a survey of recent advances for an important subclass of min-max problem, where the minimization and maximization problems can be non-convex and/or non-concave. In particular, we will first present a number of applications to showcase the importance of such min-max problems; then we discuss key theoretical challenges, and provide a selective review of some exciting recent theoretical and algorithmic advances in tackling non-convex min-max problems. Finally, we will point out open questions and future research directions.
△ Less
Submitted 18 August, 2020; v1 submitted 15 June, 2020;
originally announced June 2020.
-
Spatial Mode Correction of Single Photons using Machine Learning
Authors:
Narayan Bhusal,
Sanjaya Lohani,
Chenglong You,
Mingyuan Hong,
Joshua Fabre,
Pengcheng Zhao,
Erin M. Knutson,
Ryan T. Glasser,
Omar S. Magana-Loaiza
Abstract:
Spatial modes of light constitute valuable resources for a variety of quantum technologies ranging from quantum communication and quantum imaging to remote sensing. Nevertheless, their vulnerabilities to phase distortions, induced by random media, impose significant limitations on the realistic implementation of numerous quantum-photonic technologies. Unfortunately, this problem is exacerbated at…
▽ More
Spatial modes of light constitute valuable resources for a variety of quantum technologies ranging from quantum communication and quantum imaging to remote sensing. Nevertheless, their vulnerabilities to phase distortions, induced by random media, impose significant limitations on the realistic implementation of numerous quantum-photonic technologies. Unfortunately, this problem is exacerbated at the single-photon level. Over the last two decades, this challenging problem has been tackled through conventional schemes that utilize optical nonlinearities, quantum correlations, and adaptive optics. In this article, we exploit the self-learning and self-evolving features of artificial neural networks to correct the complex spatial profile of distorted Laguerre-Gaussian modes at the single-photon level. Furthermore, we demonstrate the possibility of boosting the performance of an optical communication protocol through the spatial mode correction of single photons using machine learning. Our results have important implications for real-time turbulence correction of structured photons and single-photon images.
△ Less
Submitted 4 September, 2020; v1 submitted 13 June, 2020;
originally announced June 2020.
-
On Chen's biharmonic conjecture for hypersurfaces in $\mathbb R^5$
Authors:
Yu Fu,
Min-Chun Hong,
Xin Zhan
Abstract:
A longstanding conjecture on biharmonic submanifolds, proposed by Chen in 1991, is that {\it any biharmonic submanifold in a Euclidean space is minimal}. In the case of a hypersurface $M^n$ in $\mathbb R^{n+1}$, Chen's conjecture was settled in the case of $n=2$ by Chen and Jiang around 1987 independently. Hasanis and Vlachos in 1995 settled Chen's conjecture for a hypersurface with $n=3$. However…
▽ More
A longstanding conjecture on biharmonic submanifolds, proposed by Chen in 1991, is that {\it any biharmonic submanifold in a Euclidean space is minimal}. In the case of a hypersurface $M^n$ in $\mathbb R^{n+1}$, Chen's conjecture was settled in the case of $n=2$ by Chen and Jiang around 1987 independently. Hasanis and Vlachos in 1995 settled Chen's conjecture for a hypersurface with $n=3$. However, the general Chen's conjecture on a hypersurface $M^n$ remains open for $n> 3$. In this paper, we settle Chen's conjecture for hypersurfaces in $\mathbb R^{5}$ for $n=4$.
△ Less
Submitted 22 July, 2020; v1 submitted 13 June, 2020;
originally announced June 2020.
-
DSU-net: Dense SegU-net for automatic head-and-neck tumor segmentation in MR images
Authors:
Pin Tang,
Chen Zu,
Mei Hong,
Rui Yan,
Xingchen Peng,
Jianghong Xiao,
Xi Wu,
Jiliu Zhou,
Lu** Zhou,
Yan Wang
Abstract:
Precise and accurate segmentation of the most common head-and-neck tumor, nasopharyngeal carcinoma (NPC), in MRI sheds light on treatment and regulatory decisions making. However, the large variations in the lesion size and shape of NPC, boundary ambiguity, as well as the limited available annotated samples conspire NPC segmentation in MRI towards a challenging task. In this paper, we propose a De…
▽ More
Precise and accurate segmentation of the most common head-and-neck tumor, nasopharyngeal carcinoma (NPC), in MRI sheds light on treatment and regulatory decisions making. However, the large variations in the lesion size and shape of NPC, boundary ambiguity, as well as the limited available annotated samples conspire NPC segmentation in MRI towards a challenging task. In this paper, we propose a Dense SegU-net (DSU-net) framework for automatic NPC segmentation in MRI. Our contribution is threefold. First, different from the traditional decoder in U-net using upconvolution for upsamling, we argue that the restoration from low resolution features to high resolution output should be capable of preserving information significant for precise boundary localization. Hence, we use unpooling to unsample and propose SegU-net. Second, to combat the potential vanishing-gradient problem, we introduce dense blocks which can facilitate feature propagation and reuse. Third, using only cross entropy (CE) as loss function may bring about troubles such as miss-prediction, therefore we propose to use a loss function comprised of both CE loss and Dice loss to train the network. Quantitative and qualitative comparisons are carried out extensively on in-house datasets, the experimental results show that our proposed architecture outperforms the existing state-of-the-art segmentation networks.
△ Less
Submitted 19 December, 2020; v1 submitted 11 June, 2020;
originally announced June 2020.
-
Generalization Bounds for Stochastic Saddle Point Problems
Authors:
Junyu Zhang,
Mingyi Hong,
Mengdi Wang,
Shuzhong Zhang
Abstract:
This paper studies the generalization bounds for the empirical saddle point (ESP) solution to stochastic saddle point (SSP) problems. For SSP with Lipschitz continuous and strongly convex-strongly concave objective functions, we establish an $\mathcal{O}(1/n)$ generalization bound by using a uniform stability argument. We also provide generalization bounds under a variety of assumptions, including…
▽ More
This paper studies the generalization bounds for the empirical saddle point (ESP) solution to stochastic saddle point (SSP) problems. For SSP with Lipschitz continuous and strongly convex-strongly concave objective functions, we establish an $\mathcal{O}(1/n)$ generalization bound by using a uniform stability argument. We also provide generalization bounds under a variety of assumptions, including the cases without strong convexity and without bounded domains. We illustrate our results in two examples: batch policy learning in Markov decision process, and mixed strategy Nash equilibrium estimation for stochastic games. In each of these examples, we show that a regularized ESP solution enjoys a near-optimal sample complexity. To the best of our knowledge, this is the first set of results on the generalization theory of ESP.
△ Less
Submitted 3 June, 2020;
originally announced June 2020.
-
FedPD: A Federated Learning Framework with Optimal Rates and Adaptivity to Non-IID Data
Authors:
Xinwei Zhang,
Mingyi Hong,
Sairaj Dhople,
Wotao Yin,
Yang Liu
Abstract:
Federated Learning (FL) has become a popular paradigm for learning from distributed data. To effectively utilize data at different devices without moving them to the cloud, algorithms such as the Federated Averaging (FedAvg) have adopted a "computation then aggregation" (CTA) model, in which multiple local updates are performed using local data, before sending the local models to the cloud for agg…
▽ More
Federated Learning (FL) has become a popular paradigm for learning from distributed data. To effectively utilize data at different devices without moving them to the cloud, algorithms such as the Federated Averaging (FedAvg) have adopted a "computation then aggregation" (CTA) model, in which multiple local updates are performed using local data, before sending the local models to the cloud for aggregation.
However, these schemes typically require strong assumptions, such as the local data are identically independent distributed (i.i.d), or the size of the local gradients are bounded. In this paper, we first explicitly characterize the behavior of the FedAvg algorithm, and show that without strong and unrealistic assumptions on the problem structure, the algorithm can behave erratically for non-convex problems (e.g., diverge to infinity). Aiming at designing FL algorithms that are provably fast and require as few assumptions as possible, we propose a new algorithm design strategy from the primal-dual optimization perspective. Our strategy yields a family of algorithms that take the same CTA model as existing algorithms, but they can deal with the non-convex objective, achieve the best possible optimization and communication complexity while being able to deal with both the full batch and mini-batch local computation models. Most importantly, the proposed algorithms are {\it communication efficient}, in the sense that the communication pattern can be adaptive to the level of heterogeneity among the local data. To the best of our knowledge, this is the first algorithmic framework for FL that achieves all the above properties.
△ Less
Submitted 7 July, 2020; v1 submitted 22 May, 2020;
originally announced May 2020.
-
A Modal-Space Method for Online Power System Steady-State Stability Monitoring
Authors:
Bin Wang,
Le Xie,
Slava Maslennikov,
Xiaochuan Luo,
Qiang Zhang,
Mingguo Hong
Abstract:
This paper proposes a novel approach to estimate the steady-state angle stability limit (SSASL) by using the nonlinear power system dynamic model in the modal space. Through two linear changes of coordinates and a simplification introduced by the steady-state condition, the nonlinear power system dynamic model is transformed into a number of single-machine-like power systems whose power-angle curv…
▽ More
This paper proposes a novel approach to estimate the steady-state angle stability limit (SSASL) by using the nonlinear power system dynamic model in the modal space. Through two linear changes of coordinates and a simplification introduced by the steady-state condition, the nonlinear power system dynamic model is transformed into a number of single-machine-like power systems whose power-angle curves can be derived and used for estimating the SSASL. The proposed approach estimates the SSASL of angles at all machines and all buses without the need for manually specifying the scenario, i.e. setting sink and source areas, and also without the need for solving multiple nonlinear power flows. Case studies on 9-bus and 39-bus power systems demonstrate that the proposed approach is always able to capture the aperiodic instability in an online environment, showing promising performance in the online monitoring of the steady-state angle stability over the traditional power flow-based analysis.
△ Less
Submitted 22 May, 2020;
originally announced May 2020.
-
Online Proximal-ADMM For Time-varying Constrained Convex Optimization
Authors:
Yijian Zhang,
Emiliano Dall'Anese,
Mingyi Hong
Abstract:
This paper considers a convex optimization problem with cost and constraints that evolve over time. The function to be minimized is strongly convex and possibly non-differentiable, and variables are coupled through linear constraints. In this setting, the paper proposes an online algorithm based on the alternating direction method of multipliers (ADMM), to track the optimal solution trajectory of…
▽ More
This paper considers a convex optimization problem with cost and constraints that evolve over time. The function to be minimized is strongly convex and possibly non-differentiable, and variables are coupled through linear constraints. In this setting, the paper proposes an online algorithm based on the alternating direction method of multipliers (ADMM), to track the optimal solution trajectory of the time-varying problem; in particular, the proposed algorithm consists of a primal proximal gradient descent step and an appropriately perturbed dual ascent step. The paper derives tracking results, asymptotic bounds, and linear convergence results. The proposed algorithm is then specialized to a multi-area power grid optimization problem, and our numerical results verify the desired properties.
△ Less
Submitted 12 January, 2021; v1 submitted 7 May, 2020;
originally announced May 2020.
-
NTIRE 2020 Challenge on Image Demoireing: Methods and Results
Authors:
Shanxin Yuan,
Radu Timofte,
Ales Leonardis,
Gregory Slabaugh,
Xiaotong Luo,
Jiangtao Zhang,
Yanyun Qu,
Ming Hong,
Yuan Xie,
Cuihua Li,
Dejia Xu,
Yihao Chu,
Qingyan Sun,
Shuai Liu,
Ziyao Zong,
Nan Nan,
Chenghua Li,
Sangmin Kim,
Hyungjoon Nam,
Jisu Kim,
Jechang Jeong,
Manri Cheon,
Sung-Jun Yoon,
Byungyeon Kang,
Junwoo Lee
, et al. (21 additional authors not shown)
Abstract:
This paper reviews the Challenge on Image Demoireing that was part of the New Trends in Image Restoration and Enhancement (NTIRE) workshop, held in conjunction with CVPR 2020. Demoireing is a difficult task of removing moire patterns from an image to reveal an underlying clean image. The challenge was divided into two tracks. Track 1 targeted the single image demoireing problem, which seeks to rem…
▽ More
This paper reviews the Challenge on Image Demoireing that was part of the New Trends in Image Restoration and Enhancement (NTIRE) workshop, held in conjunction with CVPR 2020. Demoireing is a difficult task of removing moire patterns from an image to reveal an underlying clean image. The challenge was divided into two tracks. Track 1 targeted the single image demoireing problem, which seeks to remove moire patterns from a single image. Track 2 focused on the burst demoireing problem, where a set of degraded moire images of the same scene were provided as input, with the goal of producing a single demoired image as output. The methods were ranked in terms of their fidelity, measured using the peak signal-to-noise ratio (PSNR) between the ground truth clean images and the restored images produced by the participants' methods. The tracks had 142 and 99 registered participants, respectively, with a total of 14 and 6 submissions in the final testing stage. The entries span the current state-of-the-art in image and burst image demoireing problems.
△ Less
Submitted 6 May, 2020;
originally announced May 2020.
-
Accurate Tumor Tissue Region Detection with Accelerated Deep Convolutional Neural Networks
Authors:
Gabriel Tjio,
Xulei Yang,
Jia Mei Hong,
Sum Thai Wong,
Vanessa Ding,
Andre Choo,
Yi Su
Abstract:
Manual annotation of pathology slides for cancer diagnosis is laborious and repetitive. Therefore, much effort has been devoted to develop computer vision solutions. Our approach, (FLASH), is based on a Deep Convolutional Neural Network (DCNN) architecture. It reduces computational costs and is faster than typical deep learning approaches by two orders of magnitude, making high throughput processi…
▽ More
Manual annotation of pathology slides for cancer diagnosis is laborious and repetitive. Therefore, much effort has been devoted to develop computer vision solutions. Our approach, (FLASH), is based on a Deep Convolutional Neural Network (DCNN) architecture. It reduces computational costs and is faster than typical deep learning approaches by two orders of magnitude, making high throughput processing a possibility. In computer vision approaches using deep learning methods, the input image is subdivided into patches which are separately passed through the neural network. Features extracted from these patches are used by the classifier to annotate the corresponding region. Our approach aggregates all the extracted features into a single matrix before passing them to the classifier. Previously, the features are extracted from overlap** patches. Aggregating the features eliminates the need for processing overlap** patches, which reduces the computations required. DCCN and FLASH demonstrate high sensitivity (~ 0.96), good precision (~0.78) and high F1 scores (~0.84). The average time taken to process each sample for FLASH and DCNN is 96.6 seconds and 9489.20 seconds, respectively. Our approach was approximately 100 times faster than the original DCNN approach while simultaneously preserving high accuracy and precision.
△ Less
Submitted 18 April, 2020;
originally announced April 2020.
-
Online Social Deception and Its Countermeasures for Trustworthy Cyberspace: A Survey
Authors:
Zhen Guo,
**-Hee Cho,
Ing-Ray Chen,
Srijan Sengupta,
Michin Hong,
Tanushree Mitra
Abstract:
We are living in an era when online communication over social network services (SNSs) have become an indispensable part of people's everyday lives. As a consequence, online social deception (OSD) in SNSs has emerged as a serious threat in cyberspace, particularly for users vulnerable to such cyberattacks. Cyber attackers have exploited the sophisticated features of SNSs to carry out harmful OSD ac…
▽ More
We are living in an era when online communication over social network services (SNSs) have become an indispensable part of people's everyday lives. As a consequence, online social deception (OSD) in SNSs has emerged as a serious threat in cyberspace, particularly for users vulnerable to such cyberattacks. Cyber attackers have exploited the sophisticated features of SNSs to carry out harmful OSD activities, such as financial fraud, privacy threat, or sexual/labor exploitation. Therefore, it is critical to understand OSD and develop effective countermeasures against OSD for building a trustworthy SNSs. In this paper, we conducted an extensive survey, covering (i) the multidisciplinary concepts of social deception; (ii) types of OSD attacks and their unique characteristics compared to other social network attacks and cybercrimes; (iii) comprehensive defense mechanisms embracing prevention, detection, and response (or mitigation) against OSD attacks along with their pros and cons; (iv) datasets/metrics used for validation and verification; and (v) legal and ethical concerns related to OSD research. Based on this survey, we provide insights into the effectiveness of countermeasures and the lessons from existing literature. We conclude this survey paper with an in-depth discussions on the limitations of the state-of-the-art and recommend future research directions in this area.
△ Less
Submitted 16 April, 2020;
originally announced April 2020.
-
Distributed Learning in the Non-Convex World: From Batch to Streaming Data, and Beyond
Authors:
Tsung-Hui Chang,
Mingyi Hong,
Hoi-To Wai,
Xinwei Zhang,
Songtao Lu
Abstract:
Distributed learning has become a critical enabler of the massively connected world envisioned by many. This article discusses four key elements of scalable distributed processing and real-time intelligence --- problems, data, communication and computation. Our aim is to provide a fresh and unique perspective about how these elements should work together in an effective and coherent manner. In par…
▽ More
Distributed learning has become a critical enabler of the massively connected world envisioned by many. This article discusses four key elements of scalable distributed processing and real-time intelligence --- problems, data, communication and computation. Our aim is to provide a fresh and unique perspective about how these elements should work together in an effective and coherent manner. In particular, we {provide a selective review} about the recent techniques developed for optimizing non-convex models (i.e., problem classes), processing batch and streaming data (i.e., data types), over the networks in a distributed manner (i.e., communication and computation paradigm). We describe the intuitions and connections behind a core set of popular distributed algorithms, emphasizing how to trade off between computation and communication costs. Practical issues and future research directions will also be discussed.
△ Less
Submitted 14 January, 2020;
originally announced January 2020.
-
A Communication Efficient Collaborative Learning Framework for Distributed Features
Authors:
Yang Liu,
Yan Kang,
Xinwei Zhang,
Li** Li,
Yong Cheng,
Tianjian Chen,
Mingyi Hong,
Qiang Yang
Abstract:
We introduce a collaborative learning framework allowing multiple parties having different sets of attributes about the same user to jointly build models without exposing their raw data or model parameters. In particular, we propose a Federated Stochastic Block Coordinate Descent (FedBCD) algorithm, in which each party conducts multiple local updates before each communication to effectively reduce…
▽ More
We introduce a collaborative learning framework allowing multiple parties having different sets of attributes about the same user to jointly build models without exposing their raw data or model parameters. In particular, we propose a Federated Stochastic Block Coordinate Descent (FedBCD) algorithm, in which each party conducts multiple local updates before each communication to effectively reduce the number of communication rounds among parties, a principal bottleneck for collaborative learning problems. We analyze theoretically the impact of the number of local updates and show that when the batch size, sample size, and the local iterations are selected appropriately, within $T$ iterations, the algorithm performs $\mathcal{O}(\sqrt{T})$ communication rounds and achieves some $\mathcal{O}(1/\sqrt{T})$ accuracy (measured by the average of the gradient norm squared). The approach is supported by our empirical evaluations on a variety of tasks and datasets, demonstrating advantages over stochastic gradient descent (SGD) approaches.
△ Less
Submitted 31 July, 2020; v1 submitted 23 December, 2019;
originally announced December 2019.
-
On Lower Iteration Complexity Bounds for the Saddle Point Problems
Authors:
Junyu Zhang,
Mingyi Hong,
Shuzhong Zhang
Abstract:
In this paper, we study the lower iteration complexity bounds for finding the saddle point of a strongly convex and strongly concave saddle point problem: $\min_x\max_yF(x,y)$. We restrict the classes of algorithms in our investigation to be either pure first-order methods or methods using proximal map**s. The existing lower bound result for this type of problems is obtained via the framework of…
▽ More
In this paper, we study the lower iteration complexity bounds for finding the saddle point of a strongly convex and strongly concave saddle point problem: $\min_x\max_yF(x,y)$. We restrict the classes of algorithms in our investigation to be either pure first-order methods or methods using proximal map**s. The existing lower bound result for this type of problems is obtained via the framework of strongly monotone variational inequality problems, which corresponds to the case where the gradient Lipschitz constants ($L_x, L_y$ and $L_{xy}$) and strong convexity/concavity constants ($μ_x$ and $μ_y$) are uniform with respect to variables $x$ and $y$. However, specific to the min-max saddle point problem these parameters are naturally different. Therefore, one is led to finding the best possible lower iteration complexity bounds, specific to the min-max saddle point models. In this paper we present the following results. For the class of pure first-order algorithms, our lower iteration complexity bound is $Ω\left(\sqrt{\frac{L_x}{μ_x}+\frac{L_{xy}^2}{μ_xμ_y}+\frac{L_y}{μ_y}}\cdot\ln\left(\frac{1}ε\right)\right)$, where the term $\frac{L_{xy}^2}{μ_xμ_y}$ explains how the coupling influences the iteration complexity. Under several special parameter regimes, this lower bound has been achieved by corresponding optimal algorithms. However, whether or not the bound under the general parameter regime is optimal remains open. Additionally, for the special case of bilinear coupling problems, given the availability of certain proximal operators, a lower bound of $Ω\left(\sqrt{\frac{L_{xy}^2}{μ_xμ_y}+1}\cdot\ln(\frac{1}ε)\right)$ is established in this paper, and optimal algorithms have already been developed in the literature.
△ Less
Submitted 20 June, 2021; v1 submitted 16 December, 2019;
originally announced December 2019.
-
Dense Recurrent Neural Networks for Accelerated MRI: History-Cognizant Unrolling of Optimization Algorithms
Authors:
Seyed Amir Hossein Hosseini,
Burhaneddin Yaman,
Steen Moeller,
Mingyi Hong,
Mehmet Akçakaya
Abstract:
Inverse problems for accelerated MRI typically incorporate domain-specific knowledge about the forward encoding operator in a regularized reconstruction framework. Recently physics-driven deep learning (DL) methods have been proposed to use neural networks for data-driven regularization. These methods unroll iterative optimization algorithms to solve the inverse problem objective function, by alte…
▽ More
Inverse problems for accelerated MRI typically incorporate domain-specific knowledge about the forward encoding operator in a regularized reconstruction framework. Recently physics-driven deep learning (DL) methods have been proposed to use neural networks for data-driven regularization. These methods unroll iterative optimization algorithms to solve the inverse problem objective function, by alternating between domain-specific data consistency and data-driven regularization via neural networks. The whole unrolled network is then trained end-to-end to learn the parameters of the network. Due to simplicity of data consistency updates with gradient descent steps, proximal gradient descent (PGD) is a common approach to unroll physics-driven DL reconstruction methods. However, PGD methods have slow convergence rates, necessitating a higher number of unrolled iterations, leading to memory issues in training and slower reconstruction times in testing. Inspired by efficient variants of PGD methods that use a history of the previous iterates, we propose a history-cognizant unrolling of the optimization algorithm with dense connections across iterations for improved performance. In our approach, the gradient descent steps are calculated at a trainable combination of the outputs of all the previous regularization units. We also apply this idea to unrolling variable splitting methods with quadratic relaxation. Our results in reconstruction of the fastMRI knee dataset show that the proposed history-cognizant approach reduces residual aliasing artifacts compared to its conventional unrolled counterpart without requiring extra computational power or increasing reconstruction time.
△ Less
Submitted 8 July, 2020; v1 submitted 16 December, 2019;
originally announced December 2019.
-
Spectrum Cartography via Coupled Block-Term Tensor Decomposition
Authors:
Guoyong Zhang,
Xiao Fu,
Jun Wang,
Xi-Le Zhao,
Mingyi Hong
Abstract:
Spectrum cartography aims at estimating power propagation patterns over a geographical region across multiple frequency bands (i.e., a radio map)---from limited samples taken sparsely over the region. Classic cartography methods are mostly concerned with recovering the aggregate radio frequency (RF) information while ignoring the constituents of the radio map---but fine-grained emitter-level RF in…
▽ More
Spectrum cartography aims at estimating power propagation patterns over a geographical region across multiple frequency bands (i.e., a radio map)---from limited samples taken sparsely over the region. Classic cartography methods are mostly concerned with recovering the aggregate radio frequency (RF) information while ignoring the constituents of the radio map---but fine-grained emitter-level RF information is of great interest. In addition, many existing cartography methods work explicitly or implicitly assume random spatial sampling schemes that may be difficult to implement, due to legal/privacy/security issues. The theoretical aspects (e.g., identifiability of the radio map) of many existing methods are also unclear. In this work, we propose a joint radio map recovery and disaggregation method that is based on coupled block-term tensor decomposition. Our method guarantees identifiability of the individual radio map of \textit{each emitter} (thereby that of the aggregate radio map as well), under realistic conditions. The identifiability result holds under a large variety of geographical sampling patterns, including a number of pragmatic systematic sampling strategies. We also propose effective optimization algorithms to carry out the formulated radio map disaggregation problems. Extensive simulations are employed to showcase the effectiveness of the proposed approach.
△ Less
Submitted 11 May, 2020; v1 submitted 27 November, 2019;
originally announced November 2019.
-
AIM 2019 Challenge on Image Demoireing: Methods and Results
Authors:
Shanxin Yuan,
Radu Timofte,
Gregory Slabaugh,
Ales Leonardis,
Bolun Zheng,
Xin Ye,
Xiang Tian,
Yaowu Chen,
Xi Cheng,
Zhenyong Fu,
Jian Yang,
Ming Hong,
Wenying Lin,
Wen** Yang,
Yanyun Qu,
Hong-Kyu Shin,
Joon-Yeon Kim,
Sung-Jea Ko,
Hang Dong,
Yu Guo,
Jie Wang,
Xuan Ding,
Zongyan Han,
Sourya Dipta Das,
Kuldeep Purohit
, et al. (3 additional authors not shown)
Abstract:
This paper reviews the first-ever image demoireing challenge that was part of the Advances in Image Manipulation (AIM) workshop, held in conjunction with ICCV 2019. This paper describes the challenge, and focuses on the proposed solutions and their results. Demoireing is a difficult task of removing moire patterns from an image to reveal an underlying clean image. A new dataset, called LCDMoire wa…
▽ More
This paper reviews the first-ever image demoireing challenge that was part of the Advances in Image Manipulation (AIM) workshop, held in conjunction with ICCV 2019. This paper describes the challenge, and focuses on the proposed solutions and their results. Demoireing is a difficult task of removing moire patterns from an image to reveal an underlying clean image. A new dataset, called LCDMoire was created for this challenge, and consists of 10,200 synthetically generated image pairs (moire and clean ground truth). The challenge was divided into 2 tracks. Track 1 targeted fidelity, measuring the ability of demoire methods to obtain a moire-free image compared with the ground truth, while Track 2 examined the perceptual quality of demoire methods. The tracks had 60 and 39 registered participants, respectively. A total of eight teams competed in the final testing phase. The entries span the current the state-of-the-art in the image demoireing problem.
△ Less
Submitted 8 November, 2019;
originally announced November 2019.
-
ZO-AdaMM: Zeroth-Order Adaptive Momentum Method for Black-Box Optimization
Authors:
Xiangyi Chen,
Sijia Liu,
Kaidi Xu,
Xingguo Li,
Xue Lin,
Mingyi Hong,
David Cox
Abstract:
The adaptive momentum method (AdaMM), which uses past gradients to update descent directions and learning rates simultaneously, has become one of the most popular first-order optimization methods for solving machine learning problems. However, AdaMM is not suited for solving black-box optimization problems, where explicit gradient forms are difficult or infeasible to obtain. In this paper, we prop…
▽ More
The adaptive momentum method (AdaMM), which uses past gradients to update descent directions and learning rates simultaneously, has become one of the most popular first-order optimization methods for solving machine learning problems. However, AdaMM is not suited for solving black-box optimization problems, where explicit gradient forms are difficult or infeasible to obtain. In this paper, we propose a zeroth-order AdaMM (ZO-AdaMM) algorithm, that generalizes AdaMM to the gradient-free regime. We show that the convergence rate of ZO-AdaMM for both convex and nonconvex optimization is roughly a factor of $O(\sqrt{d})$ worse than that of the first-order AdaMM algorithm, where $d$ is problem size. In particular, we provide a deep understanding on why Mahalanobis distance matters in convergence of ZO-AdaMM and other AdaMM-type methods. As a byproduct, our analysis makes the first step toward understanding adaptive learning rate methods for nonconvex constrained optimization. Furthermore, we demonstrate two applications, designing per-image and universal adversarial attacks from black-box neural networks, respectively. We perform extensive experiments on ImageNet and empirically show that ZO-AdaMM converges much faster to a solution of high accuracy compared with $6$ state-of-the-art ZO optimization methods.
△ Less
Submitted 15 October, 2019; v1 submitted 14 October, 2019;
originally announced October 2019.
-
Improving the Sample and Communication Complexity for Decentralized Non-Convex Optimization: A Joint Gradient Estimation and Tracking Approach
Authors:
Haoran Sun,
Songtao Lu,
Mingyi Hong
Abstract:
Many modern large-scale machine learning problems benefit from decentralized and stochastic optimization. Recent works have shown that utilizing both decentralized computing and local stochastic gradient estimates can outperform state-of-the-art centralized algorithms, in applications involving highly non-convex problems, such as training deep neural networks.
In this work, we propose a decentra…
▽ More
Many modern large-scale machine learning problems benefit from decentralized and stochastic optimization. Recent works have shown that utilizing both decentralized computing and local stochastic gradient estimates can outperform state-of-the-art centralized algorithms, in applications involving highly non-convex problems, such as training deep neural networks.
In this work, we propose a decentralized stochastic algorithm to deal with certain smooth non-convex problems where there are $m$ nodes in the system, and each node has a large number of samples (denoted as $n$). Differently from the majority of the existing decentralized learning algorithms for either stochastic or finite-sum problems, our focus is given to both reducing the total communication rounds among the nodes, while accessing the minimum number of local data samples. In particular, we propose an algorithm named D-GET (decentralized gradient estimation and tracking), which jointly performs decentralized gradient estimation (which estimates the local gradient using a subset of local samples) and gradient tracking (which tracks the global full gradient using local estimates). We show that, to achieve certain $ε$ stationary solution of the deterministic finite sum problem, the proposed algorithm achieves an $\mathcal{O}(mn^{1/2}ε^{-1})$ sample complexity and an $\mathcal{O}(ε^{-1})$ communication complexity. These bounds significantly improve upon the best existing bounds of $\mathcal{O}(mnε^{-1})$ and $\mathcal{O}(ε^{-1})$, respectively. Similarly, for online problems, the proposed method achieves an $\mathcal{O}(m ε^{-3/2})$ sample complexity and an $\mathcal{O}(ε^{-1})$ communication complexity, while the best existing bounds are $\mathcal{O}(mε^{-2})$ and $\mathcal{O}(ε^{-2})$, respectively.
△ Less
Submitted 13 October, 2019;
originally announced October 2019.
-
Min-Max Optimization without Gradients: Convergence and Applications to Adversarial ML
Authors:
Sijia Liu,
Songtao Lu,
Xiangyi Chen,
Yao Feng,
Kaidi Xu,
Abdullah Al-Dujaili,
Minyi Hong,
Una-May O'Reilly
Abstract:
In this paper, we study the problem of constrained robust (min-max) optimization ina black-box setting, where the desired optimizer cannot access the gradients of the objective function but may query its values. We present a principled optimization framework, integrating a zeroth-order (ZO) gradient estimator with an alternating projected stochastic gradient descent-ascent method, where the former…
▽ More
In this paper, we study the problem of constrained robust (min-max) optimization ina black-box setting, where the desired optimizer cannot access the gradients of the objective function but may query its values. We present a principled optimization framework, integrating a zeroth-order (ZO) gradient estimator with an alternating projected stochastic gradient descent-ascent method, where the former only requires a small number of function queries and the later needs just one-step descent/ascent update. We show that the proposed framework, referred to as ZO-Min-Max, has a sub-linear convergence rate under mild conditions and scales gracefully with problem size. From an application side, we explore a promising connection between black-box min-max optimization and black-box evasion and poisoning attacks in adversarial machine learning (ML). Our empirical evaluations on these use cases demonstrate the effectiveness of our approach and its scalability to dimensions that prohibit using recent black-box solvers.
△ Less
Submitted 16 June, 2020; v1 submitted 30 September, 2019;
originally announced September 2019.
-
Dynamic high efficiency 3D meta-holography in visible range with large frames number and high frame rate based on space division multiplexing design
Authors:
Hui Gao,
Yuxi Wang,
Xuhao Fan,
Fenzhang Jiao,
**song Xia,
Wei Xiong,
Minghui Hong
Abstract:
Hologram is an ideal method for naked eye three-dimensional (3D) display, and computer-generated holography (CGH) makes it possible to reconstruct virtual objects. However, the large pixel size of common CGH devices results in shortages in the applications of hologram, such as narrow field of view, twin images, multi-diffraction orders, et al. Meanwhile, metasurfaces consisted of subwavelength str…
▽ More
Hologram is an ideal method for naked eye three-dimensional (3D) display, and computer-generated holography (CGH) makes it possible to reconstruct virtual objects. However, the large pixel size of common CGH devices results in shortages in the applications of hologram, such as narrow field of view, twin images, multi-diffraction orders, et al. Meanwhile, metasurfaces consisted of subwavelength structures show great potential in controlling of light which is suitable for hologram design. There are many inspired works to achieve dynamic meta-hologram as we summary in this paper. And it can be concluded that there is still not such a research work of meta-holography with high efficiency and good display quality in the visible range that can show smooth holographic videos with a large frames number and high frame rate. In current work, we demonstrate a new design of meta-holography in visible range based on space division multiplexing metasurface which can achieve 228 different holographic frames and a very high frame rate (maximum of frame rate, 9523 FPS). Also, the metasurface consists of silicon nitride (SiNx) nanopillars with high efficiency (more than 70%). This method can not only satisfy the needs of holographic display but also be suitable for many different research fields, such as laser fabrication, photolithography, 3D forming of two-photon polymerization and optics information processing, et al.
△ Less
Submitted 12 September, 2019;
originally announced September 2019.
-
On the Global Convergence of Actor-Critic: A Case for Linear Quadratic Regulator with Ergodic Cost
Authors:
Zhuoran Yang,
Yongxin Chen,
Mingyi Hong,
Zhaoran Wang
Abstract:
Despite the empirical success of the actor-critic algorithm, its theoretical understanding lags behind. In a broader context, actor-critic can be viewed as an online alternating update algorithm for bilevel optimization, whose convergence is known to be fragile. To understand the instability of actor-critic, we focus on its application to linear quadratic regulators, a simple yet fundamental setti…
▽ More
Despite the empirical success of the actor-critic algorithm, its theoretical understanding lags behind. In a broader context, actor-critic can be viewed as an online alternating update algorithm for bilevel optimization, whose convergence is known to be fragile. To understand the instability of actor-critic, we focus on its application to linear quadratic regulators, a simple yet fundamental setting of reinforcement learning. We establish a nonasymptotic convergence analysis of actor-critic in this setting. In particular, we prove that actor-critic finds a globally optimal pair of actor (policy) and critic (action-value function) at a linear rate of convergence. Our analysis may serve as a preliminary step towards a complete theoretical understanding of bilevel optimization with nonconvex subproblems, which is NP-hard in the worst case and is often solved using heuristics.
△ Less
Submitted 14 July, 2019;
originally announced July 2019.