-
Nanosecond anomaly detection with decision trees and real-time application to exotic Higgs decays
Authors:
Stephen Roche,
Quincy Bayer,
Benjamin Carlson,
William Ouligian,
Pavel Serhiayenka,
Joerg Stelzer,
Tae Min Hong
Abstract:
We present an interpretable implementation of the autoencoding algorithm, used as an anomaly detector, built with a forest of deep decision trees on FPGA, field programmable gate arrays. Scenarios at the Large Hadron Collider at CERN are considered, for which the autoencoder is trained using known physical processes of the Standard Model. The design is then deployed in real-time trigger systems fo…
▽ More
We present an interpretable implementation of the autoencoding algorithm, used as an anomaly detector, built with a forest of deep decision trees on FPGA, field programmable gate arrays. Scenarios at the Large Hadron Collider at CERN are considered, for which the autoencoder is trained using known physical processes of the Standard Model. The design is then deployed in real-time trigger systems for anomaly detection of unknown physical processes, such as the detection of rare exotic decays of the Higgs boson. The inference is made with a latency value of 30 ns at percent-level resource usage using the Xilinx Virtex UltraScale+ VU9P FPGA. Our method offers anomaly detection at low latency values for edge AI users with resource constraints.
△ Less
Submitted 15 April, 2024; v1 submitted 7 April, 2023;
originally announced April 2023.
-
Cascaded Local Implicit Transformer for Arbitrary-Scale Super-Resolution
Authors:
Hao-Wei Chen,
Yu-Syuan Xu,
Min-Fong Hong,
Yi-Min Tsai,
Hsien-Kai Kuo,
Chun-Yi Lee
Abstract:
Implicit neural representation has recently shown a promising ability in representing images with arbitrary resolutions. In this paper, we present a Local Implicit Transformer (LIT), which integrates the attention mechanism and frequency encoding technique into a local implicit image function. We design a cross-scale local attention block to effectively aggregate local features. To further improve…
▽ More
Implicit neural representation has recently shown a promising ability in representing images with arbitrary resolutions. In this paper, we present a Local Implicit Transformer (LIT), which integrates the attention mechanism and frequency encoding technique into a local implicit image function. We design a cross-scale local attention block to effectively aggregate local features. To further improve representative power, we propose a Cascaded LIT (CLIT) that exploits multi-scale features, along with a cumulative training strategy that gradually increases the upsampling scales during training. We have conducted extensive experiments to validate the effectiveness of these components and analyze various training strategies. The qualitative and quantitative results demonstrate that LIT and CLIT achieve favorable results and outperform the prior works in arbitrary super-resolution tasks.
△ Less
Submitted 29 March, 2023;
originally announced March 2023.
-
GLASU: A Communication-Efficient Algorithm for Federated Learning with Vertically Distributed Graph Data
Authors:
Xinwei Zhang,
Mingyi Hong,
Jie Chen
Abstract:
Vertical federated learning (VFL) is a distributed learning paradigm, where computing clients collectively train a model based on the partial features of the same set of samples they possess. Current research on VFL focuses on the case when samples are independent, but it rarely addresses an emerging scenario when samples are interrelated through a graph. For graph-structured data, graph neural ne…
▽ More
Vertical federated learning (VFL) is a distributed learning paradigm, where computing clients collectively train a model based on the partial features of the same set of samples they possess. Current research on VFL focuses on the case when samples are independent, but it rarely addresses an emerging scenario when samples are interrelated through a graph. For graph-structured data, graph neural networks (GNNs) are competitive machine learning models, but a naive implementation in the VFL setting causes a significant communication overhead. Moreover, the analysis of the training is faced with a challenge caused by the biased stochastic gradients. In this paper, we propose a model splitting method that splits a backbone GNN across the clients and the server and a communication-efficient algorithm, GLASU, to train such a model. GLASU adopts lazy aggregation and stale updates to skip aggregation when evaluating the model and skip feature exchanges during training, greatly reducing communication. We offer a theoretical analysis and conduct extensive numerical experiments on real-world datasets, showing that the proposed algorithm effectively trains a GNN model, whose performance matches that of the backbone GNN when trained in a centralized manner.
△ Less
Submitted 16 March, 2023;
originally announced March 2023.
-
What Is Missing in IRM Training and Evaluation? Challenges and Solutions
Authors:
Yihua Zhang,
Pranay Sharma,
Parikshit Ram,
Mingyi Hong,
Kush Varshney,
Sijia Liu
Abstract:
Invariant risk minimization (IRM) has received increasing attention as a way to acquire environment-agnostic data representations and predictions, and as a principled solution for preventing spurious correlations from being learned and for improving models' out-of-distribution generalization. Yet, recent works have found that the optimality of the originally-proposed IRM optimization (IRM) may be…
▽ More
Invariant risk minimization (IRM) has received increasing attention as a way to acquire environment-agnostic data representations and predictions, and as a principled solution for preventing spurious correlations from being learned and for improving models' out-of-distribution generalization. Yet, recent works have found that the optimality of the originally-proposed IRM optimization (IRM) may be compromised in practice or could be impossible to achieve in some scenarios. Therefore, a series of advanced IRM algorithms have been developed that show practical improvement over IRM. In this work, we revisit these recent IRM advancements, and identify and resolve three practical limitations in IRM training and evaluation. First, we find that the effect of batch size during training has been chronically overlooked in previous studies, leaving room for further improvement. We propose small-batch training and highlight the improvements over a set of large-batch optimization techniques. Second, we find that improper selection of evaluation environments could give a false sense of invariance for IRM. To alleviate this effect, we leverage diversified test-time environments to precisely characterize the invariance of IRM when applied in practice. Third, we revisit (Ahuja et al. (2020))'s proposal to convert IRM into an ensemble game and identify a limitation when a single invariant predictor is desired instead of an ensemble of individual predictors. We propose a new IRM variant to address this limitation based on a novel viewpoint of ensemble IRM games as consensus-constrained bi-level optimization. Lastly, we conduct extensive experiments (covering 7 existing IRM variants and 7 datasets) to justify the practical significance of revisiting IRM training and evaluation in a principled manner.
△ Less
Submitted 4 March, 2023;
originally announced March 2023.
-
The Game of Life on the Robinson Triangle Penrose Tiling: Still Life
Authors:
Seung Hyeon Mandy Hong,
May Mei
Abstract:
We investigate Conway's Game of Life played on the Robinson triangle Penrose tiling. In this paper, we classify all four-cell still lifes.
We investigate Conway's Game of Life played on the Robinson triangle Penrose tiling. In this paper, we classify all four-cell still lifes.
△ Less
Submitted 10 April, 2023; v1 submitted 20 February, 2023;
originally announced February 2023.
-
When Demonstrations Meet Generative World Models: A Maximum Likelihood Framework for Offline Inverse Reinforcement Learning
Authors:
Siliang Zeng,
Chenliang Li,
Alfredo Garcia,
Mingyi Hong
Abstract:
Offline inverse reinforcement learning (Offline IRL) aims to recover the structure of rewards and environment dynamics that underlie observed actions in a fixed, finite set of demonstrations from an expert agent. Accurate models of expertise in executing a task has applications in safety-sensitive applications such as clinical decision making and autonomous driving. However, the structure of an ex…
▽ More
Offline inverse reinforcement learning (Offline IRL) aims to recover the structure of rewards and environment dynamics that underlie observed actions in a fixed, finite set of demonstrations from an expert agent. Accurate models of expertise in executing a task has applications in safety-sensitive applications such as clinical decision making and autonomous driving. However, the structure of an expert's preferences implicit in observed actions is closely linked to the expert's model of the environment dynamics (i.e. the ``world'' model). Thus, inaccurate models of the world obtained from finite data with limited coverage could compound inaccuracy in estimated rewards. To address this issue, we propose a bi-level optimization formulation of the estimation task wherein the upper level is likelihood maximization based upon a conservative model of the expert's policy (lower level). The policy model is conservative in that it maximizes reward subject to a penalty that is increasing in the uncertainty of the estimated model of the world. We propose a new algorithmic framework to solve the bi-level optimization problem formulation and provide statistical and computational guarantees of performance for the associated optimal reward estimator. Finally, we demonstrate that the proposed algorithm outperforms the state-of-the-art offline IRL and imitation learning benchmarks by a large margin, over the continuous control tasks in MuJoCo and different datasets in the D4RL benchmark.
△ Less
Submitted 28 February, 2024; v1 submitted 14 February, 2023;
originally announced February 2023.
-
Electrically Sign-Reversible Topological Hall Effect in a Top-Gated Topological Insulator (Bi,Sb)2Te3 on a Ferrimagnetic Insulator Europium Iron Garnet
Authors:
Jyun-Fong Wong,
Ko-Hsuan Mandy Chen,
Jui-Min Chia,
Zih-** Huang,
Sheng-Xin Wang,
Pei-Tze Chen,
Lawrence Boyu Young,
Yen-Hsun Glen Lin,
Shang-Fan Lee,
Chung-Yu Mou,
Minghwei Hong,
Jueinai Kwo
Abstract:
Topological Hall effect (THE), an electrical transport signature of systems with chiral spin textures like skyrmions, has been observed recently in topological insulator (TI)-based magnetic heterostructures. However, the intriguing interplay between the topological surface state and THE is yet to be fully understood. In this work, we report a large THE of ~10 ohm (~4 micro-ohm*cm) at 2 K with an e…
▽ More
Topological Hall effect (THE), an electrical transport signature of systems with chiral spin textures like skyrmions, has been observed recently in topological insulator (TI)-based magnetic heterostructures. However, the intriguing interplay between the topological surface state and THE is yet to be fully understood. In this work, we report a large THE of ~10 ohm (~4 micro-ohm*cm) at 2 K with an electrically reversible sign in a top-gated 4 nm TI (Bi0.3Sb0.7)2Te3 (BST) grown on a ferrimagnetic insulator (FI) europium iron garnet (EuIG). Temperature, external magnetic field angle, and top gate bias dependences of magnetotransport properties were investigated and consistent with a skyrmion-driven THE. Most importantly, a sign change in THE was discovered as the Fermi level was tuned from the upper to the lower parts of the gapped Dirac cone and vice versa. This discovery is anticipated to impact technological applications in ultralow power skyrmion-based spintronics.
△ Less
Submitted 13 April, 2023; v1 submitted 31 December, 2022;
originally announced January 2023.
-
Alleviating neighbor bias: augmenting graph self-supervise learning with structural equivalent positive samples
Authors:
Jiawei Zhu,
Mei Hong,
Ronghua Du,
Haifeng Li
Abstract:
In recent years, using a self-supervised learning framework to learn the general characteristics of graphs has been considered a promising paradigm for graph representation learning. The core of self-supervised learning strategies for graph neural networks lies in constructing suitable positive sample selection strategies. However, existing GNNs typically aggregate information from neighboring nod…
▽ More
In recent years, using a self-supervised learning framework to learn the general characteristics of graphs has been considered a promising paradigm for graph representation learning. The core of self-supervised learning strategies for graph neural networks lies in constructing suitable positive sample selection strategies. However, existing GNNs typically aggregate information from neighboring nodes to update node representations, leading to an over-reliance on neighboring positive samples, i.e., homophilous samples; while ignoring long-range positive samples, i.e., positive samples that are far apart on the graph but structurally equivalent samples, a problem we call "neighbor bias." This neighbor bias can reduce the generalization performance of GNNs. In this paper, we argue that the generalization properties of GNNs should be determined by combining homogeneous samples and structurally equivalent samples, which we call the "GC combination hypothesis." Therefore, we propose a topological signal-driven self-supervised method. It uses a topological information-guided structural equivalence sampling strategy. First, we extract multiscale topological features using persistent homology. Then we compute the structural equivalence of node pairs based on their topological features. In particular, we design a topological loss function to pull in non-neighboring node pairs with high structural equivalence in the representation space to alleviate neighbor bias. Finally, we use the joint training mechanism to adjust the effect of structural equivalence on the model to fit datasets with different characteristics. We conducted experiments on the node classification task across seven graph datasets. The results show that the model performance can be effectively improved using a strategy of topological signal enhancement.
△ Less
Submitted 8 December, 2022;
originally announced December 2022.
-
Engineering Photon Statistics of Spatial Light Modes
Authors:
Mingyuan Hong,
Ashe Miller,
Roberto de J. León-Montiel,
Chenglong You,
Omar S. Magaña-Loaiza
Abstract:
The nature of light sources is defined by the statistical fluctuations of the electromagnetic field. As such, the photon statistics of light sources are typically associated with distinct emitters. Here, we demonstrate the possibility of producing light beams with various photon statistics through the spatial modulation of coherent light. This is achieved by the sequential encoding of controllable…
▽ More
The nature of light sources is defined by the statistical fluctuations of the electromagnetic field. As such, the photon statistics of light sources are typically associated with distinct emitters. Here, we demonstrate the possibility of producing light beams with various photon statistics through the spatial modulation of coherent light. This is achieved by the sequential encoding of controllable Kolmogorov phase screens in a digital micromirror device. Interestingly, the flexibility of our scheme allows for the arbitrary sha** of spatial light modes with engineered photon statistics at different spatial positions. The performance of our scheme is assessed through the photon-number-resolving characterization of different families of spatial light modes with engineered photon statistics. We believe that the possibility of controlling the photon fluctuations of the light field at arbitrary spatial locations has important implications for quantum spectroscopy, sensing, and imaging.
△ Less
Submitted 3 December, 2022;
originally announced December 2022.
-
On the Robustness of deep learning-based MRI Reconstruction to image transformations
Authors:
**ghan Jia,
Mingyi Hong,
Yimeng Zhang,
Mehmet Akçakaya,
Sijia Liu
Abstract:
Although deep learning (DL) has received much attention in accelerated magnetic resonance imaging (MRI), recent studies show that tiny input perturbations may lead to instabilities of DL-based MRI reconstruction models. However, the approaches of robustifying these models are underdeveloped. Compared to image classification, it could be much more challenging to achieve a robust MRI image reconstru…
▽ More
Although deep learning (DL) has received much attention in accelerated magnetic resonance imaging (MRI), recent studies show that tiny input perturbations may lead to instabilities of DL-based MRI reconstruction models. However, the approaches of robustifying these models are underdeveloped. Compared to image classification, it could be much more challenging to achieve a robust MRI image reconstruction network considering its regression-based learning objective, limited amount of training data, and lack of efficient robustness metrics. To circumvent the above limitations, our work revisits the problem of DL-based image reconstruction through the lens of robust machine learning. We find a new instability source of MRI image reconstruction, i.e., the lack of reconstruction robustness against spatial transformations of an input, e.g., rotation and cutout. Inspired by this new robustness metric, we develop a robustness-aware image reconstruction method that can defend against both pixel-wise adversarial perturbations as well as spatial transformations. Extensive experiments are also conducted to demonstrate the effectiveness of our proposed approaches.
△ Less
Submitted 21 November, 2022; v1 submitted 9 November, 2022;
originally announced November 2022.
-
Limit laws for functionals of self-intersection symmetric alpha-stable processes
Authors:
Minhao Hong,
Qian Yu
Abstract:
In this paper, we prove two limit laws for functionals of self-intersection symmetric alpha-stable processes with alpha\in(1,2). The results are obtained based on the method of moments, the sample configuration and the chaining argument introduced in (Nualart and Xu 2013) are employed.
In this paper, we prove two limit laws for functionals of self-intersection symmetric alpha-stable processes with alpha\in(1,2). The results are obtained based on the method of moments, the sample configuration and the chaining argument introduced in (Nualart and Xu 2013) are employed.
△ Less
Submitted 28 October, 2022;
originally announced October 2022.
-
When Expressivity Meets Trainability: Fewer than $n$ Neurons Can Work
Authors:
Jiawei Zhang,
Yushun Zhang,
Mingyi Hong,
Ruoyu Sun,
Zhi-Quan Luo
Abstract:
Modern neural networks are often quite wide, causing large memory and computation costs. It is thus of great interest to train a narrower network. However, training narrow neural nets remains a challenging task. We ask two theoretical questions: Can narrow networks have as strong expressivity as wide ones? If so, does the loss function exhibit a benign optimization landscape? In this work, we prov…
▽ More
Modern neural networks are often quite wide, causing large memory and computation costs. It is thus of great interest to train a narrower network. However, training narrow neural nets remains a challenging task. We ask two theoretical questions: Can narrow networks have as strong expressivity as wide ones? If so, does the loss function exhibit a benign optimization landscape? In this work, we provide partially affirmative answers to both questions for 1-hidden-layer networks with fewer than $n$ (sample size) neurons when the activation is smooth. First, we prove that as long as the width $m \geq 2n/d$ (where $d$ is the input dimension), its expressivity is strong, i.e., there exists at least one global minimizer with zero training loss. Second, we identify a nice local region with no local-min or saddle points. Nevertheless, it is not clear whether gradient descent can stay in this nice region. Third, we consider a constrained optimization formulation where the feasible region is the nice local region, and prove that every KKT point is a nearly global minimizer. It is expected that projected gradient methods converge to KKT points under mild technical conditions, but we leave the rigorous convergence analysis to future work. Thorough numerical results show that projected gradient methods on this constrained formulation significantly outperform SGD for training narrow neural nets.
△ Less
Submitted 21 October, 2022;
originally announced October 2022.
-
LeVoice ASR Systems for the ISCSLP 2022 Intelligent Cockpit Speech Recognition Challenge
Authors:
Yan Jia,
Mi Hong,
**gyu Hou,
Kailong Ren,
Sifan Ma,
** Wang,
Fangzhen Peng,
Yinglin Ji,
Lin Yang,
Junjie Wang
Abstract:
This paper describes LeVoice automatic speech recognition systems to track2 of intelligent cockpit speech recognition challenge 2022. Track2 is a speech recognition task without limits on the scope of model size. Our main points include deep learning based speech enhancement, text-to-speech based speech generation, training data augmentation via various techniques and speech recognition model fusi…
▽ More
This paper describes LeVoice automatic speech recognition systems to track2 of intelligent cockpit speech recognition challenge 2022. Track2 is a speech recognition task without limits on the scope of model size. Our main points include deep learning based speech enhancement, text-to-speech based speech generation, training data augmentation via various techniques and speech recognition model fusion. We compared and fused the hybrid architecture and two kinds of end-to-end architecture. For end-to-end modeling, we used models based on connectionist temporal classification/attention-based encoder-decoder architecture and recurrent neural network transducer/attention-based encoder-decoder architecture. The performance of these models is evaluated with an additional language model to improve word error rates. As a result, our system achieved 10.2\% character error rate on the challenge test set data and ranked third place among the submitted systems in the challenge.
△ Less
Submitted 16 October, 2022; v1 submitted 14 October, 2022;
originally announced October 2022.
-
Advancing Model Pruning via Bi-level Optimization
Authors:
Yihua Zhang,
Yuguang Yao,
Parikshit Ram,
Pu Zhao,
Tianlong Chen,
Mingyi Hong,
Yanzhi Wang,
Sijia Liu
Abstract:
The deployment constraints in practical applications necessitate the pruning of large-scale deep learning models, i.e., promoting their weight sparsity. As illustrated by the Lottery Ticket Hypothesis (LTH), pruning also has the potential of improving their generalization ability. At the core of LTH, iterative magnitude pruning (IMP) is the predominant pruning method to successfully find 'winning…
▽ More
The deployment constraints in practical applications necessitate the pruning of large-scale deep learning models, i.e., promoting their weight sparsity. As illustrated by the Lottery Ticket Hypothesis (LTH), pruning also has the potential of improving their generalization ability. At the core of LTH, iterative magnitude pruning (IMP) is the predominant pruning method to successfully find 'winning tickets'. Yet, the computation cost of IMP grows prohibitively as the targeted pruning ratio increases. To reduce the computation overhead, various efficient 'one-shot' pruning methods have been developed, but these schemes are usually unable to find winning tickets as good as IMP. This raises the question of how to close the gap between pruning accuracy and pruning efficiency? To tackle it, we pursue the algorithmic advancement of model pruning. Specifically, we formulate the pruning problem from a fresh and novel viewpoint, bi-level optimization (BLO). We show that the BLO interpretation provides a technically-grounded optimization base for an efficient implementation of the pruning-retraining learning paradigm used in IMP. We also show that the proposed bi-level optimization-oriented pruning method (termed BiP) is a special class of BLO problems with a bi-linear problem structure. By leveraging such bi-linearity, we theoretically show that BiP can be solved as easily as first-order optimization, thus inheriting the computation efficiency. Through extensive experiments on both structured and unstructured pruning with 5 model architectures and 4 data sets, we demonstrate that BiP can find better winning tickets than IMP in most cases, and is computationally as efficient as the one-shot pruning schemes, demonstrating 2-7 times speedup over IMP for the same level of model accuracy and sparsity.
△ Less
Submitted 21 April, 2023; v1 submitted 8 October, 2022;
originally announced October 2022.
-
Maximum-Likelihood Inverse Reinforcement Learning with Finite-Time Guarantees
Authors:
Siliang Zeng,
Chenliang Li,
Alfredo Garcia,
Mingyi Hong
Abstract:
Inverse reinforcement learning (IRL) aims to recover the reward function and the associated optimal policy that best fits observed sequences of states and actions implemented by an expert. Many algorithms for IRL have an inherently nested structure: the inner loop finds the optimal policy given parametrized rewards while the outer loop updates the estimates towards optimizing a measure of fit. For…
▽ More
Inverse reinforcement learning (IRL) aims to recover the reward function and the associated optimal policy that best fits observed sequences of states and actions implemented by an expert. Many algorithms for IRL have an inherently nested structure: the inner loop finds the optimal policy given parametrized rewards while the outer loop updates the estimates towards optimizing a measure of fit. For high dimensional environments such nested-loop structure entails a significant computational burden. To reduce the computational burden of a nested loop, novel methods such as SQIL [1] and IQ-Learn [2] emphasize policy estimation at the expense of reward estimation accuracy. However, without accurate estimated rewards, it is not possible to do counterfactual analysis such as predicting the optimal policy under different environment dynamics and/or learning new tasks. In this paper we develop a novel single-loop algorithm for IRL that does not compromise reward estimation accuracy. In the proposed algorithm, each policy improvement step is followed by a stochastic gradient step for likelihood maximization. We show that the proposed algorithm provably converges to a stationary solution with a finite-time guarantee. If the reward is parameterized linearly, we show the identified solution corresponds to the solution of the maximum entropy IRL problem. Finally, by using robotics control problems in MuJoCo and their transfer settings, we show that the proposed algorithm achieves superior performance compared with other IRL and imitation learning benchmarks.
△ Less
Submitted 31 October, 2022; v1 submitted 4 October, 2022;
originally announced October 2022.
-
Structural Estimation of Markov Decision Processes in High-Dimensional State Space with Finite-Time Guarantees
Authors:
Siliang Zeng,
Mingyi Hong,
Alfredo Garcia
Abstract:
We consider the task of estimating a structural model of dynamic decisions by a human agent based upon the observable history of implemented actions and visited states. This problem has an inherent nested structure: in the inner problem, an optimal policy for a given reward function is identified while in the outer problem, a measure of fit is maximized. Several approaches have been proposed to al…
▽ More
We consider the task of estimating a structural model of dynamic decisions by a human agent based upon the observable history of implemented actions and visited states. This problem has an inherent nested structure: in the inner problem, an optimal policy for a given reward function is identified while in the outer problem, a measure of fit is maximized. Several approaches have been proposed to alleviate the computational burden of this nested-loop structure, but these methods still suffer from high complexity when the state space is either discrete with large cardinality or continuous in high dimensions. Other approaches in the inverse reinforcement learning (IRL) literature emphasize policy estimation at the expense of reduced reward estimation accuracy. In this paper we propose a single-loop estimation algorithm with finite time guarantees that is equipped to deal with high-dimensional state spaces without compromising reward estimation accuracy. In the proposed algorithm, each policy improvement step is followed by a stochastic gradient step for likelihood maximization. We show that the proposed algorithm converges to a stationary solution with a finite-time guarantee. Further, if the reward is parameterized linearly, we show that the algorithm approximates the maximum likelihood estimator sublinearly. Finally, by using robotics control problems in MuJoCo and their transfer settings, we show that the proposed algorithm achieves superior performance compared with other IRL and imitation learning benchmarks.
△ Less
Submitted 1 March, 2024; v1 submitted 3 October, 2022;
originally announced October 2022.
-
Scholastic: Graphical Human-Al Collaboration for Inductive and Interpretive Text Analysis
Authors:
Matt-Heun Hong,
Lauren A. Marsh,
Jessica L. Feuston,
Janet Ruppert,
Jed R. Brubaker,
Danielle Albers Szafir
Abstract:
Interpretive scholars generate knowledge from text corpora by manually sampling documents, applying codes, and refining and collating codes into categories until meaningful themes emerge. Given a large corpus, machine learning could help scale this data sampling and analysis, but prior research shows that experts are generally concerned about algorithms potentially disrupting or driving interpreti…
▽ More
Interpretive scholars generate knowledge from text corpora by manually sampling documents, applying codes, and refining and collating codes into categories until meaningful themes emerge. Given a large corpus, machine learning could help scale this data sampling and analysis, but prior research shows that experts are generally concerned about algorithms potentially disrupting or driving interpretive scholarship. We take a human-centered design approach to addressing concerns around machine-assisted interpretive research to build Scholastic, which incorporates a machine-in-the-loop clustering algorithm to scaffold interpretive text analysis. As a scholar applies codes to documents and refines them, the resulting coding schema serves as structured metadata which constrains hierarchical document and word clusters inferred from the corpus. Interactive visualizations of these clusters can help scholars strategically sample documents further toward insights. Scholastic demonstrates how human-centered algorithm design and visualizations employing familiar metaphors can support inductive and interpretive research methodologies through interactive topic modeling and document clustering.
△ Less
Submitted 12 August, 2022;
originally announced August 2022.
-
Nanosecond machine learning regression with deep boosted decision trees in FPGA for high energy physics
Authors:
Benjamin Carlson,
Quincy Bayer,
Tae Min Hong,
Stephen Roche
Abstract:
We present a novel application of the machine learning / artificial intelligence method called boosted decision trees to estimate physical quantities on field programmable gate arrays (FPGA). The software package fwXmachina features a new architecture called parallel decision paths that allows for deep decision trees with arbitrary number of input variables. It also features a new optimization sch…
▽ More
We present a novel application of the machine learning / artificial intelligence method called boosted decision trees to estimate physical quantities on field programmable gate arrays (FPGA). The software package fwXmachina features a new architecture called parallel decision paths that allows for deep decision trees with arbitrary number of input variables. It also features a new optimization scheme to use different numbers of bits for each input variable, which produces optimal physics results and ultraefficient FPGA resource utilization. Problems in high energy physics of proton collisions at the Large Hadron Collider (LHC) are considered. Estimation of missing transverse momentum (ETmiss) at the first level trigger system at the High Luminosity LHC (HL-LHC) experiments, with a simplified detector modeled by Delphes, is used to benchmark and characterize the firmware performance. The firmware implementation with a maximum depth of up to 10 using eight input variables of 16-bit precision gives a latency value of O(10) ns, independent of the clock speed, and O(0.1)% of the available FPGA resources without using digital signal processors.
△ Less
Submitted 12 July, 2022;
originally announced July 2022.
-
Boosting 3D Object Detection by Simulating Multimodality on Point Clouds
Authors:
Wu Zheng,
Mingxuan Hong,
Li Jiang,
Chi-Wing Fu
Abstract:
This paper presents a new approach to boost a single-modality (LiDAR) 3D object detector by teaching it to simulate features and responses that follow a multi-modality (LiDAR-image) detector. The approach needs LiDAR-image data only when training the single-modality detector, and once well-trained, it only needs LiDAR data at inference. We design a novel framework to realize the approach: response…
▽ More
This paper presents a new approach to boost a single-modality (LiDAR) 3D object detector by teaching it to simulate features and responses that follow a multi-modality (LiDAR-image) detector. The approach needs LiDAR-image data only when training the single-modality detector, and once well-trained, it only needs LiDAR data at inference. We design a novel framework to realize the approach: response distillation to focus on the crucial response samples and avoid the background samples; sparse-voxel distillation to learn voxel semantics and relations from the estimated crucial voxels; a fine-grained voxel-to-point distillation to better attend to features of small and distant objects; and instance distillation to further enhance the deep-feature consistency. Experimental results on the nuScenes dataset show that our approach outperforms all SOTA LiDAR-only 3D detectors and even surpasses the baseline LiDAR-image detector on the key NDS metric, filling 72% mAP gap between the single- and multi-modality detectors.
△ Less
Submitted 29 June, 2022;
originally announced June 2022.
-
A Framework for Understanding Model Extraction Attack and Defense
Authors:
Xun Xian,
Mingyi Hong,
Jie Ding
Abstract:
The privacy of machine learning models has become a significant concern in many emerging Machine-Learning-as-a-Service applications, where prediction services based on well-trained models are offered to users via pay-per-query. The lack of a defense mechanism can impose a high risk on the privacy of the server's model since an adversary could efficiently steal the model by querying only a few `goo…
▽ More
The privacy of machine learning models has become a significant concern in many emerging Machine-Learning-as-a-Service applications, where prediction services based on well-trained models are offered to users via pay-per-query. The lack of a defense mechanism can impose a high risk on the privacy of the server's model since an adversary could efficiently steal the model by querying only a few `good' data points. The interplay between a server's defense and an adversary's attack inevitably leads to an arms race dilemma, as commonly seen in Adversarial Machine Learning. To study the fundamental tradeoffs between model utility from a benign user's view and privacy from an adversary's view, we develop new metrics to quantify such tradeoffs, analyze their theoretical properties, and develop an optimization problem to understand the optimal adversarial attack and defense strategies. The developed concepts and theory match the empirical findings on the `equilibrium' between privacy and utility. In terms of optimization, the key ingredient that enables our results is a unified representation of the attack-defense problem as a min-max bi-level problem. The developed results will be demonstrated by examples and experiments.
△ Less
Submitted 23 June, 2022;
originally announced June 2022.
-
Distributed Adversarial Training to Robustify Deep Neural Networks at Scale
Authors:
Gaoyuan Zhang,
Songtao Lu,
Yihua Zhang,
Xiangyi Chen,
Pin-Yu Chen,
Quanfu Fan,
Lee Martie,
Lior Horesh,
Mingyi Hong,
Sijia Liu
Abstract:
Current deep neural networks (DNNs) are vulnerable to adversarial attacks, where adversarial perturbations to the inputs can change or manipulate classification. To defend against such attacks, an effective and popular approach, known as adversarial training (AT), has been shown to mitigate the negative impact of adversarial attacks by virtue of a min-max robust training method. While effective, i…
▽ More
Current deep neural networks (DNNs) are vulnerable to adversarial attacks, where adversarial perturbations to the inputs can change or manipulate classification. To defend against such attacks, an effective and popular approach, known as adversarial training (AT), has been shown to mitigate the negative impact of adversarial attacks by virtue of a min-max robust training method. While effective, it remains unclear whether it can successfully be adapted to the distributed learning context. The power of distributed optimization over multiple machines enables us to scale up robust training over large models and datasets. Spurred by that, we propose distributed adversarial training (DAT), a large-batch adversarial training framework implemented over multiple machines. We show that DAT is general, which supports training over labeled and unlabeled data, multiple types of attack generation methods, and gradient compression operations favored for distributed optimization. Theoretically, we provide, under standard conditions in the optimization theory, the convergence rate of DAT to the first-order stationary points in general non-convex settings. Empirically, we demonstrate that DAT either matches or outperforms state-of-the-art robust accuracies and achieves a graceful training speedup (e.g., on ResNet-50 under ImageNet). Codes are available at https://github.com/dat-2022/dat.
△ Less
Submitted 7 September, 2022; v1 submitted 13 June, 2022;
originally announced June 2022.
-
Optimal Solutions for Joint Beamforming and Antenna Selection: From Branch and Bound to Graph Neural Imitation Learning
Authors:
Sagar Shrestha,
Xiao Fu,
Mingyi Hong
Abstract:
This work revisits the joint beamforming (BF) and antenna selection (AS) problem, as well as its robust beamforming (RBF) version under imperfect channel state information (CSI). Such problems arise due to various reasons, e.g., the costly nature of the radio frequency (RF) chains and energy/resource-saving considerations. The joint (R)BF\&AS problem is a mixed integer and nonlinear program, and t…
▽ More
This work revisits the joint beamforming (BF) and antenna selection (AS) problem, as well as its robust beamforming (RBF) version under imperfect channel state information (CSI). Such problems arise due to various reasons, e.g., the costly nature of the radio frequency (RF) chains and energy/resource-saving considerations. The joint (R)BF\&AS problem is a mixed integer and nonlinear program, and thus finding {\it optimal solutions} is often costly, if not outright impossible. The vast majority of the prior works tackled these problems using techniques such as continuous approximations, greedy methods, and supervised machine learning -- yet these approaches do not ensure optimality or even feasibility of the solutions. The main contribution of this work is threefold. First, an effective {\it branch and bound} (B\&B) framework for solving the problems of interest is proposed. Leveraging existing BF and RBF solvers, it is shown that the B\&B framework guarantees global optimality of the considered problems. Second, to expedite the potentially costly B\&B algorithm, a machine learning (ML)-based scheme is proposed to help skip intermediate states of the B\&B search tree. The learning model features a {\it graph neural network} (GNN)-based design that is resilient to a commonly encountered challenge in wireless communications, namely, the change of problem size (e.g., the number of users) across the training and test stages. Third, comprehensive performance characterizations are presented, showing that the GNN-based method retains the global optimality of B\&B with provably reduced complexity, under reasonable conditions. Numerical simulations also show that the ML-based acceleration can often achieve an order-of-magnitude speedup relative to B\&B.
△ Less
Submitted 30 January, 2023; v1 submitted 11 June, 2022;
originally announced June 2022.
-
Zeroth-Order SciML: Non-intrusive Integration of Scientific Software with Deep Learning
Authors:
Ioannis Tsaknakis,
Bhavya Kailkhura,
Sijia Liu,
Donald Loveland,
James Diffenderfer,
Anna Maria Hiszpanski,
Mingyi Hong
Abstract:
Using deep learning (DL) to accelerate and/or improve scientific workflows can yield discoveries that are otherwise impossible. Unfortunately, DL models have yielded limited success in complex scientific domains due to large data requirements. In this work, we propose to overcome this issue by integrating the abundance of scientific knowledge sources (SKS) with the DL training process. Existing kn…
▽ More
Using deep learning (DL) to accelerate and/or improve scientific workflows can yield discoveries that are otherwise impossible. Unfortunately, DL models have yielded limited success in complex scientific domains due to large data requirements. In this work, we propose to overcome this issue by integrating the abundance of scientific knowledge sources (SKS) with the DL training process. Existing knowledge integration approaches are limited to using differentiable knowledge source to be compatible with first-order DL training paradigm. In contrast, our proposed approach treats knowledge source as a black-box in turn allowing to integrate virtually any knowledge source. To enable an end-to-end training of SKS-coupled-DL, we propose to use zeroth-order optimization (ZOO) based gradient-free training schemes, which is non-intrusive, i.e., does not require making any changes to the SKS. We evaluate the performance of our ZOO training scheme on two real-world material science applications. We show that proposed scheme is able to effectively integrate scientific knowledge with DL training and is able to outperform purely data-driven model for data-limited scientific applications. We also discuss some limitations of the proposed method and mention potentially worthwhile future directions.
△ Less
Submitted 4 June, 2022;
originally announced June 2022.
-
Global Classical Solutions Near Vacuum to the Initial-Boundary Value Problem of Isentropic Supersonic Flows through Divergent Ducts
Authors:
Ying-Chieh Lin,
Jay Chu,
John M. Hong,
Hsin-Yi Lee
Abstract:
In this paper, we study the global existence and asymptotic behavior of classical solutions near vacuum for the initial-boundary value problem modeling isentropic supersonic flows through divergent ducts. The governing equations are the compressible Euler equations with a small parameter, which can be written as a hyperbolic system in terms of the Riemann invariants with a non-dissipative source.…
▽ More
In this paper, we study the global existence and asymptotic behavior of classical solutions near vacuum for the initial-boundary value problem modeling isentropic supersonic flows through divergent ducts. The governing equations are the compressible Euler equations with a small parameter, which can be written as a hyperbolic system in terms of the Riemann invariants with a non-dissipative source. We provide a new result for the global existence of classical solutions to initial-boundary value problems of non-dissipative hyperbolic balance laws without the assumption of small data. The work is based on the local existence, the maximum principle and the uniform a priori estimates obtained by the generalized Lax transformations. The asymptotic behavior of classical solutions is also shown by studying the behavior of Riemann invariants along each characteristic curve and vertical line. The results can be applied to the spherically symmetric solutions to N-dimensional compressible Euler equations. Numerical simulations are provided to support our theoretical results.
△ Less
Submitted 24 May, 2022;
originally announced May 2022.
-
Unsupervised Homography Estimation with Coplanarity-Aware GAN
Authors:
Mingbo Hong,
Yuhang Lu,
Nian** Ye,
Chunyu Lin,
Qijun Zhao,
Shuaicheng Liu
Abstract:
Estimating homography from an image pair is a fundamental problem in image alignment. Unsupervised learning methods have received increasing attention in this field due to their promising performance and label-free training. However, existing methods do not explicitly consider the problem of plane-induced parallax, which will make the predicted homography compromised on multiple planes. In this wo…
▽ More
Estimating homography from an image pair is a fundamental problem in image alignment. Unsupervised learning methods have received increasing attention in this field due to their promising performance and label-free training. However, existing methods do not explicitly consider the problem of plane-induced parallax, which will make the predicted homography compromised on multiple planes. In this work, we propose a novel method HomoGAN to guide unsupervised homography estimation to focus on the dominant plane. First, a multi-scale transformer network is designed to predict homography from the feature pyramids of input images in a coarse-to-fine fashion. Moreover, we propose an unsupervised GAN to impose coplanarity constraint on the predicted homography, which is realized by using a generator to predict a mask of aligned regions, and then a discriminator to check if two masked feature maps are induced by a single homography. To validate the effectiveness of HomoGAN and its components, we conduct extensive experiments on a large-scale dataset, and the results show that our matching error is 22% lower than the previous SOTA method. Code is available at https://github.com/megvii-research/HomoGAN.
△ Less
Submitted 8 May, 2022;
originally announced May 2022.
-
Understanding A Class of Decentralized and Federated Optimization Algorithms: A Multi-Rate Feedback Control Perspective
Authors:
Xinwei Zhang,
Mingyi Hong,
Nicola Elia
Abstract:
Distributed algorithms have been playing an increasingly important role in many applications such as machine learning, signal processing, and control. Significant research efforts have been devoted to develo** and analyzing new algorithms for various applications. In this work, we provide a fresh perspective to understand, analyze, and design distributed optimization algorithms. Through the lens…
▽ More
Distributed algorithms have been playing an increasingly important role in many applications such as machine learning, signal processing, and control. Significant research efforts have been devoted to develo** and analyzing new algorithms for various applications. In this work, we provide a fresh perspective to understand, analyze, and design distributed optimization algorithms. Through the lens of multi-rate feedback control, we show that a wide class of distributed algorithms, including popular decentralized/federated schemes, can be viewed as discretizing a certain continuous-time feedback control system, possibly with multiple sampling rates, such as decentralized gradient descent, gradient tracking, and federated averaging. This key observation not only allows us to develop a generic framework to analyze the convergence of the entire algorithm class. More importantly, it also leads to an interesting way of designing new distributed algorithms. We develop the theory behind our framework and provide examples to highlight how the framework can be used in practice.
△ Less
Submitted 1 November, 2022; v1 submitted 26 April, 2022;
originally announced April 2022.
-
Vision System of Curling Robots: Thrower and Skip
Authors:
Seongwook Yoon,
Gayoung Kim,
Myungpyo Hong,
Sanghoon Sull
Abstract:
We built a vision system of curling robot which can be expected to play with human curling player. Basically, we built two types of vision systems for thrower and skip robots, respectively. First, the thrower robot drives towards a given point of curling sheet to release a stone. Our vision system in the thrower robot initialize 3DoF pose on two dimensional curling sheet and updates the pose to de…
▽ More
We built a vision system of curling robot which can be expected to play with human curling player. Basically, we built two types of vision systems for thrower and skip robots, respectively. First, the thrower robot drives towards a given point of curling sheet to release a stone. Our vision system in the thrower robot initialize 3DoF pose on two dimensional curling sheet and updates the pose to decide for the decision of stone release. Second, the skip robot stands at the opposite side of the thrower robot and monitors the state of the game to make a strategic decision. Our vision system in the skip robot recognize every stones on the curling sheet precisely. Since the viewpoint is quite perspective, many stones are occluded by each others so it is challenging to estimate the accurate position of stone. Thus, we recognize the ellipses of stone handles outline to find the exact midpoint of the stones using perspective Hough transform. Furthermore, we perform tracking of a thrown stone to produce a trajectory for ice condition analysis. Finally, we implemented our vision systems on two mobile robots and successfully perform a single turn and even careful gameplay. Specifically, our vision system includes three cameras with different viewpoint for their respective purposes.
△ Less
Submitted 20 April, 2022;
originally announced April 2022.
-
How to Robustify Black-Box ML Models? A Zeroth-Order Optimization Perspective
Authors:
Yimeng Zhang,
Yuguang Yao,
**ghan Jia,
**feng Yi,
Mingyi Hong,
Shiyu Chang,
Sijia Liu
Abstract:
The lack of adversarial robustness has been recognized as an important issue for state-of-the-art machine learning (ML) models, e.g., deep neural networks (DNNs). Thereby, robustifying ML models against adversarial attacks is now a major focus of research. However, nearly all existing defense methods, particularly for robust training, made the white-box assumption that the defender has the access…
▽ More
The lack of adversarial robustness has been recognized as an important issue for state-of-the-art machine learning (ML) models, e.g., deep neural networks (DNNs). Thereby, robustifying ML models against adversarial attacks is now a major focus of research. However, nearly all existing defense methods, particularly for robust training, made the white-box assumption that the defender has the access to the details of an ML model (or its surrogate alternatives if available), e.g., its architectures and parameters. Beyond existing works, in this paper we aim to address the problem of black-box defense: How to robustify a black-box model using just input queries and output feedback? Such a problem arises in practical scenarios, where the owner of the predictive model is reluctant to share model information in order to preserve privacy. To this end, we propose a general notion of defensive operation that can be applied to black-box models, and design it through the lens of denoised smoothing (DS), a first-order (FO) certified defense technique. To allow the design of merely using model queries, we further integrate DS with the zeroth-order (gradient-free) optimization. However, a direct implementation of zeroth-order (ZO) optimization suffers a high variance of gradient estimates, and thus leads to ineffective defense. To tackle this problem, we next propose to prepend an autoencoder (AE) to a given (black-box) model so that DS can be trained using variance-reduced ZO optimization. We term the eventual defense as ZO-AE-DS. In practice, we empirically show that ZO-AE- DS can achieve improved accuracy, certified robustness, and query complexity over existing baselines. And the effectiveness of our approach is justified under both image classification and image reconstruction tasks. Codes are available at https://github.com/damon-demon/Black-Box-Defense.
△ Less
Submitted 26 March, 2022;
originally announced March 2022.
-
Sign-tunable anisotropic magnetoresistance and electrically detectable dual magnetic phases in a helical antiferromagnet
Authors:
Jong Hyuk Kim,
Hyun Jun Shin,
Mi Kyung Kim,
Jae Min Hong,
Ki Won Jeong,
** Seok Kim,
Kyungsun Moon,
Nara Lee,
Young Jai Choi
Abstract:
The helimagnetic order describes a non-collinear spin texture of antiferromagnets, arising from competing exchange interactions. Although collinear antiferromagnets are elemental building blocks of antiferromagnetic (AFM) spintronics, the potential of implementing spintronic functionality in non-collinear antiferromagnets has not been clarified thus far. Here, we propose an AFM helimagnet of EuCo2…
▽ More
The helimagnetic order describes a non-collinear spin texture of antiferromagnets, arising from competing exchange interactions. Although collinear antiferromagnets are elemental building blocks of antiferromagnetic (AFM) spintronics, the potential of implementing spintronic functionality in non-collinear antiferromagnets has not been clarified thus far. Here, we propose an AFM helimagnet of EuCo2As2 as a novel single-phase spintronic material that exhibits a remarkable sign reversal of anisotropic magnetoresistance (AMR). The contrast in the AMR arises from two electrically distinctive magnetic phases with spin reorientation driven by magnetic field lying on the easy-plane, which switches the sign of the AMR from positive to negative. Further, various AFM memory states associated with the evolution of the spin structure under magnetic fields were identified theoretically, based on an easy-plane anisotropic spin model. These results reveal that non-collinear antiferromagnets hold potential for develo** spintronic devices.
△ Less
Submitted 21 March, 2022;
originally announced March 2022.
-
Large anomalous Hall effect and anisotropic magnetoresistance in intrinsic nanoscale spin-valve-type structure of an antiferromagnet
Authors:
Dong Gun Oh,
Jong Hyuk Kim,
Mi Kyung Kim,
Ki Won Jeong Hyun Jun Shin,
Jae Min Hong,
** Seok Kim,
Kyungsun Moon,
Nara Lee,
Young Jai Choi
Abstract:
A spin valve is a prototype of spin-based electronic devices found on ferromagnets, in which an antiferromagnet plays a supporting role. Recent findings in antiferromagnetic spintronics show that an antiferromagnetic order in single-phase materials solely governs dynamic transport, and antiferromagnets are considered promising candidates for spintronic technology. In this work, we demonstrated ant…
▽ More
A spin valve is a prototype of spin-based electronic devices found on ferromagnets, in which an antiferromagnet plays a supporting role. Recent findings in antiferromagnetic spintronics show that an antiferromagnetic order in single-phase materials solely governs dynamic transport, and antiferromagnets are considered promising candidates for spintronic technology. In this work, we demonstrated antiferromagnet-based spintronic functionality on an itinerant Ising antiferromagnet of Ca0.9Sr0.1Co2As2 by integrating nanoscale spin-valve-type structure and investigating anisotropic magnetic properties driven by spin-flips. Multiple stacks of 1 nm thick spin-valve-like unit are intrinsically embedded in the antiferromagnetic spin structure. In the presence of a rotating magnetic field, a new type of the spin-valve-like operation was observed for large anomalous Hall conductivity and anisotropic magnetoresistance, whose effects are maximized above the spin-flip transition. In addition, a joint experimental and theoretical study provides an efficient tool to read out various spin states, which scheme can be useful for implementing extensive spintronic applications.
△ Less
Submitted 21 March, 2022;
originally announced March 2022.
-
Attacks and Faults Injection in Self-Driving Agents on the Carla Simulator -- Experience Report
Authors:
Niccolò Piazzesi,
Massimo Hong,
Andrea Ceccarelli
Abstract:
Machine Learning applications are acknowledged at the foundation of autonomous driving, because they are the enabling technology for most driving tasks. However, the inclusion of trained agents in automotive systems exposes the vehicle to novel attacks and faults, that can result in safety threats to the driv-ing tasks. In this paper we report our experimental campaign on the injection of adversar…
▽ More
Machine Learning applications are acknowledged at the foundation of autonomous driving, because they are the enabling technology for most driving tasks. However, the inclusion of trained agents in automotive systems exposes the vehicle to novel attacks and faults, that can result in safety threats to the driv-ing tasks. In this paper we report our experimental campaign on the injection of adversarial attacks and software faults in a self-driving agent running in a driving simulator. We show that adversarial attacks and faults injected in the trained agent can lead to erroneous decisions and severely jeopardize safety. The paper shows a feasible and easily-reproducible approach based on open source simula-tor and tools, and the results clearly motivate the need of both protective measures and extensive testing campaigns.
△ Less
Submitted 25 February, 2022;
originally announced February 2022.
-
Single-crystal epitaxial europium iron garnet films with strain-induced perpendicular magnetic anisotropy: structural, strain, magnetic, and spin transport properties
Authors:
M. X. Guo,
C. K. Cheng,
Y. C. Liu,
C. N. Wu,
W. N. Chen,
T. Y Chen,
C. T. Wu,
C. H. Hsu,
S. Q. Zhou,
C. F. Chang,
L. H. Tjeng,
S. F. Lee,
C. F. Pai,
M. Hong,
J. Kwo
Abstract:
Single-crystal europium iron garnet (EuIG) thin films epitaxially strain-grown on gadolinium gallium garnet (GGG)(100) substrates using off-axis sputtering have strain-induced perpendicular magnetic anisotropy (PMA). By varying the sputtering conditions, we have tuned the europium/iron (Eu/Fe) composition ratios in the films to tailor the film strains. The films exhibited an extremely smooth, part…
▽ More
Single-crystal europium iron garnet (EuIG) thin films epitaxially strain-grown on gadolinium gallium garnet (GGG)(100) substrates using off-axis sputtering have strain-induced perpendicular magnetic anisotropy (PMA). By varying the sputtering conditions, we have tuned the europium/iron (Eu/Fe) composition ratios in the films to tailor the film strains. The films exhibited an extremely smooth, particle-free surface with roughness as low as 0.1 nm as observed using atomic force microscopy. High-resolution x-ray diffraction analysis and reciprocal space maps showed in-plane epitaxial film growth, very smooth film/substrate interface, excellent film crystallinity with a small full width at half maximum of 0.012$^{\circ}$ in the rocking curve scans, and an in-plane compressive strain without relaxation. In addition, spherical aberration-corrected scanning transmission electron microscopy showed an atomically abrupt interface between the EuIG film and GGG. The measured squarish out-of-plane magnetization-field hysteresis loops by vibrating sample magnetometry in conjunction with the measurements from angle-dependent x-ray magnetic dichroism demonstrated the PMA in the films. We have tailored the magnetic properties of the EuIG thin films, including saturation magnetization ranging from 71.91 to 124.51 emu/c.c. (increase with the (Eu/Fe) ratios), coercive field from 27 to 157.64 Oe, and the strength of PMA field ($H_\bot$) increasing from 4.21 to 18.87 kOe with the in-plane compressive strain from -0.774 to -1.044%. We have also investigated spin transport in Pt/EuIG bi-layer structure and evaluated the real part of spin mixing conductance to be $3.48\times10^{14} Ω^{-1}m^{-2}$. We demonstrated the current-induced magnetization switching with a low critical switching current density of $3.5\times10^6 A/cm^2$, showing excellent potential for low-dissipation spintronic devices.
△ Less
Submitted 11 January, 2022;
originally announced January 2022.
-
To Supervise or Not: How to Effectively Learn Wireless Interference Management Models?
Authors:
Bingqing Song,
Haoran Sun,
Wenqiang Pu,
Sijia Liu,
Mingyi Hong
Abstract:
Machine learning has become successful in solving wireless interference management problems. Different kinds of deep neural networks (DNNs) have been trained to accomplish key tasks such as power control, beamforming and admission control. There are two popular training paradigms for such DNNs-based interference management models: supervised learning (i.e., fitting labels generated by an optimizat…
▽ More
Machine learning has become successful in solving wireless interference management problems. Different kinds of deep neural networks (DNNs) have been trained to accomplish key tasks such as power control, beamforming and admission control. There are two popular training paradigms for such DNNs-based interference management models: supervised learning (i.e., fitting labels generated by an optimization algorithm) and unsupervised learning (i.e., directly optimizing some system performance measure). Although both of these paradigms have been extensively applied in practice, due to the lack of any theoretical understanding about these methods, it is not clear how to systematically understand and compare their performance.
In this work, we conduct theoretical studies to provide some in-depth understanding about these two training paradigms. First, we show a somewhat surprising result, that for some special power control problem, the unsupervised learning can perform much worse than its supervised counterpart, because it is more likely to stuck at some low-quality local solutions. We then provide a series of theoretical results to further understand the properties of the two approaches. Generally speaking, we show that when high-quality labels are available, then the supervised learning is less likely to be stuck at a solution than its unsupervised counterpart. Additionally, we develop a semi-supervised learning approach which properly integrates these two training paradigms, and can effectively utilize limited number of labels to find high-quality solutions. To our knowledge, these are the first set of theoretical results trying to understand different training approaches in learning-based wireless communication system design.
△ Less
Submitted 28 December, 2021;
originally announced December 2021.
-
Revisiting and Advancing Fast Adversarial Training Through The Lens of Bi-Level Optimization
Authors:
Yihua Zhang,
Guanhua Zhang,
Prashant Khanduri,
Mingyi Hong,
Shiyu Chang,
Sijia Liu
Abstract:
Adversarial training (AT) is a widely recognized defense mechanism to gain the robustness of deep neural networks against adversarial attacks. It is built on min-max optimization (MMO), where the minimizer (i.e., defender) seeks a robust model to minimize the worst-case training loss in the presence of adversarial examples crafted by the maximizer (i.e., attacker). However, the conventional MMO me…
▽ More
Adversarial training (AT) is a widely recognized defense mechanism to gain the robustness of deep neural networks against adversarial attacks. It is built on min-max optimization (MMO), where the minimizer (i.e., defender) seeks a robust model to minimize the worst-case training loss in the presence of adversarial examples crafted by the maximizer (i.e., attacker). However, the conventional MMO method makes AT hard to scale. Thus, Fast-AT (Wong et al., 2020) and other recent algorithms attempt to simplify MMO by replacing its maximization step with the single gradient sign-based attack generation step. Although easy to implement, Fast-AT lacks theoretical guarantees, and its empirical performance is unsatisfactory due to the issue of robust catastrophic overfitting when training with strong adversaries. In this paper, we advance Fast-AT from the fresh perspective of bi-level optimization (BLO). We first show that the commonly-used Fast-AT is equivalent to using a stochastic gradient algorithm to solve a linearized BLO problem involving a sign operation. However, the discrete nature of the sign operation makes it difficult to understand the algorithm performance. Inspired by BLO, we design and analyze a new set of robust training algorithms termed Fast Bi-level AT (Fast-BAT), which effectively defends sign-based projected gradient descent (PGD) attacks without using any gradient sign method or explicit robust regularization. In practice, we show our method yields substantial robustness improvements over baselines across multiple models and datasets. Codes are available at https://github.com/OPTML-Group/Fast-BAT.
△ Less
Submitted 4 October, 2022; v1 submitted 23 December, 2021;
originally announced December 2021.
-
Existence and convergence of the Beris-Edwards system with general Landau-de Gennes energy
Authors:
Zhewen Feng,
Min-Chun Hong,
Yu Mei
Abstract:
In this paper, we investigate the Beris-Edwards system for both biaxial and uniaxial $Q$-tensors with a general Landau-de Gennes energy density depending on four non-zero elastic constants. We prove existence of the strong solution of the Beris-Edwards system for uniaxial $Q$-tensors up to a maximal time. Furthermore, we prove that the strong solutions of the Beris-Edwards system for biaxial $Q$-t…
▽ More
In this paper, we investigate the Beris-Edwards system for both biaxial and uniaxial $Q$-tensors with a general Landau-de Gennes energy density depending on four non-zero elastic constants. We prove existence of the strong solution of the Beris-Edwards system for uniaxial $Q$-tensors up to a maximal time. Furthermore, we prove that the strong solutions of the Beris-Edwards system for biaxial $Q$-tensors converge smoothly to the solution of the Beris-Edwards system for uniaxial $Q$-tensors up to its maximal existence time.
△ Less
Submitted 10 November, 2022; v1 submitted 7 December, 2021;
originally announced December 2021.
-
Existence of minimizers and convergence of critical points for a new Landau-de Gennes energy functional in nematic liquid crystals
Authors:
Zhewen Feng,
Min-Chun Hong
Abstract:
The Landau-de Gennes energy in nematic liquid crystals depends on four elastic constants $L_1$, $L_2$, $L_3$, $L_4$. In the case of $L_4\neq 0$, Ball and Majumdar (Mol. Cryst. Liq. Cryst., 2010) found an example that the original Landau-de Gennes energy functional in physics does not satisfy a coercivity condition, which causes a problem in mathematics to establish existence of energy minimizers.…
▽ More
The Landau-de Gennes energy in nematic liquid crystals depends on four elastic constants $L_1$, $L_2$, $L_3$, $L_4$. In the case of $L_4\neq 0$, Ball and Majumdar (Mol. Cryst. Liq. Cryst., 2010) found an example that the original Landau-de Gennes energy functional in physics does not satisfy a coercivity condition, which causes a problem in mathematics to establish existence of energy minimizers. At first, we introduce a new Landau-de Gennes energy density with $L_4\neq 0$, which is equivalent to the original Landau-de Gennes density for uniaxial tensors and satisfies the coercivity condition for all $Q$-tensors. Secondly, we prove that solutions of the Landau-de Gennes system can approach a solution of the $Q$-tensor Oseen-Frank system without using energy minimizers. Thirdly, we develop a new approach to generalize the Nguyen and Zarnescu (Calc. Var. PDEs, 2013) convergence result to the case of non-zero elastic constants $L_2$, $L_3$, $L_4$.
△ Less
Submitted 28 September, 2022; v1 submitted 6 December, 2021;
originally announced December 2021.
-
Dynamic Differential-Privacy Preserving SGD
Authors:
Jian Du,
Song Li,
Xiangyi Chen,
Siheng Chen,
Mingyi Hong
Abstract:
The vanilla Differentially-Private Stochastic Gradient Descent (DP-SGD), including DP-Adam and other variants, ensures the privacy of training data by uniformly distributing privacy costs across training steps. The equivalent privacy costs controlled by maintaining the same gradient clip** thresholds and noise powers in each step result in unstable updates and a lower model accuracy when compare…
▽ More
The vanilla Differentially-Private Stochastic Gradient Descent (DP-SGD), including DP-Adam and other variants, ensures the privacy of training data by uniformly distributing privacy costs across training steps. The equivalent privacy costs controlled by maintaining the same gradient clip** thresholds and noise powers in each step result in unstable updates and a lower model accuracy when compared to the non-DP counterpart. In this paper, we propose the dynamic DP-SGD (along with dynamic DP-Adam, and others) to reduce the performance loss gap while maintaining privacy by dynamically adjusting clip** thresholds and noise powers while adhering to a total privacy budget constraint. Extensive experiments on a variety of deep learning tasks, including image classification, natural language processing, and federated learning, demonstrate that the proposed dynamic DP-SGD algorithm stabilizes updates and, as a result, significantly improves model accuracy in the strong privacy protection region when compared to the vanilla DP-SGD. We also conduct theoretical analysis to better understand the privacy-utility trade-off with dynamic DP-SGD, as well as to learn why Dynamic DP-SGD can outperform vanilla DP-SGD.
△ Less
Submitted 17 January, 2022; v1 submitted 30 October, 2021;
originally announced November 2021.
-
Minimax Problems with Coupled Linear Constraints: Computational Complexity, Duality and Solution Methods
Authors:
Ioannis Tsaknakis,
Mingyi Hong,
Shuzhong Zhang
Abstract:
In this work we study a special minimax problem where there are linear constraints that couple both the minimization and maximization decision variables. The problem is a generalization of the traditional saddle point problem (which does not have the coupling constraint), and it finds applications in wireless communication, game theory, transportation, just to name a few. We show that the consider…
▽ More
In this work we study a special minimax problem where there are linear constraints that couple both the minimization and maximization decision variables. The problem is a generalization of the traditional saddle point problem (which does not have the coupling constraint), and it finds applications in wireless communication, game theory, transportation, just to name a few. We show that the considered problem is challenging, in the sense that it violates the classical max-min inequality, and that it is NP-hard even under very strong assumptions (e.g., when the objective is strongly convex-strongly concave). We then develop a duality theory for it, and analyze conditions under which the duality gap becomes zero. Finally, we study a class of stationary solutions defined based on the dual problem, and evaluate their practical performance in an application on adversarial attacks on network flow problems.
△ Less
Submitted 25 November, 2022; v1 submitted 21 October, 2021;
originally announced October 2021.
-
Learning to Coordinate in Multi-Agent Systems: A Coordinated Actor-Critic Algorithm and Finite-Time Guarantees
Authors:
Siliang Zeng,
Tianyi Chen,
Alfredo Garcia,
Mingyi Hong
Abstract:
Multi-agent reinforcement learning (MARL) has attracted much research attention recently. However, unlike its single-agent counterpart, many theoretical and algorithmic aspects of MARL have not been well-understood. In this paper, we study the emergence of coordinated behavior by autonomous agents using an actor-critic (AC) algorithm. Specifically, we propose and analyze a class of coordinated act…
▽ More
Multi-agent reinforcement learning (MARL) has attracted much research attention recently. However, unlike its single-agent counterpart, many theoretical and algorithmic aspects of MARL have not been well-understood. In this paper, we study the emergence of coordinated behavior by autonomous agents using an actor-critic (AC) algorithm. Specifically, we propose and analyze a class of coordinated actor-critic algorithms (CAC) in which individually parametrized policies have a {\it shared} part (which is jointly optimized among all agents) and a {\it personalized} part (which is only locally optimized). Such kind of {\it partially personalized} policy allows agents to learn to coordinate by leveraging peers' past experience and adapt to individual tasks. The flexibility in our design allows the proposed MARL-CAC algorithm to be used in a {\it fully decentralized} setting, where the agents can only communicate with their neighbors, as well as a {\it federated} setting, where the agents occasionally communicate with a server while optimizing their (partially personalized) local models. Theoretically, we show that under some standard regularity assumptions, the proposed MARL-CAC algorithm requires $\mathcal{O}(ε^{-\frac{5}{2}})$ samples to achieve an $ε$-stationary solution (defined as the solution whose squared norm of the gradient of the objective function is less than $ε$). To the best of our knowledge, this work provides the first finite-sample guarantee for decentralized AC algorithm with partially personalized policies.
△ Less
Submitted 6 December, 2021; v1 submitted 11 October, 2021;
originally announced October 2021.
-
Smart Quantum Statistical Imaging beyond the Abbe-Rayleigh Criterion
Authors:
Narayan Bhusal,
Mingyuan Hong,
Nathaniel R. Miller,
Mario A Quiroz-Juarez,
Roberto de J. Leon-Montiel,
Chenglong You,
Omar S. Magana-Loaiza
Abstract:
The manifestation of the wave nature of light through diffraction imposes limits on the resolution of optical imaging. For over a century, the Abbe-Rayleigh criterion has been utilized to assess the spatial resolution limits of optical instruments. Recently, there has been an enormous impetus in overcoming the Abbe-Rayleigh resolution limit by projecting target light beams onto spatial modes. Thes…
▽ More
The manifestation of the wave nature of light through diffraction imposes limits on the resolution of optical imaging. For over a century, the Abbe-Rayleigh criterion has been utilized to assess the spatial resolution limits of optical instruments. Recently, there has been an enormous impetus in overcoming the Abbe-Rayleigh resolution limit by projecting target light beams onto spatial modes. These conventional schemes for superresolution rely on a series of spatial projective measurements to pick up phase information that is used to boost the spatial resolution of optical systems. Unfortunately, these schemes require a priori information regarding the coherence properties of "unknown" light beams. Furthermore, they require stringent alignment and centering conditions that cannot be achieved in realistic scenarios. Here, we introduce a smart quantum camera for superresolving imaging. This camera exploits the self-learning features of artificial intelligence to identify the statistical fluctuations of unknown mixtures of light sources at each pixel. This is achieved through a universal quantum model that enables the design of artificial neural networks for the identification of quantum photon fluctuations. Our camera overcomes the inherent limitations of existing superresolution schemes based on spatial mode projection. Thus, our work provides a new perspective in the field of imaging with important implications for microscopy, remote sensing, and astronomy.
△ Less
Submitted 11 October, 2021;
originally announced October 2021.
-
MToFNet: Object Anti-Spoofing with Mobile Time-of-Flight Data
Authors:
Yonghyun Jeong,
Doyeon Kim,
Jaehyeon Lee,
Minki Hong,
Solbi Hwang,
Jongwon Choi
Abstract:
In online markets, sellers can maliciously recapture others' images on display screens to utilize as spoof images, which can be challenging to distinguish in human eyes. To prevent such harm, we propose an anti-spoofing method using the paired rgb images and depth maps provided by the mobile camera with a Time-of-Fight sensor. When images are recaptured on display screens, various patterns differi…
▽ More
In online markets, sellers can maliciously recapture others' images on display screens to utilize as spoof images, which can be challenging to distinguish in human eyes. To prevent such harm, we propose an anti-spoofing method using the paired rgb images and depth maps provided by the mobile camera with a Time-of-Fight sensor. When images are recaptured on display screens, various patterns differing by the screens as known as the moiré patterns can be also captured in spoof images. These patterns lead the anti-spoofing model to be overfitted and unable to detect spoof images recaptured on unseen media. To avoid the issue, we build a novel representation model composed of two embedding models, which can be trained without considering the recaptured images. Also, we newly introduce mToF dataset, the largest and most diverse object anti-spoofing dataset, and the first to utilize ToF data. Experimental results confirm that our model achieves robust generalization even across unseen domains.
△ Less
Submitted 6 October, 2021;
originally announced October 2021.
-
Biconservative hypersurfaces with constant scalar curvature in space forms
Authors:
Yu Fu,
Min-Chun Hong,
Dan Yang,
Xin Zhan
Abstract:
Biconservative hypersurfaces are hypersurfaces which have conservative stress-energy tensor with respect to the bienergy, containing all minimal and constant mean curvature hypersurfaces. The purpose of this paper is to study biconservative hypersurfaces $M^n$ with constant scalar curvature in a space form $N^{n+1}(c)$. We prove that every biconservative hypersurface with constant scalar curvature…
▽ More
Biconservative hypersurfaces are hypersurfaces which have conservative stress-energy tensor with respect to the bienergy, containing all minimal and constant mean curvature hypersurfaces. The purpose of this paper is to study biconservative hypersurfaces $M^n$ with constant scalar curvature in a space form $N^{n+1}(c)$. We prove that every biconservative hypersurface with constant scalar curvature in $N^4(c)$ has constant mean curvature. Moreover, we prove that any biconservative hypersurface with constant scalar curvature in $N^5(c)$ is ether an open part of a certain rotational hypersurface or a constant mean curvature hypersurface. These solve an open problem proposed recently by D. Fetcu and C. Oniciuc for $n\leq4$.
△ Less
Submitted 7 October, 2021;
originally announced October 2021.
-
Inducing Equilibria via Incentives: Simultaneous Design-and-Play Ensures Global Convergence
Authors:
Boyi Liu,
Jiayang Li,
Zhuoran Yang,
Hoi-To Wai,
Mingyi Hong,
Yu Marco Nie,
Zhaoran Wang
Abstract:
To regulate a social system comprised of self-interested agents, economic incentives are often required to induce a desirable outcome. This incentive design problem naturally possesses a bilevel structure, in which a designer modifies the rewards of the agents with incentives while anticipating the response of the agents, who play a non-cooperative game that converges to an equilibrium. The existi…
▽ More
To regulate a social system comprised of self-interested agents, economic incentives are often required to induce a desirable outcome. This incentive design problem naturally possesses a bilevel structure, in which a designer modifies the rewards of the agents with incentives while anticipating the response of the agents, who play a non-cooperative game that converges to an equilibrium. The existing bilevel optimization algorithms raise a dilemma when applied to this problem: anticipating how incentives affect the agents at equilibrium requires solving the equilibrium problem repeatedly, which is computationally inefficient; bypassing the time-consuming step of equilibrium-finding can reduce the computational cost, but may lead the designer to a sub-optimal solution. To address such a dilemma, we propose a method that tackles the designer's and agents' problems simultaneously in a single loop. Specifically, at each iteration, both the designer and the agents only move one step. Nevertheless, we allow the designer to gradually learn the overall influence of the incentives on the agents, which guarantees optimality after convergence. The convergence rate of the proposed scheme is also established for a broad class of games.
△ Less
Submitted 12 October, 2022; v1 submitted 4 October, 2021;
originally announced October 2021.
-
Primal-Dual First-Order Methods for Affinely Constrained Multi-Block Saddle Point Problems
Authors:
Junyu Zhang,
Mengdi Wang,
Mingyi Hong,
Shuzhong Zhang
Abstract:
We consider the convex-concave saddle point problem $\min_{\mathbf{x}}\max_{\mathbf{y}}Φ(\mathbf{x},\mathbf{y})$, where the decision variables $\mathbf{x}$ and/or $\mathbf{y}$ subject to a multi-block structure and affine coupling constraints, and $Φ(\mathbf{x},\mathbf{y})$ possesses certain separable structure. Although the minimization counterpart of such problem has been widely studied under th…
▽ More
We consider the convex-concave saddle point problem $\min_{\mathbf{x}}\max_{\mathbf{y}}Φ(\mathbf{x},\mathbf{y})$, where the decision variables $\mathbf{x}$ and/or $\mathbf{y}$ subject to a multi-block structure and affine coupling constraints, and $Φ(\mathbf{x},\mathbf{y})$ possesses certain separable structure. Although the minimization counterpart of such problem has been widely studied under the topics of ADMM, this minimax problem is rarely investigated. In this paper, a convenient notion of $ε$-saddle point is proposed, under which the convergence rate of several proposed algorithms are analyzed. When only one of $\mathbf{x}$ and $\mathbf{y}$ has multiple blocks and affine constraint, several natural extensions of ADMM are proposed to solve the problem. Depending on the number of blocks and the level of smoothness, $\mathcal{O}(1/T)$ or $\mathcal{O}(1/\sqrt{T})$ convergence rates are derived for our algorithms. When both $\mathbf{x}$ and $\mathbf{y}$ have multiple blocks and affine constraints, a new algorithm called ExtraGradient Method of Multipliers (EGMM) is proposed. Under desirable smoothness condition, an $\mathcal{O}(1/T)$ rate of convergence can be guaranteed regardless of the number of blocks in $\mathbf{x}$ and $\mathbf{y}$. In depth comparison between EGMM (fully primal-dual method) and ADMM (approximate dual method) is made over the multi-block optimization problems to illustrate the advantage of the EGMM.
△ Less
Submitted 16 March, 2023; v1 submitted 29 September, 2021;
originally announced September 2021.
-
High-dimensional encryption in optical fibers using machine learning
Authors:
Michelle L. J. Lollie,
Fatemeh Mostafavi,
Narayan Bhusal,
Mingyuan Hong,
Chenglong You,
Roberto de J. León-Montiel,
Omar S. Magaña-Loaiza,
Mario A. Quiroz-Juárez
Abstract:
The ability to engineer the spatial wavefunction of photons has enabled a variety of quantum protocols for communication, sensing, and information processing. These protocols exploit the high dimensionality of structured light enabling the encodinng of multiple bits of information in a single photon, the measurement of small physical parameters, and the achievement of unprecedented levels of secur…
▽ More
The ability to engineer the spatial wavefunction of photons has enabled a variety of quantum protocols for communication, sensing, and information processing. These protocols exploit the high dimensionality of structured light enabling the encodinng of multiple bits of information in a single photon, the measurement of small physical parameters, and the achievement of unprecedented levels of security in schemes for cryptography. Unfortunately, the potential of structured light has been restrained to free-space platforms in which the spatial profile of photons is preserved. Here, we make an important step forward to using structured light for fiber optical communication. We introduce a smart high-dimensional encryption protocol in which the propagation of spatial modes in multimode fibers is used as a natural mechanism for encryption. This provides a secure communication channel for data transmission. The information encoded in spatial modes is retrieved using artificial neural networks, which are trained from the intensity distributions of experimentally detected spatial modes. Our on-fiber communication platform allows us to use spatial modes of light for high-dimensional bit-by-bit and byte-by-byte encoding. This protocol enables one to recover messages and images with almost perfect accuracy. Our smart protocol for high-dimensional optical encryption in optical fibers has key implications for quantum technologies relying on structured fields of light, particularly those that are challenged by free-space propagation.
△ Less
Submitted 13 August, 2021;
originally announced August 2021.
-
The Weighted Average Illusion: Biases in Perceived Mean Position in Scatterplots
Authors:
Matt-Heun Hong,
Jessica K. Witt,
Danielle Albers Szafir
Abstract:
Scatterplots can encode a third dimension by using additional channels like size or color (e.g. bubble charts). We explore a potential misinterpretation of trivariate scatterplots, which we call the weighted average illusion, where locations of larger and darker points are given more weight toward x- and y-mean estimates. This systematic bias is sensitive to a designer's choice of size or lightnes…
▽ More
Scatterplots can encode a third dimension by using additional channels like size or color (e.g. bubble charts). We explore a potential misinterpretation of trivariate scatterplots, which we call the weighted average illusion, where locations of larger and darker points are given more weight toward x- and y-mean estimates. This systematic bias is sensitive to a designer's choice of size or lightness ranges mapped onto the data. In this paper, we quantify this bias against varying size/lightness ranges and data correlations. We discuss possible explanations for its cause by measuring attention given to individual data points using a vision science technique called the centroid method. Our work illustrates how ensemble processing mechanisms and mental shortcuts can significantly distort visual summaries of data, and can lead to misconceptions like the demonstrated weighted average illusion.
△ Less
Submitted 8 August, 2021;
originally announced August 2021.
-
SSPNet: Scale Selection Pyramid Network for Tiny Person Detection from UAV Images
Authors:
Mingbo Hong,
Shuiwang Li,
Yuchao Yang,
Feiyu Zhu,
Qijun Zhao,
Li Lu
Abstract:
With the increasing demand for search and rescue, it is highly demanded to detect objects of interest in large-scale images captured by Unmanned Aerial Vehicles (UAVs), which is quite challenging due to extremely small scales of objects. Most existing methods employed Feature Pyramid Network (FPN) to enrich shallow layers' features by combing deep layers' contextual features. However, under the li…
▽ More
With the increasing demand for search and rescue, it is highly demanded to detect objects of interest in large-scale images captured by Unmanned Aerial Vehicles (UAVs), which is quite challenging due to extremely small scales of objects. Most existing methods employed Feature Pyramid Network (FPN) to enrich shallow layers' features by combing deep layers' contextual features. However, under the limitation of the inconsistency in gradient computation across different layers, the shallow layers in FPN are not fully exploited to detect tiny objects. In this paper, we propose a Scale Selection Pyramid network (SSPNet) for tiny person detection, which consists of three components: Context Attention Module (CAM), Scale Enhancement Module (SEM), and Scale Selection Module (SSM). CAM takes account of context information to produce hierarchical attention heatmaps. SEM highlights features of specific scales at different layers, leading the detector to focus on objects of specific scales instead of vast backgrounds. SSM exploits adjacent layers' relationships to fulfill suitable feature sharing between deep layers and shallow layers, thereby avoiding the inconsistency in gradient computation across different layers. Besides, we propose a Weighted Negative Sampling (WNS) strategy to guide the detector to select more representative samples. Experiments on the TinyPerson benchmark show that our method outperforms other state-of-the-art (SOTA) detectors.
△ Less
Submitted 4 July, 2021;
originally announced July 2021.
-
Understanding Clip** for Federated Learning: Convergence and Client-Level Differential Privacy
Authors:
Xinwei Zhang,
Xiangyi Chen,
Mingyi Hong,
Zhiwei Steven Wu,
**feng Yi
Abstract:
Providing privacy protection has been one of the primary motivations of Federated Learning (FL). Recently, there has been a line of work on incorporating the formal privacy notion of differential privacy with FL. To guarantee the client-level differential privacy in FL algorithms, the clients' transmitted model updates have to be clipped before adding privacy noise. Such clip** operation is subs…
▽ More
Providing privacy protection has been one of the primary motivations of Federated Learning (FL). Recently, there has been a line of work on incorporating the formal privacy notion of differential privacy with FL. To guarantee the client-level differential privacy in FL algorithms, the clients' transmitted model updates have to be clipped before adding privacy noise. Such clip** operation is substantially different from its counterpart of gradient clip** in the centralized differentially private SGD and has not been well-understood. In this paper, we first empirically demonstrate that the clipped FedAvg can perform surprisingly well even with substantial data heterogeneity when training neural networks, which is partly because the clients' updates become similar for several popular deep architectures. Based on this key observation, we provide the convergence analysis of a differential private (DP) FedAvg algorithm and highlight the relationship between clip** bias and the distribution of the clients' updates. To the best of our knowledge, this is the first work that rigorously investigates theoretical and empirical issues regarding the clip** operation in FL algorithms.
△ Less
Submitted 25 June, 2021;
originally announced June 2021.
-
STEM: A Stochastic Two-Sided Momentum Algorithm Achieving Near-Optimal Sample and Communication Complexities for Federated Learning
Authors:
Prashant Khanduri,
Pranay Sharma,
Haibo Yang,
Mingyi Hong,
Jia Liu,
Ketan Rajawat,
Pramod K. Varshney
Abstract:
Federated Learning (FL) refers to the paradigm where multiple worker nodes (WNs) build a joint model by using local data. Despite extensive research, for a generic non-convex FL problem, it is not clear, how to choose the WNs' and the server's update directions, the minibatch sizes, and the local update frequency, so that the WNs use the minimum number of samples and communication rounds to achiev…
▽ More
Federated Learning (FL) refers to the paradigm where multiple worker nodes (WNs) build a joint model by using local data. Despite extensive research, for a generic non-convex FL problem, it is not clear, how to choose the WNs' and the server's update directions, the minibatch sizes, and the local update frequency, so that the WNs use the minimum number of samples and communication rounds to achieve the desired solution. This work addresses the above question and considers a class of stochastic algorithms where the WNs perform a few local updates before communication. We show that when both the WN's and the server's directions are chosen based on a stochastic momentum estimator, the algorithm requires $\tilde{\mathcal{O}}(ε^{-3/2})$ samples and $\tilde{\mathcal{O}}(ε^{-1})$ communication rounds to compute an $ε$-stationary solution. To the best of our knowledge, this is the first FL algorithm that achieves such {\it near-optimal} sample and communication complexities simultaneously. Further, we show that there is a trade-off curve between local update frequencies and local minibatch sizes, on which the above sample and communication complexities can be maintained. Finally, we show that for the classical FedAvg (a.k.a. Local SGD, which is a momentum-less special case of the STEM), a similar trade-off curve exists, albeit with worse sample and communication complexities. Our insights on this trade-off provides guidelines for choosing the four important design elements for FL algorithms, the update frequency, directions, and minibatch sizes to achieve the best performance.
△ Less
Submitted 19 June, 2021;
originally announced June 2021.
-
Deep Learning based Multi-modal Computing with Feature Disentanglement for MRI Image Synthesis
Authors:
Yuchen Fei,
Bo Zhan,
Mei Hong,
Xi Wu,
Jiliu Zhou,
Yan Wang
Abstract:
Purpose: Different Magnetic resonance imaging (MRI) modalities of the same anatomical structure are required to present different pathological information from the physical level for diagnostic needs. However, it is often difficult to obtain full-sequence MRI images of patients owing to limitations such as time consumption and high cost. The purpose of this work is to develop an algorithm for targ…
▽ More
Purpose: Different Magnetic resonance imaging (MRI) modalities of the same anatomical structure are required to present different pathological information from the physical level for diagnostic needs. However, it is often difficult to obtain full-sequence MRI images of patients owing to limitations such as time consumption and high cost. The purpose of this work is to develop an algorithm for target MRI sequences prediction with high accuracy, and provide more information for clinical diagnosis. Methods: We propose a deep learning based multi-modal computing model for MRI synthesis with feature disentanglement strategy. To take full advantage of the complementary information provided by different modalities, multi-modal MRI sequences are utilized as input. Notably, the proposed approach decomposes each input modality into modality-invariant space with shared information and modality-specific space with specific information, so that features are extracted separately to effectively process the input data. Subsequently, both of them are fused through the adaptive instance normalization (AdaIN) layer in the decoder. In addition, to address the lack of specific information of the target modality in the test phase, a local adaptive fusion (LAF) module is adopted to generate a modality-like pseudo-target with specific information similar to the ground truth. Results: To evaluate the synthesis performance, we verify our method on the BRATS2015 dataset of 164 subjects. The experimental results demonstrate our approach significantly outperforms the benchmark method and other state-of-the-art medical image synthesis methods in both quantitative and qualitative measures. Compared with the pix2pixGANs method, the PSNR improves from 23.68 to 24.8. Conclusion: The proposed method could be effective in prediction of target MRI sequences, and useful for clinical diagnosis and treatment.
△ Less
Submitted 6 May, 2021;
originally announced May 2021.