Search | arXiv e-print repository

Fast-BEV: A Fast and Strong Bird's-Eye View Perception Baseline

Authors: Yangguang Li, Bin Huang, Zeren Chen, Yufeng Cui, Feng Liang, Mingzhu Shen, Fenggang Liu, Enze Xie, Lu Sheng, Wanli Ouyang, **g Shao

Abstract: Recently, perception task based on Bird's-Eye View (BEV) representation has drawn more and more attention, and BEV representation is promising as the foundation for next-generation Autonomous Vehicle (AV) perception. However, most existing BEV solutions either require considerable resources to execute on-vehicle inference or suffer from modest performance. This paper proposes a simple yet effectiv… ▽ More Recently, perception task based on Bird's-Eye View (BEV) representation has drawn more and more attention, and BEV representation is promising as the foundation for next-generation Autonomous Vehicle (AV) perception. However, most existing BEV solutions either require considerable resources to execute on-vehicle inference or suffer from modest performance. This paper proposes a simple yet effective framework, termed Fast-BEV , which is capable of performing faster BEV perception on the on-vehicle chips. Towards this goal, we first empirically find that the BEV representation can be sufficiently powerful without expensive transformer based transformation nor depth representation. Our Fast-BEV consists of five parts, We novelly propose (1) a lightweight deployment-friendly view transformation which fast transfers 2D image feature to 3D voxel space, (2) an multi-scale image encoder which leverages multi-scale information for better performance, (3) an efficient BEV encoder which is particularly designed to speed up on-vehicle inference. We further introduce (4) a strong data augmentation strategy for both image and BEV space to avoid over-fitting, (5) a multi-frame feature fusion mechanism to leverage the temporal information. Through experiments, on 2080Ti platform, our R50 model can run 52.6 FPS with 47.3% NDS on the nuScenes validation set, exceeding the 41.3 FPS and 47.5% NDS of the BEVDepth-R50 model and 30.2 FPS and 45.7% NDS of the BEVDet4D-R50 model. Our largest model (R101@900x1600) establishes a competitive 53.5% NDS on the nuScenes validation set. We further develop a benchmark with considerable accuracy and efficiency on current popular on-vehicle chips. The code is released at: https://github.com/Sense-GVT/Fast-BEV. △ Less

Submitted 29 January, 2023; originally announced January 2023.

Comments: submitted to TPAMI. arXiv admin note: substantial text overlap with arXiv:2301.07870

arXiv:2301.07870 [pdf, other]

Fast-BEV: Towards Real-time On-vehicle Bird's-Eye View Perception

Authors: Bin Huang, Yangguang Li, Enze Xie, Feng Liang, Luya Wang, Mingzhu Shen, Fenggang Liu, Tianqi Wang, ** Luo, **g Shao

Abstract: Recently, the pure camera-based Bird's-Eye-View (BEV) perception removes expensive Lidar sensors, making it a feasible solution for economical autonomous driving. However, most existing BEV solutions either suffer from modest performance or require considerable resources to execute on-vehicle inference. This paper proposes a simple yet effective framework, termed Fast-BEV, which is capable of perf… ▽ More Recently, the pure camera-based Bird's-Eye-View (BEV) perception removes expensive Lidar sensors, making it a feasible solution for economical autonomous driving. However, most existing BEV solutions either suffer from modest performance or require considerable resources to execute on-vehicle inference. This paper proposes a simple yet effective framework, termed Fast-BEV, which is capable of performing real-time BEV perception on the on-vehicle chips. Towards this goal, we first empirically find that the BEV representation can be sufficiently powerful without expensive view transformation or depth representation. Starting from M2BEV baseline, we further introduce (1) a strong data augmentation strategy for both image and BEV space to avoid over-fitting (2) a multi-frame feature fusion mechanism to leverage the temporal information (3) an optimized deployment-friendly view transformation to speed up the inference. Through experiments, we show Fast-BEV model family achieves considerable accuracy and efficiency on edge. In particular, our M1 model (R18@256x704) can run over 50FPS on the Tesla T4 platform, with 47.0% NDS on the nuScenes validation set. Our largest model (R101@900x1600) establishes a new state-of-the-art 53.5% NDS on the nuScenes validation set. The code is released at: https://github.com/Sense-GVT/Fast-BEV. △ Less

Submitted 18 January, 2023; originally announced January 2023.

Comments: Accepted by NeurIPS2022_ML4AD on October 22, 2022

Journal ref: NeurIPS2022_ML4AD

arXiv:2301.05450 [pdf, ps, other]

On smoothing estimates for Schrödinger equations on product spaces $\mathbb{T}^m\times \mathbb{R}^n$

Authors: Xianghong Chen, Zihua Guo, Minxing Shen, Lixin Yan

Abstract: Let $Δ_{\mathbb{T}^m\times \mathbb{R}^n}$ denote the Laplace-Beltrami operator on the product spaces $\mathbb{T}^m\times \mathbb{R}^n$. In this article we show that $$ \left\|e^{itΔ_{\mathbb{T}^m\times \mathbb{R}^n}}f\right\|_{L^p(\mathbb{T}^m\times \mathbb{R}^n\times [0,1])} \leq C \|f\|_{W^{α,p}(\mathbb{T}^m\times\mathbb{R}^n)} $$ holds if $p\geq 2(m+n+2)/(m+n)$ and $α> (m+2n)(1/2-1/p)-2/p$.… ▽ More Let $Δ_{\mathbb{T}^m\times \mathbb{R}^n}$ denote the Laplace-Beltrami operator on the product spaces $\mathbb{T}^m\times \mathbb{R}^n$. In this article we show that $$ \left\|e^{itΔ_{\mathbb{T}^m\times \mathbb{R}^n}}f\right\|_{L^p(\mathbb{T}^m\times \mathbb{R}^n\times [0,1])} \leq C \|f\|_{W^{α,p}(\mathbb{T}^m\times\mathbb{R}^n)} $$ holds if $p\geq 2(m+n+2)/(m+n)$ and $α> (m+2n)(1/2-1/p)-2/p$. Furthermore, we apply the $\ell^2$-decoupling inequalities to establish local $L^p$-smoothing estimates for the Schrödinger operator $e^{itΔ_{\mathbb{T}^m\times\mathbb{R}^n}}$ in modulation spaces $M_{p,q}^α(\mathbb{T}^m\times\mathbb{R}^n)$: $$ \|e^{itΔ_{\mathbb{T}^m\times\mathbb{R}^n}}f\|_{L^p(\mathbb{T}^m\times\mathbb{R}^n\times [0,1])}\leq C \|f\|_{M_{p,q}^α(\mathbb{T}^m\times\mathbb{R}^n)} $$ for some range of $α$ and $p, q$. The smoothing estimates in $L^p$-Sobolev and modulation spaces are sharp up to the endpoint regularity, in a certain range of $p$ and $q$. △ Less

Submitted 13 January, 2023; originally announced January 2023.

Comments: 14 pages

arXiv:2301.04051 [pdf]

Protein Co-Enrichment Analysis of Extracellular Vesicles

Authors: Molly L. Shen, Zijie **, Rosalie Martel, Andreas Wallucks, Lucile Alexandre, Philippe DeCorwin-Martin, Lorenna Oliveira Fernandes de Araujo, Andy Ng, David Juncker

Abstract: Extracellular Vesicles (EVs) carry cell-derived proteins that confer functionality and selective cell uptake. However, whether proteins are packaged stochastically or co-enriched within individual EVs, and whether co-enrichment fluctuates under homeostasis and disease, has not been measured. EV abundance and protein global relative expression have been qualified by bulk analysis. Meanwhile, co-enr… ▽ More Extracellular Vesicles (EVs) carry cell-derived proteins that confer functionality and selective cell uptake. However, whether proteins are packaged stochastically or co-enriched within individual EVs, and whether co-enrichment fluctuates under homeostasis and disease, has not been measured. EV abundance and protein global relative expression have been qualified by bulk analysis. Meanwhile, co-enrichment is not directly accessible via bulk measurement and has not been reported for single EV analysis. Here, we introduce the normalized index of co-enrichment (NICE) to measure protein co-enrichment. NICE was derived by (i) capturing EVs based on the expression of a membrane-bound protein, (ii) probing for the co-expression of a second protein at the population level - EV integrity underwrites the detection of single EV co-expression without the need to resolve single EVs - and (iii) normalizing measured values using two universal normalization probes. Axiomatically, NICE = 1 for stochastic inclusion or no overall co-enrichment, while for positive and negative co-enrichment NICE > 1 or < 1, respectively. We quantified the NICE of tetraspanins, growth factor receptors and integrins in EVs of eight breast cancer cell lines of varying metastatic potential and organotropism, combinatorially map** up to 104 protein pairs. Our analysis revealed protein enrichment and co-expression patterns consistent with previous findings. For the organotropic cell lines, most protein pairs were co-enriched on EVs, with the majority of NICE values between 0.2 to 11.5, and extending from 0.037 to 80.4. Median NICE were either negative, neutral or positive depending on the cells. NICE analysis is easily multiplexed and is compatible with microarrays, bead-based and single EV assays. Additional studies are needed to deepen our understanding of the potential and significance of NICE for research and clinical uses. △ Less

Submitted 17 May, 2023; v1 submitted 10 January, 2023; originally announced January 2023.

arXiv:2301.00918 [pdf, other]

Evaluation of Public Transit Systems under Short Random Service Suspensions: A Bulk-Service Queuing Approach

Authors: Baichuan Mo, Li **, Haris N. Koutsopoulos, Zuo-Jun Max Shen, **hua Zhao

Abstract: This paper proposes a stochastic framework to evaluate the performance of public transit systems under short random service suspensions. We aim to derive closed-form formulations of the mean and variance of the queue length and waiting time. A bulk-service queue model is adopted to formulate the queuing behavior in the system. The random service suspension is modeled as a two-state (disruption and… ▽ More This paper proposes a stochastic framework to evaluate the performance of public transit systems under short random service suspensions. We aim to derive closed-form formulations of the mean and variance of the queue length and waiting time. A bulk-service queue model is adopted to formulate the queuing behavior in the system. The random service suspension is modeled as a two-state (disruption and normal) Markov process. We prove that headway is distributed as the difference between two compound Poisson exponential random variables. The distribution is used to specify the mean and variance of queue length and waiting time at each station with analytical formulations. The closed-form stability condition of the system is also derived, implying that the system is more likely to be unstable with high incident rates and long incident duration. The proposed model is implemented on a bus network. Results show that higher incident rates and higher average incident duration will increase both the mean and variance of queue length and waiting time, which are consistent with the theoretical analysis. Crowding stations are more vulnerable to random service suspensions. The theoretical results are validated with a simulation model, showing consistency between the two outcomes. △ Less

Submitted 2 January, 2023; originally announced January 2023.

arXiv:2301.00916 [pdf, other]

Individual Path Recommendation Under Public Transit Service Disruptions Considering Behavior Uncertainty

Authors: Baichuan Mo, Haris N. Koutsopoulos, Zuo-Jun Max Shen, **hua Zhao

Abstract: This study proposes a mixed-integer programming formulation to model the individual-based path (IPR) recommendation problem during public transit service disruptions with the objective of minimizing system travel time and respecting passengers' path choice preferences. Passengers' behavior uncertainty in path choices given recommendations is also considered. We model the behavior uncertainty based… ▽ More This study proposes a mixed-integer programming formulation to model the individual-based path (IPR) recommendation problem during public transit service disruptions with the objective of minimizing system travel time and respecting passengers' path choice preferences. Passengers' behavior uncertainty in path choices given recommendations is also considered. We model the behavior uncertainty based on the passenger's prior preferences and posterior path choice probability distribution with two new concepts: epsilon-feasibility and Gamma-concentration, which control the mean and variance of path flows in the optimization problem. We show that these two concepts can be seen as a way of approximating the recourse function (expected system travel time) in a two-stage stochastic optimization. It is proved that these two concepts help to bound the difference between the approximated recourse function and the exact one. Additional theoretical analysis shows that epsilon-feasibility and Gamma-concentration can be seen as an approximation of expectation and chance constraints in a typical stochastic optimization formulation, respectively. The proposed IPR problem with behavior uncertainty is solved efficiently with Benders decomposition. The model is implemented in the Chicago Transit Authority (CTA) system with a real-world urban rail disruption as the case study. Results show that the proposed IPR model significantly reduces the average travel times compared to the status quo and outperforms the capacity-based benchmark path recommendation strategy. △ Less

Submitted 2 January, 2023; originally announced January 2023.

arXiv:2212.10777 [pdf, other]

Hierarchically branched diffusion models leverage dataset structure for class-conditional generation

Authors: Alex M. Tseng, Max Shen, Tommaso Biancalani, Gabriele Scalia

Abstract: Class-labeled datasets, particularly those common in scientific domains, are rife with internal structure, yet current class-conditional diffusion models ignore these relationships and implicitly diffuse on all classes in a flat fashion. To leverage this structure, we propose hierarchically branched diffusion models as a novel framework for class-conditional generation. Branched diffusion models r… ▽ More Class-labeled datasets, particularly those common in scientific domains, are rife with internal structure, yet current class-conditional diffusion models ignore these relationships and implicitly diffuse on all classes in a flat fashion. To leverage this structure, we propose hierarchically branched diffusion models as a novel framework for class-conditional generation. Branched diffusion models rely on the same diffusion process as traditional models, but learn reverse diffusion separately for each branch of a hierarchy. We highlight several advantages of branched diffusion models over the current state-of-the-art methods for class-conditional diffusion, including extension to novel classes in a continual-learning setting, a more sophisticated form of analogy-based conditional generation (i.e. transmutation), and a novel interpretability into the generation process. We extensively evaluate branched diffusion models on several benchmark and large real-world scientific datasets spanning many data modalities. △ Less

Submitted 1 February, 2024; v1 submitted 21 December, 2022; originally announced December 2022.

arXiv:2212.07359 [pdf, other]

Post-hoc Uncertainty Learning using a Dirichlet Meta-Model

Authors: Maohao Shen, Yuheng Bu, Prasanna Sattigeri, Soumya Ghosh, Subhro Das, Gregory Wornell

Abstract: It is known that neural networks have the problem of being over-confident when directly using the output label distribution to generate uncertainty measures. Existing methods mainly resolve this issue by retraining the entire model to impose the uncertainty quantification capability so that the learned model can achieve desired performance in accuracy and uncertainty prediction simultaneously. How… ▽ More It is known that neural networks have the problem of being over-confident when directly using the output label distribution to generate uncertainty measures. Existing methods mainly resolve this issue by retraining the entire model to impose the uncertainty quantification capability so that the learned model can achieve desired performance in accuracy and uncertainty prediction simultaneously. However, training the model from scratch is computationally expensive and may not be feasible in many situations. In this work, we consider a more practical post-hoc uncertainty learning setting, where a well-trained base model is given, and we focus on the uncertainty quantification task at the second stage of training. We propose a novel Bayesian meta-model to augment pre-trained models with better uncertainty quantification abilities, which is effective and computationally efficient. Our proposed method requires no additional training data and is flexible enough to quantify different uncertainties and easily adapt to different application settings, including out-of-domain data detection, misclassification detection, and trustworthy transfer learning. We demonstrate our proposed meta-model approach's flexibility and superior empirical performance on these applications over multiple representative image classification benchmarks. △ Less

Submitted 14 December, 2022; originally announced December 2022.

Comments: Accepted by AAAI 2023

arXiv:2212.06905 [pdf, other]

Query Time Optimized Deep Learning Based Video Inference System

Authors: Mingren Shen, Shuoxuan Dong, Xiuyuan He

Abstract: This is a project report about how we tune Focus[1], a video inference system that provides low cost and low latency, through two phases. In this report, we will decrease the query time by saving the middle layer output of the neural network. This is a trade-off strategy that involves using more space to save time. We show how this scheme works using prototype systems, and it saves 20% of the time… ▽ More This is a project report about how we tune Focus[1], a video inference system that provides low cost and low latency, through two phases. In this report, we will decrease the query time by saving the middle layer output of the neural network. This is a trade-off strategy that involves using more space to save time. We show how this scheme works using prototype systems, and it saves 20% of the time. The code repository URL is here, https://github.com/iphyer/CS744 FocousIngestOpt. △ Less

Submitted 7 February, 2024; v1 submitted 13 December, 2022; originally announced December 2022.

arXiv:2212.06620 [pdf, other]

Improving Accuracy Without Losing Interpretability: A ML Approach for Time Series Forecasting

Authors: Yiqi Sun, Zhengxin Shi, Jianshen Zhang, Yongzhi Qi, Hao Hu, Zuojun Max Shen

Abstract: In time series forecasting, decomposition-based algorithms break aggregate data into meaningful components and are therefore appreciated for their particular advantages in interpretability. Recent algorithms often combine machine learning (hereafter ML) methodology with decomposition to improve prediction accuracy. However, incorporating ML is generally considered to sacrifice interpretability ine… ▽ More In time series forecasting, decomposition-based algorithms break aggregate data into meaningful components and are therefore appreciated for their particular advantages in interpretability. Recent algorithms often combine machine learning (hereafter ML) methodology with decomposition to improve prediction accuracy. However, incorporating ML is generally considered to sacrifice interpretability inevitably. In addition, existing hybrid algorithms usually rely on theoretical models with statistical assumptions and focus only on the accuracy of aggregate predictions, and thus suffer from accuracy problems, especially in component estimates. In response to the above issues, this research explores the possibility of improving accuracy without losing interpretability in time series forecasting. We first quantitatively define interpretability for data-driven forecasts and systematically review the existing forecasting algorithms from the perspective of interpretability. Accordingly, we propose the W-R algorithm, a hybrid algorithm that combines decomposition and ML from a novel perspective. Specifically, the W-R algorithm replaces the standard additive combination function with a weighted variant and uses ML to modify the estimates of all components simultaneously. We mathematically analyze the theoretical basis of the algorithm and validate its performance through extensive numerical experiments. In general, the W-R algorithm outperforms all decomposition-based and ML benchmarks. Based on P50_QL, the algorithm relatively improves by 8.76% in accuracy on the practical sales forecasts of JD.com and 77.99% on a public dataset of electricity loads. This research offers an innovative perspective to combine the statistical and ML algorithms, and JD.com has implemented the W-R algorithm to make accurate sales predictions and guide its marketing activities. △ Less

Submitted 13 December, 2022; originally announced December 2022.

arXiv:2212.00594 [pdf]

Path Planning Considering Time-Varying and Uncertain Movement Speed in Multi-Robot Automatic Warehouses: Problem Formulation and Algorithm

Authors: **gchuan Chen, Wei Chen, **g Li, Xiguang Wei, Wenzhe Tan, Zuo-Jun Max Shen, Hongbo Li

Abstract: Path planning in the multi-robot system refers to calculating a set of actions for each robot, which will move each robot to its goal without conflicting with other robots. Lately, the research topic has received significant attention for its extensive applications, such as airport ground, drone swarms, and automatic warehouses. Despite these available research results, most of the existing invest… ▽ More Path planning in the multi-robot system refers to calculating a set of actions for each robot, which will move each robot to its goal without conflicting with other robots. Lately, the research topic has received significant attention for its extensive applications, such as airport ground, drone swarms, and automatic warehouses. Despite these available research results, most of the existing investigations are concerned with the cases of robots with a fixed movement speed without considering uncertainty. Therefore, in this work, we study the problem of path-planning in the multi-robot automatic warehouse context, which considers the time-varying and uncertain robots' movement speed. Specifically, the path-planning module searches a path with as few conflicts as possible for a single agent by calculating traffic cost based on customarily distributed conflict probability and combining it with the classic A* algorithm. However, this probability-based method cannot eliminate all conflicts, and speed's uncertainty will constantly cause new conflicts. As a supplement, we propose the other two modules. The conflict detection and re-planning module chooses objects requiring re-planning paths from the agents involved in different types of conflicts periodically by our designed rules. Also, at each step, the scheduling module fills up the agent's preserved queue and decides who has a higher priority when the same element is assigned to two agents simultaneously. Finally, we compare the proposed algorithm with other algorithms from academia and industry, and the results show that the proposed method is validated as the best performance. △ Less

Submitted 1 December, 2022; originally announced December 2022.

arXiv:2211.12449 [pdf, other]

doi 10.1038/s41467-023-37513-w

Controlling single rare earth ion emission in an electro-optical nanocavity

Authors: Likai Yang, Sihao Wang, Mohan Shen, Jiacheng Xie, Hong X. Tang

Abstract: Rare earth emitters enable critical quantum resources including spin qubits, single photon sources, and quantum memories. Yet, probing of single ions remains challenging due to low emission rate of their intra-4f optical transitions. One feasible approach is through Purcell enhanced emission in optical cavities. The ability to modulate cavity-ion coupling in real time will further elevate the capa… ▽ More Rare earth emitters enable critical quantum resources including spin qubits, single photon sources, and quantum memories. Yet, probing of single ions remains challenging due to low emission rate of their intra-4f optical transitions. One feasible approach is through Purcell enhanced emission in optical cavities. The ability to modulate cavity-ion coupling in real time will further elevate the capacity of such systems. Here, we demonstrate direct control of single ion emission by embedding erbium dopants in an electro-optically active photonic crystal cavity patterned from thin-film lithium niobate. Purcell factor over 170 enables single ion detection, which is verified by second-order autocorrelation measurement. Dynamic control of emission rate is realized by leveraging electro-optic tuning of resonance frequency. Using this feature, storage and retrieval of single ion excitation is further demonstrated, without perturbing the emission characteristics. These results promise new opportunities for controllable single photon sources and efficient spin-photon interfaces. △ Less

Submitted 22 November, 2022; originally announced November 2022.

arXiv:2211.10668 [pdf, other]

Light-induced dynamic frequency shifting of microwave photons in a superconducting electro-optic converter

Authors: Yuntao Xu, Wei Fu, Yiyu Zhou, Mingrui Xu, Mohan Shen, Ayed Al Sayem, Hong X. Tang

Abstract: Hybrid superconducting-photonic microresonators are a promising platform for realizing microwave-to-optical transduction. However, the absorption of scattered photons by the superconductors leads to unintended microwave resonance frequency variation and linewidth broadening. Here, we experimentally study the dynamics of this effect and its impact on microwave-to-optics conversion in an integrated… ▽ More Hybrid superconducting-photonic microresonators are a promising platform for realizing microwave-to-optical transduction. However, the absorption of scattered photons by the superconductors leads to unintended microwave resonance frequency variation and linewidth broadening. Here, we experimentally study the dynamics of this effect and its impact on microwave-to-optics conversion in an integrated lithium niobate-superconductor hybrid resonator platform. We unveiled an adiabatic frequency shifting of the intracavity microwave photons induced by the fast photo-responses of the thin-film superconducting resonator. As a result, the temporal and spectral responses of electro-optics transduction are modified and well described by our theoretical model. This work provides important insights on the light-induced conversion dynamics which must be considered in future designs of hybrid superconducting-photonic system. △ Less

Submitted 19 November, 2022; originally announced November 2022.

arXiv:2211.08194 [pdf]

Machine learning for classifying and interpreting coherent X-ray speckle patterns

Authors: Mingren Shen, Dina Sheyfer, Troy David Loeffler, Subramanian K. R. S. Sankaranarayanan, G. Brian Stephenson, Maria K. Y. Chan, Dane Morgan

Abstract: Speckle patterns produced by coherent X-ray have a close relationship with the internal structure of materials but quantitative inversion of the relationship to determine structure from speckle patterns is challenging. Here, we investigate the link between coherent X-ray speckle patterns and sample structures using a model 2D disk system and explore the ability of machine learning to learn aspects… ▽ More Speckle patterns produced by coherent X-ray have a close relationship with the internal structure of materials but quantitative inversion of the relationship to determine structure from speckle patterns is challenging. Here, we investigate the link between coherent X-ray speckle patterns and sample structures using a model 2D disk system and explore the ability of machine learning to learn aspects of the relationship. Specifically, we train a deep neural network to classify the coherent X-ray speckle patterns according to the disk number density in the corresponding structure. It is demonstrated that the classification system is accurate for both non-disperse and disperse size distributions. △ Less

Submitted 1 September, 2023; v1 submitted 15 November, 2022; originally announced November 2022.

arXiv:2211.02206 [pdf, other]

Soft Masking for Cost-Constrained Channel Pruning

Authors: Ryan Humble, Maying Shen, Jorge Albericio Latorre, Eric Darve1, Jose M. Alvarez

Abstract: Structured channel pruning has been shown to significantly accelerate inference time for convolution neural networks (CNNs) on modern hardware, with a relatively minor loss of network accuracy. Recent works permanently zero these channels during training, which we observe to significantly hamper final accuracy, particularly as the fraction of the network being pruned increases. We propose Soft Mas… ▽ More Structured channel pruning has been shown to significantly accelerate inference time for convolution neural networks (CNNs) on modern hardware, with a relatively minor loss of network accuracy. Recent works permanently zero these channels during training, which we observe to significantly hamper final accuracy, particularly as the fraction of the network being pruned increases. We propose Soft Masking for cost-constrained Channel Pruning (SMCP) to allow pruned channels to adaptively return to the network while simultaneously pruning towards a target cost constraint. By adding a soft mask re-parameterization of the weights and channel pruning from the perspective of removing input channels, we allow gradient updates to previously pruned channels and the opportunity for the channels to later return to the network. We then formulate input channel pruning as a global resource allocation problem. Our method outperforms prior works on both the ImageNet classification and PASCAL VOC detection datasets. △ Less

Submitted 3 November, 2022; originally announced November 2022.

Comments: Accepted by ECCV 2022

arXiv:2211.01506 [pdf]

Real-space imaging of polar and elastic nano-textures in thin films via inversion of diffraction data

Authors: Ziming Shao, Noah Schnitzer, Jacob Ruf, Oleg Y. Gorobtsov, Cheng Dai, Berit H. Goodge, Tiannan Yang, Hari Nair, Vlad A. Stoica, John W. Freeland, Jacob Ruff, Long-Qing Chen, Darrell G. Schlom, Kyle M. Shen, Lena F. Kourkoutis, Andrej Singer

Abstract: Exploiting the emerging nanoscale periodicities in epitaxial, single-crystal thin films is an exciting direction in quantum materials science: confinement and periodic distortions induce novel properties. The structural motifs of interest are ferroelastic, ferroelectric, multiferroic, and, more recently, topologically protected magnetization and polarization textures. A critical step towards heter… ▽ More Exploiting the emerging nanoscale periodicities in epitaxial, single-crystal thin films is an exciting direction in quantum materials science: confinement and periodic distortions induce novel properties. The structural motifs of interest are ferroelastic, ferroelectric, multiferroic, and, more recently, topologically protected magnetization and polarization textures. A critical step towards heterostructure engineering is understanding their nanoscale structure, best achieved through real-space imaging. X-ray Bragg coherent diffractive imaging visualizes sub-picometer crystalline displacements with tens of nanometers spatial resolution. Yet, it is limited to objects spatially confined in all three dimensions and requires highly coherent, laser-like x-rays. Here we lift the confinement restriction by develo** real-space imaging of periodic lattice distortions: we combine an iterative phase retrieval algorithm with unsupervised machine learning to invert the diffuse scattering in conventional x-ray reciprocal-space map** into real-space images of polar and elastic textures in thin epitaxial films. We first demonstrate our imaging in PbTiO3/SrTiO3 superlattices to be consistent with published phase-field model calculations. We then visualize strain-induced ferroelastic domains emerging during the metal-insulator transition in Ca2RuO4 thin films. Instead of homogeneously transforming into a low-temperature structure (like in bulk), the strained Mott insulator splits into nanodomains with alternating lattice constants, as confirmed by cryogenic scanning transmission electron microscopy. Our study reveals the type, size, orientation, and crystal displacement field of the nano-textures. The non-destructive imaging of textures promises to improve models for their dynamics and enable advances in quantum materials and microelectronics. △ Less

Submitted 2 November, 2022; originally announced November 2022.

arXiv:2211.00246 [pdf, other]

Batch Active Learning from the Perspective of Sparse Approximation

Authors: Maohao Shen, Bowen Jiang, Jacky Yibo Zhang, Oluwasanmi Koyejo

Abstract: Active learning enables efficient model training by leveraging interactions between machine learning agents and human annotators. We study and propose a novel framework that formulates batch active learning from the sparse approximation's perspective. Our active learning method aims to find an informative subset from the unlabeled data pool such that the corresponding training loss function approx… ▽ More Active learning enables efficient model training by leveraging interactions between machine learning agents and human annotators. We study and propose a novel framework that formulates batch active learning from the sparse approximation's perspective. Our active learning method aims to find an informative subset from the unlabeled data pool such that the corresponding training loss function approximates its full data pool counterpart. We realize the framework as sparsity-constrained discontinuous optimization problems, which explicitly balance uncertainty and representation for large-scale applications and could be solved by greedy or proximal iterative hard thresholding algorithms. The proposed method can adapt to various settings, including both Bayesian and non-Bayesian neural networks. Numerical experiments show that our work achieves competitive performance across different settings with lower computational complexity. △ Less

Submitted 5 November, 2022; v1 submitted 31 October, 2022; originally announced November 2022.

Comments: NeurIPS 2022 Workshop on Human in the Loop Learning

arXiv:2210.12192 [pdf, other]

Conditional Diffusion with Less Explicit Guidance via Model Predictive Control

Authors: Max W. Shen, Ehsan Hajiramezanali, Gabriele Scalia, Alex Tseng, Nathaniel Diamant, Tommaso Biancalani, Andreas Loukas

Abstract: How much explicit guidance is necessary for conditional diffusion? We consider the problem of conditional sampling using an unconditional diffusion model and limited explicit guidance (e.g., a noised classifier, or a conditional diffusion model) that is restricted to a small number of time steps. We explore a model predictive control (MPC)-like approach to approximate guidance by simulating uncond… ▽ More How much explicit guidance is necessary for conditional diffusion? We consider the problem of conditional sampling using an unconditional diffusion model and limited explicit guidance (e.g., a noised classifier, or a conditional diffusion model) that is restricted to a small number of time steps. We explore a model predictive control (MPC)-like approach to approximate guidance by simulating unconditional diffusion forward, and backpropagating explicit guidance feedback. MPC-approximated guides have high cosine similarity to real guides, even over large simulation distances. Adding MPC steps improves generative quality when explicit guidance is limited to five time steps. △ Less

Submitted 21 October, 2022; originally announced October 2022.

arXiv:2210.06659 [pdf, other]

Structural Pruning via Latency-Saliency Knapsack

Authors: Maying Shen, Hongxu Yin, Pavlo Molchanov, Lei Mao, Jianna Liu, Jose M. Alvarez

Abstract: Structural pruning can simplify network architecture and improve inference speed. We propose Hardware-Aware Latency Pruning (HALP) that formulates structural pruning as a global resource allocation optimization problem, aiming at maximizing the accuracy while constraining latency under a predefined budget on targeting device. For filter importance ranking, HALP leverages latency lookup table to tr… ▽ More Structural pruning can simplify network architecture and improve inference speed. We propose Hardware-Aware Latency Pruning (HALP) that formulates structural pruning as a global resource allocation optimization problem, aiming at maximizing the accuracy while constraining latency under a predefined budget on targeting device. For filter importance ranking, HALP leverages latency lookup table to track latency reduction potential and global saliency score to gauge accuracy drop. Both metrics can be evaluated very efficiently during pruning, allowing us to reformulate global structural pruning under a reward maximization problem given target constraint. This makes the problem solvable via our augmented knapsack solver, enabling HALP to surpass prior work in pruning efficacy and accuracy-efficiency trade-off. We examine HALP on both classification and detection tasks, over varying networks, on ImageNet and VOC datasets, on different platforms. In particular, for ResNet-50/-101 pruning on ImageNet, HALP improves network throughput by $1.60\times$/$1.90\times$ with $+0.3\%$/$-0.2\%$ top-1 accuracy changes, respectively. For SSD pruning on VOC, HALP improves throughput by $1.94\times$ with only a $0.56$ mAP drop. HALP consistently outperforms prior art, sometimes by large margins. Project page at https://halp-neurips.github.io/. △ Less

Submitted 18 October, 2022; v1 submitted 12 October, 2022; originally announced October 2022.

Comments: Accepted by NeurIPS 2022. arXiv admin note: substantial text overlap with arXiv:2110.10811

arXiv:2210.04397 [pdf, other]

Energy-efficient Reactive and Predictive Connected Cruise Control

Authors: Minghao Shen, R. Austin Dollar, Tamas G. Molnar, Chaozhe R. He, Ardalan Vahidi, Gabor Orosz

Abstract: In this paper, we propose a framework for the longitudinal control of connected and automated vehicles traveling in mixed traffic consisting of connected and non-connected human-driven vehicles. Reactive and predictive controllers are proposed. Reactive controllers are given by explicit feedback control laws. In predictive controllers, the control input is optimized in a receding-horizon fashion,… ▽ More In this paper, we propose a framework for the longitudinal control of connected and automated vehicles traveling in mixed traffic consisting of connected and non-connected human-driven vehicles. Reactive and predictive controllers are proposed. Reactive controllers are given by explicit feedback control laws. In predictive controllers, the control input is optimized in a receding-horizon fashion, which depends on the predictions of motions of preceding vehicles. Beyond-line-of-sight information is obtained via vehicle-to-vehicle (V2V) communication, and is utilized in the proposed reactive and predictive controllers. Simulations utilizing real traffic data are used to show that connectivity can bring significant energy savings. △ Less

Submitted 9 October, 2022; originally announced October 2022.

Comments: 18 pages, 12 figures, submitted to Transportation Research Part C: Emerging Technologies

arXiv:2209.08246 [pdf, other]

doi 10.1109/SEC54971.2022.00059

Quantum Computing Methods for Supply Chain Management

Authors: Hansheng Jiang, Zuo-Jun Max Shen, Junyu Liu

Abstract: Quantum computing is expected to have transformative influences on many domains, but its practical deployments on industry problems are underexplored. We focus on applying quantum computing to operations management problems in industry, and in particular, supply chain management. Many problems in supply chain management involve large state and action spaces and pose computational challenges on cla… ▽ More Quantum computing is expected to have transformative influences on many domains, but its practical deployments on industry problems are underexplored. We focus on applying quantum computing to operations management problems in industry, and in particular, supply chain management. Many problems in supply chain management involve large state and action spaces and pose computational challenges on classic computers. We develop a quantized policy iteration algorithm to solve an inventory control problem and demonstrative its effectiveness. We also discuss in-depth the hardware requirements and potential challenges on implementing this quantum algorithm in the near term. Our simulations and experiments are powered by \texttt{IBM Qiskit} and the \texttt{qBraid} system. △ Less

Submitted 1 December, 2022; v1 submitted 17 September, 2022; originally announced September 2022.

Comments: 6 pages, 5 figures

Journal ref: 2022 IEEE/ACM 7th Symposium on Edge Computing (SEC)

arXiv:2209.03571 [pdf, other]

Optimal Policy for Inventory Management with Periodic and Controlled Resets

Authors: Yoon Lee, Yonatan Mintz, Anil Aswani, Zuo-Jun Max Shen, Cong Yang

Abstract: Inventory management problems with periodic and controllable resets occur in the context of managing water storage in the develo** world and retailing limited-time availability products. In this paper, we consider a set of sequential decision problems in which the decision-maker must not only balance holding and shortage costs but discard all inventory before a fixed number of decision epochs, w… ▽ More Inventory management problems with periodic and controllable resets occur in the context of managing water storage in the develo** world and retailing limited-time availability products. In this paper, we consider a set of sequential decision problems in which the decision-maker must not only balance holding and shortage costs but discard all inventory before a fixed number of decision epochs, with the option for an early inventory reset. Finding optimal policies using dynamic programming for these problems is particularly challenging since the resulting value functions are non-convex. Moreover, this structure cannot be easily analyzed using existing extended definitions, such as $K$-convexity. Our key contribution is to present sufficient conditions that ensure the optimal policy has an easily interpretable structure that generalizes the well-known $(s, S)$ policy from the operations literature. Furthermore, we demonstrate that the optimal policy has a four-threshold structure under these rather mild conditions. We then conclude with computational experiments, thereby illustrating the policy structures that can be extracted in several inventory management scenarios. △ Less

Submitted 8 September, 2022; originally announced September 2022.

arXiv:2208.12938 [pdf, other]

TSGN: Transaction Subgraph Networks Assisting Phishing Detection in Ethereum

Authors: **huan Wang, Pengtao Chen, Xinyao Xu, Jia**g Wu, Meng Shen, Qi Xuan, Xiaoniu Yang

Abstract: Due to the decentralized and public nature of the Blockchain ecosystem, the malicious activities on the Ethereum platform impose immeasurable losses for the users. Existing phishing scam detection methods mostly rely only on the analysis of original transaction networks, which is difficult to dig deeply into the transaction patterns hidden in the network structure of transaction interaction. In th… ▽ More Due to the decentralized and public nature of the Blockchain ecosystem, the malicious activities on the Ethereum platform impose immeasurable losses for the users. Existing phishing scam detection methods mostly rely only on the analysis of original transaction networks, which is difficult to dig deeply into the transaction patterns hidden in the network structure of transaction interaction. In this paper, we propose a \underline{T}ransaction \underline{S}ub\underline{G}raph \underline{N}etwork (TSGN) based phishing accounts identification framework for Ethereum. We first extract transaction subgraphs for target accounts and then expand these subgraphs into corresponding TSGNs based on the different map** mechanisms. In order to make our model incorporate more important information about real transactions, we encode the transaction attributes into the modeling process of TSGNs, yielding two variants of TSGN, i.e., Directed-TSGN and Temporal-TSGN, which can be applied to the different attributed networks. Especially, by introducing TSGN into multi-edge transaction networks, the Multiple-TSGN model proposed is able to preserve the temporal transaction flow information and capture the significant topological pattern of phishing scams, while reducing the time complexity of modeling large-scale networks. Extensive experimental results show that TSGN models can provide more potential information to improve the performance of phishing detection by incorporating graph representation learning. △ Less

Submitted 27 August, 2022; originally announced August 2022.

Comments: 13 pages, 9 figures. arXiv admin note: text overlap with arXiv:2104.08767

arXiv:2206.14083 [pdf, other]

doi 10.1103/PhysRevLett.129.047001

Gapped collective charge excitations and interlayer hop** in cuprate superconductors

Authors: M. Hepting, M. Bejas, A. Nag, H. Yamase, N. Coppola, D. Betto, C. Falter, M. Garcia-Fernandez, S. Agrestini, K. -J. Zhou, M. Minola, C. Sacco, L. Maritato, P. Orgiani, H. I. Wei, K. M. Shen, D. G. Schlom, A. Galdi, A. Greco, B. Keimer

Abstract: We use resonant inelastic x-ray scattering (RIXS) to probe the propagation of plasmons in the electron-doped cuprate superconductor Sr$_{0.9}$La$_{0.1}$CuO$_2$ (SLCO). We detect a plasmon gap of $\sim$~120 meV at the two-dimensional Brillouin zone center, indicating that low-energy plasmons in SLCO are not strictly acoustic. The plasmon dispersion, including the gap, is accurately captured by laye… ▽ More We use resonant inelastic x-ray scattering (RIXS) to probe the propagation of plasmons in the electron-doped cuprate superconductor Sr$_{0.9}$La$_{0.1}$CuO$_2$ (SLCO). We detect a plasmon gap of $\sim$~120 meV at the two-dimensional Brillouin zone center, indicating that low-energy plasmons in SLCO are not strictly acoustic. The plasmon dispersion, including the gap, is accurately captured by layered $t$-$J$-$V$ model calculations. A similar analysis performed on recent RIXS data from other cuprates suggests that the plasmon gap is generic and its size is related to the magnitude of the interlayer hop** $t_z$. Our work signifies the three-dimensionality of the charge dynamics in layered cuprates and provides a new method to determine $t_z$. △ Less

Submitted 28 June, 2022; originally announced June 2022.

Comments: 17 pages, 10 figures, includes Supplemental Material. Accepted for publication in Physical Review Letters

Journal ref: Phys. Rev. Lett. 129, 047001 (2022)

arXiv:2206.13753 [pdf, other]

Unveiling photon statistics with a 100-pixel photon-number-resolving detector

Authors: Risheng Cheng, Yiyu Zhou, Sihao Wang, Mohan Shen, Towsif Taher, Hong X. Tang

Abstract: Single-photon detectors are ubiquitous in quantum information science and quantum sensing. They are key enabling technologies for numerous scientific discoveries and fundamental tests of quantum optics. Photon-number-revolving detectors are the ultimate measurement tool of light. However, few detectors to date can provide high-fidelity photon number resolution at few-photon levels. Here, we demons… ▽ More Single-photon detectors are ubiquitous in quantum information science and quantum sensing. They are key enabling technologies for numerous scientific discoveries and fundamental tests of quantum optics. Photon-number-revolving detectors are the ultimate measurement tool of light. However, few detectors to date can provide high-fidelity photon number resolution at few-photon levels. Here, we demonstrate an on-chip detector that can resolve up to 100 photons by spatiotemporally multiplexing an array of superconducting nanowires along a single waveguide. The unparalleled photon number resolution paired with the high-speed response exclusively allows us to unveil the quantum photon statistics of a true thermal light source for the first time, which is realized by direct measurement of high-order correlation function g^(N) with N up to 15, observation of photon-subtraction-induced photon number enhancement, and quantum-limited state discrimination against a coherent light source. Our detector provides a viable route towards various important applications, including photonic quantum computation and quantum metrology. △ Less

Submitted 28 June, 2022; originally announced June 2022.

arXiv:2206.12645 [pdf, other]

SmartCut Er:LiNbO3 with high optical coherence enabling optical thickness control

Authors: Sihao Wang, Likai Yang, Mohan Shen, Wei Fu, Yuntao Xu, Rufus L. Cone, Charles W. Thiel, Hong Tang

Abstract: Integrated photonics capable of incorporating rare earth ions with high optical coherence is desirable for realizing efficient quantum transducers, compact quantum memories, and hybrid quantum systems. Here we describe a photonic platform based on the SmartCut erbium-doped lithium niobate thin film, and explore its stable optical transitions at telecom wavelength in a dilution refrigerator. Optica… ▽ More Integrated photonics capable of incorporating rare earth ions with high optical coherence is desirable for realizing efficient quantum transducers, compact quantum memories, and hybrid quantum systems. Here we describe a photonic platform based on the SmartCut erbium-doped lithium niobate thin film, and explore its stable optical transitions at telecom wavelength in a dilution refrigerator. Optical coherence time of up to 180\,$μ$s, rivaling the value of bulk crystals, is achieved in optical ridge waveguides and ring resonators. With this integrated platform, we demonstrate tunable light-ion interaction and flexible control of optical thickness by exploiting long waveguides, whose lengths are in principle variable. This unique ability to obtain high optical density using a low concentration ions further leads to the observation of multi-echo pulse trains in centimeter-long waveguides. Our results establish a promising photonic platform for quantum information processing with rare earth ions. △ Less

Submitted 25 June, 2022; originally announced June 2022.

arXiv:2206.08652 [pdf, ps, other]

An efficient spectral method for the fractional Schrödinger equation on the real line

Authors: Mengxia Shen, Haiyong Wang

Abstract: The fractional Schrödinger equation (FSE) on the real line arises in a broad range of physical settings and their numerical simulation is challenging due to the nonlocal nature and the power law decay of the solution at infinity. In this paper, we propose a new spectral discretization scheme for the FSE in space based upon Malmquist-Takenaka functions. We show that this new discretization scheme a… ▽ More The fractional Schrödinger equation (FSE) on the real line arises in a broad range of physical settings and their numerical simulation is challenging due to the nonlocal nature and the power law decay of the solution at infinity. In this paper, we propose a new spectral discretization scheme for the FSE in space based upon Malmquist-Takenaka functions. We show that this new discretization scheme achieves much better performance than existing discretization schemes in the case where the underlying FSE involves the square root of the Laplacian, while in other cases it also exhibits comparable or even better performance. Numerical experiments are provided to illustrate the effectiveness of the proposed method. △ Less

Submitted 17 June, 2022; originally announced June 2022.

Comments: 25 pages, 12 figures

arXiv:2206.08618 [pdf, ps, other]

Around the de Rham-Betti conjecture

Authors: Tobias Kreutz, Mingmin Shen, Charles Vial

Abstract: A de Rham-Betti class on a smooth projective variety $X$ over an algebraic extension $K$ of the rational numbers is a rational class in the Betti cohomology of the analytification of$X$ that descends to a class in the algebraic de Rham cohomology of $X$ via the period comparison isomorphism. The period conjecture of Grothendieck implies that de Rham-Betti classes should be algebraic. We prove that… ▽ More A de Rham-Betti class on a smooth projective variety $X$ over an algebraic extension $K$ of the rational numbers is a rational class in the Betti cohomology of the analytification of$X$ that descends to a class in the algebraic de Rham cohomology of $X$ via the period comparison isomorphism. The period conjecture of Grothendieck implies that de Rham-Betti classes should be algebraic. We prove that any de Rham-Betti class on a product of elliptic curves is algebraic. This is achieved by showing that the Tannakian torsor associated to a de Rham-Betti object is connected, and by exploiting the analytic subgroup theorem of Wüstholz. In the case of products of non-CM elliptic curves, we prove the stronger result that $\overline{\mathds{Q}}$-de Rham-Betti classes are $\overline \mathds{Q}$-linear combinations of algebraic classes by showing that the period comparison isomorphism generates the torsor of motivic periods. A key step consists in establishing a version of the analytic subgroup theorem with $\overline \mathds{Q}$-coefficients. Finally, building on results of Deligne and André regarding the Kuga-Satake correspondence, we further show that any de Rham-Betti isometry between the second cohomology groups of hyper-Kähler varieties, with second Betti number not 3, is Hodge. As two applications we show that codimension-2 de Rham-Betti classes on hyper-Kähler varieties of known deformation type are Hodge and we obtain a global de Rham-Betti Torelli theorem for K3 surfaces over $\overline \mathds{Q}$. △ Less

Submitted 16 March, 2023; v1 submitted 17 June, 2022; originally announced June 2022.

Comments: New title, new coauthor. Main changes: new results regarding $\overline \mathds{Q}$-de Rham-Betti classes and new results regarding de Rham-Betti classes on hyper-Kähler varieties

MSC Class: 14C15; 14C25; 14C34; 14F40; 14G25; 14J42

arXiv:2206.04236 [pdf, other]

Analytical Composition of Differential Privacy via the Edgeworth Accountant

Authors: Hua Wang, Sheng Gao, Huanyu Zhang, Milan Shen, Weijie J. Su

Abstract: Many modern machine learning algorithms are composed of simple private algorithms; thus, an increasingly important problem is to efficiently compute the overall privacy loss under composition. In this study, we introduce the Edgeworth Accountant, an analytical approach to composing differential privacy guarantees of private algorithms. The Edgeworth Accountant starts by losslessly tracking the pri… ▽ More Many modern machine learning algorithms are composed of simple private algorithms; thus, an increasingly important problem is to efficiently compute the overall privacy loss under composition. In this study, we introduce the Edgeworth Accountant, an analytical approach to composing differential privacy guarantees of private algorithms. The Edgeworth Accountant starts by losslessly tracking the privacy loss under composition using the $f$-differential privacy framework, which allows us to express the privacy guarantees using privacy-loss log-likelihood ratios (PLLRs). As the name suggests, this accountant next uses the Edgeworth expansion to the upper and lower bounds the probability distribution of the sum of the PLLRs. Moreover, by relying on a technique for approximating complex distributions using simple ones, we demonstrate that the Edgeworth Accountant can be applied to the composition of any noise-addition mechanism. Owing to certain appealing features of the Edgeworth expansion, the $(ε, δ)$-differential privacy bounds offered by this accountant are non-asymptotic, with essentially no extra computational cost, as opposed to the prior approaches in, wherein the running times increase with the number of compositions. Finally, we demonstrate that our upper and lower $(ε, δ)$-differential privacy bounds are tight in federated analytics and certain regimes of training private deep learning models. △ Less

Submitted 8 June, 2022; originally announced June 2022.

arXiv:2205.10715 [pdf, other]

Policy-based Primal-Dual Methods for Concave CMDP with Variance Reduction

Authors: Donghao Ying, Mengzi Amy Guo, Hyunin Lee, Yuhao Ding, Javad Lavaei, Zuo-Jun Max Shen

Abstract: We study Concave Constrained Markov Decision Processes (Concave CMDPs) where both the objective and constraints are defined as concave functions of the state-action occupancy measure. We propose the Variance-Reduced Primal-Dual Policy Gradient Algorithm (VR-PDPG), which updates the primal variable via policy gradient ascent and the dual variable via projected sub-gradient descent. Despite the chal… ▽ More We study Concave Constrained Markov Decision Processes (Concave CMDPs) where both the objective and constraints are defined as concave functions of the state-action occupancy measure. We propose the Variance-Reduced Primal-Dual Policy Gradient Algorithm (VR-PDPG), which updates the primal variable via policy gradient ascent and the dual variable via projected sub-gradient descent. Despite the challenges posed by the loss of additivity structure and the nonconcave nature of the problem, we establish the global convergence of VR-PDPG by exploiting a form of hidden concavity. In the exact setting, we prove an $O(T^{-1/3})$ convergence rate for both the average optimality gap and constraint violation, which further improves to $O(T^{-1/2})$ under strong concavity of the objective in the occupancy measure. In the sample-based setting, we demonstrate that VR-PDPG achieves an $\widetilde{O}(ε^{-4})$ sample complexity for $ε$-global optimality. Moreover, by incorporating a diminishing pessimistic term into the constraint, we show that VR-PDPG can attain a zero constraint violation without compromising the convergence rate of the optimality gap. Finally, we validate the effectiveness of our methods through numerical experiments. △ Less

Submitted 26 May, 2024; v1 submitted 21 May, 2022; originally announced May 2022.

arXiv:2205.03473 [pdf, other]

Energy-efficient Connected Cruise Control with Lean Penetration of Connected Vehicles

Authors: Minghao Shen, Chaozhe R. He, Tamas Molnar, A. Harvey Bell, Gabor Orosz

Abstract: This paper focuses on energy-efficient longitudinal controller design for a connected automated truck that travels in mixed traffic consisting of connected and non-connected vehicles. The truck has access to information about connected vehicles beyond line of sight using vehicle-to-vehicle (V2V) communication. A novel connected cruise control design is proposed which incorporates additional delays… ▽ More This paper focuses on energy-efficient longitudinal controller design for a connected automated truck that travels in mixed traffic consisting of connected and non-connected vehicles. The truck has access to information about connected vehicles beyond line of sight using vehicle-to-vehicle (V2V) communication. A novel connected cruise control design is proposed which incorporates additional delays into the control law when responding to distant connected vehicles to account for the finite propagation of traffic waves. The speeds of non-connected vehicles are modeled as stochastic processes. A fundamental theorem is proven which links the spectral properties of the motion signals to the average energy consumption. This enables us to tune controller parameters and maximize energy efficiency. Simulations with synthetic data and real traffic data are used to demonstrate the energy efficiency of the control design. It is demonstrated that even with lean penetration of connected vehicles, our controller can bring significant energy savings. △ Less

Submitted 6 May, 2022; originally announced May 2022.

Comments: This is submitted to IEEE Transactions on Intelligent Transportation Systems

arXiv:2203.13102 [pdf, ps, other]

Magnetic Excitations in Square Lattice Iridates: Contrast between Ba$_2$IrO$_4$ and Sr$_2$IrO$_4$

Authors: J. P. Clancy, H. Gretarsson, A. Lupascu, J. A. Sears, Z. Nie, M. H. Upton, Jungho Kim, Z. Islam, M. Uchida, D. G. Schlom, K. M. Shen, Young-June Kim

Abstract: We report a resonant inelastic x-ray scattering (RIXS) investigation of ultra-thin epitaxial films of Ba$_2$IrO$_4$, and compare their low energy magnetic and spin-orbit excitations to those of their sister compound Sr$_2$IrO$_4$. Due to the 180$^\circ$ Ir-O-Ir bond, the bandwidth of the magnon and spin-orbiton is significantly larger in Ba$_2$IrO$_4$, making it difficult to describe these two typ… ▽ More We report a resonant inelastic x-ray scattering (RIXS) investigation of ultra-thin epitaxial films of Ba$_2$IrO$_4$, and compare their low energy magnetic and spin-orbit excitations to those of their sister compound Sr$_2$IrO$_4$. Due to the 180$^\circ$ Ir-O-Ir bond, the bandwidth of the magnon and spin-orbiton is significantly larger in Ba$_2$IrO$_4$, making it difficult to describe these two types of excitations as separate well-defined quasiparticles. Both types of excitations are found to be quite sensitive to the effect of epitaxial strain. In addition, we find that the d-level inversion observed in Sr$_2$IrO$_4$ is absent in Ba$_2$IrO$_4$, as predicted in recent theoretical studies. Our results illustrate that the magnetic properties of Ba$_2$IrO$_4$ are substantially different from those of Sr$_2$IrO$_4$, suggesting that these materials need to be examined more carefully with electron itinerancy taken into account. △ Less

Submitted 24 March, 2022; originally announced March 2022.

Comments: 7 pages, 4 figures

arXiv:2203.09980 [pdf]

doi 10.1063/5.0125268

X-ray Nano-imaging of Defects in Thin Film Catalysts via Cluster Analysis

Authors: Aileen Luo, Oleg Yu. Gorobtsov, Jocienne N. Nelson, Ding-Yuan Kuo, Ziming Shao, Ryan Bouck, Mathew Cherukara, Martin V. Holt, Kyle M. Shen, Darrell G. Schlom, ** Suntivich, Andrej Singer

Abstract: Functional properties of transition-metal oxides strongly depend on crystallographic defects. In transition-metal-oxide electrocatalysts such as SrIrO3 (SIO), crystallographic lattice deviations can affect ionic diffusion and adsorbate binding energies. Scanning x-ray nanodiffraction enables imaging of local structural distortions across an extended spatial region of thin samples. Line defects rem… ▽ More Functional properties of transition-metal oxides strongly depend on crystallographic defects. In transition-metal-oxide electrocatalysts such as SrIrO3 (SIO), crystallographic lattice deviations can affect ionic diffusion and adsorbate binding energies. Scanning x-ray nanodiffraction enables imaging of local structural distortions across an extended spatial region of thin samples. Line defects remain challenging to detect and localize using nanodiffraction, due to their weak diffuse scattering. Here we apply an unsupervised machine learning clustering algorithm to isolate the low-intensity diffuse scattering in as-grown and alkaline-treated thin epitaxially strained SIO films. We pinpoint the defect locations, find additional strain variation in the morphology of electrochemically cycled SIO, and interpret the defect type by analyzing the diffraction profile through clustering. Our findings demonstrate the use of a machine learning clustering algorithm for identifying and characterizing hard-to-find crystallographic defects in thin films of electrocatalysts and highlight the potential to study electrochemical reactions at defect sites in operando experiments. △ Less

Submitted 9 January, 2023; v1 submitted 18 March, 2022; originally announced March 2022.

Comments: 11 pages and 4 figures in main text, supporting information included

arXiv:2203.09360 [pdf, other]

doi 10.1109/TIFS.2022.3208471

Behavior-aware Account De-anonymization on Ethereum Interaction Graph

Authors: Jiajun Zhou, Chenkai Hu, Jianlei Chi, Jia**g Wu, Meng Shen, Qi Xuan

Abstract: Blockchain technology has the characteristics of decentralization, traceability and tamper-proof, which creates a reliable decentralized trust mechanism, further accelerating the development of blockchain finance. However, the anonymization of blockchain hinders market regulation, resulting in increasing illegal activities such as money laundering, gambling and phishing fraud on blockchain financi… ▽ More Blockchain technology has the characteristics of decentralization, traceability and tamper-proof, which creates a reliable decentralized trust mechanism, further accelerating the development of blockchain finance. However, the anonymization of blockchain hinders market regulation, resulting in increasing illegal activities such as money laundering, gambling and phishing fraud on blockchain financial platforms. Thus, financial security has become a top priority in the blockchain ecosystem, calling for effective market regulation. In this paper, we consider identifying Ethereum accounts from a graph classification perspective, and propose an end-to-end graph neural network framework named Ethident, to characterize the behavior patterns of accounts and further achieve account de-anonymization. Specifically, we first construct an Account Interaction Graph (AIG) using raw Ethereum data. Then we design a hierarchical graph attention encoder named HGATE as the backbone of our framework, which can effectively characterize the node-level account features and subgraph-level behavior patterns. For alleviating account label scarcity, we further introduce contrastive self-supervision mechanism as regularization to jointly train our framework. Comprehensive experiments on Ethereum datasets demonstrate that our framework achieves superior performance in account identification, yielding 1.13% ~ 4.93% relative improvement over previous state-of-the-art. Furthermore, detailed analyses illustrate the effectiveness of Ethident in identifying and understanding the behavior of known participants in Ethereum (e.g. exchanges, miners, etc.), as well as that of the lawbreakers (e.g. phishing scammers, hackers, etc.), which may aid in risk assessment and market regulation. △ Less

Submitted 13 September, 2022; v1 submitted 17 March, 2022; originally announced March 2022.

Comments: Accepted by IEEE Transactions on Information Forensics & Security

Journal ref: in IEEE Transactions on Information Forensics and Security, vol. 17, pp. 3433-3448, 2022

arXiv:2203.07562 [pdf, other]

Safe adaptation in multiagent competition

Authors: Macheng Shen, Jonathan P. How

Abstract: Achieving the capability of adapting to ever-changing environments is a critical step towards building fully autonomous robots that operate safely in complicated scenarios. In multiagent competitive scenarios, agents may have to adapt to new opponents with previously unseen behaviors by learning from the interaction experiences between the ego-agent and the opponent. However, this adaptation is su… ▽ More Achieving the capability of adapting to ever-changing environments is a critical step towards building fully autonomous robots that operate safely in complicated scenarios. In multiagent competitive scenarios, agents may have to adapt to new opponents with previously unseen behaviors by learning from the interaction experiences between the ego-agent and the opponent. However, this adaptation is susceptible to opponent exploitation. As the ego-agent updates its own behavior to exploit the opponent, its own behavior could become more exploitable as a result of overfitting to this specific opponent's behavior. To overcome this difficulty, we developed a safe adaptation approach in which the ego-agent is trained against a regularized opponent model, which effectively avoids overfitting and consequently improves the robustness of the ego-agent's policy. We evaluated our approach in the Mujoco domain with two competing agents. The experiment results suggest that our approach effectively achieves both adaptation to the specific opponent that the ego-agent is interacting with and maintaining low exploitability to other possible opponent exploitation. △ Less

Submitted 14 March, 2022; originally announced March 2022.

arXiv:2203.02102 [pdf, other]

BEATS: An Open-Source, High-Precision, Multi-Channel EEG Acquisition Tool System

Authors: Bing Zou, Yubo Zheng, Mu Shen, Yingying Luo, Lei Li, Lin Zhang

Abstract: Stable and accurate electroencephalogram (EEG) signal acquisition is fundamental in non-invasive brain-computer interface (BCI) technology. Commonly used EEG acquisition system's hardware and software are usually closed-source. Its inability to flexible expansion and secondary development is a major obstacle to real-time BCI research. This paper presents the Bei**g University of Posts and Telecom… ▽ More Stable and accurate electroencephalogram (EEG) signal acquisition is fundamental in non-invasive brain-computer interface (BCI) technology. Commonly used EEG acquisition system's hardware and software are usually closed-source. Its inability to flexible expansion and secondary development is a major obstacle to real-time BCI research. This paper presents the Bei**g University of Posts and Telecommunications EEG Acquisition Tool System named BEATS. It implements a comprehensive system from hardware to software, composed of the analog front-end, microprocessor, and software platform. BEATS is capable of collecting 32-channel EEG signals at a guaranteed sampling rate of 4k Hz with wireless transmission. Compared to state-of-the-art systems used in many EEG fields, it displays a better sampling rate. Using techniques including direct memory access, first in first out, and timer, the precision and stability of the acquisition are ensured at the microsecond level. An evaluation is conducted during 24 hours of continuous acquisitions. The data loss is 0 packets and the average maximum delay is only 0.07 s/h. Moreover, as an open source system, BEATS provides detailed design files, and adopts a plug-in structure and easy-to-access materials, which makes it can be quickly reproduced. Schematics, source code, and other materials of BEATS are available at https://github.com/buptantEEG/BEATS. △ Less

Submitted 19 December, 2022; v1 submitted 3 March, 2022; originally announced March 2022.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2202.05302 [pdf, ps, other]

Trust in AI: Interpretability is not necessary or sufficient, while black-box interaction is necessary and sufficient

Authors: Max W. Shen

Abstract: The problem of human trust in artificial intelligence is one of the most fundamental problems in applied machine learning. Our processes for evaluating AI trustworthiness have substantial ramifications for ML's impact on science, health, and humanity, yet confusion surrounds foundational concepts. What does it mean to trust an AI, and how do humans assess AI trustworthiness? What are the mechanism… ▽ More The problem of human trust in artificial intelligence is one of the most fundamental problems in applied machine learning. Our processes for evaluating AI trustworthiness have substantial ramifications for ML's impact on science, health, and humanity, yet confusion surrounds foundational concepts. What does it mean to trust an AI, and how do humans assess AI trustworthiness? What are the mechanisms for building trustworthy AI? And what is the role of interpretable ML in trust? Here, we draw from statistical learning theory and sociological lenses on human-automation trust to motivate an AI-as-tool framework, which distinguishes human-AI trust from human-AI-human trust. Evaluating an AI's contractual trustworthiness involves predicting future model behavior using behavior certificates (BCs) that aggregate behavioral evidence from diverse sources including empirical out-of-distribution and out-of-task evaluation and theoretical proofs linking model architecture to behavior. We clarify the role of interpretability in trust with a ladder of model access. Interpretability (level 3) is not necessary or even sufficient for trust, while the ability to run a black-box model at-will (level 2) is necessary and sufficient. While interpretability can offer benefits for trust, it can also incur costs. We clarify ways interpretability can contribute to trust, while questioning the perceived centrality of interpretability to trust in popular discourse. How can we empower people with tools to evaluate trust? Instead of trying to understand how a model works, we argue for understanding how a model behaves. Instead of opening up black boxes, we should create more behavior certificates that are more correct, relevant, and understandable. We discuss how to build trusted and trustworthy AI responsibly. △ Less

Submitted 10 February, 2022; originally announced February 2022.

arXiv:2202.02920 [pdf, other]

Monolithic Kerr and electro-optic hybrid microcombs

Authors: Zheng Gong, Mohan Shen, Juanjuan Lu, Joshua B, Surya Hong X. Tang

Abstract: Advances in microresonator-based soliton generation promise chip-scale integration of optical frequency comb for applications spanning from time kee** to frequency synthesis. Miniaturized cavities harness Kerr nonlinearity and enable terahertz soliton repetition rates. However, such high repetition rates are not amenable to direct electronic detection. Here, we demonstrate hybrid Kerr and electr… ▽ More Advances in microresonator-based soliton generation promise chip-scale integration of optical frequency comb for applications spanning from time kee** to frequency synthesis. Miniaturized cavities harness Kerr nonlinearity and enable terahertz soliton repetition rates. However, such high repetition rates are not amenable to direct electronic detection. Here, we demonstrate hybrid Kerr and electro-optic microcombs using the lithium niobate thin film that exhibits both Kerr and Pockels nonlinearities. By interleaving the high-repetition-rate Kerr soliton comb with the low-repetition-rate electro-optic comb on the same waveguide, the wide Kerr soliton mode spacing is divided within a single chip, allowing for subsequent electronic detection and feedback control of the soliton repetition rate. Our work establishes an integrated electronic interface to Kerr solitons of terahertz repetition rates, paving the path towards chipscale optical-to-microwave frequency division and comb locking. △ Less

Submitted 6 February, 2022; originally announced February 2022.

arXiv:2202.00796 [pdf, other]

On Balancing Bias and Variance in Unsupervised Multi-Source-Free Domain Adaptation

Authors: Maohao Shen, Yuheng Bu, Gregory Wornell

Abstract: Due to privacy, storage, and other constraints, there is a growing need for unsupervised domain adaptation techniques in machine learning that do not require access to the data used to train a collection of source models. Existing methods for multi-source-free domain adaptation (MSFDA) typically train a target model using pseudo-labeled data produced by the source models, which focus on improving… ▽ More Due to privacy, storage, and other constraints, there is a growing need for unsupervised domain adaptation techniques in machine learning that do not require access to the data used to train a collection of source models. Existing methods for multi-source-free domain adaptation (MSFDA) typically train a target model using pseudo-labeled data produced by the source models, which focus on improving the pseudo-labeling techniques or proposing new training objectives. Instead, we aim to analyze the fundamental limits of MSFDA. In particular, we develop an information-theoretic bound on the generalization error of the resulting target model, which illustrates an inherent bias-variance trade-off. We then provide insights on how to balance this trade-off from three perspectives, including domain aggregation, selective pseudo-labeling, and joint feature alignment, which leads to the design of novel algorithms. Experiments on multiple datasets validate our theoretical analysis and demonstrate the state-of-art performance of the proposed algorithm, especially on some of the most challenging datasets, including Office-Home and DomainNet. △ Less

Submitted 31 May, 2023; v1 submitted 1 February, 2022; originally announced February 2022.

Comments: ICML 2023

arXiv:2201.07297 [pdf]

doi 10.1103/PhysRevB.106.195135

Strain-induced orbital energy shift in antiferromagnetic RuO2 revealed by resonant elastic x-ray scattering

Authors: Benjamin Gregory, Jörg Strempfer, Daniel Weinstock, Jacob Ruf, Yifei Sun, Hari Nair, Nathaniel J. Schreiber, Darrell G. Schlom, Kyle M. Shen, Andrej Singer

Abstract: In its ground state, RuO2 was long thought to be an ordinary metallic paramagnet. Recent neutron and x-ray diffraction revealed that bulk RuO2 is an antiferromagnet (AFM) with TN above 300 K. Furthermore, epitaxial strain induces novel superconductivity in thin films of RuO2 below 2 K. Here, we present a resonant elastic x-ray scattering (REXS) study at the Ru L2 edge of the strained RuO2 films ex… ▽ More In its ground state, RuO2 was long thought to be an ordinary metallic paramagnet. Recent neutron and x-ray diffraction revealed that bulk RuO2 is an antiferromagnet (AFM) with TN above 300 K. Furthermore, epitaxial strain induces novel superconductivity in thin films of RuO2 below 2 K. Here, we present a resonant elastic x-ray scattering (REXS) study at the Ru L2 edge of the strained RuO2 films exhibiting the strain-induced superconductivity. We observe an azimuthal modulation of the 100 Bragg peak consistent with canted AFM found in bulk. Most notably, in the strained films displaying novel superconductivity, we observe a ~1 eV shift of the Ru eg orbitals to a higher energy. The energy shift is smaller in thicker, relaxed films and films with a different strain direction. Our results provide further evidence of the utility of epitaxial strain as a tuning parameter in complex oxides. △ Less

Submitted 18 January, 2022; originally announced January 2022.

Comments: 20 pages, 3 main figures, 3 supplementary figures

arXiv:2201.01437 [pdf, other]

Robust Path Recommendations During Public Transit Disruptions Under Demand Uncertainty

Authors: Baichuan Mo, Haris N. Koutsopoulos, Max Zuo-Jun Shen, **hua Zhao

Abstract: When there are significant service disruptions in public transit systems, passengers usually need guidance to find alternative paths. This paper proposes a path recommendation model to mitigate the congestion during public transit disruptions. Passengers with different origin-destination and departure times are recommended with different paths such that the system travel time is minimized. We mode… ▽ More When there are significant service disruptions in public transit systems, passengers usually need guidance to find alternative paths. This paper proposes a path recommendation model to mitigate the congestion during public transit disruptions. Passengers with different origin-destination and departure times are recommended with different paths such that the system travel time is minimized. We model the path recommendation as an optimal flow problem with uncertain demand information. To tackle the non-analytical formulation of travel times due to left behind, we propose a simulation-based first-order approximation to transform the original problem into linear programming. Uncertainties in demand are modeled with robust optimization to protect the path recommendation strategies against inaccurate estimates. A real-world rail disruption scenario in the Chicago Transit Authority (CTA) system is used as a case study. Results show that even without considering uncertainty, the nominal model can reduce the system travel time by 9.1% (compared to the status quo), and outperforms the benchmark capacity-based path recommendation. The average travel time of passengers in the incident line (i.e., passengers receiving recommendations) is reduced more (-20.6% compared to the status quo). After incorporating the demand uncertainty, the robust model can further reduce the system travel time. The best robust model can decrease the average travel time of incident-line passengers by 2.91% compared to the nominal model. △ Less

Submitted 4 January, 2022; originally announced January 2022.

arXiv:2112.14366 [pdf, other]

doi 10.1103/PhysRevLett.128.114801

A single-crystal alkali antimonide photocathode: high efficiency in the ultra-thin limit

Authors: C. T. Parzyck, A. Galdi, J. K. Nangoi, W. J. I. DeBenedetti, J. Balajka, B. D. Faeth, H. Paik, C. Hu, T. A. Arias, M. A. Hines, D. G. Schlom, K. M. Shen, J. M. Maxson

Abstract: The properties of photoemission electron sources determine the ultimate performance of a wide class of electron accelerators and photon detectors. To date, all high-efficiency visible-light photocathode materials are either polycrystalline or exhibit intrinsic surface disorder, both of which limit emitted electron beam brightness. In this letter we demonstrate the synthesis of epitaxial thin films… ▽ More The properties of photoemission electron sources determine the ultimate performance of a wide class of electron accelerators and photon detectors. To date, all high-efficiency visible-light photocathode materials are either polycrystalline or exhibit intrinsic surface disorder, both of which limit emitted electron beam brightness. In this letter we demonstrate the synthesis of epitaxial thin films of Cs$_3$Sb on 3C-SiC (001) using molecular-beam epitaxy. Films as thin as 4 nm have quantum efficiencies exceeding 2\% at 532 nm. We also find that epitaxial films have an order of magnitude larger quantum efficiency at 650 nm than comparable polycrystalline films on Si. Additionally, these films permit angle-resolved photoemission spectroscopy measurements of the electronic structure, which are found to be in good agreement with theory. Epitaxial films open the door to dramatic brightness enhancements via increased efficiency near threshold, reduced surface disorder, and the possibility of engineering new photoemission functionality at the level of single atomic layers. △ Less

Submitted 28 December, 2021; originally announced December 2021.

Comments: Submitted Manuscript: 8 pages, 4 figures

arXiv:2112.14215 [pdf, other]

doi 10.1016/j.nuclphysb.2022.115828

Explaining the $b \to s \ell^+ \ell^-$ anomalies in $Z^\prime$ scenarios with top-FCNC couplings

Authors: Xin-Qiang Li, Meng Shen, Dong-Yang Wang, Ya-Dong Yang, Xing-Bo Yuan

Abstract: Motivated by the recent anomalies in $b \to s \ell^+ \ell^-$ transitions, we explore a minimal $Z^\prime$ scenario, in which the $Z^\prime$ boson has a flavour-changing coupling to charm and top quarks and a flavour-conserving coupling to muons. It is found that such a $Z^\prime$ boson can explain the current $b \to s \ell^+ \ell^-$ anomalies, while satisfying other flavour and collider constraint… ▽ More Motivated by the recent anomalies in $b \to s \ell^+ \ell^-$ transitions, we explore a minimal $Z^\prime$ scenario, in which the $Z^\prime$ boson has a flavour-changing coupling to charm and top quarks and a flavour-conserving coupling to muons. It is found that such a $Z^\prime$ boson can explain the current $b \to s \ell^+ \ell^-$ anomalies, while satisfying other flavour and collider constraints simultaneously. The $Z^\prime$ boson can be as light as few hundreds GeV. In this case, the $t \to c μ^+ μ^-$ decay and the $tZ^\prime$ associated production at the LHC could provide sensitive probes of such a $Z^\prime$ boson. As a special feature, the $Z^\prime$ contributions to all rare $B$- and $K$-meson processes are controlled by one parameter. This results in interesting correlations among these processes, which could provide further insights into this scenario. In addition, an extended scenario, in which the $Z^\prime$ boson interacts with the $SU(2)_L$ fermion doublets with analogous couplings as in the minimal scenario, is also investigated. △ Less

Submitted 23 January, 2022; v1 submitted 28 December, 2021; originally announced December 2021.

Comments: 40 pages, 13 figures, typos fixed, comments and references added

arXiv:2112.12352 [pdf, other]

doi 10.1063/1.5133647

Enhanced Surface Superconductivity in Ba(Fe$_{0.95}$Co$_{0.05}$)$_2$As$_2$

Authors: Christopher T. Parzyck, Brendan D. Faeth, Gordon N. Tam, Gregory R. Stewart, Kyle M. Shen

Abstract: We present direct evidence for an enhanced superconducting $T_c$ on the surface of cleaved single crystals of Ba(Fe$_{0.95}$Co$_{0.05}$)$_2$As$_2$. Transport measurements performed on samples cleaved in ultra high vacuum (UHV) show a significantly enhanced superconducting transition when compared to equivalent measurements performed in air. Deviations from the bulk resistivity appear at 21K, well… ▽ More We present direct evidence for an enhanced superconducting $T_c$ on the surface of cleaved single crystals of Ba(Fe$_{0.95}$Co$_{0.05}$)$_2$As$_2$. Transport measurements performed on samples cleaved in ultra high vacuum (UHV) show a significantly enhanced superconducting transition when compared to equivalent measurements performed in air. Deviations from the bulk resistivity appear at 21K, well above the 10K bulk $T_c$ of the underdoped compound. We demonstrate that the excess conductivity above the bulk $T_c$ can be controllably suppressed by application of potassium ions on the cleaved surface, indicating that the enhanced superconductivity is strongly localized to the sample surface. Additionally, we find that the effects of the potassium surface dosing are strongly influenced by the presence of residual gas absorbates on the sample surface, which may prevent effective charge transfer from the potassium atoms to the FeAs plane. This is further support for the conclusion that the effects of the dosing (and enhanced superconductivity) are localized within a few layers of the surface. △ Less

Submitted 22 December, 2021; originally announced December 2021.

Comments: 11 pages, 3 figures

Journal ref: Appl. Phys. Lett. 116, 062601 (2020)

arXiv:2111.03759 [pdf, other]

MQBench: Towards Reproducible and Deployable Model Quantization Benchmark

Authors: Yuhang Li, Mingzhu Shen, Jian Ma, Yan Ren, Mingxin Zhao, Qi Zhang, Ruihao Gong, Fengwei Yu, Junjie Yan

Abstract: Model quantization has emerged as an indispensable technique to accelerate deep learning inference. While researchers continue to push the frontier of quantization algorithms, existing quantization work is often unreproducible and undeployable. This is because researchers do not choose consistent training pipelines and ignore the requirements for hardware deployments. In this work, we propose Mode… ▽ More Model quantization has emerged as an indispensable technique to accelerate deep learning inference. While researchers continue to push the frontier of quantization algorithms, existing quantization work is often unreproducible and undeployable. This is because researchers do not choose consistent training pipelines and ignore the requirements for hardware deployments. In this work, we propose Model Quantization Benchmark (MQBench), a first attempt to evaluate, analyze, and benchmark the reproducibility and deployability for model quantization algorithms. We choose multiple different platforms for real-world deployments, including CPU, GPU, ASIC, DSP, and evaluate extensive state-of-the-art quantization algorithms under a unified training pipeline. MQBench acts like a bridge to connect the algorithm and the hardware. We conduct a comprehensive analysis and find considerable intuitive or counter-intuitive insights. By aligning the training settings, we find existing algorithms have about the same performance on the conventional academic track. While for the hardware-deployable quantization, there is a huge accuracy gap which remains unsettled. Surprisingly, no existing algorithm wins every challenge in MQBench, and we hope this work could inspire future research directions. △ Less

Submitted 25 January, 2022; v1 submitted 5 November, 2021; originally announced November 2021.

Comments: Accepted by 35th Conference on Neural Information Processing Systems (NeurIPS 2021) Track on Datasets and Benchmarks

arXiv:2110.12351 [pdf, other]

Integrated Conditional Estimation-Optimization

Authors: Meng Qi, Paul Grigas, Zuo-Jun Max Shen

Abstract: Many real-world optimization problems involve uncertain parameters with probability distributions that can be estimated using contextual feature information. In contrast to the standard approach of first estimating the distribution of uncertain parameters and then optimizing the objective based on the estimation, we propose an integrated conditional estimation-optimization (ICEO) framework that es… ▽ More Many real-world optimization problems involve uncertain parameters with probability distributions that can be estimated using contextual feature information. In contrast to the standard approach of first estimating the distribution of uncertain parameters and then optimizing the objective based on the estimation, we propose an integrated conditional estimation-optimization (ICEO) framework that estimates the underlying conditional distribution of the random parameter while considering the structure of the optimization problem. We directly model the relationship between the conditional distribution of the random parameter and the contextual features, and then estimate the probabilistic model with an objective that aligns with the downstream optimization problem. We show that our ICEO approach is asymptotically consistent under moderate regularity conditions and further provide finite performance guarantees in the form of generalization bounds. Computationally, performing estimation with the ICEO approach is a non-convex and often non-differentiable optimization problem. We propose a general methodology for approximating the potentially non-differentiable map** from estimated conditional distribution to the optimal decision by a differentiable function, which greatly improves the performance of gradient-based algorithms applied to the non-convex problem. We also provide a polynomial optimization solution approach in the semi-algebraic case. Numerical experiments are also conducted to show the empirical success of our approach in different situations including with limited data samples and model mismatches. △ Less

Submitted 1 August, 2023; v1 submitted 24 October, 2021; originally announced October 2021.

arXiv:2110.12007 [pdf, other]

When to Prune? A Policy towards Early Structural Pruning

Authors: Maying Shen, Pavlo Molchanov, Hongxu Yin, Jose M. Alvarez

Abstract: Pruning enables appealing reductions in network memory footprint and time complexity. Conventional post-training pruning techniques lean towards efficient inference while overlooking the heavy computation for training. Recent exploration of pre-training pruning at initialization hints on training cost reduction via pruning, but suffers noticeable performance degradation. We attempt to combine the… ▽ More Pruning enables appealing reductions in network memory footprint and time complexity. Conventional post-training pruning techniques lean towards efficient inference while overlooking the heavy computation for training. Recent exploration of pre-training pruning at initialization hints on training cost reduction via pruning, but suffers noticeable performance degradation. We attempt to combine the benefits of both directions and propose a policy that prunes as early as possible during training without hurting performance. Instead of pruning at initialization, our method exploits initial dense training for few epochs to quickly guide the architecture, while constantly evaluating dominant sub-networks via neuron importance ranking. This unveils dominant sub-networks whose structures turn stable, allowing conventional pruning to be pushed earlier into the training. To do this early, we further introduce an Early Pruning Indicator (EPI) that relies on sub-network architectural similarity and quickly triggers pruning when the sub-network's architecture stabilizes. Through extensive experiments on ImageNet, we show that EPI empowers a quick tracking of early training epochs suitable for pruning, offering same efficacy as an otherwise ``oracle'' grid-search that scans through epochs and requires orders of magnitude more compute. Our method yields $1.4\%$ top-1 accuracy boost over state-of-the-art pruning counterparts, cuts down training cost on GPU by $2.4\times$, hence offers a new efficiency-accuracy boundary for network pruning during training. △ Less

Submitted 22 October, 2021; originally announced October 2021.

arXiv:2110.10811 [pdf, ps, other]

HALP: Hardware-Aware Latency Pruning

Authors: Maying Shen, Hongxu Yin, Pavlo Molchanov, Lei Mao, Jianna Liu, Jose M. Alvarez

Abstract: Structural pruning can simplify network architecture and improve inference speed. We propose Hardware-Aware Latency Pruning (HALP) that formulates structural pruning as a global resource allocation optimization problem, aiming at maximizing the accuracy while constraining latency under a predefined budget. For filter importance ranking, HALP leverages latency lookup table to track latency reductio… ▽ More Structural pruning can simplify network architecture and improve inference speed. We propose Hardware-Aware Latency Pruning (HALP) that formulates structural pruning as a global resource allocation optimization problem, aiming at maximizing the accuracy while constraining latency under a predefined budget. For filter importance ranking, HALP leverages latency lookup table to track latency reduction potential and global saliency score to gauge accuracy drop. Both metrics can be evaluated very efficiently during pruning, allowing us to reformulate global structural pruning under a reward maximization problem given target constraint. This makes the problem solvable via our augmented knapsack solver, enabling HALP to surpass prior work in pruning efficacy and accuracy-efficiency trade-off. We examine HALP on both classification and detection tasks, over varying networks, on ImageNet and VOC datasets. In particular, for ResNet-50/-101 pruning on ImageNet, HALP improves network throughput by $1.60\times$/$1.90\times$ with $+0.3\%$/$-0.2\%$ top-1 accuracy changes, respectively. For SSD pruning on VOC, HALP improves throughput by $1.94\times$ with only a $0.56$ mAP drop. HALP consistently outperforms prior art, sometimes by large margins. △ Less

Submitted 20 October, 2021; originally announced October 2021.

arXiv:2110.08244 [pdf]

Performance, Successes and Limitations of Deep Learning Semantic Segmentation of Multiple Defects in Transmission Electron Micrographs

Authors: Ryan Jacobs, Mingren Shen, Yuhan Liu, Wei Hao, Xiaoshan Li, Ruoyu He, Jacob RC Greaves, Donglin Wang, Zeming Xie, Zitong Huang, Chao Wang, Kevin G. Field, Dane Morgan

Abstract: In this work, we perform semantic segmentation of multiple defect types in electron microscopy images of irradiated FeCrAl alloys using a deep learning Mask Regional Convolutional Neural Network (Mask R-CNN) model. We conduct an in-depth analysis of key model performance statistics, with a focus on quantities such as predicted distributions of defect shapes, defect sizes, and defect areal densitie… ▽ More In this work, we perform semantic segmentation of multiple defect types in electron microscopy images of irradiated FeCrAl alloys using a deep learning Mask Regional Convolutional Neural Network (Mask R-CNN) model. We conduct an in-depth analysis of key model performance statistics, with a focus on quantities such as predicted distributions of defect shapes, defect sizes, and defect areal densities relevant to informing modeling and understanding of irradiated Fe-based materials properties. To better understand the performance and present limitations of the model, we provide examples of useful evaluation tests which include a suite of random splits, and dataset size-dependent and domain-targeted cross validation tests. Overall, we find that the current model is a fast, effective tool for automatically characterizing and quantifying multiple defect types in microscopy images, with a level of accuracy on par with human domain expert labelers. More specifically, the model can achieve average defect identification F1 scores as high as 0.8, and, based on random cross validation, have low overall average (+/- standard deviation) defect size and density percentage errors of 7.3 (+/- 3.8)% and 12.7 (+/- 5.3)%, respectively. Further, our model predicts the expected material hardening to within 10-20 MPa (about 10% of total hardening), which is about the same error level as experiments. Our targeted evaluation tests also suggest the best path toward improving future models is not expanding existing databases with more labeled images but instead data additions that target weak points of the model domain, such as images from different microscopes, imaging conditions, irradiation environments, and alloy types. Finally, we discuss the first phase of an effort to provide an easy-to-use, open-source object detection tool to the broader community for identifying defects in new images. △ Less

Submitted 15 October, 2021; originally announced October 2021.

arXiv:2110.04869 [pdf, other]

Global Vision Transformer Pruning with Hessian-Aware Saliency

Authors: Huanrui Yang, Hongxu Yin, Maying Shen, Pavlo Molchanov, Hai Li, Jan Kautz

Abstract: Transformers yield state-of-the-art results across many tasks. However, their heuristically designed architecture impose huge computational costs during inference. This work aims on challenging the common design philosophy of the Vision Transformer (ViT) model with uniform dimension across all the stacked blocks in a model stage, where we redistribute the parameters both across transformer blocks… ▽ More Transformers yield state-of-the-art results across many tasks. However, their heuristically designed architecture impose huge computational costs during inference. This work aims on challenging the common design philosophy of the Vision Transformer (ViT) model with uniform dimension across all the stacked blocks in a model stage, where we redistribute the parameters both across transformer blocks and between different structures within the block via the first systematic attempt on global structural pruning. Dealing with diverse ViT structural components, we derive a novel Hessian-based structural pruning criteria comparable across all layers and structures, with latency-aware regularization for direct latency reduction. Performing iterative pruning on the DeiT-Base model leads to a new architecture family called NViT (Novel ViT), with a novel parameter redistribution that utilizes parameters more efficiently. On ImageNet-1K, NViT-Base achieves a 2.6x FLOPs reduction, 5.1x parameter reduction, and 1.9x run-time speedup over the DeiT-Base model in a near lossless manner. Smaller NViT variants achieve more than 1% accuracy gain at the same throughput of the DeiT Small/Tiny variants, as well as a lossless 3.3x parameter reduction over the SWIN-Small model. These results outperform prior art by a large margin. Further analysis is provided on the parameter redistribution insight of NViT, where we show the high prunability of ViT models, distinct sensitivity within ViT block, and unique parameter distribution trend across stacked ViT blocks. Our insights provide viability for a simple yet effective parameter redistribution rule towards more efficient ViTs for off-the-shelf performance boost. △ Less

Submitted 29 March, 2023; v1 submitted 10 October, 2021; originally announced October 2021.

Comments: Accepted as a conference paper at CVPR 2023

Showing 101–150 of 381 results for author: shen, M