-
Strategies to enhance THz harmonic generation combining multilayered, gated, and metamaterial-based architectures
Authors:
Ali Maleki,
Moritz B. Heindl,
Yongbao Xin,
Robert W. Boyd,
Georg Herink,
Jean-Michel Ménard
Abstract:
Graphene has unique properties paving the way for groundbreaking future applications. Its large optical nonlinearity and ease of integration in devices notably makes it an ideal candidate to become a key component for all-optical switching and frequency conversion applications. In the terahertz (THz) region, various approaches have been independently demonstrated to optimize the nonlinear effects…
▽ More
Graphene has unique properties paving the way for groundbreaking future applications. Its large optical nonlinearity and ease of integration in devices notably makes it an ideal candidate to become a key component for all-optical switching and frequency conversion applications. In the terahertz (THz) region, various approaches have been independently demonstrated to optimize the nonlinear effects in graphene, addressing a critical limitation arising from the atomically thin interaction length. Here, we demonstrate sample architectures that combine strategies to enhance THz nonlinearities in graphene-based structures. We achieve this by increasing the interaction length through a multilayered design, controlling carrier density with an electrical gate, and modulating the THz field spatial distribution with a metallic metasurface substrate. Our study specifically investigates third harmonic generation (THG) using a table-top high-field THz source. We measure THG enhancement factors exceeding thirty and propose architectures capable of achieving a two-order-of-magnitude increase. These findings highlight the potential of engineered graphene-based samples in advancing THz frequency conversion technologies for signal processing and wireless communication applications.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
A note on the minimax risk of sparse linear regression
Authors:
Yilin Guo,
Shubhangi Ghosh,
Haolei Weng,
Arian Maleki
Abstract:
Sparse linear regression is one of the classical and extensively studied problems in high-dimensional statistics and compressed sensing. Despite the substantial body of literature dedicated to this problem, the precise determination of its minimax risk remains elusive. This paper aims to fill this gap by deriving asymptotically constant-sharp characterization for the minimax risk of sparse linear…
▽ More
Sparse linear regression is one of the classical and extensively studied problems in high-dimensional statistics and compressed sensing. Despite the substantial body of literature dedicated to this problem, the precise determination of its minimax risk remains elusive. This paper aims to fill this gap by deriving asymptotically constant-sharp characterization for the minimax risk of sparse linear regression. More specifically, the paper focuses on scenarios where the sparsity level, denoted as k, satisfies the condition $(k \log p)/n {\to} 0$, with p and n representing the number of features and observations respectively. We establish that the minimax risk under isotropic Gaussian random design is asymptotically equal to $2σ^2k/n log(p/k)$, where $σ$ denotes the standard deviation of the noise. In addition to this result, we will summarize the existing results in the literature, and mention some of the fundamental problems that have still remained open.
△ Less
Submitted 8 May, 2024;
originally announced May 2024.
-
Fusing Depthwise and Pointwise Convolutions for Efficient Inference on GPUs
Authors:
Fareed Qararyah,
Muhammad Waqar Azhar,
Mohammad Ali Maleki,
Pedro Trancoso
Abstract:
Depthwise and pointwise convolutions have fewer parameters and perform fewer operations than standard convolutions. As a result, they have become increasingly used in various compact DNNs, including convolutional neural networks (CNNs) and vision transformers (ViTs). However, they have a lower compute-to-memory-access ratio than standard convolutions, making their memory accesses often the perform…
▽ More
Depthwise and pointwise convolutions have fewer parameters and perform fewer operations than standard convolutions. As a result, they have become increasingly used in various compact DNNs, including convolutional neural networks (CNNs) and vision transformers (ViTs). However, they have a lower compute-to-memory-access ratio than standard convolutions, making their memory accesses often the performance bottleneck. This paper explores fusing depthwise and pointwise convolutions to overcome the memory access bottleneck. The focus is on fusing these operators on GPUs. The prior art on GPU-based fusion suffers from one or more of the following: (1) fusing either a convolution with an element-wise or multiple non-convolutional operators, (2) not explicitly optimizing for memory accesses, (3) not supporting depthwise convolutions. This paper proposes Fused Convolutional Modules (FCMs), a set of novel fused depthwise and pointwise GPU kernels. FCMs significantly reduce pointwise and depthwise convolutions memory accesses, improving execution time and energy efficiency. To evaluate the trade-offs associated with fusion and determine which convolutions are beneficial to fuse and the optimal FCM parameters, we propose FusePlanner. FusePlanner consists of cost models to estimate the memory accesses of depthwise, pointwise, and FCM kernels given GPU characteristics. Our experiments on three GPUs using representative CNNs and ViTs demonstrate that FCMs save up to 83% of the memory accesses and achieve speedups of up to 3.7x compared to cuDNN. Complete model implementations of various CNNs using our modules outperform TVMs' achieving speedups of up to 1.8x and saving up to two-thirds of the energy.
△ Less
Submitted 30 April, 2024;
originally announced April 2024.
-
Bagged Deep Image Prior for Recovering Images in the Presence of Speckle Noise
Authors:
Xi Chen,
Zhewen Hou,
Christopher A. Metzler,
Arian Maleki,
Shirin Jalali
Abstract:
We investigate both the theoretical and algorithmic aspects of likelihood-based methods for recovering a complex-valued signal from multiple sets of measurements, referred to as looks, affected by speckle (multiplicative) noise. Our theoretical contributions include establishing the first existing theoretical upper bound on the Mean Squared Error (MSE) of the maximum likelihood estimator under the…
▽ More
We investigate both the theoretical and algorithmic aspects of likelihood-based methods for recovering a complex-valued signal from multiple sets of measurements, referred to as looks, affected by speckle (multiplicative) noise. Our theoretical contributions include establishing the first existing theoretical upper bound on the Mean Squared Error (MSE) of the maximum likelihood estimator under the deep image prior hypothesis. Our theoretical results capture the dependence of MSE upon the number of parameters in the deep image prior, the number of looks, the signal dimension, and the number of measurements per look. On the algorithmic side, we introduce the concept of bagged Deep Image Priors (Bagged-DIP) and integrate them with projected gradient descent. Furthermore, we show how employing Newton-Schulz algorithm for calculating matrix inverses within the iterations of PGD reduces the computational complexity of the algorithm. We will show that this method achieves the state-of-the-art performance.
△ Less
Submitted 23 February, 2024;
originally announced February 2024.
-
Theoretical Analysis of Leave-one-out Cross Validation for Non-differentiable Penalties under High-dimensional Settings
Authors:
Haolin Zou,
Arnab Auddy,
Kamiar Rahnama Rad,
Arian Maleki
Abstract:
Despite a large and significant body of recent work focused on estimating the out-of-sample risk of regularized models in the high dimensional regime, a theoretical understanding of this problem for non-differentiable penalties such as generalized LASSO and nuclear norm is missing. In this paper we resolve this challenge. We study this problem in the proportional high dimensional regime where both…
▽ More
Despite a large and significant body of recent work focused on estimating the out-of-sample risk of regularized models in the high dimensional regime, a theoretical understanding of this problem for non-differentiable penalties such as generalized LASSO and nuclear norm is missing. In this paper we resolve this challenge. We study this problem in the proportional high dimensional regime where both the sample size n and number of features p are large, and n/p and the signal-to-noise ratio (per observation) remain finite. We provide finite sample upper bounds on the expected squared error of leave-one-out cross-validation (LO) in estimating the out-of-sample risk. The theoretical framework presented here provides a solid foundation for elucidating empirical findings that show the accuracy of LO.
△ Less
Submitted 14 February, 2024; v1 submitted 13 February, 2024;
originally announced February 2024.
-
High detectivity terahertz radiation sensing using frequency-noise-optimized nanomechanical resonators
Authors:
Chang Zhang,
Eeswar K. Yalavarthi,
Mathieu Giroux,
Wei Cui,
Michel Stephan,
Ali Maleki,
Arnaud Weck,
Jean-Michel Ménard,
Raphael St-Gelais
Abstract:
We achieve high detectivity terahertz sensing using a silicon nitride nanomechanical resonator functionalized with a metasurface absorber. High performances are achieved by striking a fine balance between the frequency stability of the resonator, and its responsivity to absorbed radiation. Using this approach, we demonstrate a detectivity $D^*=3.4\times10^9~\mathrm{cm\cdot\sqrt{Hz}/W}$ and a noise…
▽ More
We achieve high detectivity terahertz sensing using a silicon nitride nanomechanical resonator functionalized with a metasurface absorber. High performances are achieved by striking a fine balance between the frequency stability of the resonator, and its responsivity to absorbed radiation. Using this approach, we demonstrate a detectivity $D^*=3.4\times10^9~\mathrm{cm\cdot\sqrt{Hz}/W}$ and a noise equivalent power $\mathrm{NEP}=36~\mathrm{pW/\sqrt{Hz}}$ that outperform the best room-temperature on-chip THz detectors (i.e., pyroelectrics). Our optical absorber consists of a 1-mm diameter metasurface, which currently enables a 0.5-3 THz detection range but can easily be scaled to other frequencies in the THz and infrared ranges. In addition to demonstrating high-performance terahertz sensing, our work unveils an important fundamental trade-off between high frequency stability and high responsivity in thermal-based nanomechanical radiation sensors.
△ Less
Submitted 29 January, 2024;
originally announced January 2024.
-
Coordinated Deep Neural Networks: A Versatile Edge Offloading Algorithm
Authors:
Alireza Maleki,
Hamed Shah-Mansouri,
Babak H. Khalaj
Abstract:
As artificial intelligence (AI) applications continue to expand, there is a growing need for deep neural network (DNN) models. Although DNN models deployed at the edge are promising to provide AI as a service with low latency, their cooperation is yet to be explored. In this paper, we consider the DNN service providers share their computing resources as well as their models' parameters and allow o…
▽ More
As artificial intelligence (AI) applications continue to expand, there is a growing need for deep neural network (DNN) models. Although DNN models deployed at the edge are promising to provide AI as a service with low latency, their cooperation is yet to be explored. In this paper, we consider the DNN service providers share their computing resources as well as their models' parameters and allow other DNNs to offload their computations without mirroring. We propose a novel algorithm called coordinated DNNs on edge (\textbf{CoDE}) that facilitates coordination among DNN services by creating multi-task DNNs out of individual models. CoDE aims to find the optimal path that results in the lowest possible cost, where the cost reflects the inference delay, model accuracy, and local computation workload. With CoDE, DNN models can make new paths for inference by using their own or other models' parameters. We then evaluate the performance of CoDE through numerical experiments. The results demonstrate a $75\%$ reduction in the local service computation workload while degrading the accuracy by only $2\%$ and having the same inference time in a balanced load condition. Under heavy load, CoDE can further decrease the inference time by $30\%$ while the accuracy is reduced by only $4\%$.
△ Less
Submitted 31 December, 2023;
originally announced January 2024.
-
Who Are Tweeting About Academic Publications? A Cochrane Systematic Review and Meta-Analysis of Altmetric Studies
Authors:
Ashraf Maleki,
Kim Holmberg
Abstract:
Previous studies have developed different categorizations of Twitter users who interact with scientific publications online, reflecting the difficulty in creating a unified approach. Using Cochrane Review meta-analysis to analyse earlier research (including 79,014 Twitter users, over twenty million tweets, and over five million tweeted publications from 23 studies), we created a consolidated robus…
▽ More
Previous studies have developed different categorizations of Twitter users who interact with scientific publications online, reflecting the difficulty in creating a unified approach. Using Cochrane Review meta-analysis to analyse earlier research (including 79,014 Twitter users, over twenty million tweets, and over five million tweeted publications from 23 studies), we created a consolidated robust categorization consisting of 11 user categories, at different dimensions, covering most of any future needs for user categorizations on Twitter and possibly also other social media platforms. Our findings showed, with moderate certainty, covering all the earlier different approaches employed, that the predominant group of Twitter was individual users (66%), being responsible for the majority of tweets (55%) and tweeted publications (50%), while organizations (22%, 27%, and 28%, respectively) and science communicators (16%, 13%, and 30%) clearly contributed to a lesser degree. These individual users consisted of both academic individuals (33%) and other individuals (28%). While academic individuals shared more academic publications than other individuals (42% vs. 31%), they posted fewer tweets overall (22% vs. 30%), but these differences do not reach statistical significance. Despite significant heterogeneity arising from variations in earlier categorizations, the findings consistently indicate the importance of academics in disseminating academic publications on Twitter.
△ Less
Submitted 14 May, 2024; v1 submitted 11 December, 2023;
originally announced December 2023.
-
Approximate Leave-one-out Cross Validation for Regression with $\ell_1$ Regularizers (extended version)
Authors:
Arnab Auddy,
Haolin Zou,
Kamiar Rahnama Rad,
Arian Maleki
Abstract:
The out-of-sample error (OO) is the main quantity of interest in risk estimation and model selection. Leave-one-out cross validation (LO) offers a (nearly) distribution-free yet computationally demanding approach to estimate OO. Recent theoretical work showed that approximate leave-one-out cross validation (ALO) is a computationally efficient and statistically reliable estimate of LO (and OO) for…
▽ More
The out-of-sample error (OO) is the main quantity of interest in risk estimation and model selection. Leave-one-out cross validation (LO) offers a (nearly) distribution-free yet computationally demanding approach to estimate OO. Recent theoretical work showed that approximate leave-one-out cross validation (ALO) is a computationally efficient and statistically reliable estimate of LO (and OO) for generalized linear models with differentiable regularizers. For problems involving non-differentiable regularizers, despite significant empirical evidence, the theoretical understanding of ALO's error remains unknown. In this paper, we present a novel theory for a wide class of problems in the generalized linear model family with non-differentiable regularizers. We bound the error |ALO - LO| in terms of intuitive metrics such as the size of leave-i-out perturbations in active sets, sample size n, number of features p and regularization parameters. As a consequence, for the $\ell_1$-regularized problems, we show that |ALO - LO| goes to zero as p goes to infinity while n/p and SNR are fixed and bounded.
△ Less
Submitted 26 October, 2023;
originally announced October 2023.
-
A Tutorial on Chirp Spread Spectrum for LoRaWAN: Basics and Key Advances
Authors:
Alireza Maleki,
Ha H. Nguyen,
Ebrahim Bedeer,
Robert Barton
Abstract:
Chirps spread spectrum (CSS) modulation is the heart of long-range (LoRa) modulation used in the context of long-range wide area network (LoRaWAN) in internet of things (IoT) scenarios. Despite being a proprietary technology owned by Semtech Corp., LoRa modulation has drawn much attention from the research and industry communities in recent years. However, to the best of our knowledge, a comprehen…
▽ More
Chirps spread spectrum (CSS) modulation is the heart of long-range (LoRa) modulation used in the context of long-range wide area network (LoRaWAN) in internet of things (IoT) scenarios. Despite being a proprietary technology owned by Semtech Corp., LoRa modulation has drawn much attention from the research and industry communities in recent years. However, to the best of our knowledge, a comprehensive tutorial, investigating the CSS modulation in the LoRaWAN application, is missing in the literature. Therefore, in the first part of this paper, we provide a thorough analysis and tutorial of CSS modulation modified by LoRa specifications, discussing various aspects such as signal generation, detection, error performance, and spectral characteristics. Moreover, a summary of key recent advances in the context of CSS modulation applications in IoT networks is presented in the second part of this paper under four main categories of transceiver configuration and design, data rate improvement, interference modeling, and synchronization algorithms.
△ Less
Submitted 16 October, 2023;
originally announced October 2023.
-
Using disturbance function for vibration analysis of a beam with an open edge crack
Authors:
Mousa Rezaee,
Saeed Lotfan,
Vahid A. Maleki
Abstract:
In this article, the model presented by Shen and Pierre to investigate the transverse vibration behavior of a simply supported beam has been revised. This is done by applying more realistic assumptions. The crack is modeled as a continuous disturbance and the disturbance function is provided based on fracture mechanics. Next, the natural frequencies corresponding to the model are extracted using t…
▽ More
In this article, the model presented by Shen and Pierre to investigate the transverse vibration behavior of a simply supported beam has been revised. This is done by applying more realistic assumptions. The crack is modeled as a continuous disturbance and the disturbance function is provided based on fracture mechanics. Next, the natural frequencies corresponding to the model are extracted using the Galerkin method. The effect of crack parameters on the vibration behavior of the cracked beam is investigated. The obtained results show that the natural frequencies of the beam decrease with increasing crack depth. At the end, the obtained results are compared with the experimental results. The results show that the presented model is improved compared to previous models and predicts the vibration behavior of cracked beams with better accuracy for different crack parameters.
△ Less
Submitted 12 March, 2023;
originally announced May 2023.
-
Activity-induced asymmetric dispersion in confined channels with constriction
Authors:
Armin Maleki,
Malihe Ghodrat,
Ignacio Pagonabarraga
Abstract:
Microorganisms, such as E.Coli, are known to display upstream behavior and respond rheotactically to shear flows. In particular, E.Coli suspensions have been shown to display strong sensitivity to spatial constrictions, leading to an anomalous densification past the constriction for incoming fluid velocities comparable to the microoganism's self propulsion speed. We introduce a Brownian dynamics m…
▽ More
Microorganisms, such as E.Coli, are known to display upstream behavior and respond rheotactically to shear flows. In particular, E.Coli suspensions have been shown to display strong sensitivity to spatial constrictions, leading to an anomalous densification past the constriction for incoming fluid velocities comparable to the microoganism's self propulsion speed. We introduce a Brownian dynamics model for ellipsoidal self-propelling particles in a confined channel subject to a constriction. The model allows to identify the relevant parameters that characterize the relevant dynamical regimes of the accumulation of the active particles at the constriction, and clarify the mechanisms underlying the experimental observations. We find that particles are trapped in butterfly-like attractors in front of the constriction, which is the origin of the symmetry breaking in the emerging density profiles of active particles passing the constriction. In addition, the probability of trap** and thus the strength of asymmetry is affected by size of the particles and geometry of the channel, as well as the ratio of fluid velocity to propulsion speed.
△ Less
Submitted 16 May, 2023;
originally announced May 2023.
-
Hybrid THz architectures for molecular polaritonics
Authors:
Ahmed Jaber,
Michael Reitz,
Avinash Singh,
Ali Maleki,
Yongbao Xin,
Brian Sullivan,
Ksenia Dolgaleva,
Robert W. Boyd,
Claudiu Genes,
Jean-Michel Ménard
Abstract:
Physical and chemical properties of materials can be modified by a resonant optical mode. Such recent demonstrations have mostly relied on a planar cavity geometry, others have relied on a plasmonic resonator. However, the combination of these two device architectures have remained largely unexplored, especially in the context of maximizing light-matter interactions. Here, we investigate several s…
▽ More
Physical and chemical properties of materials can be modified by a resonant optical mode. Such recent demonstrations have mostly relied on a planar cavity geometry, others have relied on a plasmonic resonator. However, the combination of these two device architectures have remained largely unexplored, especially in the context of maximizing light-matter interactions. Here, we investigate several schemes of electromagnetic field confinement aimed at facilitating the collective coupling of a localized photonic mode to molecular vibrations in the terahertz region. The key aspects are the use of metasurface plasmonic structures combined with standard Fabry-Perot configurations and the deposition of a thin layer of glucose, via a spray coating technique, within a tightly focused electromagnetic mode volume. More importantly, we demonstrate enhanced vacuum Rabi splittings reaching up to 200 GHz when combining plasmonic resonances, photonic cavity modes and low-energy molecular resonances. Furthermore, we demonstrate how a cavity mode can be utilized to enhance the zero-point electric field amplitude of a plasmonic resonator. Our study provides key insight into the design of polaritonic platforms with organic molecules to harvest the unique properties of hybrid light-matter states.
△ Less
Submitted 25 May, 2024; v1 submitted 7 April, 2023;
originally announced April 2023.
-
D2D-aided LoRaWAN LR-FHSS in Direct-to-Satellite IoT Networks
Authors:
Alireza Maleki,
Ha H. Nguyen,
Ebrahim Bedeer,
Robert Barton
Abstract:
In this paper, we present a device-to-device (D2D) transmission scheme for aiding long-range frequency hop** spread spectrum (LR-FHSS) LoRaWAN protocol with application in direct-to-satellite IoT networks. We consider a practical ground-to-satellite fading model, i.e. shadowed-Rice channel, and derive the outage performance of the LR-FHSS network. With the help of network coding, D2D-aided LR-FH…
▽ More
In this paper, we present a device-to-device (D2D) transmission scheme for aiding long-range frequency hop** spread spectrum (LR-FHSS) LoRaWAN protocol with application in direct-to-satellite IoT networks. We consider a practical ground-to-satellite fading model, i.e. shadowed-Rice channel, and derive the outage performance of the LR-FHSS network. With the help of network coding, D2D-aided LR-FHSS transmission scheme is proposed to improve the network capacity for which a closed-form outage probability expression is also derived. The obtained analytical expressions for both LR-FHSS and D2D-aided LR-FHSS outage probabilities are validated by computer simulations for different parts of the analysis capturing the effects of noise, fading, unslotted ALOHA-based time scheduling, the receiver's capture effect, IoT device distributions, and distance from node to satellite. The total outage probability for the D2D-aided LR-FHSS shows a considerable increase of 249.9% and 150.1% in network capacity at a typical outage of 10^-2 for DR6 and DR5, respectively, when compared to LR-FHSS. This is obtained at the cost of minimum of one and maximum of two additional transmissions per each IoT end device imposed by the D2D scheme in each time-slot.
△ Less
Submitted 8 December, 2022;
originally announced December 2022.
-
Evaluation of the Antibacterial and Wound Healing Properties of a Burn Ointment Containing Curcumin, Honey, and Potassium Aluminium
Authors:
Mahsa Shahbandeh,
Mahsa Amin Salehi,
Maryam Soltanyzadeh,
Mehrnaz Mirzaei,
Ali Maleki,
Abdolkarim Chehregani rad,
Mohammad Javad Fatemi,
Reza Mirnejad,
Mostafa Dahmardehei
Abstract:
Burn wounds can severely trouble the health system and life quality of patients. The present study aimed to analyze the synergistic healing properties of curcumin, honey, and potassium alum substances merged in a newly-devised burn ointment on second-degree burn wounds in rats. The MIC and MBC tests on 200 clinical isolates of Pseudomonas aeruginous are compared to imipenem in vitro. Their killing…
▽ More
Burn wounds can severely trouble the health system and life quality of patients. The present study aimed to analyze the synergistic healing properties of curcumin, honey, and potassium alum substances merged in a newly-devised burn ointment on second-degree burn wounds in rats. The MIC and MBC tests on 200 clinical isolates of Pseudomonas aeruginous are compared to imipenem in vitro. Their killing time and cytotoxicity are also studied using a standard isolate of P. aeruginous, fibroblast stem cells (FSC) and mouse embryonic fibroblasts (MEF). Furthermore, histopathological and histomorphological assessments are conducted on 150 male Wistar rats whitin four experimental groups to evaluate the efficiency of the prepared burn ointment. We found a significant wound healing in both macroscopical observations and microscopical evaluations. Both curcumin and honey show strong antimicrobial effects with no cytotoxicity. Also, the histopathological results present a considerable and comparable wound re-epithelization in the a group of rats treated with both honey and curcumin after 7 days. The burn ointment containing curcumin, honey, and potassium alum show considerable efficacy in accelerating the healing of experimentally-induced burn wounds in animals. Th novel onement product is propose as a powerful alternative for the topical treatment of burn injuries.
△ Less
Submitted 21 November, 2022;
originally announced November 2022.
-
Signal-to-noise ratio aware minimaxity and higher-order asymptotics
Authors:
Yilin Guo,
Haolei Weng,
Arian Maleki
Abstract:
Since its development, the minimax framework has been one of the corner stones of theoretical statistics, and has contributed to the popularity of many well-known estimators, such as the regularized M-estimators for high-dimensional problems. In this paper, we will first show through the example of sparse Gaussian sequence model, that the theoretical results under the classical minimax framework a…
▽ More
Since its development, the minimax framework has been one of the corner stones of theoretical statistics, and has contributed to the popularity of many well-known estimators, such as the regularized M-estimators for high-dimensional problems. In this paper, we will first show through the example of sparse Gaussian sequence model, that the theoretical results under the classical minimax framework are insufficient for explaining empirical observations. In particular, both hard and soft thresholding estimators are (asymptotically) minimax, however, in practice they often exhibit sub-optimal performances at various signal-to-noise ratio (SNR) levels. The first contribution of this paper is to demonstrate that this issue can be resolved if the signal-to-noise ratio is taken into account in the construction of the parameter space. We call the resulting minimax framework the signal-to-noise ratio aware minimaxity. The second contribution of this paper is to showcase how one can use higher-order asymptotics to obtain accurate approximations of the SNR-aware minimax risk and discover minimax estimators. The theoretical findings obtained from this refined minimax framework provide new insights and practical guidance for the estimation of sparse signals.
△ Less
Submitted 28 December, 2023; v1 submitted 10 November, 2022;
originally announced November 2022.
-
Metamaterial-based octave-wide terahertz bandpass filters
Authors:
Ali Maleki,
Avinash Singh,
Ahmed Jaber,
Wei Cui,
Yongbao Xin,
Brian T. Sullivan,
Robert W. Boyd,
Jean-Michel Menard
Abstract:
We present octave-wide bandpass filters in the terahertz (THz) region based on bilayer-metamaterial (BLMM) structures. The passband region has a super-Gaussian shape with a maximum transmittance approaching 70% and a typical stopband rejection of 20 dB. The design is based on a metasurface consisting of a metallic square-hole array deposited on a transparent polymer, which is stacked on top of an…
▽ More
We present octave-wide bandpass filters in the terahertz (THz) region based on bilayer-metamaterial (BLMM) structures. The passband region has a super-Gaussian shape with a maximum transmittance approaching 70% and a typical stopband rejection of 20 dB. The design is based on a metasurface consisting of a metallic square-hole array deposited on a transparent polymer, which is stacked on top of an identical metasurface with a sub-wavelength separation. The superimposed metasurface structures were designed using finite-difference time-domain (FDTD) simulations and fabricated using a photolithography process. Experimental characterization of these structures between 0.3 to 5.8 THz is performed with a time-domain THz spectroscopy system. Good agreement between experiment and simulation results is observed. We also demonstrate that two superimposed BLMM (2BLMM) devices increase the steepness of the roll-offs to more than 85 dB/octave and enable a superior stopband rejection approaching 40 dB while the maximum transmittance remains above 64%. This work paves the way toward new THz applications, including the detection of THz pulses centered at specific frequencies, and an enhanced time-resolved detection sensitivity towards molecular vibrations that are noise dominated by a strong, off-resonant, driving field.
△ Less
Submitted 17 August, 2022;
originally announced August 2022.
-
Heterogeneous Multi-core Array-based DNN Accelerator
Authors:
Mohammad Ali Maleki,
Mehdi Kamal,
Ali Afzali-Kusha
Abstract:
In this article, we investigate the impact of architectural parameters of array-based DNN accelerators on accelerator's energy consumption and performance in a wide variety of network topologies. For this purpose, we have developed a tool that simulates the execution of neural networks on array-based accelerators and has the capability of testing different configurations for the estimation of ener…
▽ More
In this article, we investigate the impact of architectural parameters of array-based DNN accelerators on accelerator's energy consumption and performance in a wide variety of network topologies. For this purpose, we have developed a tool that simulates the execution of neural networks on array-based accelerators and has the capability of testing different configurations for the estimation of energy consumption and processing latency. Based on our analysis of the behavior of benchmark networks under different architectural parameters, we offer a few recommendations for having an efficient yet high performance accelerator design. Next, we propose a heterogeneous multi-core chip scheme for deep neural network execution. The evaluations of a selective small search space indicate that the execution of neural networks on their near-optimal core configuration can save up to 36% and 67% of energy consumption and energy-delay product respectively. Also, we suggest an algorithm to distribute the processing of network's layers across multiple cores of the same type in order to speed up the computations through model parallelism. Evaluations on different networks and with the different number of cores verify the effectiveness of the proposed algorithm in speeding up the processing to near-optimal values.
△ Less
Submitted 25 June, 2022;
originally announced June 2022.
-
Complementarity-Entanglement Tradeoff in Quantum Gravity
Authors:
Yusef Maleki,
Alireza Maleki
Abstract:
Quantization of the gravity remains one of the most important, yet extremely illusive, challenges at the heart of modern physics. Any attempt to resolve this long-standing problem seems to be doomed, as the route to any direct empirical evidence (i.e., detecting gravitons) for shedding light on the quantum aspect of the gravity is far beyond the current capabilities. Recently, it has been discover…
▽ More
Quantization of the gravity remains one of the most important, yet extremely illusive, challenges at the heart of modern physics. Any attempt to resolve this long-standing problem seems to be doomed, as the route to any direct empirical evidence (i.e., detecting gravitons) for shedding light on the quantum aspect of the gravity is far beyond the current capabilities. Recently, it has been discovered that gravitationally-induced entanglement, tailored in the interferometric frameworks, can be used to witness the quantum nature of the gravity. Even though these schemes offer promising tools for investigating quantum gravity, many fundamental and empirical aspects of the schemes are yet to be discovered. Considering the fact that, beside quantum entanglement, quantum uncertainty and complementarity principles are the two other foundational aspects of quantum physics, the quantum nature of the gravity needs to manifest all of these features. Here, we lay out an interferometric platform for testing these three nonclassical aspects of quantum mechanics in quantum gravity setting, which connects gravity and quantum physics in a broader and deeper context. As we show in this work, all of these three fundamental features of quantum gravity can be framed and fully analyzed in an interferometric scheme.
△ Less
Submitted 4 May, 2022;
originally announced May 2022.
-
Towards Designing Optimal Sensing Matrices for Generalized Linear Inverse Problems
Authors:
Junjie Ma,
Ji Xu,
Arian Maleki
Abstract:
We consider an inverse problem $\mathbf{y}= f(\mathbf{Ax})$, where $\mathbf{x}\in\mathbb{R}^n$ is the signal of interest, $\mathbf{A}$ is the sensing matrix, $f$ is a nonlinear function and $\mathbf{y} \in \mathbb{R}^m$ is the measurement vector. In many applications, we have some level of freedom to design the sensing matrix $\mathbf{A}$, and in such circumstances we could optimize $\mathbf{A}$ t…
▽ More
We consider an inverse problem $\mathbf{y}= f(\mathbf{Ax})$, where $\mathbf{x}\in\mathbb{R}^n$ is the signal of interest, $\mathbf{A}$ is the sensing matrix, $f$ is a nonlinear function and $\mathbf{y} \in \mathbb{R}^m$ is the measurement vector. In many applications, we have some level of freedom to design the sensing matrix $\mathbf{A}$, and in such circumstances we could optimize $\mathbf{A}$ to achieve better reconstruction performance. As a first step towards optimal design, it is important to understand the impact of the sensing matrix on the difficulty of recovering $\mathbf{x}$ from $\mathbf{y}$.
In this paper, we study the performance of one of the most successful recovery methods, i.e., the expectation propagation (EP) algorithm. We define a notion of spikiness for the spectrum of $\bmmathbfA}$ and show the importance of this measure for the performance of EP. We show that whether a spikier spectrum can hurt or help the recovery performance depends on $f$. Based on our framework, we are able to show that, in phase-retrieval problems, matrices with spikier spectrums are better for EP, while in 1-bit compressed sensing problems, less spiky spectrums lead to better performance. Our results unify and substantially generalize existing results that compare Gaussian and orthogonal matrices, and provide a platform towards designing optimal sensing systems.
△ Less
Submitted 19 August, 2023; v1 submitted 4 November, 2021;
originally announced November 2021.
-
Quantum Steering Ellipsoid and Unruh Effect
Authors:
Yusef Maleki,
Bahram Ahansaz,
Kangle Li,
Alireza Maleki
Abstract:
Quantum steering is a perplexing feature at the heart of quantum mechanics that provides profound implications in understanding the nature of physical reality. On the other hand, the effect of relativistic features on quantum systems is vital in understanding the underlying foundations of physics. In this work, we study the effects of Unruh acceleration on the quantum steering of a two-qubit syste…
▽ More
Quantum steering is a perplexing feature at the heart of quantum mechanics that provides profound implications in understanding the nature of physical reality. On the other hand, the effect of relativistic features on quantum systems is vital in understanding the underlying foundations of physics. In this work, we study the effects of Unruh acceleration on the quantum steering of a two-qubit system. In particular, we consider the so-called quantum steering ellipsoid and the maximally-steered coherence in a non-inertial frame and find closed-form analytic expressions for the role of the Unruh acceleration in these quantities. Analyzing the conditions for the steerability of the system, we develop a geometric description for the effect of Unruh acceleration on the quantum steering of a two-qubit system.
△ Less
Submitted 27 October, 2021;
originally announced October 2021.
-
A composable autoencoder-based iterative algorithm for accelerating numerical simulations
Authors:
Rishikesh Ranade,
Chris Hill,
Haiyang He,
Amir Maleki,
Norman Chang,
Jay Pathak
Abstract:
Numerical simulations for engineering applications solve partial differential equations (PDE) to model various physical processes. Traditional PDE solvers are very accurate but computationally costly. On the other hand, Machine Learning (ML) methods offer a significant computational speedup but face challenges with accuracy and generalization to different PDE conditions, such as geometry, boundary…
▽ More
Numerical simulations for engineering applications solve partial differential equations (PDE) to model various physical processes. Traditional PDE solvers are very accurate but computationally costly. On the other hand, Machine Learning (ML) methods offer a significant computational speedup but face challenges with accuracy and generalization to different PDE conditions, such as geometry, boundary conditions, initial conditions and PDE source terms. In this work, we propose a novel ML-based approach, CoAE-MLSim (Composable AutoEncoder Machine Learning Simulation), which is an unsupervised, lower-dimensional, local method, that is motivated from key ideas used in commercial PDE solvers. This allows our approach to learn better with relatively fewer samples of PDE solutions. The proposed ML-approach is compared against commercial solvers for better benchmarks as well as latest ML-approaches for solving PDEs. It is tested for a variety of complex engineering cases to demonstrate its computational speed, accuracy, scalability, and generalization across different PDE conditions. The results show that our approach captures physics accurately across all metrics of comparison (including measures such as results on section cuts and lines).
△ Less
Submitted 7 October, 2021;
originally announced October 2021.
-
OVERT: An Algorithm for Safety Verification of Neural Network Control Policies for Nonlinear Systems
Authors:
Chelsea Sidrane,
Amir Maleki,
Ahmed Irfan,
Mykel J. Kochenderfer
Abstract:
Deep learning methods can be used to produce control policies, but certifying their safety is challenging. The resulting networks are nonlinear and often very large. In response to this challenge, we present OVERT: a sound algorithm for safety verification of nonlinear discrete-time closed loop dynamical systems with neural network control policies. The novelty of OVERT lies in combining ideas fro…
▽ More
Deep learning methods can be used to produce control policies, but certifying their safety is challenging. The resulting networks are nonlinear and often very large. In response to this challenge, we present OVERT: a sound algorithm for safety verification of nonlinear discrete-time closed loop dynamical systems with neural network control policies. The novelty of OVERT lies in combining ideas from the classical formal methods literature with ideas from the newer neural network verification literature. The central concept of OVERT is to abstract nonlinear functions with a set of optimally tight piecewise linear bounds. Such piecewise linear bounds are designed for seamless integration into ReLU neural network verification tools. OVERT can be used to prove bounded-time safety properties by either computing reachable sets or solving feasibility queries directly. We demonstrate various examples of safety verification for several classical benchmark examples. OVERT compares favorably to existing methods both in computation time and in tightness of the reachable set.
△ Less
Submitted 2 August, 2021;
originally announced August 2021.
-
Compressed sensing in the presence of speckle noise
Authors:
Wenda Zhou,
Shirin Jalali,
Arian Maleki
Abstract:
The problem of recovering a structured signal from its linear measurements in the presence of speckle noise is studied. This problem appears in many imaging systems such as synthetic aperture radar and optical coherence tomography. The current acquisition technology oversamples signals and converts the problem into a denoising problem with multiplicative noise. However, this paper explores the pos…
▽ More
The problem of recovering a structured signal from its linear measurements in the presence of speckle noise is studied. This problem appears in many imaging systems such as synthetic aperture radar and optical coherence tomography. The current acquisition technology oversamples signals and converts the problem into a denoising problem with multiplicative noise. However, this paper explores the possibility of reducing the number of measurements below the ambient dimension of the signal. The sophistications that appear in the study of multiplicative noises have so far impeded theoretical analysis of such problems. This paper aims to present the first theoretical result regarding the recovery of signals from their undersampled measurements under the speckle noise. It is shown that if the signal class is structured, in the sense that the signals can be compressed efficiently, then one can obtain accurate estimates of the signal from fewer measurements than the ambient dimension. We demonstrate the effectiveness of the methods we propose through simulation results.
△ Less
Submitted 31 July, 2021;
originally announced August 2021.
-
Geometry encoding for numerical simulations
Authors:
Amir Maleki,
Jan Heyse,
Rishikesh Ranade,
Haiyang He,
Priya Kasimbeg,
Jay Pathak
Abstract:
We present a notion of geometry encoding suitable for machine learning-based numerical simulation. In particular, we delineate how this notion of encoding is different than other encoding algorithms commonly used in other disciplines such as computer vision and computer graphics. We also present a model comprised of multiple neural networks including a processor, a compressor and an evaluator.Thes…
▽ More
We present a notion of geometry encoding suitable for machine learning-based numerical simulation. In particular, we delineate how this notion of encoding is different than other encoding algorithms commonly used in other disciplines such as computer vision and computer graphics. We also present a model comprised of multiple neural networks including a processor, a compressor and an evaluator.These parts each satisfy a particular requirement of our encoding. We compare our encoding model with the analogous models in the literature
△ Less
Submitted 15 April, 2021;
originally announced April 2021.
-
A Latent space solver for PDE generalization
Authors:
Rishikesh Ranade,
Chris Hill,
Haiyang He,
Amir Maleki,
Jay Pathak
Abstract:
In this work we propose a hybrid solver to solve partial differential equation (PDE)s in the latent space. The solver uses an iterative inferencing strategy combined with solution initialization to improve generalization of PDE solutions. The solver is tested on an engineering case and the results show that it can generalize well to several PDE conditions.
In this work we propose a hybrid solver to solve partial differential equation (PDE)s in the latent space. The solver uses an iterative inferencing strategy combined with solution initialization to improve generalization of PDE solutions. The solver is tested on an engineering case and the results show that it can generalize well to several PDE conditions.
△ Less
Submitted 6 April, 2021;
originally announced April 2021.
-
Preference-based Learning of Reward Function Features
Authors:
Sydney M. Katz,
Amir Maleki,
Erdem Bıyık,
Mykel J. Kochenderfer
Abstract:
Preference-based learning of reward functions, where the reward function is learned using comparison data, has been well studied for complex robotic tasks such as autonomous driving. Existing algorithms have focused on learning reward functions that are linear in a set of trajectory features. The features are typically hand-coded, and preference-based learning is used to determine a particular use…
▽ More
Preference-based learning of reward functions, where the reward function is learned using comparison data, has been well studied for complex robotic tasks such as autonomous driving. Existing algorithms have focused on learning reward functions that are linear in a set of trajectory features. The features are typically hand-coded, and preference-based learning is used to determine a particular user's relative weighting for each feature. Designing a representative set of features to encode reward is challenging and can result in inaccurate models that fail to model the users' preferences or perform the task properly. In this paper, we present a method to learn both the relative weighting among features as well as additional features that help encode a user's reward function. The additional features are modeled as a neural network that is trained on the data from pairwise comparison queries. We apply our methods to a driving scenario used in previous work and compare the predictive power of our method to that of only hand-coded features. We perform additional analysis to interpret the learned features and examine the optimal trajectories. Our results show that adding an additional learned feature to the reward model enhances both its predictive power and expressiveness, producing unique results for each user.
△ Less
Submitted 3 March, 2021;
originally announced March 2021.
-
Optimal Data Detection and Signal Estimation in Systems with Input Noise
Authors:
Ramina Ghods,
Charles Jeon,
Arian Maleki,
Christoph Studer
Abstract:
Practical systems often suffer from hardware impairments that already appear during signal generation. Despite the limiting effect of such input-noise impairments on signal processing systems, they are routinely ignored in the literature. In this paper, we propose an algorithm for data detection and signal estimation, referred to as Approximate Message Passing with Input noise (AMPI), which takes…
▽ More
Practical systems often suffer from hardware impairments that already appear during signal generation. Despite the limiting effect of such input-noise impairments on signal processing systems, they are routinely ignored in the literature. In this paper, we propose an algorithm for data detection and signal estimation, referred to as Approximate Message Passing with Input noise (AMPI), which takes into account input-noise impairments. To demonstrate the efficacy of AMPI, we investigate two applications: Data detection in large multiple-input multiple output (MIMO) wireless systems and sparse signal recovery in compressive sensing. For both applications, we provide precise conditions in the large-system limit for which AMPI achieves optimal performance. We furthermore use simulations to demonstrate that AMPI achieves near-optimal performance at low complexity in realistic, finite-dimensional systems.
△ Less
Submitted 5 August, 2020;
originally announced August 2020.
-
Mismatched Data Detection in Massive MU-MIMO
Authors:
Charles Jeon,
Arian Maleki,
Christoph Studer
Abstract:
We investigate mismatched data detection for massive multi-user (MU) multiple-input multiple-output (MIMO) wireless systems in which the prior distribution of the transmit signal used in the data detector differs from the true prior. In order to minimize the performance loss caused by the prior mismatch, we include a tuning stage into the recently proposed large-MIMO approximate message passing (L…
▽ More
We investigate mismatched data detection for massive multi-user (MU) multiple-input multiple-output (MIMO) wireless systems in which the prior distribution of the transmit signal used in the data detector differs from the true prior. In order to minimize the performance loss caused by the prior mismatch, we include a tuning stage into the recently proposed large-MIMO approximate message passing (LAMA) algorithm, which enables the development of data detectors with optimal as well as sub-optimal parameter tuning. We show that carefully-selected priors enable the design of simpler and computationally more efficient data detection algorithms compared to LAMA that uses the optimal prior, while achieving near-optimal error-rate performance. In particular, we demonstrate that a hardware-friendly approximation of the exact prior enables the design of low-complexity data detectors that achieve near individually-optimal performance. Furthermore, for Gaussian priors and uniform priors within a hypercube covering the quadrature amplitude modulation (QAM) constellation, our performance analysis recovers classical and recent results on linear and non-linear massive MU-MIMO data detection, respectively.
△ Less
Submitted 18 October, 2021; v1 submitted 10 July, 2020;
originally announced July 2020.
-
Sharp Concentration Results for Heavy-Tailed Distributions
Authors:
Milad Bakhshizadeh,
Arian Maleki,
Victor H. de la Pena
Abstract:
We obtain concentration and large deviation for the sums of independent and identically distributed random variables with heavy-tailed distributions. Our concentration results are concerned with random variables whose distributions satisfy $\mathbb{P}(X>t) \leq {\rm e}^{- I(t)}$, where $I: \mathbb{R} \rightarrow \mathbb{R}$ is an increasing function and $I(t)/t \rightarrow α\in [0, \infty)$ as…
▽ More
We obtain concentration and large deviation for the sums of independent and identically distributed random variables with heavy-tailed distributions. Our concentration results are concerned with random variables whose distributions satisfy $\mathbb{P}(X>t) \leq {\rm e}^{- I(t)}$, where $I: \mathbb{R} \rightarrow \mathbb{R}$ is an increasing function and $I(t)/t \rightarrow α\in [0, \infty)$ as $t \rightarrow \infty$. Our main theorem can not only recover some of the existing results, such as the concentration of the sum of subWeibull random variables, but it can also produce new results for the sum of random variables with heavier tails. We show that the concentration inequalities we obtain are sharp enough to offer large deviation results for the sums of independent random variables as well. Our analyses which are based on standard truncation arguments simplify, unify and generalize the existing results on the concentration and large deviation of heavy-tailed random variables.
△ Less
Submitted 25 July, 2022; v1 submitted 30 March, 2020;
originally announced March 2020.
-
Error bounds in estimating the out-of-sample prediction error using leave-one-out cross validation in high-dimensions
Authors:
Kamiar Rahnama Rad,
Wenda Zhou,
Arian Maleki
Abstract:
We study the problem of out-of-sample risk estimation in the high dimensional regime where both the sample size $n$ and number of features $p$ are large, and $n/p$ can be less than one. Extensive empirical evidence confirms the accuracy of leave-one-out cross validation (LO) for out-of-sample risk estimation. Yet, a unifying theoretical evaluation of the accuracy of LO in high-dimensional problems…
▽ More
We study the problem of out-of-sample risk estimation in the high dimensional regime where both the sample size $n$ and number of features $p$ are large, and $n/p$ can be less than one. Extensive empirical evidence confirms the accuracy of leave-one-out cross validation (LO) for out-of-sample risk estimation. Yet, a unifying theoretical evaluation of the accuracy of LO in high-dimensional problems has remained an open problem. This paper aims to fill this gap for penalized regression in the generalized linear family. With minor assumptions about the data generating process, and without any sparsity assumptions on the regression coefficients, our theoretical analysis obtains finite sample upper bounds on the expected squared error of LO in estimating the out-of-sample error. Our bounds show that the error goes to zero as $n,p \rightarrow \infty$, even when the dimension $p$ of the feature vectors is comparable with or greater than the sample size $n$. One technical advantage of the theory is that it can be used to clarify and connect some results from the recent literature on scalable approximate LO.
△ Less
Submitted 3 March, 2020;
originally announced March 2020.
-
Constraint on the mass of fuzzy dark matter from the rotation curve of the Milky Way
Authors:
Alireza Maleki,
Shant Baghram,
Sohrab Rahvar
Abstract:
Fuzzy Dark Matter (FDM) is one of the recent models for dark matter. According to this model, dark matter is made of very light scalar particles with considerable quantum mechanical effects on the galactic scale, which solves many problems of the cold dark matter (CDM). Here we use the observed data from the rotation curve of the Milky Way (MW) Galaxy to compare the results from FDM and CDM models…
▽ More
Fuzzy Dark Matter (FDM) is one of the recent models for dark matter. According to this model, dark matter is made of very light scalar particles with considerable quantum mechanical effects on the galactic scale, which solves many problems of the cold dark matter (CDM). Here we use the observed data from the rotation curve of the Milky Way (MW) Galaxy to compare the results from FDM and CDM models. We show FDM adds a local peak on the rotation curve close to the center of the bulge, where its position and amplitude depend on the mass of FDM particles. By fitting the observed rotation curve with our expectation from FDM, we find that the mass of FDM is $m = 2.5^{+3.6}_{-2.0} \times10^{-21}$eV. We note that the local peak of the rotation curve in MW can also be explained in the CDM model with an extra inner bulge model for the MW Galaxy. We conclude that the FDM model explains this peak without a need for extra structure for the bulge.
△ Less
Submitted 12 May, 2020; v1 submitted 13 January, 2020;
originally announced January 2020.
-
Investigation of two colliding solitonic cores in Fuzzy Dark Matter models
Authors:
Alireza Maleki,
Shant Baghram,
Sohrab Rahvar
Abstract:
One of the challenging questions in cosmology is the nature of dark matter particles. Fuzzy Dark Matter (FDM) is one of the candidates which is made of very light ($m_{FDM}\simeq 10^{-22}-10^{-21}$ eV) bosonic particles with no self-interaction. It is introduced by the motivation to solve the core-cusp problem in the galactic halos. In this work, we investigate the observational features from FDM…
▽ More
One of the challenging questions in cosmology is the nature of dark matter particles. Fuzzy Dark Matter (FDM) is one of the candidates which is made of very light ($m_{FDM}\simeq 10^{-22}-10^{-21}$ eV) bosonic particles with no self-interaction. It is introduced by the motivation to solve the core-cusp problem in the galactic halos. In this work, we investigate the observational features from FDM halo collisions. Taking into account the quantum wave-length of the condensed bosonic structure, we determine the interference of the wave function of cores after collision. The fringe formation in the wave function is associated to the density contrast of the dark matter inside the colliding galaxies. The observational signatures of the fringes of the distribution of the dark matter are (i) on the lensing of the background sources, (ii) accumulation of the baryonic plasma tracking the interference of the FDM potential and (iii) excess in the X-ray emission from dense regions. Finally, we provide prospects for the observations of quantum wave features of FDM in the colliding galaxies. The NGC6240 colliding galaxy at the redshift of $z=0.024$ is a suitable candidate for this study. No signal is detected from the fringes in the Chandra data and taking into account the angular resolution of the telescope, we put constrain of $m> 7 \times10^{-23}$ eV on the mass of FDM particles.
△ Less
Submitted 1 November, 2019;
originally announced November 2019.
-
Information Theoretic Limits for Phase Retrieval with Subsampled Haar Sensing Matrices
Authors:
Rishabh Dudeja,
Junjie Ma,
Arian Maleki
Abstract:
We study information theoretic limits of recovering an unknown $n$ dimensional, complex signal vector $\mathbf{x}_\star$ with unit norm from $m$ magnitude-only measurements of the form $y_i = |(\mathbf{A} \mathbf{x}_\star)_i|^2, \; i = 1,2 \dots , m$, where $\mathbf{A}$ is the sensing matrix. This is known as the Phase Retrieval problem and models practical imaging systems where measuring the phas…
▽ More
We study information theoretic limits of recovering an unknown $n$ dimensional, complex signal vector $\mathbf{x}_\star$ with unit norm from $m$ magnitude-only measurements of the form $y_i = |(\mathbf{A} \mathbf{x}_\star)_i|^2, \; i = 1,2 \dots , m$, where $\mathbf{A}$ is the sensing matrix. This is known as the Phase Retrieval problem and models practical imaging systems where measuring the phase of the observations is difficult. Since in a number of applications, the sensing matrix has orthogonal columns, we model the sensing matrix as a subsampled Haar matrix formed by picking $n$ columns of a uniformly random $m \times m$ unitary matrix. We study this problem in the high dimensional asymptotic regime, where $m,n \rightarrow \infty$, while $m/n \rightarrow δ$ with $δ$ being a fixed number, and show that if $m < (2-o_n(1))\cdot n$, then any estimator is asymptotically orthogonal to the true signal vector $\mathbf{x}_\star$. This lower bound is sharp since when $m > (2+o_n(1)) \cdot n $, estimators that achieve a non trivial asymptotic correlation with the signal vector are known from previous works.
△ Less
Submitted 4 August, 2020; v1 submitted 25 October, 2019;
originally announced October 2019.
-
Does SLOPE outperform bridge regression?
Authors:
Shuaiwen Wang,
Haolei Weng,
Arian Maleki
Abstract:
A recently proposed SLOPE estimator (arXiv:1407.3824) has been shown to adaptively achieve the minimax $\ell_2$ estimation rate under high-dimensional sparse linear regression models (arXiv:1503.08393). Such minimax optimality holds in the regime where the sparsity level $k$, sample size $n$, and dimension $p$ satisfy $k/p \rightarrow 0$, $k\log p/n \rightarrow 0$. In this paper, we characterize t…
▽ More
A recently proposed SLOPE estimator (arXiv:1407.3824) has been shown to adaptively achieve the minimax $\ell_2$ estimation rate under high-dimensional sparse linear regression models (arXiv:1503.08393). Such minimax optimality holds in the regime where the sparsity level $k$, sample size $n$, and dimension $p$ satisfy $k/p \rightarrow 0$, $k\log p/n \rightarrow 0$. In this paper, we characterize the estimation error of SLOPE under the complementary regime where both $k$ and $n$ scale linearly with $p$, and provide new insights into the performance of SLOPE estimators. We first derive a concentration inequality for the finite sample mean square error (MSE) of SLOPE. The quantity that MSE concentrates around takes a complicated and implicit form. With delicate analysis of the quantity, we prove that among all SLOPE estimators, LASSO is optimal for estimating $k$-sparse parameter vectors that do not have tied non-zero components in the low noise scenario. On the other hand, in the large noise scenario, the family of SLOPE estimators are sub-optimal compared with bridge regression such as the Ridge estimator.
△ Less
Submitted 22 September, 2021; v1 submitted 20 September, 2019;
originally announced September 2019.
-
Speed limit of quantum dynamics near the event horizon of black holes
Authors:
Yusef Maleki,
Alireza Maleki
Abstract:
Quantum mechanics imposes a fundamental bound on the minimum time required for the quantum systems to evolve between two states of interest. This bound introduces a limit on the speed of the dynamical evolution of the systems, known as the quantum speed limit. We show that black holes can drastically affect the speed limit of a two-level fermionic quantum system subjected to an open quantum dynami…
▽ More
Quantum mechanics imposes a fundamental bound on the minimum time required for the quantum systems to evolve between two states of interest. This bound introduces a limit on the speed of the dynamical evolution of the systems, known as the quantum speed limit. We show that black holes can drastically affect the speed limit of a two-level fermionic quantum system subjected to an open quantum dynamics. As we demonstrate, the quantum speed limit can enhance at the vicinity of a black hole's event horizon in the Schwarzschild spacetime.
△ Less
Submitted 26 June, 2019;
originally announced June 2019.
-
Orbital angular momentum transfer via spontaneously generated coherence
Authors:
Zahra Amini Sabegh,
Mohammad Mohammadi,
Mohammad Ali Maleki,
Mohammad Mahmoudi
Abstract:
We study the orbital angular momentum (OAM) transfer from a weak Laguerre-Gaussian (LG) field to a weak plane-wave in two closed-loop three-level $V$-type atomic systems. In the first scheme, the atomic system has two non-degenerate upper levels which the corresponding transition is excited by a microwave plane-wave. It is analytically shown that the microwave field induces an OAM transfer from an…
▽ More
We study the orbital angular momentum (OAM) transfer from a weak Laguerre-Gaussian (LG) field to a weak plane-wave in two closed-loop three-level $V$-type atomic systems. In the first scheme, the atomic system has two non-degenerate upper levels which the corresponding transition is excited by a microwave plane-wave. It is analytically shown that the microwave field induces an OAM transfer from an LG field to a generated third field. In the second scheme, we consider a three-level $V$-type atomic system with two near-degenerate excited states and study the effect of the quantum interference due to the spontaneous emission on the OAM transfer. It is found that the spontaneously generated coherence (SGC) induces the OAM transfer from the LG field to the weak planar field, while the OAM transfer does not occur in the absence of the SGC. The suggested models prepare a rather simple method for the OAM transfer which can be used in quantum information processing and data storage.
△ Less
Submitted 5 June, 2019;
originally announced June 2019.
-
A Configurable Memristor-based Finite Impulse Response Filter
Authors:
Mohammad Hemmati,
Vahid Rashtchi,
Ahmad Maleki,
Siroos Toofan
Abstract:
There are two main methods to implement FIR filters: software and hardware. In the software method, an FIR filter can be implemented within the processor by programming; it uses too much memory and it is extremely time-consuming while it gives the design more configurability. In most hardware-based implementations of FIR filters, Analog-to-Digital (A/D) and Digital-to-Analog (D/A) converters are m…
▽ More
There are two main methods to implement FIR filters: software and hardware. In the software method, an FIR filter can be implemented within the processor by programming; it uses too much memory and it is extremely time-consuming while it gives the design more configurability. In most hardware-based implementations of FIR filters, Analog-to-Digital (A/D) and Digital-to-Analog (D/A) converters are mandatory and increase the cost. The most important advantage of hardware implementation of a FIR filter is its higher speed compared to its software counterpart. In this work, considering the advantages of software and hardware approaches, a method to implement direct form FIR filters using analog components and memristors is proposed. Not only the A/D and D/A converters are omitted, but also using memristors avails configurability. A new circuit is presented to handle negative coefficients of the filter and memristance values are calculated using a heuristic method in order to achieve a better accuracy in setting coefficients. Moreover, an appropriate sample and delay topology is employed which overcomes the limitations of the previous research in implementation of high-order filters. Proper operation and usefulness of the proposed structures are all validated via simulation in Cadence.
△ Less
Submitted 10 April, 2019;
originally announced April 2019.
-
Analysis of Spectral Methods for Phase Retrieval with Random Orthogonal Matrices
Authors:
Rishabh Dudeja,
Milad Bakhshizadeh,
Junjie Ma,
Arian Maleki
Abstract:
Phase retrieval refers to algorithmic methods for recovering a signal from its phaseless measurements. Local search algorithms that work directly on the non-convex formulation of the problem have been very popular recently. Due to the nonconvexity of the problem, the success of these local search algorithms depends heavily on their starting points. The most widely used initialization scheme is the…
▽ More
Phase retrieval refers to algorithmic methods for recovering a signal from its phaseless measurements. Local search algorithms that work directly on the non-convex formulation of the problem have been very popular recently. Due to the nonconvexity of the problem, the success of these local search algorithms depends heavily on their starting points. The most widely used initialization scheme is the spectral method, in which the leading eigenvector of a data-dependent matrix is used as a starting point. Recently, the performance of the spectral initialization was characterized accurately for measurement matrices with independent and identically distributed entries. This paper aims to obtain the same level of knowledge for isotropically random column-orthogonal matrices, which are substantially better models for practical phase retrieval systems. Towards this goal, we consider the asymptotic setting in which the number of measurements $m$, and the dimension of the signal, $n$, diverge to infinity with $m/n = δ\in(1,\infty)$, and obtain a simple expression for the overlap between the spectral estimator and the true signal vector.
△ Less
Submitted 4 March, 2020; v1 submitted 6 March, 2019;
originally announced March 2019.
-
Spectral Method for Phase Retrieval: an Expectation Propagation Perspective
Authors:
Junjie Ma,
Rishabh Dudeja,
Ji Xu,
Arian Maleki,
Xiaodong Wang
Abstract:
Phase retrieval refers to the problem of recovering a signal $\mathbf{x}_{\star}\in\mathbb{C}^n$ from its phaseless measurements $y_i=|\mathbf{a}_i^{\mathrm{H}}\mathbf{x}_{\star}|$, where $\{\mathbf{a}_i\}_{i=1}^m$ are the measurement vectors. Many popular phase retrieval algorithms are based on the following two-step procedure: (i) initialize the algorithm based on a spectral method, (ii) refine…
▽ More
Phase retrieval refers to the problem of recovering a signal $\mathbf{x}_{\star}\in\mathbb{C}^n$ from its phaseless measurements $y_i=|\mathbf{a}_i^{\mathrm{H}}\mathbf{x}_{\star}|$, where $\{\mathbf{a}_i\}_{i=1}^m$ are the measurement vectors. Many popular phase retrieval algorithms are based on the following two-step procedure: (i) initialize the algorithm based on a spectral method, (ii) refine the initial estimate by a local search algorithm (e.g., gradient descent). The quality of the spectral initialization step can have a major impact on the performance of the overall algorithm. In this paper, we focus on the model where the measurement matrix $\mathbf{A}=[\mathbf{a}_1,\ldots,\mathbf{a}_m]^{\mathrm{H}}$ has orthonormal columns, and study the spectral initialization under the asymptotic setting $m,n\to\infty$ with $m/n\toδ\in(1,\infty)$. We use the expectation propagation framework to characterize the performance of spectral initialization for Haar distributed matrices. Our numerical results confirm that the predictions of the EP method are accurate for not-only Haar distributed matrices, but also for realistic Fourier based models (e.g. the coded diffraction model). The main findings of this paper are the following:
(1) There exists a threshold on $δ$ (denoted as $δ_{\mathrm{weak}}$) below which the spectral method cannot produce a meaningful estimate. We show that $δ_{\mathrm{weak}}=2$ for the column-orthonormal model. In contrast, previous results by Mondelli and Montanari show that $δ_{\mathrm{weak}}=1$ for the i.i.d. Gaussian model.
(2) The optimal design for the spectral method coincides with that for the i.i.d. Gaussian model, where the latter was recently introduced by Luo, Alghamdi and Lu.
△ Less
Submitted 9 September, 2020; v1 submitted 6 March, 2019;
originally announced March 2019.
-
Consistent Risk Estimation in Moderately High-Dimensional Linear Regression
Authors:
Ji Xu,
Arian Maleki,
Kamiar Rahnama Rad,
Daniel Hsu
Abstract:
Risk estimation is at the core of many learning systems. The importance of this problem has motivated researchers to propose different schemes, such as cross validation, generalized cross validation, and Bootstrap. The theoretical properties of such estimates have been extensively studied in the low-dimensional settings, where the number of predictors $p$ is much smaller than the number of observa…
▽ More
Risk estimation is at the core of many learning systems. The importance of this problem has motivated researchers to propose different schemes, such as cross validation, generalized cross validation, and Bootstrap. The theoretical properties of such estimates have been extensively studied in the low-dimensional settings, where the number of predictors $p$ is much smaller than the number of observations $n$. However, a unifying methodology accompanied with a rigorous theory is lacking in high-dimensional settings. This paper studies the problem of risk estimation under the moderately high-dimensional asymptotic setting $n,p \rightarrow \infty$ and $n/p \rightarrow δ>1$ ($δ$ is a fixed number), and proves the consistency of three risk estimates that have been successful in numerical studies, i.e., leave-one-out cross validation (LOOCV), approximate leave-one-out (ALO), and approximate message passing (AMP)-based techniques. A corner stone of our analysis is a bound that we obtain on the discrepancy of the `residuals' obtained from AMP and LOOCV. This connection not only enables us to obtain a more refined information on the estimates of AMP, ALO, and LOOCV, but also offers an upper bound on the convergence rate of each estimate.
△ Less
Submitted 18 January, 2021; v1 submitted 5 February, 2019;
originally announced February 2019.
-
Minimax Linear Estimation of the Retargeted Mean
Authors:
David A. Hirshberg,
Arian Maleki,
Jose R. Zubizarreta
Abstract:
Evaluating treatments received by one population for application to a different target population of scientific interest is a central problem in causal inference from observational studies. We study the minimax linear estimator of the treatment-specific mean outcome on a target population and provide a theoretical basis for inference based on it. In particular, we provide a justification for the c…
▽ More
Evaluating treatments received by one population for application to a different target population of scientific interest is a central problem in causal inference from observational studies. We study the minimax linear estimator of the treatment-specific mean outcome on a target population and provide a theoretical basis for inference based on it. In particular, we provide a justification for the common practice of ignoring bias when building confidence intervals with these linear estimators. Focusing on the case that the class of the unknown outcome function is the unit ball of a reproducing kernel Hilbert space, we show that the resulting linear estimator is asymptotically optimal under conditions only marginally stronger than those used with augmented estimators. We establish bounds attesting to the estimator's good finite sample properties. In an extensive simulation study, we observe promising performance of the estimator throughout a wide range of sample sizes, noise levels, and levels of overlap between the covariate distributions of the treated and target populations.
△ Less
Submitted 26 February, 2021; v1 submitted 10 January, 2019;
originally announced January 2019.
-
Theories and Practice of Agent based Modeling: Some practical Implications for Economic Planners
Authors:
Hossein Sabzian,
Mohammad Ali Shafia,
Ali Maleki,
Seyeed Mostapha Seyeed Hashemi,
Ali Baghaei,
Hossein Gharib
Abstract:
Nowadays, we are surrounded by a large number of complex phenomena ranging from rumor spreading, social norms formation to rise of new economic trends and disruption of traditional businesses. To deal with such phenomena,Complex Adaptive System (CAS) framework has been found very influential among social scientists,especially economists. As the most powerful methodology of CAS modeling, Agent-base…
▽ More
Nowadays, we are surrounded by a large number of complex phenomena ranging from rumor spreading, social norms formation to rise of new economic trends and disruption of traditional businesses. To deal with such phenomena,Complex Adaptive System (CAS) framework has been found very influential among social scientists,especially economists. As the most powerful methodology of CAS modeling, Agent-based modeling (ABM) has gained a growing application among academicians and practitioners. ABMs show how simple behavioral rules of agents and local interactions among them at micro-scale can generate surprisingly complex patterns at macro-scale. Despite a growing number of ABM publications, those researchers unfamiliar with this methodology have to study a number of works to understand (1) the why and what of ABMs and (2) the ways they are rigorously developed. Therefore, the major focus of this paper is to help social sciences researchers,especially economists get a big picture of ABMs and know how to develop them both systematically and rigorously.
△ Less
Submitted 23 January, 2019;
originally announced January 2019.
-
Microwave-induced orbital angular momentum transfer
Authors:
Zahra Amini Sabegh,
Mohammad Ali Maleki,
Mohammad Mahmoudi
Abstract:
The microwave-induced orbital angular momentum (OAM) transfer from a Laguerre-Gaussian (LG) beam to a weak plane-wave is studied in a closed-loop four-level ladder-type atomic system. The analytical investigation shows that the generated fourth field is a LG beam with the same OAM of the applied LG field. Moreover, the microwave-induced subluminal generated pulse can be switched to the superlumina…
▽ More
The microwave-induced orbital angular momentum (OAM) transfer from a Laguerre-Gaussian (LG) beam to a weak plane-wave is studied in a closed-loop four-level ladder-type atomic system. The analytical investigation shows that the generated fourth field is a LG beam with the same OAM of the applied LG field. Moreover, the microwave-induced subluminal generated pulse can be switched to the superluminal one only by changing the relative phase of applied fields. It is shown that the OAM transfer in subluminal regime is accompanied by a slightly absorption, however, it switches to the slightly gain in superluminal regime. The transfer of light's OAM and control of the group velocity of generated pulse can prepare a high-dimensional Hilbert space which has a major role in quantum communication and information processing.
△ Less
Submitted 9 November, 2018;
originally announced November 2018.
-
Optimal Data Detection in Large MIMO
Authors:
Charles Jeon,
Ramina Ghods,
Arian Maleki,
Christoph Studer
Abstract:
Large multiple-input multiple-output (MIMO) appears in massive multi-user MIMO and randomly-spread code-division multiple access (CDMA)-based wireless systems. In order to cope with the excessively high complexity of optimal data detection in such systems, a variety of efficient yet sub-optimal algorithms have been proposed in the past. In this paper, we propose a data detection algorithm that is…
▽ More
Large multiple-input multiple-output (MIMO) appears in massive multi-user MIMO and randomly-spread code-division multiple access (CDMA)-based wireless systems. In order to cope with the excessively high complexity of optimal data detection in such systems, a variety of efficient yet sub-optimal algorithms have been proposed in the past. In this paper, we propose a data detection algorithm that is computationally efficient and optimal in a sense that it is able to achieve the same error-rate performance as the individually optimal (IO) data detector under certain assumptions on the MIMO system matrix and constellation alphabet. Our algorithm, which we refer to as LAMA (short for large MIMO AMP), builds on complex-valued Bayesian approximate message passing (AMP), which enables an exact analytical characterization of the performance and complexity in the large-system limit via the state-evolution framework. We derive optimality conditions for LAMA and investigate performance/complexity trade-offs. As a byproduct of our analysis, we recover classical results of IO data detection for randomly-spread CDMA. We furthermore provide practical ways for LAMA to approach the theoretical performance limits in realistic, finite-dimensional systems at low computational complexity.
△ Less
Submitted 5 November, 2018;
originally announced November 2018.
-
Benefits of over-parameterization with EM
Authors:
Ji Xu,
Daniel Hsu,
Arian Maleki
Abstract:
Expectation Maximization (EM) is among the most popular algorithms for maximum likelihood estimation, but it is generally only guaranteed to find its stationary points of the log-likelihood objective. The goal of this article is to present theoretical and empirical evidence that over-parameterization can help EM avoid spurious local optima in the log-likelihood. We consider the problem of estimati…
▽ More
Expectation Maximization (EM) is among the most popular algorithms for maximum likelihood estimation, but it is generally only guaranteed to find its stationary points of the log-likelihood objective. The goal of this article is to present theoretical and empirical evidence that over-parameterization can help EM avoid spurious local optima in the log-likelihood. We consider the problem of estimating the mean vectors of a Gaussian mixture model in a scenario where the mixing weights are known. Our study shows that the global behavior of EM, when one uses an over-parameterized model in which the mixing weights are treated as unknown, is better than that when one uses the (correct) model with the mixing weights fixed to the known values. For symmetric Gaussians mixtures with two components, we prove that introducing the (statistically redundant) weight parameters enables EM to find the global maximizer of the log-likelihood starting from almost any initial mean parameters, whereas EM without this over-parameterization may very often fail. For other Gaussian mixtures, we provide empirical evidence that shows similar behavior. Our results corroborate the value of over-parameterization in solving non-convex optimization problems, previously observed in other domains.
△ Less
Submitted 26 October, 2018;
originally announced October 2018.
-
The absolutely Koszul and Backelin-Roos properties for spaces of quadrics of small codimension
Authors:
Rasoul Ahangari Maleki,
Liana M. Şega
Abstract:
Let $\kk$ be a field, $R$ a standard graded quadratic $\kk$-algebra with $\dim_{\kk}R_2\le 3$, and let $\ov\kk$ denote an algebraic closure of $\kk$. We construct a graded surjective Golod homomorphism $\varphi \colon P\to R\otimes_{\kk}\ov{\kk}$ such that $P$ is a complete intersection of codimension at most $3$. Furthermore, we show that $R$ is absolutely Koszul (that is, every finitely generate…
▽ More
Let $\kk$ be a field, $R$ a standard graded quadratic $\kk$-algebra with $\dim_{\kk}R_2\le 3$, and let $\ov\kk$ denote an algebraic closure of $\kk$. We construct a graded surjective Golod homomorphism $\varphi \colon P\to R\otimes_{\kk}\ov{\kk}$ such that $P$ is a complete intersection of codimension at most $3$. Furthermore, we show that $R$ is absolutely Koszul (that is, every finitely generated $R$-module has finite linearity defect) if and only if $R$ is Koszul if and only if $R$ is not a trivial fiber extension of a standard graded $\kk$-algebra with Hilbert series $(1+2t-2t^3)(1-t)^{-1}$. In particular, we recover earlier results on the Koszul property of Backelin, Conca and D'Alì.
△ Less
Submitted 19 January, 2020; v1 submitted 13 October, 2018;
originally announced October 2018.
-
Approximate Leave-One-Out for High-Dimensional Non-Differentiable Learning Problems
Authors:
Shuaiwen Wang,
Wenda Zhou,
Arian Maleki,
Haihao Lu,
Vahab Mirrokni
Abstract:
Consider the following class of learning schemes: \begin{equation} \label{eq:main-problem1}
\hat{\boldsymbolβ} := \underset{\boldsymbolβ \in \mathcal{C}}{\arg\min} \;\sum_{j=1}^n \ell(\boldsymbol{x}_j^\top\boldsymbolβ; y_j) + λR(\boldsymbolβ), \qquad \qquad \qquad (1) \end{equation} where $\boldsymbol{x}_i \in \mathbb{R}^p$ and $y_i \in \mathbb{R}$ denote the $i^{\rm th}$ feature and response va…
▽ More
Consider the following class of learning schemes: \begin{equation} \label{eq:main-problem1}
\hat{\boldsymbolβ} := \underset{\boldsymbolβ \in \mathcal{C}}{\arg\min} \;\sum_{j=1}^n \ell(\boldsymbol{x}_j^\top\boldsymbolβ; y_j) + λR(\boldsymbolβ), \qquad \qquad \qquad (1) \end{equation} where $\boldsymbol{x}_i \in \mathbb{R}^p$ and $y_i \in \mathbb{R}$ denote the $i^{\rm th}$ feature and response variable respectively. Let $\ell$ and $R$ be the convex loss function and regularizer, $\boldsymbolβ$ denote the unknown weights, and $λ$ be a regularization parameter. $\mathcal{C} \subset \mathbb{R}^{p}$ is a closed convex set. Finding the optimal choice of $λ$ is a challenging problem in high-dimensional regimes where both $n$ and $p$ are large. We propose three frameworks to obtain a computationally efficient approximation of the leave-one-out cross validation (LOOCV) risk for nonsmooth losses and regularizers. Our three frameworks are based on the primal, dual, and proximal formulations of (1). Each framework shows its strength in certain types of problems. We prove the equivalence of the three approaches under smoothness conditions. This equivalence enables us to justify the accuracy of the three methods under such conditions. We use our approaches to obtain a risk estimate for several standard problems, including generalized LASSO, nuclear norm regularization, and support vector machines. We empirically demonstrate the effectiveness of our results for non-differentiable cases.
△ Less
Submitted 4 October, 2018;
originally announced October 2018.
-
A strategic framework for identifying the critical factors of 4G technology diffusion in I.R. Iran - A Fuzzy DEMATEL approach
Authors:
Hossein Sabzian,
Hossein Gharib,
Seyyed Mostafa Seyyed Hashemi,
Ali Maleki
Abstract:
As the most prominent representative of 4G, Long term evolution (LTE) technology has become a focal point for mobile network operators all over the world. However, although Iranian main operators like MCI and Irancell have hugely invested on deployment of this technology, its diffusion has been very slow with a penetration rate of 0.06 at the end of spring 2017. Nevertheless, if this rate doesn't…
▽ More
As the most prominent representative of 4G, Long term evolution (LTE) technology has become a focal point for mobile network operators all over the world. However, although Iranian main operators like MCI and Irancell have hugely invested on deployment of this technology, its diffusion has been very slow with a penetration rate of 0.06 at the end of spring 2017. Nevertheless, if this rate doesn't increase, it will yield some negative unintended consequences for telecom operators such as (I) Failure to provide a large number of high quality services (II) Inability to compete with OTT technologies (III) Loss of many revenue opportunities (IV) Prolongation of payback period and (V) The lack of technological integrability with fifth generation networks (5G) and loss of many IOT opportunities. Through discussing the literature of technology adoption and diffusion both generally and specifically, identifying the major limitations of these studies and establishing a comprehensive factor set based on four major groups of (I) mobile handset and operators-related factors (II) subscribers-related biological factors, (III) subscribers-related perceptual factors and (IV) subscribers-related contextual factors, a novel fuzzy DEMATEL model has been developed by which all ICT policy makers can not only get a clear knowledge of factors influencing technology adoption but also know the critical success factors (CSFs) influencing Iranians' mindsets towards LTE adoption. Therefore, they can make effective and actionable policies to scale up LTE diffusion or other ICT-related technologies throughout the society.
△ Less
Submitted 10 July, 2018;
originally announced July 2018.
-
Approximate Leave-One-Out for Fast Parameter Tuning in High Dimensions
Authors:
Shuaiwen Wang,
Wenda Zhou,
Haihao Lu,
Arian Maleki,
Vahab Mirrokni
Abstract:
Consider the following class of learning schemes: $$\hat{\boldsymbolβ} := \arg\min_{\boldsymbolβ}\;\sum_{j=1}^n \ell(\boldsymbol{x}_j^\top\boldsymbolβ; y_j) + λR(\boldsymbolβ),\qquad\qquad (1) $$ where $\boldsymbol{x}_i \in \mathbb{R}^p$ and $y_i \in \mathbb{R}$ denote the $i^{\text{th}}$ feature and response variable respectively. Let $\ell$ and $R$ be the loss function and regularizer,…
▽ More
Consider the following class of learning schemes: $$\hat{\boldsymbolβ} := \arg\min_{\boldsymbolβ}\;\sum_{j=1}^n \ell(\boldsymbol{x}_j^\top\boldsymbolβ; y_j) + λR(\boldsymbolβ),\qquad\qquad (1) $$ where $\boldsymbol{x}_i \in \mathbb{R}^p$ and $y_i \in \mathbb{R}$ denote the $i^{\text{th}}$ feature and response variable respectively. Let $\ell$ and $R$ be the loss function and regularizer, $\boldsymbolβ$ denote the unknown weights, and $λ$ be a regularization parameter. Finding the optimal choice of $λ$ is a challenging problem in high-dimensional regimes where both $n$ and $p$ are large. We propose two frameworks to obtain a computationally efficient approximation ALO of the leave-one-out cross validation (LOOCV) risk for nonsmooth losses and regularizers. Our two frameworks are based on the primal and dual formulations of (1). We prove the equivalence of the two approaches under smoothness conditions. This equivalence enables us to justify the accuracy of both methods under such conditions. We use our approaches to obtain a risk estimate for several standard problems, including generalized LASSO, nuclear norm regularization, and support vector machines. We empirically demonstrate the effectiveness of our results for non-differentiable cases.
△ Less
Submitted 7 July, 2018;
originally announced July 2018.