-
A non-asymptotic theory of Kernel Ridge Regression: deterministic equivalents, test error, and GCV estimator
Authors:
Theodor Misiakiewicz,
Basil Saeed
Abstract:
We consider learning an unknown target function $f_*$ using kernel ridge regression (KRR) given i.i.d. data $(u_i,y_i)$, $i\leq n$, where $u_i \in U$ is a covariate vector and $y_i = f_* (u_i) +\varepsilon_i \in \mathbb{R}$. A recent string of work has empirically shown that the test error of KRR can be well approximated by a closed-form estimate derived from an `equivalent' sequence model that on…
▽ More
We consider learning an unknown target function $f_*$ using kernel ridge regression (KRR) given i.i.d. data $(u_i,y_i)$, $i\leq n$, where $u_i \in U$ is a covariate vector and $y_i = f_* (u_i) +\varepsilon_i \in \mathbb{R}$. A recent string of work has empirically shown that the test error of KRR can be well approximated by a closed-form estimate derived from an `equivalent' sequence model that only depends on the spectrum of the kernel operator. However, a theoretical justification for this equivalence has so far relied either on restrictive assumptions -- such as subgaussian independent eigenfunctions -- , or asymptotic derivations for specific kernels in high dimensions.
In this paper, we prove that this equivalence holds for a general class of problems satisfying some spectral and concentration properties on the kernel eigendecomposition. Specifically, we establish in this setting a non-asymptotic deterministic approximation for the test error of KRR -- with explicit non-asymptotic bounds -- that only depends on the eigenvalues and the target function alignment to the eigenvectors of the kernel. Our proofs rely on a careful derivation of deterministic equivalents for random matrix functionals in the dimension free regime pioneered by Cheng and Montanari (2022).
We apply this setting to several classical examples and show an excellent agreement between theoretical predictions and numerical simulations. These results rely on having access to the eigendecomposition of the kernel operator. Alternatively, we prove that, under this same setting, the generalized cross-validation (GCV) estimator concentrates on the test error uniformly over a range of ridge regularization parameter that includes zero (the interpolating solution). As a consequence, the GCV estimator can be used to estimate from data the test error and optimal regularization parameter for KRR.
△ Less
Submitted 13 March, 2024;
originally announced March 2024.
-
On the Importance of Large Objects in CNN Based Object Detection Algorithms
Authors:
Ahmed Ben Saad,
Gabriele Facciolo,
Axel Davy
Abstract:
Object detection models, a prominent class of machine learning algorithms, aim to identify and precisely locate objects in images or videos. However, this task might yield uneven performances sometimes caused by the objects sizes and the quality of the images and labels used for training. In this paper, we highlight the importance of large objects in learning features that are critical for all siz…
▽ More
Object detection models, a prominent class of machine learning algorithms, aim to identify and precisely locate objects in images or videos. However, this task might yield uneven performances sometimes caused by the objects sizes and the quality of the images and labels used for training. In this paper, we highlight the importance of large objects in learning features that are critical for all sizes. Given these findings, we propose to introduce a weighting term into the training loss. This term is a function of the object area size. We show that giving more weight to large objects leads to improved detection scores across all object sizes and so an overall improvement in Object Detectors performances (+2 p.p. of mAP on small objects, +2 p.p. on medium and +4 p.p. on large on COCO val 2017 with InternImage-T). Additional experiments and ablation studies with different models and on a different dataset further confirm the robustness of our findings.
△ Less
Submitted 20 November, 2023;
originally announced November 2023.
-
Improving Pixel-Level Contrastive Learning by Leveraging Exogenous Depth Information
Authors:
Ahmed Ben Saad,
Kristina Prokopetc,
Josselin Kherroubi,
Axel Davy,
Adrien Courtois,
Gabriele Facciolo
Abstract:
Self-supervised representation learning based on Contrastive Learning (CL) has been the subject of much attention in recent years. This is due to the excellent results obtained on a variety of subsequent tasks (in particular classification), without requiring a large amount of labeled samples. However, most reference CL algorithms (such as SimCLR and MoCo, but also BYOL and Barlow Twins) are not a…
▽ More
Self-supervised representation learning based on Contrastive Learning (CL) has been the subject of much attention in recent years. This is due to the excellent results obtained on a variety of subsequent tasks (in particular classification), without requiring a large amount of labeled samples. However, most reference CL algorithms (such as SimCLR and MoCo, but also BYOL and Barlow Twins) are not adapted to pixel-level downstream tasks. One existing solution known as PixPro proposes a pixel-level approach that is based on filtering of pairs of positive/negative image crops of the same image using the distance between the crops in the whole image. We argue that this idea can be further enhanced by incorporating semantic information provided by exogenous data as an additional selection filter, which can be used (at training time) to improve the selection of the pixel-level positive/negative samples. In this paper we will focus on the depth information, which can be obtained by using a depth estimation network or measured from available data (stereovision, parallax motion, LiDAR, etc.). Scene depth can provide meaningful cues to distinguish pixels belonging to different objects based on their depth. We show that using this exogenous information in the contrastive loss leads to improved results and that the learned representations better follow the shapes of objects. In addition, we introduce a multi-scale loss that alleviates the issue of finding the training parameters adapted to different object sizes. We demonstrate the effectiveness of our ideas on the Breakout Segmentation on Borehole Images where we achieve an improvement of 1.9\% over PixPro and nearly 5\% over the supervised baseline. We further validate our technique on the indoor scene segmentation tasks with ScanNet and outdoor scenes with CityScapes ( 1.6\% and 1.1\% improvement over PixPro respectively).
△ Less
Submitted 18 November, 2022;
originally announced November 2022.
-
Universality of empirical risk minimization
Authors:
Andrea Montanari,
Basil Saeed
Abstract:
Consider supervised learning from i.i.d. samples $\{{\boldsymbol x}_i,y_i\}_{i\le n}$ where ${\boldsymbol x}_i \in\mathbb{R}^p$ are feature vectors and ${y} \in \mathbb{R}$ are labels. We study empirical risk minimization over a class of functions that are parameterized by $\mathsf{k} = O(1)$ vectors ${\boldsymbol θ}_1, . . . , {\boldsymbol θ}_{\mathsf k} \in \mathbb{R}^p$ , and prove universality…
▽ More
Consider supervised learning from i.i.d. samples $\{{\boldsymbol x}_i,y_i\}_{i\le n}$ where ${\boldsymbol x}_i \in\mathbb{R}^p$ are feature vectors and ${y} \in \mathbb{R}$ are labels. We study empirical risk minimization over a class of functions that are parameterized by $\mathsf{k} = O(1)$ vectors ${\boldsymbol θ}_1, . . . , {\boldsymbol θ}_{\mathsf k} \in \mathbb{R}^p$ , and prove universality results both for the training and test error. Namely, under the proportional asymptotics $n,p\to\infty$, with $n/p = Θ(1)$, we prove that the training error depends on the random features distribution only through its covariance structure. Further, we prove that the minimum test error over near-empirical risk minimizers enjoys similar universality properties. In particular, the asymptotics of these quantities can be computed $-$to leading order$-$ under a simpler model in which the feature vectors ${\boldsymbol x}_i$ are replaced by Gaussian vectors ${\boldsymbol g}_i$ with the same covariance. Earlier universality results were limited to strongly convex learning procedures, or to feature vectors ${\boldsymbol x}_i$ with independent entries. Our results do not make any of these assumptions. Our assumptions are general enough to include feature vectors ${\boldsymbol x}_i$ that are produced by randomized featurization maps. In particular we explicitly check the assumptions for certain random features models (computing the output of a one-layer neural network with random weights) and neural tangent models (first-order Taylor approximation of two-layer networks).
△ Less
Submitted 31 October, 2022; v1 submitted 17 February, 2022;
originally announced February 2022.
-
Accurate Graph Filtering in Wireless Sensor Networks
Authors:
Leila Ben Saad,
Baltasar Beferull-Lozano
Abstract:
Wireless sensor networks (WSNs) are considered as a major technology enabling the Internet of Things (IoT) paradigm. The recent emerging Graph Signal Processing field can also contribute to enabling the IoT by providing key tools, such as graph filters, for processing the data associated with the sensor devices. Graph filters can be performed over WSNs in a distributed manner by means of a certain…
▽ More
Wireless sensor networks (WSNs) are considered as a major technology enabling the Internet of Things (IoT) paradigm. The recent emerging Graph Signal Processing field can also contribute to enabling the IoT by providing key tools, such as graph filters, for processing the data associated with the sensor devices. Graph filters can be performed over WSNs in a distributed manner by means of a certain number of communication exchanges among the nodes. But, WSNs are often affected by interferences and noise, which leads to view these networks as directed, random and time-varying graph topologies. Most of existing works neglect this problem by considering an unrealistic assumption that claims the same probability of link activation in both directions when sending a packet between two neighboring nodes. This work focuses on the problem of operating graph filtering in random asymmetric WSNs. We show first that graph filtering with finite impulse response graph filters (node-invariant and node-variant) requires having equal connectivity probabilities for all the links in order to have an unbiased filtering, which can not be achieved in practice in random WSNs. After this, we characterize the graph filtering error and present an efficient strategy to conduct graph filtering tasks over random WSNs with node-variant graph filters by maximizing accuracy, that is, ensuring a small bias-variance trade-off. In order to enforce the desired accuracy, we optimize the filter coefficients and design a cross-layer distributed scheduling algorithm at the MAC layer. Extensive numerical experiments are presented to show the efficiency of the proposed solution as well as the cross-layer distributed scheduling algorithm for the denoising application.
△ Less
Submitted 15 July, 2020; v1 submitted 24 April, 2020;
originally announced April 2020.
-
Quantization Analysis and Robust Design for Distributed Graph Filters
Authors:
Leila Ben Saad,
Baltasar Beferull-Lozano,
Elvin Isufi
Abstract:
Distributed graph filters have found applications in wireless sensor networks (WSNs) to solve distributed tasks such as consensus, signal denoising, and reconstruction. However, when employed over WSN, the graph filters should deal with the network limited energy, processing, and communication capabilities. Quantization plays a fundamental role to improve the latter but its effects on distributed…
▽ More
Distributed graph filters have found applications in wireless sensor networks (WSNs) to solve distributed tasks such as consensus, signal denoising, and reconstruction. However, when employed over WSN, the graph filters should deal with the network limited energy, processing, and communication capabilities. Quantization plays a fundamental role to improve the latter but its effects on distributed graph filtering are little understood. WSNs are also prone to random link losses due to noise and interference. The filter output is affected by both the quantization error and the topological randomness error, which, if it is not properly accounted in the filter design phase, may lead to an accumulated error through the filtering iterations and significantly degrade the performance. In this paper, we analyze how quantization affects distributed graph filtering over both time-invariant and time-varying graphs. We bring insights on the quantization effects for the two most common graph filters: the finite impulse response (FIR) and autoregressive moving average (ARMA) graph filter. We devise theoretical performance guarantees on the filter performance when the quantization stepsize is fixed or changes dynamically over the filtering iterations. For FIR filters, we show that a dynamic quantization stepsize leads to more control on the quantization noise than the fixed-stepsize quantization. For ARMA graph filters, we show that decreasing the quantization stepsize over the iterations reduces the quantization noise to zero at the steady-state. In addition, we propose robust filter design strategies that minimize the quantization noise for both time-invariant and time-varying networks. Numerical experiments on synthetic and two real data sets corroborate our findings and show the different trade-offs between quantization bits, filter order, and robustness to topological randomness.
△ Less
Submitted 14 April, 2020;
originally announced April 2020.
-
Causal Structure Discovery from Distributions Arising from Mixtures of DAGs
Authors:
Basil Saeed,
Snigdha Panigrahi,
Caroline Uhler
Abstract:
We consider distributions arising from a mixture of causal models, where each model is represented by a directed acyclic graph (DAG). We provide a graphical representation of such mixture distributions and prove that this representation encodes the conditional independence relations of the mixture distribution. We then consider the problem of structure learning based on samples from such distribut…
▽ More
We consider distributions arising from a mixture of causal models, where each model is represented by a directed acyclic graph (DAG). We provide a graphical representation of such mixture distributions and prove that this representation encodes the conditional independence relations of the mixture distribution. We then consider the problem of structure learning based on samples from such distributions. Since the mixing variable is latent, we consider causal structure discovery algorithms such as FCI that can deal with latent variables. We show that such algorithms recover a "union" of the component DAGs and can identify variables whose conditional distribution across the component DAGs vary. We demonstrate our results on synthetic and real data showing that the inferred graph identifies nodes that vary between the different mixture components. As an immediate application, we demonstrate how retrieval of this causal information can be used to cluster samples according to each mixture component.
△ Less
Submitted 9 August, 2020; v1 submitted 31 January, 2020;
originally announced January 2020.
-
Where is the Fake? Patch-Wise Supervised GANs for Texture Inpainting
Authors:
Ahmed Ben Saad,
Youssef Tamaazousti,
Josselin Kherroubi,
Alexis He
Abstract:
We tackle the problem of texture inpainting where the input images are textures with missing values along with masks that indicate the zones that should be generated. Many works have been done in image inpainting with the aim to achieve global and local consistency. But these works still suffer from limitations when dealing with textures. In fact, the local information in the image to be completed…
▽ More
We tackle the problem of texture inpainting where the input images are textures with missing values along with masks that indicate the zones that should be generated. Many works have been done in image inpainting with the aim to achieve global and local consistency. But these works still suffer from limitations when dealing with textures. In fact, the local information in the image to be completed needs to be used in order to achieve local continuities and visually realistic texture inpainting. For this, we propose a new segmentor discriminator that performs a patch-wise real/fake classification and is supervised by input masks. During training, it aims to locate the fake and thus backpropagates consistent signal to the generator. We tested our approach on the publicly available DTD dataset and showed that it achieves state-of-the-art performances and better deals with local consistency than existing methods.
△ Less
Submitted 9 March, 2020; v1 submitted 6 November, 2019;
originally announced November 2019.
-
Ordering-Based Causal Structure Learning in the Presence of Latent Variables
Authors:
Daniel Irving Bernstein,
Basil Saeed,
Chandler Squires,
Caroline Uhler
Abstract:
We consider the task of learning a causal graph in the presence of latent confounders given i.i.d.~samples from the model. While current algorithms for causal structure discovery in the presence of latent confounders are constraint-based, we here propose a score-based approach. We prove that under assumptions weaker than faithfulness, any sparsest independence map (IMAP) of the distribution belong…
▽ More
We consider the task of learning a causal graph in the presence of latent confounders given i.i.d.~samples from the model. While current algorithms for causal structure discovery in the presence of latent confounders are constraint-based, we here propose a score-based approach. We prove that under assumptions weaker than faithfulness, any sparsest independence map (IMAP) of the distribution belongs to the Markov equivalence class of the true model. This motivates the \emph{Sparsest Poset} formulation - that posets can be mapped to minimal IMAPs of the true model such that the sparsest of these IMAPs is Markov equivalent to the true model. Motivated by this result, we propose a greedy algorithm over the space of posets for causal structure discovery in the presence of latent confounders and compare its performance to the current state-of-the-art algorithms FCI and FCI+ on synthetic data.
△ Less
Submitted 24 March, 2020; v1 submitted 20 October, 2019;
originally announced October 2019.
-
Smart Palm: An IoT Framework for Red Palm Weevil Early Detection
Authors:
Anis Koubaa,
Abdulrahman Aldawood,
Bassel Saeed,
Abdullatif Hadid,
Mohanned Ahmed,
Abdulrahman Saad,
Hesham Alkhouja,
Mohamed Alkanhal
Abstract:
Smart agriculture is an evolving trend in agriculture industry, where sensors are embedded into plants to collect vital data and help in decision making to ensure higher quality of crops and prevent pests, disease, and other possible threats. In Saudi Arabia, growing palms is the most important agricultural activity, and there is an increasing need to leverage smart agriculture technology to impro…
▽ More
Smart agriculture is an evolving trend in agriculture industry, where sensors are embedded into plants to collect vital data and help in decision making to ensure higher quality of crops and prevent pests, disease, and other possible threats. In Saudi Arabia, growing palms is the most important agricultural activity, and there is an increasing need to leverage smart agriculture technology to improve the production of dates and prevent diseases. One of the most critical diseases of palms if the red palm weevil, which is an insect that causes a lot of damage to palm trees and can devast large areas of palm trees. The most challenging problem is that the effect of the weevil is not visible by humans until the palm reaches an advanced infestation state. For this reason, there is a need to use advanced technology for early detection and prevention of infestation propagation. In this project, we have developed am IoT based smart palm monitoring prototype as a proof-of-concept that (1) allows to monitor palms remotely using smart agriculture sensors, (2) contribute to the early detection of red palm weevil. Users can use web/mobile application to interact with their palm farms and help them in getting early detection of possible infestations. We used Elm company IoT platform to interface between the sensor layer and the user layer. In addition, we have collected data using accelerometer sensors and we applied signal processing and statistical techniques to analyze collected data and determine a fingerprint of the infestation.
△ Less
Submitted 21 September, 2019;
originally announced October 2019.
-
A Complete Transient Analysis for the Incremental LMS Algorithm
Authors:
Muhammad Omer Bin Saeed
Abstract:
The incremental least mean square (ILMS) algorithm was presented in \cite{Lopes2007}. The article included theoretical analysis of the algorithm along with simulation results under different scenarios. However, the transient analysis was left incomplete. This work presents the complete transient analysis, including the learning behavior. The analysis results are verified through several experiment…
▽ More
The incremental least mean square (ILMS) algorithm was presented in \cite{Lopes2007}. The article included theoretical analysis of the algorithm along with simulation results under different scenarios. However, the transient analysis was left incomplete. This work presents the complete transient analysis, including the learning behavior. The analysis results are verified through several experimental results.
△ Less
Submitted 9 September, 2019;
originally announced September 2019.
-
Explaining intuitive difficulty judgments by modeling physical effort and risk
Authors:
Ilker Yildirim,
Basil Saeed,
Grace Bennett-Pierre,
Tobias Gerstenberg,
Joshua Tenenbaum,
Hyowon Gweon
Abstract:
The ability to estimate task difficulty is critical for many real-world decisions such as setting appropriate goals for ourselves or appreciating others' accomplishments. Here we give a computational account of how humans judge the difficulty of a range of physical construction tasks (e.g., moving 10 loose blocks from their initial configuration to their target configuration, such as a vertical to…
▽ More
The ability to estimate task difficulty is critical for many real-world decisions such as setting appropriate goals for ourselves or appreciating others' accomplishments. Here we give a computational account of how humans judge the difficulty of a range of physical construction tasks (e.g., moving 10 loose blocks from their initial configuration to their target configuration, such as a vertical tower) by quantifying two key factors that influence construction difficulty: physical effort and physical risk. Physical effort captures the minimal work needed to transport all objects to their final positions, and is computed using a hybrid task-and-motion planner. Physical risk corresponds to stability of the structure, and is computed using noisy physics simulations to capture the costs for precision (e.g., attention, coordination, fine motor movements) required for success. We show that the full effort-risk model captures human estimates of difficulty and construction time better than either component alone.
△ Less
Submitted 14 May, 2019; v1 submitted 11 May, 2019;
originally announced May 2019.
-
Physical problem solving: Joint planning with symbolic, geometric, and dynamic constraints
Authors:
Ilker Yildirim,
Tobias Gerstenberg,
Basil Saeed,
Marc Toussaint,
Josh Tenenbaum
Abstract:
In this paper, we present a new task that investigates how people interact with and make judgments about towers of blocks. In Experiment~1, participants in the lab solved a series of problems in which they had to re-configure three blocks from an initial to a final configuration. We recorded whether they used one hand or two hands to do so. In Experiment~2, we asked participants online to judge wh…
▽ More
In this paper, we present a new task that investigates how people interact with and make judgments about towers of blocks. In Experiment~1, participants in the lab solved a series of problems in which they had to re-configure three blocks from an initial to a final configuration. We recorded whether they used one hand or two hands to do so. In Experiment~2, we asked participants online to judge whether they think the person in the lab used one or two hands. The results revealed a close correspondence between participants' actions in the lab, and the mental simulations of participants online. To explain participants' actions and mental simulations, we develop a model that plans over a symbolic representation of the situation, executes the plan using a geometric solver, and checks the plan's feasibility by taking into account the physical constraints of the scene. Our model explains participants' actions and judgments to a high degree of quantitative accuracy.
△ Less
Submitted 25 July, 2017;
originally announced July 2017.
-
Generic-Precision algorithm for DCT-Cordic architectures
Authors:
Imen Ben Saad,
Younes Lahbib,
Yassine Hachaïchi,
Sonia Mami,
Abdelkader Mami
Abstract:
In this paper we propose a generic algorithm to calculate the rotation parameters of CORDIC angles required for the Discrete Cosine Transform algorithm (DCT). This leads us to increase the precision of calculation meeting any accuracy.Our contribution is to use this decomposition in CORDIC based DCT which is appropriate for domains which require high quality and top precision. We then propose a ha…
▽ More
In this paper we propose a generic algorithm to calculate the rotation parameters of CORDIC angles required for the Discrete Cosine Transform algorithm (DCT). This leads us to increase the precision of calculation meeting any accuracy.Our contribution is to use this decomposition in CORDIC based DCT which is appropriate for domains which require high quality and top precision. We then propose a hardware implementation of the novel transformation, and as expected, a substantial improvement in PSNR quality is found.
△ Less
Submitted 8 June, 2016;
originally announced June 2016.
-
A Unified Analysis Approach for LMS-based Variable Step-Size Algorithms
Authors:
Muhammad Omer Bin Saeed
Abstract:
The least-mean-squares (LMS) algorithm is the most popular algorithm in adaptive filtering. Several variable step-size strategies have been suggested to improve the performance of the LMS algorithm. These strategies enhance the performance of the algorithm but a major drawback is the complexity in the theoretical analysis of the resultant algorithms. Researchers use several assumptions to find clo…
▽ More
The least-mean-squares (LMS) algorithm is the most popular algorithm in adaptive filtering. Several variable step-size strategies have been suggested to improve the performance of the LMS algorithm. These strategies enhance the performance of the algorithm but a major drawback is the complexity in the theoretical analysis of the resultant algorithms. Researchers use several assumptions to find closed-form analytical solutions. This work presents a unified approach for the analysis of variable step-size LMS algorithms. The approach is then applied to several variable step-size strategies and theoretical and simulation results are compared.
△ Less
Submitted 11 January, 2015;
originally announced January 2015.
-
Low-Complexity Particle Swarm Optimization for Time-Critical Applications
Authors:
Muhammad Saqib Sohail,
Muhammad Omer Bin Saeed,
Syed Zeeshan Rizvi,
Mobien Shoaib,
Asrar Ul Haq Sheikh
Abstract:
Particle swam optimization (PSO) is a popular stochastic optimization method that has found wide applications in diverse fields. However, PSO suffers from high computational complexity and slow convergence speed. High computational complexity hinders its use in applications that have limited power resources while slow convergence speed makes it unsuitable for time critical applications. In this pa…
▽ More
Particle swam optimization (PSO) is a popular stochastic optimization method that has found wide applications in diverse fields. However, PSO suffers from high computational complexity and slow convergence speed. High computational complexity hinders its use in applications that have limited power resources while slow convergence speed makes it unsuitable for time critical applications. In this paper, we propose two techniques to overcome these limitations. The first technique reduces the computational complexity of PSO while the second technique speeds up its convergence. These techniques can be applied, either separately or in conjunction, to any existing PSO variant. The proposed techniques are robust to the number of dimensions of the optimization problem. Simulation results are presented for the proposed techniques applied to the standard PSO as well as to several PSO variants. The results show that the use of both these techniques in conjunction results in a reduction in the number of computations required as well as faster convergence speed while maintaining an acceptable error performance for time-critical applications.
△ Less
Submitted 2 January, 2014;
originally announced January 2014.
-
An accelerated CLPSO algorithm
Authors:
Muhammad Omer Bin Saeed,
Muhammad Saqib Sohail,
Syed Zeeshan Rizvi,
Mobien Shoaib,
Asrar Ul Haq Sheikh
Abstract:
The particle swarm approach provides a low complexity solution to the optimization problem among various existing heuristic algorithms. Recent advances in the algorithm resulted in improved performance at the cost of increased computational complexity, which is undesirable. Literature shows that the particle swarm optimization algorithm based on comprehensive learning provides the best complexity-…
▽ More
The particle swarm approach provides a low complexity solution to the optimization problem among various existing heuristic algorithms. Recent advances in the algorithm resulted in improved performance at the cost of increased computational complexity, which is undesirable. Literature shows that the particle swarm optimization algorithm based on comprehensive learning provides the best complexity-performance trade-off. We show how to reduce the complexity of this algorithm further, with a slight but acceptable performance loss. This enhancement allows the application of the algorithm in time critical applications, such as, real-time tracking, equalization etc.
△ Less
Submitted 14 April, 2013;
originally announced April 2013.