Search | arXiv e-print repository

Two Complementary Perspectives to Continual Learning: Ask Not Only What to Optimize, But Also How

Authors: Timm Hess, Tinne Tuytelaars, Gido M. van de Ven

Abstract: Recent years have seen considerable progress in the continual training of deep neural networks, predominantly thanks to approaches that add replay or regularization terms to the loss function to approximate the joint loss over all tasks so far. However, we show that even with a perfect approximation to the joint loss, these approaches still suffer from temporary but substantial forgetting when sta… ▽ More Recent years have seen considerable progress in the continual training of deep neural networks, predominantly thanks to approaches that add replay or regularization terms to the loss function to approximate the joint loss over all tasks so far. However, we show that even with a perfect approximation to the joint loss, these approaches still suffer from temporary but substantial forgetting when starting to train on a new task. Motivated by this 'stability gap', we propose that continual learning strategies should focus not only on the optimization objective, but also on the way this objective is optimized. While there is some continual learning work that alters the optimization trajectory (e.g., using gradient projection techniques), this line of research is positioned as alternative to improving the optimization objective, while we argue it should be complementary. In search of empirical support for our proposition, we perform a series of pre-registered experiments combining replay-approximated joint objectives with gradient projection-based optimization routines. However, this first experimental attempt fails to show clear and consistent benefits. Nevertheless, our conceptual arguments, as well as some of our empirical results, demonstrate the distinctive importance of the optimization trajectory in continual learning, thereby opening up a new direction for continual learning research. △ Less

Submitted 21 June, 2024; v1 submitted 8 November, 2023; originally announced November 2023.

Comments: Full paper version of pre-registered report accepted at the 1st ContinualAI Unconference. The originally submitted pre-registered proposal can be found at arXiv:2311.04898v1

arXiv:2310.03429 [pdf]

An Extended Phase Graph-based framework for DANTE-SPACE simulations including physiological, temporal, and spatial variations

Authors: Matthijs H. S. de Buck, Peter Jezzard, Aaron T. Hess

Abstract: Purpose: The DANTE-SPACE sequence facilitates three-dimensional intracranial vessel wall imaging with simultaneous suppression of blood and cerebrospinal fluid (CSF). However, the achieved image contrast depends closely on the selected sequence parameters, and the clinical use of the sequence is limited in vivo by observed signal variations in the vessel wall, CSF, and blood. This paper introduces… ▽ More Purpose: The DANTE-SPACE sequence facilitates three-dimensional intracranial vessel wall imaging with simultaneous suppression of blood and cerebrospinal fluid (CSF). However, the achieved image contrast depends closely on the selected sequence parameters, and the clinical use of the sequence is limited in vivo by observed signal variations in the vessel wall, CSF, and blood. This paper introduces a comprehensive DANTE-SPACE simulation framework, with the aim of providing a better understanding of the underlying contrast mechanisms and facilitating improved parameter selection and contrast optimization. Methods: An Extended Phase Graph (EPG) formalism was developed for efficient spin ensemble simulation of the DANTE-SPACE sequence. Physiological processes such as pulsatile flow velocity variation, varying flow directions, intravoxel dephasing, diffusion, and B1+ effects were included in the framework to represent the mechanisms behind the achieved signal levels accurately. Results: Intravoxel velocity averaging improved temporal stability and robustness against small velocity changes. Time-varying pulsatile velocity variation affected CSF simulations, introducing periods of near-zero velocity and partial rephasing. Inclusion of diffusion effects was found to substantially reduce the CSF signal. Blood flow trajectory variations had minor effects, but B1+ differences along the trajectory reduced DANTE efficiency in low-B1+ areas. Introducing low-velocity pulsatility of both CSF and vessel wall helped explain the in vivo observed signal heterogeneity in both tissue types. Conclusion: The presented simulation framework facilitates a more comprehensive optimization of DANTE-SPACE sequence parameters. Furthermore, the simulation framework helps to explain observed contrasts in acquired data. △ Less

Submitted 5 October, 2023; originally announced October 2023.

arXiv:2304.00933 [pdf, other]

Knowledge Accumulation in Continually Learned Representations and the Issue of Feature Forgetting

Authors: Timm Hess, Eli Verwimp, Gido M. van de Ven, Tinne Tuytelaars

Abstract: Continual learning research has shown that neural networks suffer from catastrophic forgetting "at the output level", but it is debated whether this is also the case at the level of learned representations. Multiple recent studies ascribe representations a certain level of innate robustness against forgetting -- that they only forget minimally in comparison with forgetting at the output level. We… ▽ More Continual learning research has shown that neural networks suffer from catastrophic forgetting "at the output level", but it is debated whether this is also the case at the level of learned representations. Multiple recent studies ascribe representations a certain level of innate robustness against forgetting -- that they only forget minimally in comparison with forgetting at the output level. We revisit and expand upon the experiments that revealed this difference in forgetting and illustrate the coexistence of two phenomena that affect the quality of continually learned representations: knowledge accumulation and feature forgetting. Taking both aspects into account, we show that, even though forgetting in the representation (i.e. feature forgetting) can be small in absolute terms, when measuring relative to how much was learned during a task, forgetting in the representation tends to be just as catastrophic as forgetting at the output level. Next we show that this feature forgetting is problematic as it substantially slows down the incremental learning of good general representations (i.e. knowledge accumulation). Finally, we study how feature forgetting and knowledge accumulation are affected by different types of continual learning methods. △ Less

Submitted 24 June, 2024; v1 submitted 3 April, 2023; originally announced April 2023.

Comments: TMLR 2024

Journal ref: Transactions on Machine Learning Research (TMLR), 2024

arXiv:2303.12383 [pdf, other]

Exploiting d-DNNFs for Repetitive Counting Queries on Feature Models

Authors: Chico Sundermann, Heiko Raab, Tobias Heß, Thomas Thüm, Ina Schaefer

Abstract: Feature models are commonly used to specify the valid configurations of a product line. In industry, feature models are often complex due to a large number of features and constraints. Thus, a multitude of automated analyses have been proposed. Many of those rely on computing the number of valid configurations which typically depends on solving a #SAT problem, a computationally expensive operation… ▽ More Feature models are commonly used to specify the valid configurations of a product line. In industry, feature models are often complex due to a large number of features and constraints. Thus, a multitude of automated analyses have been proposed. Many of those rely on computing the number of valid configurations which typically depends on solving a #SAT problem, a computationally expensive operation. Further, most counting-based analyses require numerous #SAT computations on the same feature model. In particular, many analyses depend on multiple computations for evaluating the number of valid configurations that include certain features or conform to partial configurations. Instead of using expensive repetitive computations on highly similar formulas, we aim to improve the performance by reusing knowledge between these computations. In this work, we are the first to propose reusing d-DNNFs for performing efficient repetitive queries on features and partial configurations. Our empirical evaluation shows that our approach is up-to 8,300 times faster (99.99\% CPU-time saved) than the state of the art of repetitively invoking #SAT solvers. Applying our tool ddnnife reduces runtimes from days to minutes compared to using #SAT solvers. △ Less

Submitted 22 March, 2023; originally announced March 2023.

arXiv:2212.14596 [pdf]

Head-and-neck multi-channel B1+ map** and carotid arteries RF shimming using a parallel transmit head coil

Authors: Matthijs H. S. de Buck, Peter Jezzard, Aaron T. Hess

Abstract: Purpose: Neurovascular MRI suffers from a rapid drop in B1+ into the neck when using transmit head coils at 7T. One solution to improving B1+ magnitude in the major feeding arteries in the neck is to use custom RF shims on parallel transmit (pTx) head coils. However, calculating such shims requires robust multi-channel B1+ maps in both the head and the neck, which is challenging due to low RF pene… ▽ More Purpose: Neurovascular MRI suffers from a rapid drop in B1+ into the neck when using transmit head coils at 7T. One solution to improving B1+ magnitude in the major feeding arteries in the neck is to use custom RF shims on parallel transmit (pTx) head coils. However, calculating such shims requires robust multi-channel B1+ maps in both the head and the neck, which is challenging due to low RF penetration into the neck, limited dynamic range of multi-channel B1+ map** techniques, and B0 sensitivity. We therefore sought a robust large-dynamic-range pTx field map** protocol, and tested whether RF shimming can improve carotid artery B1+ in practice. Methods: A pipeline is presented that combines B1+ map** data acquired using circularly polarized (CP-) and CP2-mode RF shims at multiple voltages. The pipeline was evaluated by comparing the predicted and measured B1+ for multiple random transmit shims, and by assessing the ability of RF shimming to increase the B1+ in the carotid arteries. Results: The proposed method achieved good agreement between predicted and measured B1+ in both the head and the neck. The B1+ magnitude in the carotid arteries can be increased by 42% using tailored RF shims or by 37% using universal RF shims, while also improving the RF homogeneity compared to CP mode. Conclusion: B1+ in the neck can be increased using RF shims calculated from multi-channel B1+ maps in both the head and the neck. This can be achieved using universal phase-only RF shims, facilitating easy implementation in existing sequences. △ Less

Submitted 30 December, 2022; originally announced December 2022.

Comments: 19 pages (8 pages main text) & 10 figures. To be submitted for review

arXiv:2203.04180 [pdf, other]

Tuning-free multi-coil compressed sensing MRI with Parallel Variable Density Approximate Message Passing (P-VDAMP)

Authors: Charles Millard, Mark Chiew, Jared Tanner, Aaron T. Hess, Boris Mailhe

Abstract: Magnetic Resonance Imaging (MRI) has excellent soft tissue contrast but is hindered by an inherently slow data acquisition process. Compressed sensing, which reconstructs sparse signals from incoherently sampled data, has been widely applied to accelerate MRI acquisitions. Compressed sensing MRI requires one or more model parameters to be tuned, which is usually done by hand, giving sub-optimal tu… ▽ More Magnetic Resonance Imaging (MRI) has excellent soft tissue contrast but is hindered by an inherently slow data acquisition process. Compressed sensing, which reconstructs sparse signals from incoherently sampled data, has been widely applied to accelerate MRI acquisitions. Compressed sensing MRI requires one or more model parameters to be tuned, which is usually done by hand, giving sub-optimal tuning in general. To address this issue, we build on previous work by the authors on the single-coil Variable Density Approximate Message Passing (VDAMP) algorithm, extending the framework to multiple receiver coils to propose the Parallel VDAMP (P-VDAMP) algorithm. For Bernoulli random variable density sampling, P-VDAMP obeys a "state evolution", where the intermediate per-iteration image estimate is distributed according to the ground truth corrupted by a zero-mean Gaussian vector with approximately known covariance. To our knowledge, P-VDAMP is the first algorithm for multi-coil MRI data that obeys a state evolution with accurately tracked parameters. We leverage state evolution to automatically tune sparse parameters on-the-fly with Stein's Unbiased Risk Estimate (SURE). P-VDAMP is evaluated on brain, knee and angiogram datasets and compared with four variants of the Fast Iterative Shrinkage-Thresholding algorithm (FISTA), including two tuning-free variants from the literature. The proposed method is found to have a similar reconstruction quality and time to convergence as FISTA with an optimally tuned sparse weighting and offers substantial robustness and reconstruction quality improvements over competing tuning-free methods. △ Less

Submitted 28 April, 2022; v1 submitted 8 March, 2022; originally announced March 2022.

Comments: 10 pages, 9 figures, submitted to IEEE TMI on 26th April 2022

arXiv:2201.13217 [pdf, ps, other]

Fast Distributed k-Means with a Small Number of Rounds

Authors: Tom Hess, Ron Visbord, Sivan Sabato

Abstract: We propose a new algorithm for k-means clustering in a distributed setting, where the data is distributed across many machines, and a coordinator communicates with these machines to calculate the output clustering. Our algorithm guarantees a cost approximation factor and a number of communication rounds that depend only on the computational capacity of the coordinator. Moreover, the algorithm incl… ▽ More We propose a new algorithm for k-means clustering in a distributed setting, where the data is distributed across many machines, and a coordinator communicates with these machines to calculate the output clustering. Our algorithm guarantees a cost approximation factor and a number of communication rounds that depend only on the computational capacity of the coordinator. Moreover, the algorithm includes a built-in stop** mechanism, which allows it to use fewer communication rounds whenever possible. We show both theoretically and empirically that in many natural cases, indeed 1-4 rounds suffice. In comparison with the popular k-means|| algorithm, our approach allows exploiting a larger coordinator capacity to obtain a smaller number of rounds. Our experiments show that the k-means cost obtained by the proposed algorithm is usually better than the cost obtained by k-means||, even when the latter is allowed a larger number of rounds. Moreover, the machine running time in our approach is considerably smaller than that of k-means||. Code for running the algorithm and experiments is available at https://github.com/selotape/distributed_k_means. △ Less

Submitted 15 March, 2023; v1 submitted 31 January, 2022; originally announced January 2022.

Comments: AISTATS 2023

Journal ref: Proceedings of the Twenty Sixth International Conference on Artificial Intelligence and Statistics (AISTATS), PMLR 206:850--874, 2023

arXiv:2106.02585 [pdf, other]

A Procedural World Generation Framework for Systematic Evaluation of Continual Learning

Authors: Timm Hess, Martin Mundt, Iuliia Pliushch, Visvanathan Ramesh

Abstract: Several families of continual learning techniques have been proposed to alleviate catastrophic interference in deep neural network training on non-stationary data. However, a comprehensive comparison and analysis of limitations remains largely open due to the inaccessibility to suitable datasets. Empirical examination not only varies immensely between individual works, it further currently relies… ▽ More Several families of continual learning techniques have been proposed to alleviate catastrophic interference in deep neural network training on non-stationary data. However, a comprehensive comparison and analysis of limitations remains largely open due to the inaccessibility to suitable datasets. Empirical examination not only varies immensely between individual works, it further currently relies on contrived composition of benchmarks through subdivision and concatenation of various prevalent static vision datasets. In this work, our goal is to bridge this gap by introducing a computer graphics simulation framework that repeatedly renders only upcoming urban scene fragments in an endless real-time procedural world generation process. At its core lies a modular parametric generative model with adaptable generative factors. The latter can be used to flexibly compose data streams, which significantly facilitates a detailed analysis and allows for effortless investigation of various continual learning schemes. △ Less

Submitted 13 December, 2021; v1 submitted 4 June, 2021; originally announced June 2021.

Comments: Published in Neural Information Processing Systems, Dataset and Benchmarks Track 2021

arXiv:2104.04300 [pdf]

Optimization of Undersampling Parameters for 3D Intracranial Compressed Sensing MR Angiography at 7 Tesla

Authors: Matthijs H. S. de Buck, Peter Jezzard, Aaron T. Hess

Abstract: Purpose: 3D Time-of-flight (TOF) MR Angiography (MRA) can accurately visualize the intracranial vasculature, but is limited by long acquisition times. Compressed sensing (CS) reconstruction can be used to substantially accelerate acquisitions. The quality of those reconstructions depends on the undersampling patterns used in the acquisitions. In this work, optimized sets of undersampling parameter… ▽ More Purpose: 3D Time-of-flight (TOF) MR Angiography (MRA) can accurately visualize the intracranial vasculature, but is limited by long acquisition times. Compressed sensing (CS) reconstruction can be used to substantially accelerate acquisitions. The quality of those reconstructions depends on the undersampling patterns used in the acquisitions. In this work, optimized sets of undersampling parameters using various acceleration factors for Cartesian 3D TOF-MRA are established. Methods: Fully-sampled datasets acquired at 7T were retrospectively undersampled using variable-density Poisson-disk sampling with various autocalibration region sizes, polynomial orders, and acceleration factors. The accuracy of reconstructions from the different undersampled datasets was assessed using the vessel-masked structural similarity index. Results were compared for four imaging volumes, acquired from two different subjects. Optimized undersampling parameters were validated using additional prospectively undersampled datasets. Results: For all acceleration factors, using a fully-sampled calibration area of 12x12 k-space lines and a polynomial order of around 2-2.4 resulted in the highest image quality. The importance of sampling parameter optimization was found to increase for higher acceleration factors. The results were consistent across resolutions and regions of interest with vessels of varying sizes and tortuosity. In prospectively undersampled acquisitions, using optimized undersampling parameters resulted in a 7.2% increase in the number of visible small vessels at R = 7.2. Conclusion: The image quality of CS TOF-MRA can be improved by appropriate choice of undersampling parameters. The optimized sets of parameters are independent of the acceleration factor. △ Less

Submitted 9 April, 2021; originally announced April 2021.

Comments: Manuscript to be submitted to Magnetic Resonance in Medicine

arXiv:2102.04050 [pdf, ps, other]

A Constant Approximation Algorithm for Sequential Random-Order No-Substitution k-Median Clustering

Authors: Tom Hess, Michal Moshkovitz, Sivan Sabato

Abstract: We study k-median clustering under the sequential no-substitution setting. In this setting, a data stream is sequentially observed, and some of the points are selected by the algorithm as cluster centers. However, a point can be selected as a center only immediately after it is observed, before observing the next point. In addition, a selected center cannot be substituted later. We give the first… ▽ More We study k-median clustering under the sequential no-substitution setting. In this setting, a data stream is sequentially observed, and some of the points are selected by the algorithm as cluster centers. However, a point can be selected as a center only immediately after it is observed, before observing the next point. In addition, a selected center cannot be substituted later. We give the first algorithm for this setting that obtains a constant approximation factor on the optimal risk under a random arrival order, an exponential improvement over previous work. This is also the first constant approximation guarantee that holds without any structural assumptions on the input data. Moreover, the number of selected centers is only quasi-linear in k. Our algorithm and analysis are based on a careful risk estimation that avoids outliers, a new concept of a linear bin division, and a multiscale approach to center selection. △ Less

Submitted 6 June, 2021; v1 submitted 8 February, 2021; originally announced February 2021.

Journal ref: Advances in Neural Information Processing Systems, pages 3298-3308, 2021

arXiv:2011.06471 [pdf]

Accelerated calibrationless parallel transmit map** using joint transmit and receive low-rank tensor completion

Authors: Aaron T Hess, Iulius Dragonu, Mark Chiew

Abstract: Purpose: To evaluate an algorithm for calibrationless parallel imaging to reconstruct undersampled parallel transmit field maps for the body and brain. Methods: Using synthetic data, body, and brain measurements of relative transmit maps, three different approaches to a joint transmit-receive low-rank tensor completion algorithm are evaluated. These methods included: (i) virtual coils using the… ▽ More Purpose: To evaluate an algorithm for calibrationless parallel imaging to reconstruct undersampled parallel transmit field maps for the body and brain. Methods: Using synthetic data, body, and brain measurements of relative transmit maps, three different approaches to a joint transmit-receive low-rank tensor completion algorithm are evaluated. These methods included: (i) virtual coils using the product of receive and transmit sensitivities, (ii) joint-receiver coils that enforces a low rank structure across receive coils of all transmit modes, and (iii) transmit low rank (TxLR) that uses a low rank structure for both receive and transmit modes simultaneously. The performance of each are investigated for different noise levels and different acceleration rates on an 8-channel parallel transmit 7T system. Results: The virtual coils method broke down with increasing noise levels or acceleration rates greater than two producing RMS error greater than 0.1. The joint receiver coils method worked well up to acceleration factors of four, beyond which the RMS error exceeded 0.1. While TxLR enabled an eight-fold acceleration with most RMS errors remaining below 0.1. Conclusion: This work demonstrates that under-sampling factors of up to eight-fold are feasible for transmit array map** and can be reconstructed using calibrationless parallel imaging methods. △ Less

Submitted 12 November, 2020; originally announced November 2020.

Comments: 30 pages, 14 figures

arXiv:2005.02060 [pdf]

An investigation into the minimum number of tissue groups required for 7T in-silico parallel transmit electromagnetic safety simulations in the human head

Authors: Matthijs H. S. de Buck, Peter Jezzard, Hongbae Jeong, Aaron T. Hess

Abstract: Purpose: Safety limits for the permitted Specific Absorption Rate (SAR) place restrictions on pulse sequence design, especially at ultra-high fields ($\geq 7$ tesla). Due to inter-subject variability, the SAR is usually conservatively estimated based on standard human models that include an applied safety margin to ensure safe operation. One approach to reducing the restrictions is to create more… ▽ More Purpose: Safety limits for the permitted Specific Absorption Rate (SAR) place restrictions on pulse sequence design, especially at ultra-high fields ($\geq 7$ tesla). Due to inter-subject variability, the SAR is usually conservatively estimated based on standard human models that include an applied safety margin to ensure safe operation. One approach to reducing the restrictions is to create more accurate subject-specific models from their segmented MR images. This study uses electromagnetic simulations to investigate the minimum number of tissue groups required to accurately determine SAR in the human head. Methods: Tissue types from a fully characterized electromagnetic human model with 47 tissue types in the head and neck region were grouped into different tissue clusters based on the conductivities, permittivities, and mass densities of the tissues. Electromagnetic simulations of the head model inside a parallel transmit (pTx) head coil at 7T were used to determine the minimum number of required tissue clusters to accurately determine the subject-specific SAR. The identified tissue clusters were then evaluated using two additional well-characterized electromagnetic human models. Results: A minimum of 4 clusters plus air was found to be required for accurate SAR estimation. These tissue clusters are centered around gray matter, fat, cortical bone, and cerebrospinal fluid. For all three simulated models the pTx maximum 10gSAR was consistently determined to within an error of <12% relative to the full 47-tissue model. Conclusion: A minimum of 4 clusters plus air are required to produce accurate personalized SAR simulations of the human head when using pTx at 7T. △ Less

Submitted 17 July, 2020; v1 submitted 5 May, 2020; originally announced May 2020.

Comments: Submitted to Magnetic Resonance in Medicine

arXiv:2003.02701 [pdf, other]

Approximate Message Passing with a Colored Aliasing Model for Variable Density Fourier Sampled Images

Authors: Charles Millard, Aaron T Hess, Boris Mailhé, Jared Tanner

Abstract: The Approximate Message Passing (AMP) algorithm efficiently reconstructs signals which have been sampled with large i.i.d. sub-Gaussian sensing matrices. Central to AMP is its "state evolution", which guarantees that the difference between the current estimate and ground truth (the "aliasing") at every iteration obeys a Gaussian distribution that can be fully characterized by a scalar. However, wh… ▽ More The Approximate Message Passing (AMP) algorithm efficiently reconstructs signals which have been sampled with large i.i.d. sub-Gaussian sensing matrices. Central to AMP is its "state evolution", which guarantees that the difference between the current estimate and ground truth (the "aliasing") at every iteration obeys a Gaussian distribution that can be fully characterized by a scalar. However, when Fourier coefficients of a signal with non-uniform spectral density are sampled, such as in Magnetic Resonance Imaging (MRI), the aliasing is intrinsically colored, AMP's scalar state evolution is no longer accurate and the algorithm encounters convergence problems. In response, we propose the Variable Density Approximate Message Passing (VDAMP) algorithm, which uses the wavelet domain to model the colored aliasing. We present empirical evidence that VDAMP obeys a "colored state evolution", where the aliasing obeys a Gaussian distribution that can be fully characterized with one scalar per wavelet subband. A benefit of state evolution is that Stein's Unbiased Risk Estimate (SURE) can be effectively implemented, yielding an algorithm with subband-dependent thresholding that has no free parameters. We empirically evaluate the effectiveness of VDAMP on three variations of Fast Iterative Shrinkage-Thresholding (FISTA) and find that it converges in around 10 times fewer iterations on average than the next-fastest method, and to a comparable mean-squared-error. △ Less

Submitted 7 September, 2020; v1 submitted 3 March, 2020; originally announced March 2020.

Comments: 13 pages, 7 figures, 3 tables. arXiv admin note: text overlap with arXiv:1911.01234

MSC Class: 47A52; 65J22; 94A08; 92C55 ACM Class: G.1.3

arXiv:1911.01234 [pdf, other]

An Approximate Message Passing Algorithm for Rapid Parameter-Free Compressed Sensing MRI

Authors: Charles Millard, Aaron T Hess, Boris Mailhé, Jared Tanner

Abstract: For certain sensing matrices, the Approximate Message Passing (AMP) algorithm efficiently reconstructs undersampled signals. However, in Magnetic Resonance Imaging (MRI), where Fourier coefficients of a natural image are sampled with variable density, AMP encounters convergence problems. In response we present an algorithm based on Orthogonal AMP constructed specifically for variable density parti… ▽ More For certain sensing matrices, the Approximate Message Passing (AMP) algorithm efficiently reconstructs undersampled signals. However, in Magnetic Resonance Imaging (MRI), where Fourier coefficients of a natural image are sampled with variable density, AMP encounters convergence problems. In response we present an algorithm based on Orthogonal AMP constructed specifically for variable density partial Fourier sensing matrices. For the first time in this setting a state evolution has been observed. A practical advantage of state evolution is that Stein's Unbiased Risk Estimate (SURE) can be effectively implemented, yielding an algorithm with no free parameters. We empirically evaluate the effectiveness of the parameter-free algorithm on simulated data and find that it converges over 5x faster and to a lower mean-squared error solution than Fast Iterative Shrinkage-Thresholding (FISTA). △ Less

Submitted 7 September, 2020; v1 submitted 4 November, 2019; originally announced November 2019.

Comments: 5 pages, 5 figures, IEEE International Conference on Image Processing (ICIP) 2020

ACM Class: G.1.3

arXiv:1905.12925 [pdf, other]

Sequential no-Substitution k-Median-Clustering

Authors: Tom Hess, Sivan Sabato

Abstract: We study the sample-based k-median clustering objective under a sequential setting without substitutions. In this setting, an i.i.d. sequence of examples is observed. An example can be selected as a center only immediately after it is observed, and it cannot be substituted later. The goal is to select a set of centers with a good k-median cost on the distribution which generated the sequence. We p… ▽ More We study the sample-based k-median clustering objective under a sequential setting without substitutions. In this setting, an i.i.d. sequence of examples is observed. An example can be selected as a center only immediately after it is observed, and it cannot be substituted later. The goal is to select a set of centers with a good k-median cost on the distribution which generated the sequence. We provide an efficient algorithm for this setting, and show that its multiplicative approximation factor is twice the approximation factor of an efficient offline algorithm. In addition, we show that if efficiency requirements are removed, there is an algorithm that can obtain the same approximation factor as the best offline algorithm. We demonstrate in experiments the performance of the efficient algorithm on real data sets. Our code is available at https://github.com/tomhess/No_Substitution_K_Median. △ Less

Submitted 22 April, 2020; v1 submitted 30 May, 2019; originally announced May 2019.

Comments: to appear at AISTATS 2020

Journal ref: Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics (AISTATS), 962--972, 2020

arXiv:1702.03989 [pdf, ps, other]

Selecting with History

Authors: Tom Hess, Sivan Sabato

Abstract: We define a new selection problem, \emph{Selecting with History}, which extends the secretary problem to a setting with historical information. We propose a strategy for this problem and calculate its success probability in the limit of a large sequence. We define a new selection problem, \emph{Selecting with History}, which extends the secretary problem to a setting with historical information. We propose a strategy for this problem and calculate its success probability in the limit of a large sequence. △ Less

Submitted 26 September, 2018; v1 submitted 13 February, 2017; originally announced February 2017.

arXiv:1602.01132 [pdf, ps, other]

Interactive algorithms: from pool to stream

Authors: Sivan Sabato, Tom Hess

Abstract: We consider interactive algorithms in the pool-based setting, and in the stream-based setting. Interactive algorithms observe suggested elements (representing actions or queries), and interactively select some of them and receive responses. Pool-based algorithms can select elements at any order, while stream-based algorithms observe elements in sequence, and can only select elements immediately af… ▽ More We consider interactive algorithms in the pool-based setting, and in the stream-based setting. Interactive algorithms observe suggested elements (representing actions or queries), and interactively select some of them and receive responses. Pool-based algorithms can select elements at any order, while stream-based algorithms observe elements in sequence, and can only select elements immediately after observing them. We assume that the suggested elements are generated independently from some source distribution, and ask what is the stream size required for emulating a pool algorithm with a given pool size. We provide algorithms and matching lower bounds for general pool algorithms, and for utility-based pool algorithms. We further show that a maximal gap between the two settings exists also in the special case of active learning for binary classification. △ Less

Submitted 16 June, 2016; v1 submitted 2 February, 2016; originally announced February 2016.

Comments: Appearing in COLT 2016

Journal ref: S. Sabato and T. Hess, "Interactive Algorithms: from Pool to Stream", Proceedings of the 29th Annual Conference on Learning Theory (COLT), JMLR Workshop and Conference Proceedings 49:1419-1439, 2016

Showing 1–17 of 17 results for author: Hess, T