-
Revisiting Cascaded Ensembles for Efficient Inference
Authors:
Steven Kolawole,
Don Dennis,
Ameet Talwalkar,
Virginia Smith
Abstract:
A common approach to make machine learning inference more efficient is to use example-specific adaptive schemes, which route or select models for each example at inference time. In this work we study a simple scheme for adaptive inference. We build a cascade of ensembles (CoE), beginning with resource-efficient models and growing to larger, more expressive models, where ensemble agreement serves a…
▽ More
A common approach to make machine learning inference more efficient is to use example-specific adaptive schemes, which route or select models for each example at inference time. In this work we study a simple scheme for adaptive inference. We build a cascade of ensembles (CoE), beginning with resource-efficient models and growing to larger, more expressive models, where ensemble agreement serves as a data-dependent routing criterion. This scheme is easy to incorporate into existing inference pipelines, requires no additional training, and can be used to place models across multiple resource tiers--for instance, serving efficient models at the edge and invoking larger models in the cloud only when necessary. In cases where parallel inference is feasible, we show that CoE can improve accuracy relative to the single best model while reducing the average cost of inference by up to 7x, and provides Pareto-dominate solutions in accuracy and efficiency relative to existing adaptive inference baselines. These savings translate to an over 3x-reduction in total monetary cost when performing inference using a heterogeneous cluster of GPUs. Finally, for edge inference scenarios where portions of the cascade reside at the edge vs. in the cloud, CoE can provide a 14x reduction in communication cost and inference latency without sacrificing accuracy.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
Progressive Ensemble Distillation: Building Ensembles for Efficient Inference
Authors:
Don Kurian Dennis,
Abhishek Shetty,
Anish Sevekari,
Kazuhito Koishida,
Virginia Smith
Abstract:
We study the problem of progressive ensemble distillation: Given a large, pretrained teacher model $g$, we seek to decompose the model into smaller, low-inference cost student models $f_i$, such that progressively evaluating additional models in this ensemble leads to improved predictions. The resulting ensemble allows for flexibly tuning accuracy vs. inference cost at runtime, which is useful for…
▽ More
We study the problem of progressive ensemble distillation: Given a large, pretrained teacher model $g$, we seek to decompose the model into smaller, low-inference cost student models $f_i$, such that progressively evaluating additional models in this ensemble leads to improved predictions. The resulting ensemble allows for flexibly tuning accuracy vs. inference cost at runtime, which is useful for a number of applications in on-device inference. The method we propose, B-DISTIL , relies on an algorithmic procedure that uses function composition over intermediate activations to construct expressive ensembles with similar performance as $g$ , but with smaller student models. We demonstrate the effectiveness of B-DISTIL by decomposing pretrained models across standard image, speech, and sensor datasets. We also provide theoretical guarantees in terms of convergence and generalization.
△ Less
Submitted 9 November, 2023; v1 submitted 20 February, 2023;
originally announced February 2023.
-
Bitrate-Constrained DRO: Beyond Worst Case Robustness To Unknown Group Shifts
Authors:
Amrith Setlur,
Don Dennis,
Benjamin Eysenbach,
Aditi Raghunathan,
Chelsea Finn,
Virginia Smith,
Sergey Levine
Abstract:
Training machine learning models robust to distribution shifts is critical for real-world applications. Some robust training algorithms (e.g., Group DRO) specialize to group shifts and require group information on all training points. Other methods (e.g., CVaR DRO) that do not need group annotations can be overly conservative, since they naively upweight high loss points which may form a contrived…
▽ More
Training machine learning models robust to distribution shifts is critical for real-world applications. Some robust training algorithms (e.g., Group DRO) specialize to group shifts and require group information on all training points. Other methods (e.g., CVaR DRO) that do not need group annotations can be overly conservative, since they naively upweight high loss points which may form a contrived set that does not correspond to any meaningful group in the real world (e.g., when the high loss points are randomly mislabeled training points). In this work, we address limitations in prior approaches by assuming a more nuanced form of group shift: conditioned on the label, we assume that the true group function (indicator over group) is simple. For example, we may expect that group shifts occur along low bitrate features (e.g., image background, lighting). Thus, we aim to learn a model that maintains high accuracy on simple group functions realized by these low bitrate features, that need not spend valuable model capacity achieving high accuracy on contrived groups of examples. Based on this, we consider the two-player game formulation of DRO where the adversary's capacity is bitrate-constrained. Our resulting practical algorithm, Bitrate-Constrained DRO (BR-DRO), does not require group information on training samples yet matches the performance of Group DRO on datasets that have training group annotations and that of CVaR DRO on long-tailed distributions. Our theoretical analysis reveals that in some settings BR-DRO objective can provably yield statistically efficient and less conservative solutions than unconstrained CVaR DRO.
△ Less
Submitted 11 October, 2023; v1 submitted 6 February, 2023;
originally announced February 2023.
-
Coherent Organizational States in Turbulent Pipe Flow at moderate Reynolds numbers
Authors:
R. Jäckel,
B. Magacho,
B. E. Owolabi,
L. Moriconi,
D. J. C. Dennis,
J. B. R. Loureiro
Abstract:
Turbulent pipe flow is still an essentially open area of research, boosted in the last two decades by considerable progress achieved both on the experimental and numerical frontiers, mainly related to the identification and characterization of coherent structures as basic building blocks of turbulence. It has been a challenging task, however, to detect and visualize these coherent states. We addre…
▽ More
Turbulent pipe flow is still an essentially open area of research, boosted in the last two decades by considerable progress achieved both on the experimental and numerical frontiers, mainly related to the identification and characterization of coherent structures as basic building blocks of turbulence. It has been a challenging task, however, to detect and visualize these coherent states. We address, by means of stereoscopic particle image velocimetry, that issue with the help of a large diameter (6 inches) pipe loop, which allowed us to probe for coherent states at various moderate Reynolds numbers (5300 < Re < 29000)). Although these states have been observed at flow regimes around laminar-turbulent transition (Re $\approx$ 2300) and also at high Reynolds number pipe flow (Re $\approx$ 35000), at moderate Reynolds numbers their existence had not been observed yet by experiment. By conditionally averaging the flow fields with respect to their dominant azimuthal wavenumber of streamwise velocity streaks, we have been able to uncover the existence of ten well-defined coherent flow patterns. It turns out, as a remarkable phenomenon, that their occurrence probabilities and the total number of dominant modes do not essentially change as the Reynolds number is varied. Their occurrence probabilities are noted to be reasonably well described by a Poisson distribution, which suggests that low-speed streaks are created as a Poisson process on the pipe circular geometry.
△ Less
Submitted 26 January, 2023;
originally announced January 2023.
-
Stochastic Model of Organizational State Transitions in a Turbulent Pipe Flow
Authors:
Robert Jäckel,
Bruno Magacho,
Bayode Owolabi,
Luca Moriconi,
David J. C. Dennis,
Juliana B. R. Loureiro
Abstract:
Turbulent pipe flows exhibit organizational states (OSs) that are labelled by discrete azimuthal wavenumber modes and are reminiscent of the traveling wave solutions of low Reynolds number regimes. The discretized time evolution of the OSs, obtained through stereoscopic particle image velocimetry, is shown to be non-Markovian for data acquisition carried out at a structure-resolved sampling rate.…
▽ More
Turbulent pipe flows exhibit organizational states (OSs) that are labelled by discrete azimuthal wavenumber modes and are reminiscent of the traveling wave solutions of low Reynolds number regimes. The discretized time evolution of the OSs, obtained through stereoscopic particle image velocimetry, is shown to be non-Markovian for data acquisition carried out at a structure-resolved sampling rate. In particular, properly defined time-correlation functions for the OS transitions are observed to decay as intriguing power laws, up to a large-eddy time horizon, beyond which they decorrelate at much faster rates. We are able to establish, relying upon a probabilistic description of the creation and annihilation of streamwise streaks, a lower-level {\it{Markovian}} model for the OS transitions, which reproduces their time-correlated behavior with meaningful accuracy. These findings indicate that the OSs are distributed along the pipe as statistically correlated packets of quasi-streamwise vortical structures.
△ Less
Submitted 12 January, 2023;
originally announced January 2023.
-
Adversarial Policies Beat Superhuman Go AIs
Authors:
Tony T. Wang,
Adam Gleave,
Tom Tseng,
Kellin Pelrine,
Nora Belrose,
Joseph Miller,
Michael D. Dennis,
Yawen Duan,
Viktor Pogrebniak,
Sergey Levine,
Stuart Russell
Abstract:
We attack the state-of-the-art Go-playing AI system KataGo by training adversarial policies against it, achieving a >97% win rate against KataGo running at superhuman settings. Our adversaries do not win by playing Go well. Instead, they trick KataGo into making serious blunders. Our attack transfers zero-shot to other superhuman Go-playing AIs, and is comprehensible to the extent that human exper…
▽ More
We attack the state-of-the-art Go-playing AI system KataGo by training adversarial policies against it, achieving a >97% win rate against KataGo running at superhuman settings. Our adversaries do not win by playing Go well. Instead, they trick KataGo into making serious blunders. Our attack transfers zero-shot to other superhuman Go-playing AIs, and is comprehensible to the extent that human experts can implement it without algorithmic assistance to consistently beat superhuman AIs. The core vulnerability uncovered by our attack persists even in KataGo agents adversarially trained to defend against our attack. Our results demonstrate that even superhuman AI systems may harbor surprising failure modes. Example games are available https://goattack.far.ai/.
△ Less
Submitted 13 July, 2023; v1 submitted 31 October, 2022;
originally announced November 2022.
-
Heterogeneity for the Win: One-Shot Federated Clustering
Authors:
Don Kurian Dennis,
Tian Li,
Virginia Smith
Abstract:
In this work, we explore the unique challenges -- and opportunities -- of unsupervised federated learning (FL). We develop and analyze a one-shot federated clustering scheme, $k$-FED, based on the widely-used Lloyd's method for $k$-means clustering. In contrast to many supervised problems, we show that the issue of statistical heterogeneity in federated networks can in fact benefit our analysis. W…
▽ More
In this work, we explore the unique challenges -- and opportunities -- of unsupervised federated learning (FL). We develop and analyze a one-shot federated clustering scheme, $k$-FED, based on the widely-used Lloyd's method for $k$-means clustering. In contrast to many supervised problems, we show that the issue of statistical heterogeneity in federated networks can in fact benefit our analysis. We analyse $k$-FED under a center separation assumption and compare it to the best known requirements of its centralized counterpart. Our analysis shows that in heterogeneous regimes where the number of clusters per device $(k')$ is smaller than the total number of clusters over the network $k$, $(k'\le \sqrt{k})$, we can use heterogeneity to our advantage -- significantly weakening the cluster separation requirements for $k$-FED. From a practical viewpoint, $k$-FED also has many desirable properties: it requires only round of communication, can run asynchronously, and can handle partial participation or node/network failures. We motivate our analysis with experiments on common FL benchmarks, and highlight the practical utility of one-shot clustering through use-cases in personalized FL and device sampling.
△ Less
Submitted 5 October, 2021; v1 submitted 28 February, 2021;
originally announced March 2021.