-
CodeIt: Self-Improving Language Models with Prioritized Hindsight Replay
Authors:
Natasha Butt,
Blazej Manczak,
Auke Wiggers,
Corrado Rainone,
David W. Zhang,
Michaël Defferrard,
Taco Cohen
Abstract:
Large language models are increasingly solving tasks that are commonly believed to require human-level reasoning ability. However, these models still perform very poorly on benchmarks of general intelligence such as the Abstraction and Reasoning Corpus (ARC). In this paper, we approach ARC as a programming-by-examples problem, and introduce a novel and scalable method for language model self-impro…
▽ More
Large language models are increasingly solving tasks that are commonly believed to require human-level reasoning ability. However, these models still perform very poorly on benchmarks of general intelligence such as the Abstraction and Reasoning Corpus (ARC). In this paper, we approach ARC as a programming-by-examples problem, and introduce a novel and scalable method for language model self-improvement called Code Iteration (CodeIt). Our method iterates between 1) program sampling and hindsight relabeling, and 2) learning from prioritized experience replay. By relabeling the goal of an episode (i.e., the target program output given input) to the realized output produced by the sampled program, our method effectively deals with the extreme sparsity of rewards in program synthesis. Applying CodeIt to the ARC dataset, we demonstrate that prioritized hindsight replay, along with pre-training and data-augmentation, leads to successful inter-task generalization. CodeIt is the first neuro-symbolic approach that scales to the full ARC evaluation dataset. Our method solves 15% of ARC evaluation tasks, achieving state-of-the-art performance and outperforming existing neural and symbolic baselines. Our code is available at https://github.com/Qualcomm-AI-research/codeit .
△ Less
Submitted 1 July, 2024; v1 submitted 7 February, 2024;
originally announced February 2024.
-
MobileNVC: Real-time 1080p Neural Video Compression on a Mobile Device
Authors:
Ties van Rozendaal,
Tushar Singhal,
Hoang Le,
Guillaume Sautiere,
Amir Said,
Krishna Buska,
Anjuman Raha,
Dimitris Kalatzis,
Hitarth Mehta,
Frank Mayer,
Liang Zhang,
Markus Nagel,
Auke Wiggers
Abstract:
Neural video codecs have recently become competitive with standard codecs such as HEVC in the low-delay setting. However, most neural codecs are large floating-point networks that use pixel-dense war** operations for temporal modeling, making them too computationally expensive for deployment on mobile devices. Recent work has demonstrated that running a neural decoder in real time on mobile is f…
▽ More
Neural video codecs have recently become competitive with standard codecs such as HEVC in the low-delay setting. However, most neural codecs are large floating-point networks that use pixel-dense war** operations for temporal modeling, making them too computationally expensive for deployment on mobile devices. Recent work has demonstrated that running a neural decoder in real time on mobile is feasible, but shows this only for 720p RGB video. This work presents the first neural video codec that decodes 1080p YUV420 video in real time on a mobile device. Our codec relies on two major contributions. First, we design an efficient codec that uses a block-based motion compensation algorithm available on the war** core of the mobile accelerator, and we show how to quantize this model to integer precision. Second, we implement a fast decoder pipeline that concurrently runs neural network components on the neural signal processor, parallel entropy coding on the mobile GPU, and war** on the war** core. Our codec outperforms the previous on-device codec by a large margin with up to 48% BD-rate savings, while reducing the MAC count on the receiver side by $10 \times$. We perform a careful ablation to demonstrate the effect of the introduced motion compensation scheme, and ablate the effect of model quantization.
△ Less
Submitted 15 November, 2023; v1 submitted 2 October, 2023;
originally announced October 2023.
-
A Residual Diffusion Model for High Perceptual Quality Codec Augmentation
Authors:
Noor Fathima Ghouse,
Jens Petersen,
Auke Wiggers,
Tianlin Xu,
Guillaume Sautière
Abstract:
Diffusion probabilistic models have recently achieved remarkable success in generating high quality image and video data. In this work, we build on this class of generative models and introduce a method for lossy compression of high resolution images. The resulting codec, which we call DIffuson-based Residual Augmentation Codec (DIRAC), is the first neural codec to allow smooth traversal of the ra…
▽ More
Diffusion probabilistic models have recently achieved remarkable success in generating high quality image and video data. In this work, we build on this class of generative models and introduce a method for lossy compression of high resolution images. The resulting codec, which we call DIffuson-based Residual Augmentation Codec (DIRAC), is the first neural codec to allow smooth traversal of the rate-distortion-perception tradeoff at test time, while obtaining competitive performance with GAN-based methods in perceptual quality. Furthermore, while sampling from diffusion probabilistic models is notoriously expensive, we show that in the compression setting the number of steps can be drastically reduced.
△ Less
Submitted 29 March, 2023; v1 submitted 13 January, 2023;
originally announced January 2023.
-
Boosting neural video codecs by exploiting hierarchical redundancy
Authors:
Reza Pourreza,
Hoang Le,
Amir Said,
Guillaume Sautiere,
Auke Wiggers
Abstract:
In video compression, coding efficiency is improved by reusing pixels from previously decoded frames via motion and residual compensation. We define two levels of hierarchical redundancy in video frames: 1) first-order: redundancy in pixel space, i.e., similarities in pixel values across neighboring frames, which is effectively captured using motion and residual compensation, 2) second-order: redu…
▽ More
In video compression, coding efficiency is improved by reusing pixels from previously decoded frames via motion and residual compensation. We define two levels of hierarchical redundancy in video frames: 1) first-order: redundancy in pixel space, i.e., similarities in pixel values across neighboring frames, which is effectively captured using motion and residual compensation, 2) second-order: redundancy in motion and residual maps due to smooth motion in natural videos. While most of the existing neural video coding literature addresses first-order redundancy, we tackle the problem of capturing second-order redundancy in neural video codecs via predictors. We introduce generic motion and residual predictors that learn to extrapolate from previously decoded data. These predictors are lightweight, and can be employed with most neural video codecs in order to improve their rate-distortion performance. Moreover, while RGB is the dominant colorspace in neural video coding literature, we introduce general modifications for neural video codecs to embrace the YUV420 colorspace and report YUV420 results. Our experiments show that using our predictors with a well-known neural video codec leads to 38% and 34% bitrate savings in RGB and YUV420 colorspaces measured on the UVG dataset.
△ Less
Submitted 16 September, 2022; v1 submitted 8 August, 2022;
originally announced August 2022.
-
MobileCodec: Neural Inter-frame Video Compression on Mobile Devices
Authors:
Hoang Le,
Liang Zhang,
Amir Said,
Guillaume Sautiere,
Yang Yang,
Pranav Shrestha,
Fei Yin,
Reza Pourreza,
Auke Wiggers
Abstract:
Realizing the potential of neural video codecs on mobile devices is a big technological challenge due to the computational complexity of deep networks and the power-constrained mobile hardware. We demonstrate practical feasibility by leveraging Qualcomm's technology and innovation, bridging the gap from neural network-based codec simulations running on wall-powered workstations, to real-time opera…
▽ More
Realizing the potential of neural video codecs on mobile devices is a big technological challenge due to the computational complexity of deep networks and the power-constrained mobile hardware. We demonstrate practical feasibility by leveraging Qualcomm's technology and innovation, bridging the gap from neural network-based codec simulations running on wall-powered workstations, to real-time operation on a mobile device powered by Snapdragon technology. We show the first-ever inter-frame neural video decoder running on a commercial mobile phone, decoding high-definition videos in real-time while maintaining a low bitrate and high visual quality.
△ Less
Submitted 17 July, 2022;
originally announced July 2022.
-
Instance-Adaptive Video Compression: Improving Neural Codecs by Training on the Test Set
Authors:
Ties van Rozendaal,
Johann Brehmer,
Yunfan Zhang,
Reza Pourreza,
Auke Wiggers,
Taco S. Cohen
Abstract:
We introduce a video compression algorithm based on instance-adaptive learning. On each video sequence to be transmitted, we finetune a pretrained compression model. The optimal parameters are transmitted to the receiver along with the latent code. By entropy-coding the parameter updates under a suitable mixture model prior, we ensure that the network parameters can be encoded efficiently. This in…
▽ More
We introduce a video compression algorithm based on instance-adaptive learning. On each video sequence to be transmitted, we finetune a pretrained compression model. The optimal parameters are transmitted to the receiver along with the latent code. By entropy-coding the parameter updates under a suitable mixture model prior, we ensure that the network parameters can be encoded efficiently. This instance-adaptive compression algorithm is agnostic about the choice of base model and has the potential to improve any neural video codec. On UVG, HEVC, and Xiph datasets, our codec improves the performance of a scale-space flow model by between 21% and 27% BD-rate savings, and that of a state-of-the-art B-frame model by 17 to 20% BD-rate savings. We also demonstrate that instance-adaptive finetuning improves the robustness to domain shift. Finally, our approach reduces the capacity requirements of compression models. We show that it enables a competitive performance even after reducing the network size by 70%.
△ Less
Submitted 23 June, 2023; v1 submitted 19 November, 2021;
originally announced November 2021.
-
Parallelized Rate-Distortion Optimized Quantization Using Deep Learning
Authors:
Dana Kianfar,
Auke Wiggers,
Amir Said,
Reza Pourreza,
Taco Cohen
Abstract:
Rate-Distortion Optimized Quantization (RDOQ) has played an important role in the coding performance of recent video compression standards such as H.264/AVC, H.265/HEVC, VP9 and AV1. This scheme yields significant reductions in bit-rate at the expense of relatively small increases in distortion. Typically, RDOQ algorithms are prohibitively expensive to implement on real-time hardware encoders due…
▽ More
Rate-Distortion Optimized Quantization (RDOQ) has played an important role in the coding performance of recent video compression standards such as H.264/AVC, H.265/HEVC, VP9 and AV1. This scheme yields significant reductions in bit-rate at the expense of relatively small increases in distortion. Typically, RDOQ algorithms are prohibitively expensive to implement on real-time hardware encoders due to their sequential nature and their need to frequently obtain entropy coding costs. This work addresses this limitation using a neural network-based approach, which learns to trade-off rate and distortion during offline supervised training. As these networks are based solely on standard arithmetic operations that can be executed on existing neural network hardware, no additional area-on-chip needs to be reserved for dedicated RDOQ circuitry. We train two classes of neural networks, a fully-convolutional network and an auto-regressive network, and evaluate each as a post-quantization step designed to refine cheap quantization schemes such as scalar quantization (SQ). Both network architectures are designed to have a low computational overhead. After training they are integrated into the HM 16.20 implementation of HEVC, and their video coding performance is evaluated on a subset of the H.266/VVC SDR common test sequences. Comparisons are made to RDOQ and SQ implementations in HM 16.20. Our method achieves 1.64% BD-rate savings on luminosity compared to the HM SQ anchor, and on average reaches 45% of the performance of the iterative HM RDOQ algorithm.
△ Less
Submitted 11 December, 2020;
originally announced December 2020.
-
Predictive Sampling with Forecasting Autoregressive Models
Authors:
Auke Wiggers,
Emiel Hoogeboom
Abstract:
Autoregressive models (ARMs) currently hold state-of-the-art performance in likelihood-based modeling of image and audio data. Generally, neural network based ARMs are designed to allow fast inference, but sampling from these models is impractically slow. In this paper, we introduce the predictive sampling algorithm: a procedure that exploits the fast inference property of ARMs in order to speed u…
▽ More
Autoregressive models (ARMs) currently hold state-of-the-art performance in likelihood-based modeling of image and audio data. Generally, neural network based ARMs are designed to allow fast inference, but sampling from these models is impractically slow. In this paper, we introduce the predictive sampling algorithm: a procedure that exploits the fast inference property of ARMs in order to speed up sampling, while kee** the model intact. We propose two variations of predictive sampling, namely sampling with ARM fixed-point iteration and learned forecasting modules. Their effectiveness is demonstrated in two settings: i) explicit likelihood modeling on binary MNIST, SVHN and CIFAR10, and ii) discrete latent modeling in an autoencoder trained on SVHN, CIFAR10 and Imagenet32. Empirically, we show considerable improvements over baselines in number of ARM inference calls and sampling speed.
△ Less
Submitted 8 July, 2020; v1 submitted 23 February, 2020;
originally announced February 2020.
-
Simulating Execution Time of Tensor Programs using Graph Neural Networks
Authors:
Jakub M. Tomczak,
Romain Lepert,
Auke Wiggers
Abstract:
Optimizing the execution time of tensor program, e.g., a convolution, involves finding its optimal configuration. Searching the configuration space exhaustively is typically infeasible in practice. In line with recent research using TVM, we propose to learn a surrogate model to overcome this issue. The model is trained on an acyclic graph called an abstract syntax tree, and utilizes a graph convol…
▽ More
Optimizing the execution time of tensor program, e.g., a convolution, involves finding its optimal configuration. Searching the configuration space exhaustively is typically infeasible in practice. In line with recent research using TVM, we propose to learn a surrogate model to overcome this issue. The model is trained on an acyclic graph called an abstract syntax tree, and utilizes a graph convolutional network to exploit structure in the graph. We claim that a learnable graph-based data processing is a strong competitor to heuristic-based feature extraction. We present a new dataset of graphs corresponding to configurations and their execution time for various tensor programs. We provide baselines for a runtime prediction task.
△ Less
Submitted 27 November, 2019; v1 submitted 26 April, 2019;
originally announced April 2019.
-
Risk-averse Behavior Planning for Autonomous Driving under Uncertainty
Authors:
Mohammad Naghshvar,
Ahmed K. Sadek,
Auke J. Wiggers
Abstract:
Autonomous vehicles have to navigate the surrounding environment with partial observability of other objects sharing the road. Sources of uncertainty in autonomous vehicle measurements include sensor fusion errors, limited sensor range due to weather or object detection latency, occlusion, and hidden parameters such as other human driver intentions. Behavior planning must consider all sources of u…
▽ More
Autonomous vehicles have to navigate the surrounding environment with partial observability of other objects sharing the road. Sources of uncertainty in autonomous vehicle measurements include sensor fusion errors, limited sensor range due to weather or object detection latency, occlusion, and hidden parameters such as other human driver intentions. Behavior planning must consider all sources of uncertainty in deciding future vehicle maneuvers. This paper presents a scalable framework for risk-averse behavior planning under uncertainty by incorporating QMDP, unscented transform, and Monte Carlo tree search (MCTS). It is shown that upper confidence bound (UCB) for expanding the tree results in noisy Q-value estimates by the MCTS and a degraded performance of QMDP. A modification to action selection procedure in MCTS is proposed to achieve robust performance.
△ Less
Submitted 4 December, 2018;
originally announced December 2018.
-
Structure in the Value Function of Two-Player Zero-Sum Games of Incomplete Information
Authors:
Auke J. Wiggers,
Frans A. Oliehoek,
Diederik M. Roijers
Abstract:
Zero-sum stochastic games provide a rich model for competitive decision making. However, under general forms of state uncertainty as considered in the Partially Observable Stochastic Game (POSG), such decision making problems are still not very well understood. This paper makes a contribution to the theory of zero-sum POSGs by characterizing structure in their value function. In particular, we int…
▽ More
Zero-sum stochastic games provide a rich model for competitive decision making. However, under general forms of state uncertainty as considered in the Partially Observable Stochastic Game (POSG), such decision making problems are still not very well understood. This paper makes a contribution to the theory of zero-sum POSGs by characterizing structure in their value function. In particular, we introduce a new formulation of the value function for zs-POSGs as a function of the "plan-time sufficient statistics" (roughly speaking the information distribution in the POSG), which has the potential to enable generalization over such information distributions. We further delineate this generalization capability by proving a structural result on the shape of value function: it exhibits concavity and convexity with respect to appropriately chosen marginals of the statistic space. This result is a key pre-cursor for develo** solution methods that may be able to exploit such structure. Finally, we show how these results allow us to reduce a zs-POSG to a "centralized" model with shared observations, thereby transferring results for the latter, narrower class, to games with individual (private) observations.
△ Less
Submitted 22 June, 2016;
originally announced June 2016.