Skip to main content

Showing 1–20 of 20 results for author: Faccio, F

.
  1. arXiv:2406.08404  [pdf, other

    cs.LG cs.AI

    Scaling Value Iteration Networks to 5000 Layers for Extreme Long-Term Planning

    Authors: Yuhui Wang, Qingyuan Wu, Weida Li, Dylan R. Ashley, Francesco Faccio, Chao Huang, Jürgen Schmidhuber

    Abstract: The Value Iteration Network (VIN) is an end-to-end differentiable architecture that performs value iteration on a latent MDP for planning in reinforcement learning (RL). However, VINs struggle to scale to long-term and large-scale planning tasks, such as navigating a $100\times 100$ maze -- a task which typically requires thousands of planning steps to solve. We observe that this deficiency is due… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    ACM Class: I.2.6

  2. arXiv:2406.03485  [pdf, other

    cs.LG cs.AI

    Highway Value Iteration Networks

    Authors: Yuhui Wang, Weida Li, Francesco Faccio, Qingyuan Wu, Jürgen Schmidhuber

    Abstract: Value iteration networks (VINs) enable end-to-end learning for planning tasks by employing a differentiable "planning module" that approximates the value iteration algorithm. However, long-term planning remains a challenge because training very deep VINs is difficult. To address this problem, we embed highway value iteration -- a recent algorithm designed to facilitate long-term credit assignment… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: ICML 2024

  3. arXiv:2405.18289  [pdf, other

    cs.LG cs.AI

    Highway Reinforcement Learning

    Authors: Yuhui Wang, Miroslav Strupl, Francesco Faccio, Qingyuan Wu, Haozhe Liu, Michał Grudzień, Xiaoyang Tan, Jürgen Schmidhuber

    Abstract: Learning from multi-step off-policy data collected by a set of policies is a core problem of reinforcement learning (RL). Approaches based on importance sampling (IS) often suffer from large variances due to products of IS ratios. Typical IS-free methods, such as $n$-step Q-learning, look ahead for $n$ time steps along the trajectory of actions (where $n$ is called the lookahead depth) and utilize… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  4. arXiv:2404.08093  [pdf, other

    cs.RO cs.AI cs.LG

    Towards a Robust Soft Baby Robot With Rich Interaction Ability for Advanced Machine Learning Algorithms

    Authors: Mohannad Alhakami, Dylan R. Ashley, Joel Dunham, Francesco Faccio, Eric Feron, Jürgen Schmidhuber

    Abstract: Artificial intelligence has made great strides in many areas lately, yet it has had comparatively little success in general-use robotics. We believe one of the reasons for this is the disconnect between traditional robotic design and the properties needed for open-ended, creativity-based AI systems. To that end, we, taking selective inspiration from nature, build a robust, partially soft robotic l… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: 5 pages in main text + 1 page of references, 7 figures in main text; source code available at https://github.com/dylanashley/robot-limb-testai

    ACM Class: I.2.9; I.2.6

  5. arXiv:2404.02747  [pdf, other

    cs.CV

    Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion Models

    Authors: Wentian Zhang, Haozhe Liu, **heng Xie, Francesco Faccio, Mike Zheng Shou, Jürgen Schmidhuber

    Abstract: This study explores the role of cross-attention during inference in text-conditional diffusion models. We find that cross-attention outputs converge to a fixed point after few inference steps. Accordingly, the time point of convergence naturally divides the entire inference process into two stages: an initial semantics-planning stage, during which, the model relies on cross-attention to plan text-… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

  6. arXiv:2403.11998  [pdf, other

    cs.LG

    Learning Useful Representations of Recurrent Neural Network Weight Matrices

    Authors: Vincent Herrmann, Francesco Faccio, Jürgen Schmidhuber

    Abstract: Recurrent Neural Networks (RNNs) are general-purpose parallel-sequential computers. The program of an RNN is its weight matrix. How to learn useful representations of RNN weights that facilitate RNN analysis as well as downstream tasks? While the mechanistic approach directly looks at some RNN's weights to predict its behavior, the functionalist approach analyzes its overall functionality-specific… ▽ More

    Submitted 18 June, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    ACM Class: I.2.6

  7. arXiv:2402.16823  [pdf, other

    cs.AI cs.CL cs.LG cs.MA

    Language Agents as Optimizable Graphs

    Authors: Mingchen Zhuge, Wenyi Wang, Louis Kirsch, Francesco Faccio, Dmitrii Khizbullin, Jürgen Schmidhuber

    Abstract: Various human-designed prompt engineering techniques have been proposed to improve problem solvers based on Large Language Models (LLMs), yielding many disparate code bases. We unify these approaches by describing LLM-based agents as computational graphs. The nodes implement functions to process multimodal data or query LLMs, and the edges describe the information flow between operations. Graphs c… ▽ More

    Submitted 27 February, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

    Comments: Project Website: https://gptswarm.org ; Github Repo: https://github.com/metauto-ai/gptswarm ; Replace to fix typos

  8. arXiv:2309.11197  [pdf, other

    cs.LG cs.CL

    The Languini Kitchen: Enabling Language Modelling Research at Different Scales of Compute

    Authors: Aleksandar Stanić, Dylan Ashley, Oleg Serikov, Louis Kirsch, Francesco Faccio, Jürgen Schmidhuber, Thomas Hofmann, Imanol Schlag

    Abstract: The Languini Kitchen serves as both a research collective and codebase designed to empower researchers with limited computational resources to contribute meaningfully to the field of language modelling. We introduce an experimental protocol that enables model comparisons based on equivalent compute, measured in accelerator hours. The number of tokens on which a model is trained is defined by the m… ▽ More

    Submitted 20 September, 2023; originally announced September 2023.

  9. arXiv:2308.07795  [pdf, other

    cs.CV cs.AI

    Learning to Identify Critical States for Reinforcement Learning from Videos

    Authors: Haozhe Liu, Mingchen Zhuge, Bing Li, Yuhui Wang, Francesco Faccio, Bernard Ghanem, Jürgen Schmidhuber

    Abstract: Recent work on deep reinforcement learning (DRL) has pointed out that algorithmic information about good policies can be extracted from offline data which lack explicit information about executed actions. For example, videos of humans or robots may convey a lot of implicit information about rewarding action sequences, but a DRL machine that wants to profit from watching such videos must first lear… ▽ More

    Submitted 15 August, 2023; originally announced August 2023.

    Comments: This paper was accepted to ICCV23

  10. arXiv:2305.17066  [pdf, other

    cs.AI cs.CL cs.CV cs.LG cs.MA

    Mindstorms in Natural Language-Based Societies of Mind

    Authors: Mingchen Zhuge, Haozhe Liu, Francesco Faccio, Dylan R. Ashley, Róbert Csordás, Anand Gopalakrishnan, Abdullah Hamdi, Hasan Abed Al Kader Hammoud, Vincent Herrmann, Kazuki Irie, Louis Kirsch, Bing Li, Guohao Li, Shuming Liu, **jie Mai, Piotr Piękos, Aditya Ramesh, Imanol Schlag, Weimin Shi, Aleksandar Stanić, Wenyi Wang, Yuhui Wang, Mengmeng Xu, Deng-** Fan, Bernard Ghanem , et al. (1 additional authors not shown)

    Abstract: Both Minsky's "society of mind" and Schmidhuber's "learning to think" inspire diverse societies of large multimodal neural networks (NNs) that solve problems by interviewing each other in a "mindstorm." Recent implementations of NN-based societies of minds consist of large language models (LLMs) and other NN-based experts communicating through a natural language interface. In doing so, they overco… ▽ More

    Submitted 26 May, 2023; originally announced May 2023.

    Comments: 9 pages in main text + 7 pages of references + 38 pages of appendices, 14 figures in main text + 13 in appendices, 7 tables in appendices

    MSC Class: 68T07 ACM Class: I.2.6; I.2.11

  11. arXiv:2207.01570  [pdf, other

    cs.LG stat.ML

    Goal-Conditioned Generators of Deep Policies

    Authors: Francesco Faccio, Vincent Herrmann, Aditya Ramesh, Louis Kirsch, Jürgen Schmidhuber

    Abstract: Goal-conditioned Reinforcement Learning (RL) aims at learning optimal policies, given goals encoded in special command inputs. Here we study goal-conditioned neural nets (NNs) that learn to generate deep NN policies in form of context-specific weight matrices, similar to Fast Weight Programmers and other methods from the 1990s. Using context commands of the form "generate a policy that achieves a… ▽ More

    Submitted 4 July, 2022; originally announced July 2022.

    Comments: Preprint. Under Review

  12. arXiv:2207.01566  [pdf, other

    cs.LG stat.ML

    General Policy Evaluation and Improvement by Learning to Identify Few But Crucial States

    Authors: Francesco Faccio, Aditya Ramesh, Vincent Herrmann, Jean Harb, Jürgen Schmidhuber

    Abstract: Learning to evaluate and improve policies is a core problem of Reinforcement Learning (RL). Traditional RL algorithms learn a value function defined for a single policy. A recently explored competitive alternative is to learn a single value function for many policies. Here we combine the actor-critic architecture of Parameter-Based Value Functions and the policy embedding of Policy Evaluation Netw… ▽ More

    Submitted 4 July, 2022; originally announced July 2022.

    Comments: Preprint. Under review

  13. arXiv:2206.01649  [pdf, other

    cs.LG

    Neural Differential Equations for Learning to Program Neural Nets Through Continuous Learning Rules

    Authors: Kazuki Irie, Francesco Faccio, Jürgen Schmidhuber

    Abstract: Neural ordinary differential equations (ODEs) have attracted much attention as continuous-time counterparts of deep residual neural networks (NNs), and numerous extensions for recurrent NNs have been proposed. Since the 1980s, ODEs have also been used to derive theoretical results for NN learning rules, e.g., the famous connection between Oja's rule and principal component analysis. Such rules are… ▽ More

    Submitted 14 October, 2022; v1 submitted 3 June, 2022; originally announced June 2022.

    Comments: Accepted to NeurIPS 2022

  14. arXiv:2205.06595  [pdf, other

    stat.ML cs.AI cs.LG

    Upside-Down Reinforcement Learning Can Diverge in Stochastic Environments With Episodic Resets

    Authors: Miroslav Štrupl, Francesco Faccio, Dylan R. Ashley, Jürgen Schmidhuber, Rupesh Kumar Srivastava

    Abstract: Upside-Down Reinforcement Learning (UDRL) is an approach for solving RL problems that does not require value functions and uses only supervised learning, where the targets for given inputs in a dataset do not change over time. Ghosh et al. proved that Goal-Conditional Supervised Learning (GCSL) -- which can be viewed as a simplified version of UDRL -- optimizes a lower bound on goal-reaching perfo… ▽ More

    Submitted 13 May, 2022; originally announced May 2022.

    Comments: presented at the 5th Multidisciplinary Conference on Reinforcement Learning and Decision Making; 5 pages in main text + 1 page of references + 3 pages of appendices, 1 figure in main text; source code available at https://github.com/struplm/UDRL-GCSL-counterexample.git

    MSC Class: 68T05 ACM Class: I.2.6

  15. arXiv:2107.09088  [pdf, other

    stat.ML cs.AI cs.LG

    Reward-Weighted Regression Converges to a Global Optimum

    Authors: Miroslav Štrupl, Francesco Faccio, Dylan R. Ashley, Rupesh Kumar Srivastava, Jürgen Schmidhuber

    Abstract: Reward-Weighted Regression (RWR) belongs to a family of widely known iterative Reinforcement Learning algorithms based on the Expectation-Maximization framework. In this family, learning at each iteration consists of sampling a batch of trajectories using the current policy and fitting a new policy to maximize a return-weighted log-likelihood of actions. Although RWR is known to yield monotonic im… ▽ More

    Submitted 23 February, 2022; v1 submitted 19 July, 2021; originally announced July 2021.

    Comments: 7 pages in main text + 2 pages of references + 6 pages of appendices, 1 figure in main text + 1 figure in appendices; source code available at https://github.com/dylanashley/reward-weighted-regression

    MSC Class: 68T05 ACM Class: I.2.6

  16. arXiv:2107.05438  [pdf, other

    q-bio.NC cs.AI

    Bayesian brains and the Rényi divergence

    Authors: Noor Sajid, Francesco Faccio, Lancelot Da Costa, Thomas Parr, Jürgen Schmidhuber, Karl Friston

    Abstract: Under the Bayesian brain hypothesis, behavioural variations can be attributed to different priors over generative model parameters. This provides a formal explanation for why individuals exhibit inconsistent behavioural preferences when confronted with similar choices. For example, greedy preferences are a consequence of confident (or precise) beliefs over certain outcomes. Here, we offer an alter… ▽ More

    Submitted 12 July, 2021; originally announced July 2021.

    Comments: 23 pages, 5 figures

  17. arXiv:2006.09226  [pdf, other

    cs.LG cs.AI stat.ML

    Parameter-Based Value Functions

    Authors: Francesco Faccio, Louis Kirsch, Jürgen Schmidhuber

    Abstract: Traditional off-policy actor-critic Reinforcement Learning (RL) algorithms learn value functions of a single target policy. However, when value functions are updated to track the learned policy, they forget potentially useful information about old policies. We introduce a class of value functions called Parameter-Based Value Functions (PBVFs) whose inputs include the policy parameters. They can ge… ▽ More

    Submitted 13 August, 2021; v1 submitted 16 June, 2020; originally announced June 2020.

    Comments: Published as a conference paper at ICLR 2021

  18. arXiv:1809.06098  [pdf, other

    cs.LG cs.AI stat.ML

    Policy Optimization via Importance Sampling

    Authors: Alberto Maria Metelli, Matteo Papini, Francesco Faccio, Marcello Restelli

    Abstract: Policy optimization is an effective reinforcement learning approach to solve continuous control tasks. Recent achievements have shown that alternating online and offline optimization is a successful choice for efficient trajectory reuse. However, deciding when to stop optimizing and collect new trajectories is non-trivial, as it requires to account for the variance of the objective function estima… ▽ More

    Submitted 31 October, 2018; v1 submitted 17 September, 2018; originally announced September 2018.

    Journal ref: 32nd Conference on Neural Information Processing Systems (NIPS 2018), Montréal, Canada

  19. arXiv:1706.00222  [pdf, other

    physics.ins-det hep-ex

    Test Beam Performance Measurements for the Phase I Upgrade of the CMS Pixel Detector

    Authors: M. Dragicevic, M. Friedl, J. Hrubec, H. Steininger, A. Gädda, J. Härkönen, T. Lampén, P. Luukka, T. Peltola, E. Tuominen, E. Tuovinen, A. Winkler, P. Eerola, T. Tuuva, G. Baulieu, G. Boudoul, L. Caponetto, C. Combaret, D. Contardo, T. Dupasquier, G. Gallbit, N. Lumb, L. Mirabito, S. Perries, M. Vander Donckt , et al. (462 additional authors not shown)

    Abstract: A new pixel detector for the CMS experiment was built in order to cope with the instantaneous luminosities anticipated for the Phase~I Upgrade of the LHC. The new CMS pixel detector provides four-hit tracking with a reduced material budget as well as new cooling and powering schemes. A new front-end readout chip mitigates buffering and bandwidth limitations, and allows operation at low comparator… ▽ More

    Submitted 1 June, 2017; originally announced June 2017.

    Report number: CMS-NOTE-2017-002

  20. Trap** in irradiated p-on-n silicon sensors at fluences anticipated at the HL-LHC outer tracker

    Authors: W. Adam, T. Bergauer, M. Dragicevic, M. Friedl, R. Fruehwirth, M. Hoch, J. Hrubec, M. Krammer, W. Treberspurg, W. Waltenberger, S. Alderweireldt, W. Beaumont, X. Janssen, S. Luyckx, P. Van Mechelen, N. Van Remortel, A. Van Spilbeeck, P. Barria, C. Caillol, B. Clerbaux, G. De Lentdecker, D. Dobur, L. Favart, A. Grebenyuk, Th. Lenzi , et al. (663 additional authors not shown)

    Abstract: The degradation of signal in silicon sensors is studied under conditions expected at the CERN High-Luminosity LHC. 200 $μ$m thick n-type silicon sensors are irradiated with protons of different energies to fluences of up to $3 \cdot 10^{15}$ neq/cm$^2$. Pulsed red laser light with a wavelength of 672 nm is used to generate electron-hole pairs in the sensors. The induced signals are used to determi… ▽ More

    Submitted 7 May, 2015; originally announced May 2015.

    Journal ref: 2016 JINST 11 P04023