Skip to main content

Showing 1–14 of 14 results for author: Razin, N

.
  1. arXiv:2402.07875  [pdf, other

    cs.LG cs.AI eess.SY stat.ML

    Implicit Bias of Policy Gradient in Linear Quadratic Control: Extrapolation to Unseen Initial States

    Authors: Noam Razin, Yotam Alexander, Edo Cohen-Karlik, Raja Giryes, Amir Globerson, Nadav Cohen

    Abstract: In modern machine learning, models can often fit training data in numerous ways, some of which perform well on unseen (test) data, while others do not. Remarkably, in such cases gradient descent frequently exhibits an implicit bias that leads to excellent performance on unseen data. This implicit bias was extensively studied in supervised learning, but is far less understood in optimal control (re… ▽ More

    Submitted 1 June, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

    Comments: Accepted to ICML 2024

  2. arXiv:2310.20703  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    Vanishing Gradients in Reinforcement Finetuning of Language Models

    Authors: Noam Razin, Hattie Zhou, Omid Saremi, Vimal Thilak, Arwen Bradley, Preetum Nakkiran, Joshua Susskind, Etai Littwin

    Abstract: Pretrained language models are commonly aligned with human preferences and downstream tasks via reinforcement finetuning (RFT), which refers to maximizing a (possibly learned) reward function using policy gradient algorithms. This work identifies a fundamental optimization obstacle in RFT: we prove that the expected gradient for an input vanishes when its reward standard deviation under the model… ▽ More

    Submitted 14 March, 2024; v1 submitted 31 October, 2023; originally announced October 2023.

    Comments: Accepted to ICLR 2024

  3. arXiv:2310.16028  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    What Algorithms can Transformers Learn? A Study in Length Generalization

    Authors: Hattie Zhou, Arwen Bradley, Etai Littwin, Noam Razin, Omid Saremi, Josh Susskind, Samy Bengio, Preetum Nakkiran

    Abstract: Large language models exhibit surprising emergent generalization properties, yet also struggle on many simple reasoning tasks such as arithmetic and parity. This raises the question of if and when Transformer models can learn the true algorithm for solving a task. We study the scope of Transformers' abilities in the specific setting of length generalization on algorithmic tasks. Here, we propose a… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

    Comments: Preprint

  4. arXiv:2303.11249  [pdf, other

    cs.LG cs.AI quant-ph

    What Makes Data Suitable for a Locally Connected Neural Network? A Necessary and Sufficient Condition Based on Quantum Entanglement

    Authors: Yotam Alexander, Nimrod De La Vega, Noam Razin, Nadav Cohen

    Abstract: The question of what makes a data distribution suitable for deep learning is a fundamental open problem. Focusing on locally connected neural networks (a prevalent family of architectures that includes convolutional and recurrent neural networks as well as local self-attention models), we address this problem by adopting theoretical tools from quantum physics. Our main theoretical result states th… ▽ More

    Submitted 21 January, 2024; v1 submitted 20 March, 2023; originally announced March 2023.

    Comments: Accepted to NeurIPS 2023

  5. arXiv:2211.16494  [pdf, other

    cs.LG cs.AI cs.NE stat.ML

    On the Ability of Graph Neural Networks to Model Interactions Between Vertices

    Authors: Noam Razin, Tom Verbin, Nadav Cohen

    Abstract: Graph neural networks (GNNs) are widely used for modeling complex interactions between entities represented as vertices of a graph. Despite recent efforts to theoretically analyze the expressive power of GNNs, a formal characterization of their ability to model interactions is lacking. The current paper aims to address this gap. Formalizing strength of interactions through an established measure k… ▽ More

    Submitted 23 October, 2023; v1 submitted 29 November, 2022; originally announced November 2022.

    Comments: Accepted to NeurIPS 2023

  6. arXiv:2201.11729  [pdf, other

    cs.LG cs.AI cs.NE stat.ML

    Implicit Regularization in Hierarchical Tensor Factorization and Deep Convolutional Neural Networks

    Authors: Noam Razin, Asaf Maman, Nadav Cohen

    Abstract: In the pursuit of explaining implicit regularization in deep learning, prominent focus was given to matrix and tensor factorizations, which correspond to simplified neural networks. It was shown that these models exhibit an implicit tendency towards low matrix and tensor ranks, respectively. Drawing closer to practical deep learning, the current paper theoretically analyzes the implicit regulariza… ▽ More

    Submitted 18 September, 2022; v1 submitted 27 January, 2022; originally announced January 2022.

    Comments: Accepted to ICML 2022

  7. arXiv:2102.09972  [pdf, other

    cs.LG cs.AI cs.NE stat.ML

    Implicit Regularization in Tensor Factorization

    Authors: Noam Razin, Asaf Maman, Nadav Cohen

    Abstract: Recent efforts to unravel the mystery of implicit regularization in deep learning have led to a theoretical focus on matrix factorization -- matrix completion via linear neural network. As a step further towards practical deep learning, we provide the first theoretical analysis of implicit regularization in tensor factorization -- tensor completion via certain type of non-linear neural network. We… ▽ More

    Submitted 9 June, 2021; v1 submitted 19 February, 2021; originally announced February 2021.

    Comments: Accepted to ICML 2021

  8. arXiv:2009.13292  [pdf, other

    cs.IR cs.CL cs.LG stat.ML

    RecoBERT: A Catalog Language Model for Text-Based Recommendations

    Authors: Itzik Malkiel, Oren Barkan, Avi Caciularu, Noam Razin, Ori Katz, Noam Koenigstein

    Abstract: Language models that utilize extensive self-supervised pre-training from unlabeled text, have recently shown to significantly advance the state-of-the-art performance in a variety of language understanding tasks. However, it is yet unclear if and how these recent models can be harnessed for conducting text-based recommendations. In this work, we introduce RecoBERT, a BERT-based approach for learni… ▽ More

    Submitted 25 September, 2020; originally announced September 2020.

  9. arXiv:2008.08088  [pdf, other

    cond-mat.stat-mech cond-mat.soft

    The entropy production of an active particle in a box

    Authors: Nitzan Razin

    Abstract: A run-and-tumble particle in a one dimensional box (infinite potential well) is studied. The steady state is analytically solved and analyzed, revealing the emergent length scale of the boundary layer where particles accumulate near the walls. The mesoscopic steady state entropy production rate of the system is derived from coupled Fokker-Planck equations with a linear reaction term, resulting in… ▽ More

    Submitted 7 September, 2020; v1 submitted 18 August, 2020; originally announced August 2020.

    Comments: 8 pages, 5 figures

    Journal ref: Phys. Rev. E 102, 030103 (2020)

  10. arXiv:2005.06398  [pdf, other

    cs.LG cs.NE stat.ML

    Implicit Regularization in Deep Learning May Not Be Explainable by Norms

    Authors: Noam Razin, Nadav Cohen

    Abstract: Mathematically characterizing the implicit regularization induced by gradient-based optimization is a longstanding pursuit in the theory of deep learning. A widespread hope is that a characterization based on minimization of norms may apply, and a standard test-bed for studying this prospect is matrix factorization (matrix completion via linear neural networks). It is an open question whether norm… ▽ More

    Submitted 17 October, 2020; v1 submitted 13 May, 2020; originally announced May 2020.

  11. arXiv:1908.05161  [pdf

    cs.LG cs.CL stat.ML

    Scalable Attentive Sentence-Pair Modeling via Distilled Sentence Embedding

    Authors: Oren Barkan, Noam Razin, Itzik Malkiel, Ori Katz, Avi Caciularu, Noam Koenigstein

    Abstract: Recent state-of-the-art natural language understanding models, such as BERT and XLNet, score a pair of sentences (A and B) using multiple cross-attention operations - a process in which each word in sentence A attends to all words in sentence B and vice versa. As a result, computing the similarity between a query sentence and a set of candidate sentences, requires the propagation of all query-cand… ▽ More

    Submitted 21 November, 2019; v1 submitted 14 August, 2019; originally announced August 2019.

    Comments: In Proceedings of AAAI 2020

  12. arXiv:1806.08921  [pdf, other

    cond-mat.soft physics.bio-ph

    Signatures of motor susceptibility in the dynamics of a tracer particle in an active gel

    Authors: Nitzan Razin, Raphael Voituriez, Nir S. Gov

    Abstract: We study a model for the motion of a tracer particle inside an active gel, exposing the properties of the van Hove distribution of the particle displacements. Active events of a typical force magnitude give rise to non-Gaussian distributions, having exponential tails or side-peaks. The side-peaks appear when the local bulk elasticity of the gel is large enough and few active sources are dominant.… ▽ More

    Submitted 23 June, 2018; originally announced June 2018.

    Comments: 4 pages, 4 figures and supplemental information (5 pages, 4 figures)

    Journal ref: Phys. Rev. E 99, 022419 (2019)

  13. arXiv:1708.05370  [pdf, other

    cond-mat.stat-mech cond-mat.soft

    Forces in inhomogeneous open active-particle systems

    Authors: Nitzan Razin, Raphael Voituriez, Jens Elgeti, Nir S. Gov

    Abstract: We study the force that non-interacting point-like active particles apply to a symmetric inert object in the presence of a gradient of activity and particle sources and sinks. We consider two simple patterns of sources and sinks that are common in biological systems. We analytically solve a one dimensional model designed to emulate higher dimensional systems, and study a two dimensional model by n… ▽ More

    Submitted 4 March, 2018; v1 submitted 17 August, 2017; originally announced August 2017.

    Comments: 14 pages, 8 figures

    Journal ref: Phys. Rev. E 96, 052409 (2017)

  14. arXiv:1703.07359  [pdf, other

    cond-mat.stat-mech cond-mat.soft

    Generalized Archimedes' principle in active fluids

    Authors: Nitzan Razin, Raphael Voituriez, Jens Elgeti, Nir S. Gov

    Abstract: We show how a gradient in the motility properties of non-interacting point-like active particles can cause a pressure gradient that pushes a large inert object. We calculate the force on an object inside a system of active particles with position dependent motion parameters, in one and two dimensions, and show that a modified Archimedes' principle is satisfied. We characterize the system, both in… ▽ More

    Submitted 29 September, 2017; v1 submitted 21 March, 2017; originally announced March 2017.

    Comments: 16 pages, 9 figures

    Journal ref: Phys. Rev. E 96, 032606 (2017)