Skip to main content

Showing 1–12 of 12 results for author: Marzen, S E

.
  1. arXiv:2405.15943  [pdf, other

    cs.LG cs.CL

    Transformers represent belief state geometry in their residual stream

    Authors: Adam S. Shai, Sarah E. Marzen, Lucas Teixeira, Alexander Gietelink Oldenziel, Paul M. Riechers

    Abstract: What computational structure are we building into large language models when we train them on next-token prediction? Here, we present evidence that this structure is given by the meta-dynamics of belief updating over hidden states of the data-generating process. Leveraging the theory of optimal prediction, we anticipate and then find that belief states are linearly represented in the residual stre… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  2. arXiv:2303.14553  [pdf, other

    cs.LG stat.ML

    Complexity-calibrated Benchmarks for Machine Learning Reveal When Next-Generation Reservoir Computer Predictions Succeed and Mislead

    Authors: Sarah E. Marzen, Paul M. Riechers, James P. Crutchfield

    Abstract: Recurrent neural networks are used to forecast time series in finance, climate, language, and from many other domains. Reservoir computers are a particularly easily trainable form of recurrent neural network. Recently, a "next-generation" reservoir computer was introduced in which the memory trace involves only a finite number of previous symbols. We explore the inherent limitations of finite-past… ▽ More

    Submitted 25 March, 2023; originally announced March 2023.

    Comments: 10 pages, 5 figures; https://csc.ucdavis.edu/~cmg/compmech/pubs/ngrc.htm

  3. arXiv:2005.03750  [pdf, other

    cond-mat.stat-mech cs.IT cs.LG nlin.CD stat.ML

    Inference, Prediction, and Entropy-Rate Estimation of Continuous-time, Discrete-event Processes

    Authors: S. E. Marzen, J. P. Crutchfield

    Abstract: Inferring models, predicting the future, and estimating the entropy rate of discrete-time, discrete-event processes is well-worn ground. However, a much broader class of discrete-event processes operates in continuous-time. Here, we provide new methods for inferring, predicting, and estimating them. The methods rely on an extension of Bayesian structural inference that takes advantage of neural ne… ▽ More

    Submitted 7 May, 2020; originally announced May 2020.

    Comments: 11 pages, 5 figures; http://csc.ucdavis.edu/~cmg/compmech/pubs/ctbsi.htm

  4. arXiv:1910.07663  [pdf, other

    cs.LG cond-mat.stat-mech cs.IT nlin.CD stat.ML

    Probabilistic Deterministic Finite Automata and Recurrent Networks, Revisited

    Authors: S. E. Marzen, J. P. Crutchfield

    Abstract: Reservoir computers (RCs) and recurrent neural networks (RNNs) can mimic any finite-state automaton in theory, and some workers demonstrated that this can hold in practice. We test the capability of generalized linear models, RCs, and Long Short-Term Memory (LSTM) RNN architectures to predict the stochastic processes generated by a large suite of probabilistic deterministic finite-state automata (… ▽ More

    Submitted 16 October, 2019; originally announced October 2019.

    Comments: 15 pages, 4 figures; http://csc.ucdavis.edu/~cmg/compmech/pubs/pdfarnr.htm

  5. arXiv:1802.03105  [pdf, other

    q-bio.PE cond-mat.stat-mech cs.IT nlin.AO

    Optimized Bacteria are Environmental Prediction Engines

    Authors: Sarah E. Marzen, James P. Crutchfield

    Abstract: Experimentalists have observed phenotypic variability in isogenic bacteria populations. We explore the hypothesis that in fluctuating environments this variability is tuned to maximize a bacterium's expected log growth rate, potentially aided by epigenetic markers that store information about past environments. We show that, in a complex, memoryful environment, the maximal expected log growth rate… ▽ More

    Submitted 8 February, 2018; originally announced February 2018.

    Comments: 7 pages, 1 figure; http://csc.ucdavis.edu/~cmg/compmech/pubs/obepe.htm

    Journal ref: Phys. Rev. E 98, 012408 (2018)

  6. arXiv:1707.03962  [pdf, other

    cond-mat.stat-mech cs.IT q-bio.BM

    Prediction and Power in Molecular Sensors: Uncertainty and Dissipation When Conditionally Markovian Channels Are Driven by Semi-Markov Environments

    Authors: Sarah E. Marzen, James P. Crutchfield

    Abstract: Sensors often serve at least two purposes: predicting their input and minimizing dissipated heat. However, determining whether or not a particular sensor is evolved or designed to be accurate and efficient is difficult. This arises partly from the functional constraints being at cross purposes and partly since quantifying the predictive performance of even in silico sensors can require prohibitive… ▽ More

    Submitted 12 July, 2017; originally announced July 2017.

    Comments: 21 pages, 4 figures, http://csc.ucdavis.edu/~cmg/compmech/pubs/piness.htm

  7. arXiv:1704.04707  [pdf, other

    cond-mat.stat-mech cs.IT math.ST nlin.CD

    Structure and Randomness of Continuous-Time Discrete-Event Processes

    Authors: S. E. Marzen, J. P. Crutchfield

    Abstract: Loosely speaking, the Shannon entropy rate is used to gauge a stochastic process' intrinsic randomness; the statistical complexity gives the cost of predicting the process. We calculate, for the first time, the entropy rate and statistical complexity of stochastic processes generated by finite unifilar hidden semi-Markov models---memoryful, state-dependent versions of renewal processes. Calculatin… ▽ More

    Submitted 15 April, 2017; originally announced April 2017.

    Comments: 10 pages, 2 figures; http://csc.ucdavis.edu/~cmg/compmech/pubs/ctdep.htm

  8. arXiv:1702.08565  [pdf, other

    cond-mat.stat-mech cs.IT nlin.CD stat.ML

    Nearly Maximally Predictive Features and Their Dimensions

    Authors: Sarah E. Marzen, James P. Crutchfield

    Abstract: Scientific explanation often requires inferring maximally predictive features from a given data set. Unfortunately, the collection of minimal maximally predictive features for most stochastic processes is uncountably infinite. In such cases, one compromises and instead seeks nearly maximally predictive features. Here, we derive upper-bounds on the rates at which the number and the coding cost of n… ▽ More

    Submitted 27 February, 2017; originally announced February 2017.

    Comments: 6 pages, 2 figures; Supplementary materials, 5 pages, 1 figure; http://csc.ucdavis.edu/~cmg/compmech/pubs/nmpf.htm

    Journal ref: Phys. Rev. E 95, 051301 (2017)

  9. arXiv:1611.01099  [pdf, other

    cond-mat.stat-mech cs.IT math.ST nlin.CD

    Informational and Causal Architecture of Continuous-time Renewal and Hidden Semi-Markov Processes

    Authors: Sarah E. Marzen, James P. Crutchfield

    Abstract: We introduce the minimal maximally predictive models (ε-machines) of processes generated by certain hidden semi-Markov models. Their causal states are either hybrid discrete-continuous or continuous random variables and causal-state transitions are described by partial differential equations. Closed-form expressions are given for statistical complexities, excess entropies, and differential informa… ▽ More

    Submitted 3 November, 2016; originally announced November 2016.

    Comments: 16 pages, 7 figures; http://csc.ucdavis.edu/~cmg/compmech/pubs/ctrp.htm

  10. arXiv:1512.01859  [pdf, other

    cond-mat.stat-mech cs.IT math.DS math.ST physics.data-an

    Statistical Signatures of Structural Organization: The case of long memory in renewal processes

    Authors: Sarah E. Marzen, James P. Crutchfield

    Abstract: Identifying and quantifying memory are often critical steps in develo** a mechanistic understanding of stochastic processes. These are particularly challenging and necessary when exploring processes that exhibit long-range correlations. The most common signatures employed rely on second-order temporal statistics and lead, for example, to identifying long memory in processes with power-law autoco… ▽ More

    Submitted 6 December, 2015; originally announced December 2015.

    Comments: 13 pages, 2 figures, 3 appendixes; http://csc.ucdavis.edu/~cmg/compmech/pubs/lrmrp.htm

  11. arXiv:1506.06138  [pdf, other

    q-bio.NC cs.IT nlin.AO physics.soc-ph q-bio.PE

    The evolution of lossy compression

    Authors: Sarah E. Marzen, Simon DeDeo

    Abstract: In complex environments, there are costs to both ignorance and perception. An organism needs to track fitness-relevant information about its world, but the more information it tracks, the more resources it must devote to memory and processing. Rate-distortion theory shows that, when errors are allowed, remarkably efficient internal representations can be found by biologically-plausible hill-climbi… ▽ More

    Submitted 19 June, 2015; originally announced June 2015.

    Comments: 14 pages, 4 figures

    Journal ref: Journal of the Royal Society Interface 14: 20170166 (2017)

  12. arXiv:1504.04756  [pdf, other

    q-bio.NC cond-mat.dis-nn cs.NE math.PR nlin.CD

    Time Resolution Dependence of Information Measures for Spiking Neurons: Atoms, Scaling, and Universality

    Authors: Sarah E. Marzen, Michael R. DeWeese, James P. Crutchfield

    Abstract: The mutual information between stimulus and spike-train response is commonly used to monitor neural coding efficiency, but neuronal computation broadly conceived requires more refined and targeted information measures of input-output joint processes. A first step towards that larger goal is to develop information measures for individual output processes, including information generation (entropy r… ▽ More

    Submitted 18 April, 2015; originally announced April 2015.

    Comments: 20 pages, 6 figures; http://csc.ucdavis.edu/~cmg/compmech/pubs/trdctim.htm