Search | arXiv e-print repository

doi 10.1167/jov.22.6.8.

Contrast Sensitivity Functions in Autoencoders

Authors: Qiang Li, Alex Gomez-Villa, Marcelo Bertalmio, Jesus Malo

Abstract: Three decades ago, Atick et al. suggested that human frequency sensitivity may emerge from the enhancement required for a more efficient analysis of retinal images. Here we reassess the relevance of low-level vision tasks in the explanation of the Contrast Sensitivity Functions (CSFs) in light of (1) the current trend of using artificial neural networks for studying vision, and (2) the current kno… ▽ More Three decades ago, Atick et al. suggested that human frequency sensitivity may emerge from the enhancement required for a more efficient analysis of retinal images. Here we reassess the relevance of low-level vision tasks in the explanation of the Contrast Sensitivity Functions (CSFs) in light of (1) the current trend of using artificial neural networks for studying vision, and (2) the current knowledge of retinal image representations. As a first contribution, we show that a very popular type of convolutional neural networks (CNNs), called autoencoders, may develop human-like CSFs in the spatio-temporal and chromatic dimensions when trained to perform some basic low-level vision tasks (like retinal noise and optical blur removal), but not others (like chromatic adaptation or pure reconstruction after simple bottlenecks). As an illustrative example, the best CNN (in the considered set of simple architectures for enhancement of the retinal signal) reproduces the CSFs with an RMSE error of 11\% of the maximum sensitivity. As a second contribution, we provide experimental evidence of the fact that, for some functional goals (at low abstraction level), deeper CNNs that are better in reaching the quantitative goal are actually worse in replicating human-like phenomena (such as the CSFs). This low-level result (for the explored networks) is not necessarily in contradiction with other works that report advantages of deeper nets in modeling higher-level vision goals. However, in line with a growing body of literature, our results suggests another word of caution about CNNs in vision science since the use of simplified units or unrealistic architectures in goal optimization may be a limitation for the modeling and understanding of human vision. △ Less

Submitted 4 March, 2022; v1 submitted 28 February, 2021; originally announced March 2021.

Comments: Accepted in the Journal of Vision

Journal ref: Journal of Vision 2022;22(6):8

arXiv:1907.13046 [pdf, other]

Visual Information flow in Wilson-Cowan networks

Authors: Alexander Gomez-Villa, Marcelo Bertalmío, Jesús Malo

Abstract: In this work we study the communication efficiency of a psychophysically-tuned cascade of Wilson-Cowan and Divisive Normalization layers that simulate the retina-V1 pathway. This is the first analysis of Wilson-Cowan networks in terms of multivariate total correlation. The parameters of the cortical model have been derived through the relation between the steady state of the Wilson-Cowan model and… ▽ More In this work we study the communication efficiency of a psychophysically-tuned cascade of Wilson-Cowan and Divisive Normalization layers that simulate the retina-V1 pathway. This is the first analysis of Wilson-Cowan networks in terms of multivariate total correlation. The parameters of the cortical model have been derived through the relation between the steady state of the Wilson-Cowan model and the Divisive Normalization model. The communication efficiency has been analyzed in two ways: First, we provide an analytical expression for the reduction of the total correlation among the responses of a V1-like population after the application of the Wilson-Cowan interaction. Second, we empirically study the efficiency with visual stimuli and statistical tools that were not available before: (1) we use a recent, radiometrically calibrated, set of natural scenes, and (2) we use a recent technique to estimate the multivariate total correlation in bits from sets of visual responses which only involves univariate operations, thus giving better estimates of the redundancy. The theoretical and the empirical results show that although this cascade of layers was not optimized for statistical independence in any way, the redundancy between the responses gets substantially reduced along the neural pathway. Specifically, we show that (1)~the efficiency of a Wilson-Cowan network is similar to its equivalent Divisive Normalization model, (2) while initial layers (Von-Kries adaptation and Weber-like brightness) contribute to univariate equalization, the bigger contributions to the reduction in total correlation come from the computation of nonlinear local contrast and the application of local oriented filters, and (3)~psychophysically-tuned models are more efficient (reduce more total correlation) in the more populated regions of the luminance-contrast plane. △ Less

Submitted 30 July, 2019; originally announced July 2019.

arXiv:1907.13004 [pdf, other]

Visual illusions via neural dynamics: Wilson-Cowan-type models and the efficient representation principle

Authors: Marcelo Bertalmío, Luca Calatroni, Valentina Franceschi, Benedetta Franceschiello, Alexander Gomez-Villa, Dario Prandi

Abstract: In this work we have aimed to reproduce supra-threshold perception phenomena, specifically visual illusions, with Wilson-Cowan-type models of neuronal dynamics. We have found that it is indeed possible to do so, but that the ability to replicate visual illusions is related to how well the neural activity equations comply with the efficient representation principle. Our first contribution is to sho… ▽ More In this work we have aimed to reproduce supra-threshold perception phenomena, specifically visual illusions, with Wilson-Cowan-type models of neuronal dynamics. We have found that it is indeed possible to do so, but that the ability to replicate visual illusions is related to how well the neural activity equations comply with the efficient representation principle. Our first contribution is to show that the Wilson-Cowan equations can reproduce a number of brightness and orientation-dependent illusions, and that the latter type of illusions require that the neuronal dynamics equations consider explicitly the orientation, as expected. Then, we formally prove that there can't be an energy functional that the Wilson-Cowan equations are minimizing, but that a slight modification makes them variational and yields a model that is consistent with the efficient representation principle. Finally, we show that this new model provides a better reproduction of visual illusions than the original Wilson-Cowan formulation. △ Less

Submitted 27 December, 2019; v1 submitted 30 July, 2019; originally announced July 2019.

arXiv:1906.08246 [pdf, other]

Cortical Divisive Normalization from Wilson-Cowan Neural Dynamics

Authors: J. Malo, J. J. Esteve-Taboada, M. Bertalmío

Abstract: Divisive Normalization and the Wilson-Cowan equations are influential models of neural interaction and saturation [Carandini and Heeger Nat.Rev.Neurosci. 2012; Wilson and Cowan Kybernetik 1973]. However, they have not been analytically related yet. In this work we show that Divisive Normalization can be obtained from the Wilson-Cowan model. Specifically, assuming that Divisive Normalization is the… ▽ More Divisive Normalization and the Wilson-Cowan equations are influential models of neural interaction and saturation [Carandini and Heeger Nat.Rev.Neurosci. 2012; Wilson and Cowan Kybernetik 1973]. However, they have not been analytically related yet. In this work we show that Divisive Normalization can be obtained from the Wilson-Cowan model. Specifically, assuming that Divisive Normalization is the steady state solution of the Wilson-Cowan differential equation, we find that the kernel that controls neural interactions in Divisive Normalization depends on the Wilson-Cowan kernel but also has a signal-dependent contribution. A standard stability analysis of a Wilson-Cowan model with the parameters obtained from our relation shows that the Divisive Normalization solution is a stable node. This stability demonstrates the consistency of our steady state assumption. The proposed theory provides a physiological foundation (a relation to a dynamical network with fixed wiring among neurons) for the functional suggestions that have been done on the need of signal-dependent Divisive Normalization [e.g. in Coen-Cagli et al., PLoS Comp.Biol. 2012]. Moreover, this theory explains the modifications that had to be introduced ad-hoc in Gaussian kernels of Divisive Normalization in [Martinez et al. Front. Neurosci. 2019] to reproduce contrast responses. The proposed relation implies that the Wilson-Cowan dynamics also reproduces visual masking and subjective image distortion metrics, which up to now had been mainly explained via Divisive Normalization. Finally, this relation allows to apply to Divisive Normalization the methods which up to now had been developed for dynamical systems such as Wilson-Cowan networks. △ Less

Submitted 28 December, 2023; v1 submitted 19 June, 2019; originally announced June 2019.

Comments: In press at the Journal of Nonlinear Science (to appear in 2024)

arXiv:1804.05964 [pdf, other]

Appropriate kernels for Divisive Normalization explained by Wilson-Cowan equations

Authors: Jesus Malo, Marcelo Bertalmio

Abstract: The interaction between wavelet-like sensors in Divisive Normalization is classically described through Gaussian kernels that decay with spatial distance, angular distance and frequency distance. However, simultaneous explanation of (a) distortion perception in natural image databases and (b) contrast perception of artificial stimuli requires very specific modifications in classical Divisive Norma… ▽ More The interaction between wavelet-like sensors in Divisive Normalization is classically described through Gaussian kernels that decay with spatial distance, angular distance and frequency distance. However, simultaneous explanation of (a) distortion perception in natural image databases and (b) contrast perception of artificial stimuli requires very specific modifications in classical Divisive Normalization. First, the wavelet response has to be high-pass filtered before the Gaussian interaction is applied. Then, distinct weights per subband are also required after the Gaussian interaction. In summary, the classical Gaussian kernel has to be left- and right-multiplied by two extra diagonal matrices. In this paper we provide a lower-level justification for this specific empirical modification required in the Gaussian kernel of Divisive Normalization. Here we assume that the psychophysical behavior described by Divisive Normalization comes from neural interactions following the Wilson-Cowan equations. In particular, we identify the Divisive Normalization response with the stationary regime of a Wilson-Cowan model. From this identification we derive an expression for the Divisive Normalization kernel in terms of the interaction kernel of the Wilson-Cowan equations. It turns out that the Wilson-Cowan kernel is left- and-right multiplied by diagonal matrices with high-pass structure. In conclusion, symmetric Gaussian inhibitory relations between wavelet-like sensors wired in the lower-level Wilson-Cowan model lead to the appropriate non-symmetric kernel that has to be empirically included in Divisive Normalization to explain a wider range of phenomena. △ Less

Submitted 16 April, 2018; originally announced April 2018.

Comments: MODVIS-18 and Celebration of Cowan's 50th anniv. at Univ. Chicago

arXiv:1801.09632 [pdf, other]

In Praise of Artifice Reloaded: Caution with subjective image quality databases

Authors: Marina Martinez-Garcia, Marcelo Bertalmío, Jesús Malo

Abstract: Subjective image quality databases are a major source of raw data on how the visual system works in naturalistic environments. These databases describe the sensitivity of many observers to a wide range of distortions (of different nature and with different suprathreshold intensities) seen on top of a variety of natural images. They seem like a dream for the vision scientist to check the models in… ▽ More Subjective image quality databases are a major source of raw data on how the visual system works in naturalistic environments. These databases describe the sensitivity of many observers to a wide range of distortions (of different nature and with different suprathreshold intensities) seen on top of a variety of natural images. They seem like a dream for the vision scientist to check the models in realistic scenarios. However, while these natural databases are great benchmarks for models developed in some other way (e.g. by using the well-controlled artificial stimuli of traditional psychophysics), they should be carefully used when trying to fit vision models. Given the high dimensionality of the image space, it is very likely that some basic phenomenon (e.g. sensitivity to distortions in certain environments) are under-represented in the database. Therefore, a model fitted on these large-scale natural databases will not reproduce these under-represented basic phenomena that could otherwise be easily illustrated with well selected artificial stimuli. In this work we study a specific example of the above statement. A wavelet+divisive normalization layer of a sensible cascade of linear+nonlinear layers fitted to maximize the correlation with subjective opinion of observers on a large image quality database fails to reproduce basic crossmasking. Here we outline a solution for this problem using artificial stimuli. Then, we show that the resulting model is also a competitive solution for the large-scale database. In line with Rust and Movshon (2005), our report (misrepresentation of basic visual phenomena in subjectively-rated natural image databases) is an additional argument in praise of artifice. △ Less

Submitted 29 January, 2018; originally announced January 2018.

arXiv:1711.00526 [pdf, other]

doi 10.1371/journal.pone.0201326

Derivatives and Inverse of Cascaded Linear+Nonlinear Neural Models

Authors: Marina Martinez-Garcia, Praveen Cyriac, Thomas Batard, Marcelo Bertalmio, Jesus Malo

Abstract: In vision science, cascades of Linear+Nonlinear transforms are very successful in modeling a number of perceptual experiences [Carandini&Heeger12]. However, the conventional literature is usually too focused on only describing the input->output transform. Instead, here we present the maths of such cascades beyond the forward transform, namely the Jacobians and the inverse. The fundamental reason… ▽ More In vision science, cascades of Linear+Nonlinear transforms are very successful in modeling a number of perceptual experiences [Carandini&Heeger12]. However, the conventional literature is usually too focused on only describing the input->output transform. Instead, here we present the maths of such cascades beyond the forward transform, namely the Jacobians and the inverse. The fundamental reason for this analytical treatment is that it offers useful insight into the psychophysics, the physiology, and the function of the visual system. For instance, we show how the trends of the sensitivity (discrimination regions) and the adaptation of the receptive fields can be seen in the expression of the Jacobian wrt the stimulus. This matrix also tells us which regions of the stimulus space are encoded more efficiently in multi-information terms. The Jacobian wrt the parameters shows which aspects of the model have bigger impact in the response, and hence bigger relevance. The analytic inverse implies conditions for the response and the model to ensure decoding. From an applied perspective, (a) the Jacobian wrt the stimulus is necessary in new experimental methods based on the synthesis of visual stimuli with interesting geometry, (b) the Jacobian matrices wrt the parameters are convenient to learn the model from classical experiments or alternative optimization goals, and (c) the inverse is a model-based alternative to blind machine-learning neural decoding that does not include meaningful biological information. The theory is checked by building a derivable and invertible vision model that actually follows the modular program suggested by Carandini&Heeger. To stress the generality of this modular setting we show examples where some of the canonical Divisive Normalization layers are substituted by equivalent layers such as the Wilson-Cowan model at V1, or a tone-map** model at the retina. △ Less

Submitted 20 May, 2018; v1 submitted 1 November, 2017; originally announced November 2017.

Comments: Reproducible results: associated Matlab toolbox available at http://isp.uv.es/docs/BioMultiLayer_L_NL.zip

arXiv:1606.00144 [pdf, other]

Derivatives and inverse of a linear-nonlinear multi-layer spatial vision model

Authors: Borja Galan, Marina Martinez-Garcia, Praveen Cyriac, Thomas Batard, Marcelo Bertalmio, Jesus Malo

Abstract: Linear-nonlinear transforms are interesting in vision science because they are key in modeling a number of perceptual experiences such as color, motion or spatial texture. Here we first show that a number of issues in vision may be addressed through an analytic expression of the Jacobian of these linear-nonlinear transforms. The particular model analyzed afterwards (an extension of [Malo & Simonce… ▽ More Linear-nonlinear transforms are interesting in vision science because they are key in modeling a number of perceptual experiences such as color, motion or spatial texture. Here we first show that a number of issues in vision may be addressed through an analytic expression of the Jacobian of these linear-nonlinear transforms. The particular model analyzed afterwards (an extension of [Malo & Simoncelli SPIE 2015]) is illustrative because it consists of a cascade of standard linear-nonlinear modules. Each module roughly corresponds to a known psychophysical mechanism: (1) linear spectral integration and nonlinear brightness-from-luminance computation, (2) linear pooling of local brightness and nonlinear normalization for local contrast computation, (3) linear frequency selectivity and nonlinear normalization for spatial contrast masking, and (4) linear wavelet-like decomposition and nonlinear normalization for frequency-dependent masking. Beyond being the appropriate technical report with the missing details in [Malo & Simoncelli SPIE 2015], the interest of the presented analytic results and numerical methods transcend the particular model because of the ubiquity of the linear-nonlinear structure. Part of this material was presented at MODVIS 2016 (see slides of the conference talk in the appendix at the end of this document). △ Less

Submitted 3 June, 2016; v1 submitted 1 June, 2016; originally announced June 2016.

Comments: Paper: 26 pages, 49 references, plus 32 slides from MODVIS-16 presentation

MSC Class: 92B20

Showing 1–8 of 8 results for author: Bertalmío, M