Search | arXiv e-print repository

Learning Object Semantic Similarity with Self-Supervision

Authors: Arthur Aubret, Timothy Schaumlöffel, Gemma Roig, Jochen Triesch

Abstract: Humans judge the similarity of two objects not just based on their visual appearance but also based on their semantic relatedness. However, it remains unclear how humans learn about semantic relationships between objects and categories. One important source of semantic knowledge is that semantically related objects frequently co-occur in the same context. For instance, forks and plates are perceiv… ▽ More Humans judge the similarity of two objects not just based on their visual appearance but also based on their semantic relatedness. However, it remains unclear how humans learn about semantic relationships between objects and categories. One important source of semantic knowledge is that semantically related objects frequently co-occur in the same context. For instance, forks and plates are perceived as similar, at least in part, because they are often experienced together in a ``kitchen" or ``eating'' context. Here, we investigate whether a bio-inspired learning principle exploiting such co-occurrence statistics suffices to learn a semantically structured object representation {\em de novo} from raw visual or combined visual and linguistic input. To this end, we simulate temporal sequences of visual experience by binding together short video clips of real-world scenes showing objects in different contexts. A bio-inspired neural network model aligns close-in-time visual representations while also aligning visual and category label representations to simulate visuo-language alignment. Our results show that our model clusters object representations based on their context, e.g. kitchen or bedroom, in particular in high-level layers of the network, akin to humans. In contrast, lower-level layers tend to better reflect object identity or category. To achieve this, the model exploits two distinct strategies: the visuo-language alignment ensures that different objects of the same category are represented similarly, whereas the temporal alignment leverages that objects from the same context are frequently seen in succession to make their representations more similar. Overall, our work suggests temporal and visuo-language alignment as plausible computational principles for explaining the origins of certain forms of semantic knowledge in humans. △ Less

Submitted 19 April, 2024; originally announced May 2024.

arXiv:2404.08127 [pdf, other]

Self-Supervised Learning of Color Constancy

Authors: Markus R. Ernst, Francisco M. López, Arthur Aubret, Roland W. Fleming, Jochen Triesch

Abstract: Color constancy (CC) describes the ability of the visual system to perceive an object as having a relatively constant color despite changes in lighting conditions. While CC and its limitations have been carefully characterized in humans, it is still unclear how the visual system acquires this ability during development. Here, we present a first study showing that CC develops in a neural network tr… ▽ More Color constancy (CC) describes the ability of the visual system to perceive an object as having a relatively constant color despite changes in lighting conditions. While CC and its limitations have been carefully characterized in humans, it is still unclear how the visual system acquires this ability during development. Here, we present a first study showing that CC develops in a neural network trained in a self-supervised manner through an invariance learning objective. During learning, objects are presented under changing illuminations, while the network aims to map subsequent views of the same object onto close-by latent representations. This gives rise to representations that are largely invariant to the illumination conditions, offering a plausible example of how CC could emerge during human cognitive development via a form of self-supervised learning. △ Less

Submitted 11 April, 2024; originally announced April 2024.

Comments: 7 pages, 5 figures, submitted to the IEEE International Conference on Development and Learning (ICDL 2024)

arXiv:2312.04318 [pdf, other]

MIMo: A Multi-Modal Infant Model for Studying Cognitive Development

Authors: Dominik Mattern, Pierre Schumacher, Francisco M. López, Marcel C. Raabe, Markus R. Ernst, Arthur Aubret, Jochen Triesch

Abstract: Human intelligence and human consciousness emerge gradually during the process of cognitive development. Understanding this development is an essential aspect of understanding the human mind and may facilitate the construction of artificial minds with similar properties. Importantly, human cognitive development relies on embodied interactions with the physical and social environment, which is perc… ▽ More Human intelligence and human consciousness emerge gradually during the process of cognitive development. Understanding this development is an essential aspect of understanding the human mind and may facilitate the construction of artificial minds with similar properties. Importantly, human cognitive development relies on embodied interactions with the physical and social environment, which is perceived via complementary sensory modalities. These interactions allow the develo** mind to probe the causal structure of the world. This is in stark contrast to common machine learning approaches, e.g., for large language models, which are merely passively ``digesting'' large amounts of training data, but are not in control of their sensory inputs. However, computational modeling of the kind of self-determined embodied interactions that lead to human intelligence and consciousness is a formidable challenge. Here we present MIMo, an open-source multi-modal infant model for studying early cognitive development through computer simulations. MIMo's body is modeled after an 18-month-old child with detailed five-fingered hands. MIMo perceives its surroundings via binocular vision, a vestibular system, proprioception, and touch perception through a full-body virtual skin, while two different actuation models allow control of his body. We describe the design and interfaces of MIMo and provide examples illustrating its use. All code is available at https://github.com/trieschlab/MIMo . △ Less

Submitted 7 December, 2023; originally announced December 2023.

Comments: 11 pages, 8 figures. Submitted to IEEE Transactions on Congnitive and Developmental Systems (TCDS)

arXiv:2312.04118 [pdf, other]

doi 10.1109/ICDL55364.2023.10364409

Caregiver Talk Shapes Toddler Vision: A Computational Study of Dyadic Play

Authors: Timothy Schaumlöffel, Arthur Aubret, Gemma Roig, Jochen Triesch

Abstract: Infants' ability to recognize and categorize objects develops gradually. The second year of life is marked by both the emergence of more semantic visual representations and a better understanding of word meaning. This suggests that language input may play an important role in sha** visual representations. However, even in suitable contexts for word learning like dyadic play sessions, caregivers… ▽ More Infants' ability to recognize and categorize objects develops gradually. The second year of life is marked by both the emergence of more semantic visual representations and a better understanding of word meaning. This suggests that language input may play an important role in sha** visual representations. However, even in suitable contexts for word learning like dyadic play sessions, caregivers utterances are sparse and ambiguous, often referring to objects that are different from the one to which the child attends. Here, we systematically investigate to what extent caregivers' utterances can nevertheless enhance visual representations. For this we propose a computational model of visual representation learning during dyadic play. We introduce a synthetic dataset of ego-centric images perceived by a toddler-agent that moves and rotates toy objects in different parts of its home environment while hearing caregivers' utterances, modeled as captions. We propose to model toddlers' learning as simultaneously aligning representations for 1) close-in-time images and 2) co-occurring images and utterances. We show that utterances with statistics matching those of real caregivers give rise to representations supporting improved category recognition. Our analysis reveals that a small decrease/increase in object-relevant naming frequencies can drastically impact the learned representations. This affects the attention on object names within an utterance, which is required for efficient visuo-linguistic alignment. Overall, our results support the hypothesis that caregivers' naming utterances can improve toddlers' visual representations. △ Less

Submitted 17 January, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

Comments: Proceedings of the 2023 IEEE International Conference on Development and Learning (ICDL)

Journal ref: "Caregiver Talk Shapes Toddler Vision: A Computational Study of Dyadic Play," 2023 IEEE International Conference on Development and Learning (ICDL), Macau, China, 2023, pp. 67-72

arXiv:2302.02330 [pdf, other]

CIPER: Combining Invariant and Equivariant Representations Using Contrastive and Predictive Learning

Authors: Xia Xu, Jochen Triesch

Abstract: Self-supervised representation learning (SSRL) methods have shown great success in computer vision. In recent studies, augmentation-based contrastive learning methods have been proposed for learning representations that are invariant or equivariant to pre-defined data augmentation operations. However, invariant or equivariant features favor only specific downstream tasks depending on the augmentat… ▽ More Self-supervised representation learning (SSRL) methods have shown great success in computer vision. In recent studies, augmentation-based contrastive learning methods have been proposed for learning representations that are invariant or equivariant to pre-defined data augmentation operations. However, invariant or equivariant features favor only specific downstream tasks depending on the augmentations chosen. They may result in poor performance when the learned representation does not match task requirements. Here, we consider an active observer that can manipulate views of an object and has knowledge of the action(s) that generated each view. We introduce Contrastive Invariant and Predictive Equivariant Representation learning (CIPER). CIPER comprises both invariant and equivariant learning objectives using one shared encoder and two different output heads on top of the encoder. One output head is a projection head with a state-of-the-art contrastive objective to encourage invariance to augmentations. The other is a prediction head estimating the augmentation parameters, capturing equivariant features. Both heads are discarded after training and only the encoder is used for downstream tasks. We evaluate our method on static image tasks and time-augmented image datasets. Our results show that CIPER outperforms a baseline contrastive method on various tasks. Interestingly, CIPER encourages the formation of hierarchically structured representations where different views of an object become systematically organized in the latent representation space. △ Less

Submitted 18 July, 2023; v1 submitted 5 February, 2023; originally announced February 2023.

Comments: 12 pages, 4 figures, 3 tables

MSC Class: I.2; I.4

arXiv:2210.09871 [pdf, other]

Sequence and Circle: Exploring the Relationship Between Patches

Authors: Zhengyang Yu, Jochen Triesch

Abstract: The vision transformer (ViT) has achieved state-of-the-art results in various vision tasks. It utilizes a learnable position embedding (PE) mechanism to encode the location of each image patch. However, it is presently unclear if this learnable PE is really necessary and what its benefits are. This paper explores two alternative ways of encoding the location of individual patches that exploit prio… ▽ More The vision transformer (ViT) has achieved state-of-the-art results in various vision tasks. It utilizes a learnable position embedding (PE) mechanism to encode the location of each image patch. However, it is presently unclear if this learnable PE is really necessary and what its benefits are. This paper explores two alternative ways of encoding the location of individual patches that exploit prior knowledge about their spatial arrangement. One is called the sequence relationship embedding (SRE), and the other is called the circle relationship embedding (CRE). Among them, the SRE considers all patches to be in order, and adjacent patches have the same interval distance. The CRE considers the central patch as the center of the circle and measures the distance of the remaining patches from the center based on the four neighborhoods principle. Multiple concentric circles with different radii combine different patches. Finally, we implemented these two relations on three classic ViTs and tested them on four popular datasets. Experiments show that SRE and CRE can replace PE to reduce the random learnable parameters while achieving the same performance. Combining SRE or CRE with PE gets better performance than only using PE. △ Less

Submitted 19 October, 2022; v1 submitted 18 October, 2022; originally announced October 2022.

Comments: 7 pages, 1 figure

arXiv:2207.13492 [pdf, other]

Time to augment self-supervised visual representation learning

Authors: Arthur Aubret, Markus Ernst, Céline Teulière, Jochen Triesch

Abstract: Biological vision systems are unparalleled in their ability to learn visual representations without supervision. In machine learning, self-supervised learning (SSL) has led to major advances in forming object representations in an unsupervised fashion. Such systems learn representations invariant to augmentation operations over images, like crop** or flip**. In contrast, biological vision syst… ▽ More Biological vision systems are unparalleled in their ability to learn visual representations without supervision. In machine learning, self-supervised learning (SSL) has led to major advances in forming object representations in an unsupervised fashion. Such systems learn representations invariant to augmentation operations over images, like crop** or flip**. In contrast, biological vision systems exploit the temporal structure of the visual experience during natural interactions with objects. This gives access to "augmentations" not commonly used in SSL, like watching the same object from multiple viewpoints or against different backgrounds. Here, we systematically investigate and compare the potential benefits of such time-based augmentations during natural interactions for learning object categories. Our results show that time-based augmentations achieve large performance gains over state-of-the-art image augmentations. Specifically, our analyses reveal that: 1) 3-D object manipulations drastically improve the learning of object categories; 2) viewing objects against changing backgrounds is important for learning to discard background-related information from the latent representation. Overall, we conclude that time-based augmentations during natural interactions with objects can substantially improve self-supervised learning, narrowing the gap between artificial and biological vision systems. △ Less

Submitted 21 December, 2022; v1 submitted 27 July, 2022; originally announced July 2022.

Comments: 20 pages

arXiv:2206.09621 [pdf, other]

Degeneracy in epilepsy: Multiple Routes to Hyperexcitable Brain Circuits and their Repair

Authors: Tristan Manfred Stöber, Danylo Batulin, Jochen Triesch, Rishikesh Narayanan, Peter Jedlicka

Abstract: Develo** effective therapies against epilepsy remains a challenge. The complex and multifaceted nature of this disease still fuels controversies about its origin. In this perspective article, we argue that conflicting hypotheses can be reconciled by taking into account the degeneracy of the brain, which manifests in multiple routes leading to similar function or dysfunction. We exemplify degener… ▽ More Develo** effective therapies against epilepsy remains a challenge. The complex and multifaceted nature of this disease still fuels controversies about its origin. In this perspective article, we argue that conflicting hypotheses can be reconciled by taking into account the degeneracy of the brain, which manifests in multiple routes leading to similar function or dysfunction. We exemplify degeneracy at three different levels, ranging from the cellular to the network and systems level. First, at the cellular level, we describe the relevance of ion channel degeneracy for epilepsy and discuss its interplay with dendritic morphology. Second, at the network level, we provide examples for the degeneracy of synaptic and intrinsic neuronal properties that supports the robustness of neuronal networks but also leads to diverse responses to ictogenic and epileptogenic perturbations. Third, at the system level, we provide examples for degeneracy in the intricate interactions between the immune and nervous system. Finally, we show that computational approaches including multiscale and so called population neural circuit models help disentangle the complex web of physiological and pathological adaptations. Such models may contribute to identifying the best personalized multitarget strategies for directing the system towards a physiological state. △ Less

Submitted 20 June, 2022; originally announced June 2022.

Comments: 66 pages, 4 figures

arXiv:2205.06198 [pdf, other]

Embodied vision for learning object representations

Authors: Arthur Aubret, Céline Teulière, Jochen Triesch

Abstract: Recent time-contrastive learning approaches manage to learn invariant object representations without supervision. This is achieved by map** successive views of an object onto close-by internal representations. When considering this learning approach as a model of the development of human object recognition, it is important to consider what visual input a toddler would typically observe while int… ▽ More Recent time-contrastive learning approaches manage to learn invariant object representations without supervision. This is achieved by map** successive views of an object onto close-by internal representations. When considering this learning approach as a model of the development of human object recognition, it is important to consider what visual input a toddler would typically observe while interacting with objects. First, human vision is highly foveated, with high resolution only available in the central region of the field of view. Second, objects may be seen against a blurry background due to infants' limited depth of field. Third, during object manipulation a toddler mostly observes close objects filling a large part of the field of view due to their rather short arms. Here, we study how these effects impact the quality of visual representations learnt through time-contrastive learning. To this end, we let a visually embodied agent "play" with objects in different locations of a near photo-realistic flat. During each play session the agent views an object in multiple orientations before turning its body to view another object. The resulting sequence of views feeds a time-contrastive learning algorithm. Our results show that visual statistics mimicking those of a toddler improve object recognition accuracy in both familiar and novel environments. We argue that this effect is caused by the reduction of features extracted in the background, a neural network bias for large features in the image and a greater similarity between novel and familiar background regions. We conclude that the embodied nature of visual learning may be crucial for understanding the development of human object perception. △ Less

Submitted 12 May, 2022; originally announced May 2022.

Comments: 6 pages

arXiv:2112.08845 [pdf, other]

Multiple Instance Learning for Brain Tumor Detection from Magnetic Resonance Spectroscopy Data

Authors: Diyuan Lu, Gerhard Kurz, Nenad Polomac, Iskra Gacheva, Elke Hattingen, Jochen Triesch

Abstract: We apply deep learning (DL) on Magnetic resonance spectroscopy (MRS) data for the task of brain tumor detection. Medical applications often suffer from data scarcity and corruption by noise. Both of these problems are prominent in our data set. Furthermore, a varying number of spectra are available for the different patients. We address these issues by considering the task as a multiple instance l… ▽ More We apply deep learning (DL) on Magnetic resonance spectroscopy (MRS) data for the task of brain tumor detection. Medical applications often suffer from data scarcity and corruption by noise. Both of these problems are prominent in our data set. Furthermore, a varying number of spectra are available for the different patients. We address these issues by considering the task as a multiple instance learning (MIL) problem. Specifically, we aggregate multiple spectra from the same patient into a "bag" for classification and apply data augmentation techniques. To achieve the permutation invariance during the process of bagging, we proposed two approaches: (1) to apply min-, max-, and average-pooling on the features of all samples in one bag and (2) to apply an attention mechanism. We tested these two approaches on multiple neural network architectures. We demonstrate that classification performance is significantly improved when training on multiple instances rather than single spectra. We propose a simple oversampling data augmentation method and show that it could further improve the performance. Finally, we demonstrate that our proposed model outperforms manual classification by neuroradiologists according to most performance metrics. △ Less

Submitted 16 December, 2021; originally announced December 2021.

arXiv:2104.10615 [pdf, ps, other]

Recurrent Feedback Improves Recognition of Partially Occluded Objects

Authors: Markus Roland Ernst, Jochen Triesch, Thomas Burwick

Abstract: Recurrent connectivity in the visual cortex is believed to aid object recognition for challenging conditions such as occlusion. Here we investigate if and how artificial neural networks also benefit from recurrence. We compare architectures composed of bottom-up, lateral and top-down connections and evaluate their performance using two novel stereoscopic occluded object datasets. We find that clas… ▽ More Recurrent connectivity in the visual cortex is believed to aid object recognition for challenging conditions such as occlusion. Here we investigate if and how artificial neural networks also benefit from recurrence. We compare architectures composed of bottom-up, lateral and top-down connections and evaluate their performance using two novel stereoscopic occluded object datasets. We find that classification accuracy is significantly higher for recurrent models when compared to feedforward models of matched parametric complexity. Additionally we show that for challenging stimuli, the recurrent feedback is able to correctly revise the initial feedforward guess. △ Less

Submitted 21 April, 2021; originally announced April 2021.

Comments: 6 pages, 2 figures, 28th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN 2020). arXiv admin note: substantial text overlap with arXiv:1909.06175

Journal ref: Proceedings of the 28th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (2020) 327-332

arXiv:2103.05100 [pdf, other]

doi 10.1007/978-3-319-97628-0_7

Learning Hierarchical Integration of Foveal and Peripheral Vision for Vergence Control by Active Efficient Coding

Authors: Zhetuo Zhao, Jochen Triesch, Bertram E. Shi

Abstract: The active efficient coding (AEC) framework parsimoniously explains the joint development of visual processing and eye movements, e.g., the emergence of binocular disparity selective neurons and fusional vergence, the disjunctive eye movements that align left and right eye images. Vergence can be driven by information in both the fovea and periphery, which play complementary roles. The high resolu… ▽ More The active efficient coding (AEC) framework parsimoniously explains the joint development of visual processing and eye movements, e.g., the emergence of binocular disparity selective neurons and fusional vergence, the disjunctive eye movements that align left and right eye images. Vergence can be driven by information in both the fovea and periphery, which play complementary roles. The high resolution fovea can drive precise short range movements. The lower resolution periphery supports coarser long range movements. The fovea and periphery may also contain conflicting information, e.g. due to objects at different depths. While past AEC models did integrate peripheral and foveal information, they did not explicitly take into account these characteristics. We propose here a two-level hierarchical approach that does. The bottom level generates different vergence actions from foveal and peripheral regions. The top level selects one. We demonstrate that the hierarchical approach performs better than prior approaches in realistic environments, exhibiting better alignment and less oscillation. △ Less

Submitted 29 January, 2021; originally announced March 2021.

arXiv:2101.11391 [pdf, ps, other]

Self-Calibrating Active Binocular Vision via Active Efficient Coding with Deep Autoencoders

Authors: Charles Wilmot, Bertram E. Shi, Jochen Triesch

Abstract: We present a model of the self-calibration of active binocular vision comprising the simultaneous learning of visual representations, vergence, and pursuit eye movements. The model follows the principle of Active Efficient Coding (AEC), a recent extension of the classic Efficient Coding Hypothesis to active perception. In contrast to previous AEC models, the present model uses deep autoencoders to… ▽ More We present a model of the self-calibration of active binocular vision comprising the simultaneous learning of visual representations, vergence, and pursuit eye movements. The model follows the principle of Active Efficient Coding (AEC), a recent extension of the classic Efficient Coding Hypothesis to active perception. In contrast to previous AEC models, the present model uses deep autoencoders to learn sensory representations. We also propose a new formulation of the intrinsic motivation signal that guides the learning of behavior. We demonstrate the performance of the model in simulations. △ Less

Submitted 27 January, 2021; originally announced January 2021.

arXiv:2101.11376 [pdf, other]

Learning Abstract Representations through Lossy Compression of Multi-Modal Signals

Authors: Charles Wilmot, Gianluca Baldassarre, Jochen Triesch

Abstract: A key competence for open-ended learning is the formation of increasingly abstract representations useful for driving complex behavior. Abstract representations ignore specific details and facilitate generalization. Here we consider the learning of abstract representations in a multi-modal setting with two or more input modalities. We treat the problem as a lossy compression problem and show that… ▽ More A key competence for open-ended learning is the formation of increasingly abstract representations useful for driving complex behavior. Abstract representations ignore specific details and facilitate generalization. Here we consider the learning of abstract representations in a multi-modal setting with two or more input modalities. We treat the problem as a lossy compression problem and show that generic lossy compression of multimodal sensory input naturally extracts abstract representations that tend to strip away modalitiy specific details and preferentially retain information that is shared across the different modalities. Furthermore, we propose an architecture to learn abstract representations by identifying and retaining only the information that is shared across multiple modalities while discarding any modality specific information. △ Less

Submitted 3 September, 2021; v1 submitted 27 January, 2021; originally announced January 2021.

arXiv:2011.13880 [pdf, other]

REAL-X -- Robot open-Ended Autonomous Learning Architectures: Achieving Truly End-to-End Sensorimotor Autonomous Learning Systems

Authors: Emilio Cartoni, Davide Montella, Jochen Triesch, Gianluca Baldassarre

Abstract: Open-ended learning is a core research field of developmental robotics and AI aiming to build learning machines and robots that can autonomously acquire knowledge and skills incrementally as infants and children. The first contribution of this work is to study the challenges posed by the previously proposed benchmark `REAL competition' aiming to foster the development of truly open-ended learning… ▽ More Open-ended learning is a core research field of developmental robotics and AI aiming to build learning machines and robots that can autonomously acquire knowledge and skills incrementally as infants and children. The first contribution of this work is to study the challenges posed by the previously proposed benchmark `REAL competition' aiming to foster the development of truly open-ended learning robot architectures. The competition involves a simulated camera-arm robot that: (a) in a first `intrinsic phase' acquires sensorimotor competence by autonomously interacting with objects; (b) in a second `extrinsic phase' is tested with tasks unknown in the intrinsic phase to measure the quality of knowledge previously acquired. This benchmark requires the solution of multiple challenges usually tackled in isolation, in particular exploration, sparse-rewards, object learning, generalisation, task/goal self-generation, and autonomous skill learning. As a second contribution, we present a set of `REAL-X' robot architectures that are able to solve different versions of the benchmark, where we progressively release initial simplifications. The architectures are based on a planning approach that dynamically increases abstraction, and intrinsic motivations to foster exploration. REAL-X achieves a good performance level in very demanding conditions. We argue that the REAL benchmark represents a valuable tool for studying open-ended learning in its hardest form. △ Less

Submitted 2 March, 2022; v1 submitted 27 November, 2020; originally announced November 2020.

Comments: 14 pages, 13 figures. Improved version of the REAL baseline including better exploration

ACM Class: I.2.9

arXiv:2006.12285 [pdf, other]

Human-Expert-Level Brain Tumor Detection Using Deep Learning with Data Distillation and Augmentation

Authors: Diyuan Lu, Nenad Polomac, Iskra Gacheva, Elke Hattingen, Jochen Triesch

Abstract: The application of Deep Learning (DL) for medical diagnosis is often hampered by two problems. First, the amount of training data may be scarce, as it is limited by the number of patients who have acquired the condition to be diagnosed. Second, the training data may be corrupted by various types of noise. Here, we study the problem of brain tumor detection from magnetic resonance spectroscopy (MRS… ▽ More The application of Deep Learning (DL) for medical diagnosis is often hampered by two problems. First, the amount of training data may be scarce, as it is limited by the number of patients who have acquired the condition to be diagnosed. Second, the training data may be corrupted by various types of noise. Here, we study the problem of brain tumor detection from magnetic resonance spectroscopy (MRS) data, where both types of problems are prominent. To overcome these challenges, we propose a new method for training a deep neural network that distills particularly representative training examples and augments the training data by mixing these samples from one class with those from the same and other classes to create additional training samples. We demonstrate that this technique substantially improves performance, allowing our method to reach human-expert-level accuracy with just a few thousand training examples. Interestingly, the network learns to rely on features of the data that are usually ignored by human experts, suggesting new directions for future research. △ Less

Submitted 16 July, 2020; v1 submitted 17 June, 2020; originally announced June 2020.

Comments: Submitted to IEEE Transactions on Neural Networks and Learning Systems

arXiv:2006.09885 [pdf, other]

Staging Epileptogenesis with Deep Neural Networks

Authors: Diyuan Lu, Sebastian Bauer, Valentin Neubert, Lara Sophie Costard, Felix Rosenow, Jochen Triesch

Abstract: Epilepsy is a common neurological disorder characterized by recurrent seizures accompanied by excessive synchronous brain activity. The process of structural and functional brain alterations leading to increased seizure susceptibility and eventually spontaneous seizures is called epileptogenesis (EPG) and can span months or even years. Detecting and monitoring the progression of EPG could allow fo… ▽ More Epilepsy is a common neurological disorder characterized by recurrent seizures accompanied by excessive synchronous brain activity. The process of structural and functional brain alterations leading to increased seizure susceptibility and eventually spontaneous seizures is called epileptogenesis (EPG) and can span months or even years. Detecting and monitoring the progression of EPG could allow for targeted early interventions that could slow down disease progression or even halt its development. Here, we propose an approach for staging EPG using deep neural networks and identify potential electroencephalography (EEG) biomarkers to distinguish different phases of EPG. Specifically, continuous intracranial EEG recordings were collected from a rodent model where epilepsy is induced by electrical perforant pathway stimulation (PPS). A deep neural network (DNN) is trained to distinguish EEG signals from before stimulation (baseline), shortly after the PPS and long after the PPS but before the first spontaneous seizure (FSS). Experimental results show that our proposed method can classify EEG signals from the three phases with an average area under the curve (AUC) of 0.93, 0.89, and 0.86. To the best of our knowledge, this represents the first successful attempt to stage EPG prior to the FSS using DNNs. △ Less

Submitted 17 June, 2020; originally announced June 2020.

arXiv:2006.06675 [pdf, other]

Towards Early Diagnosis of Epilepsy from EEG Data

Authors: Diyuan Lu, Sebastian Bauer, Valentin Neubert, Lara Sophie Costard, Felix Rosenow, Jochen Triesch

Abstract: Epilepsy is one of the most common neurological disorders, affecting about 1% of the population at all ages. Detecting the development of epilepsy, i.e., epileptogenesis (EPG), before any seizures occur could allow for early interventions and potentially more effective treatments. Here, we investigate if modern machine learning (ML) techniques can detect EPG from intra-cranial electroencephalograp… ▽ More Epilepsy is one of the most common neurological disorders, affecting about 1% of the population at all ages. Detecting the development of epilepsy, i.e., epileptogenesis (EPG), before any seizures occur could allow for early interventions and potentially more effective treatments. Here, we investigate if modern machine learning (ML) techniques can detect EPG from intra-cranial electroencephalography (EEG) recordings prior to the occurrence of any seizures. For this we use a rodent model of epilepsy where EPG is triggered by electrical stimulation of the brain. We propose a ML framework for EPG identification, which combines a deep convolutional neural network (CNN) with a prediction aggregation method to obtain the final classification decision. Specifically, the neural network is trained to distinguish five second segments of EEG recordings taken from either the pre-stimulation period or the post-stimulation period. Due to the gradual development of epilepsy, there is enormous overlap of the EEG patterns before and after the stimulation. Hence, a prediction aggregation process is introduced, which pools predictions over a longer period. By aggregating predictions over one hour, our approach achieves an area under the curve (AUC) of 0.99 on the EPG detection task. This demonstrates the feasibility of EPG prediction from EEG recordings. △ Less

Submitted 17 June, 2020; v1 submitted 11 June, 2020; originally announced June 2020.

Comments: Machine Learning for Healthcare conference 2020

arXiv:1909.06175 [pdf, other]

Recurrent Connectivity Aids Recognition of Partly Occluded Objects

Authors: Markus Roland Ernst, Jochen Triesch, Thomas Burwick

Abstract: Feedforward convolutional neural networks are the prevalent model of core object recognition. For challenging conditions, such as occlusion, neuroscientists believe that the recurrent connectivity in the visual cortex aids object recognition. In this work we investigate if and how artificial neural networks can also benefit from recurrent connectivity. For this we systematically compare architectu… ▽ More Feedforward convolutional neural networks are the prevalent model of core object recognition. For challenging conditions, such as occlusion, neuroscientists believe that the recurrent connectivity in the visual cortex aids object recognition. In this work we investigate if and how artificial neural networks can also benefit from recurrent connectivity. For this we systematically compare architectures comprised of bottom-up (B), lateral (L) and top-down (T) connections. To evaluate performance, we introduce two novel stereoscopic occluded object datasets, which bridge the gap from classifying digits to recognizing 3D objects. The task consists of recognizing one target object occluded by multiple occluder objects. We find that recurrent models perform significantly better than their feedforward counterparts, which were matched in parametric complexity. We show that for challenging stimuli, the recurrent feedback is able to correctly revise the initial feedforward guess of the network. Overall, our results suggest that both artificial and biological neural networks can exploit recurrence for improved object recognition. △ Less

Submitted 12 September, 2019; originally announced September 2019.

Comments: 9 pages, 3 figures. arXiv admin note: text overlap with arXiv:1907.08831

arXiv:1907.08831 [pdf, other]

doi 10.1007/978-3-030-30508-6_24

Recurrent Connections Aid Occluded Object Recognition by Discounting Occluders

Authors: Markus Roland Ernst, Jochen Triesch, Thomas Burwick

Abstract: Recurrent connections in the visual cortex are thought to aid object recognition when part of the stimulus is occluded. Here we investigate if and how recurrent connections in artificial neural networks similarly aid object recognition. We systematically test and compare architectures comprised of bottom-up (B), lateral (L) and top-down (T) connections. Performance is evaluated on a novel stereosc… ▽ More Recurrent connections in the visual cortex are thought to aid object recognition when part of the stimulus is occluded. Here we investigate if and how recurrent connections in artificial neural networks similarly aid object recognition. We systematically test and compare architectures comprised of bottom-up (B), lateral (L) and top-down (T) connections. Performance is evaluated on a novel stereoscopic occluded object recognition dataset. The task consists of recognizing one target digit occluded by multiple occluder digits in a pseudo-3D environment. We find that recurrent models perform significantly better than their feedforward counterparts, which were matched in parametric complexity. Furthermore, we analyze how the network's representation of the stimuli evolves over time due to recurrent connections. We show that the recurrent connections tend to move the network's representation of an occluded digit towards its un-occluded version. Our results suggest that both the brain and artificial neural networks can exploit recurrent connectivity to aid occluded object recognition. △ Less

Submitted 11 September, 2019; v1 submitted 20 July, 2019; originally announced July 2019.

Comments: 13 pages, 5 figures, accepted at the 28th International Conference on Artificial Neural Networks, published in Springer Lecture Notes in Computer Science vol 11729

Journal ref: In: Tetko, I. V. et al. (eds.) ICANN 2019. LNCS, vol 11729. Springer, Cham, pp 294-305

arXiv:1903.08100 [pdf, other]

Residual Deep Convolutional Neural Network for EEG Signal Classification in Epilepsy

Authors: Diyuan Lu, Jochen Triesch

Abstract: Epilepsy is the fourth most common neurological disorder, affecting about 1% of the population at all ages. As many as 60% of people with epilepsy experience focal seizures which originate in a certain brain area and are limited to part of one cerebral hemisphere. In focal epilepsy patients, a precise surgical removal of the seizure onset zone can lead to effective seizure control or even a seizur… ▽ More Epilepsy is the fourth most common neurological disorder, affecting about 1% of the population at all ages. As many as 60% of people with epilepsy experience focal seizures which originate in a certain brain area and are limited to part of one cerebral hemisphere. In focal epilepsy patients, a precise surgical removal of the seizure onset zone can lead to effective seizure control or even a seizure-free outcome. Thus, correct identification of the seizure onset zone is essential. For clinical evaluation purposes, electroencephalography (EEG) recordings are commonly used. However, their interpretation is usually done manually by physicians and is time-consuming and error-prone. In this work, we propose an automated epileptic signal classification method based on modern deep learning methods. In contrast to previous approaches, the network is trained directly on the EEG recordings, avoiding hand-crafted feature extraction and selection procedures. This exploits the ability of deep neural networks to detect and extract relevant features automatically, that may be too complex or subtle to be noticed by humans. The proposed network structure is based on a convolutional neural network with residual connections. We demonstrate that our network produces state-of-the-art performance on two benchmark data sets, a data set from Bonn University and the Bern-Barcelona data set. We conclude that modern deep learning approaches can reach state-of-the-art performance on epileptic EEG classification and automated seizure onset zone identification tasks when trained on raw EEG data. This suggests that such approaches have potential for improving clinical practice. △ Less

Submitted 19 March, 2019; originally announced March 2019.

arXiv:1609.04245 [pdf, other]

Non-random network connectivity comes in pairs

Authors: Felix Z. Hoffmann, Jochen Triesch

Abstract: Overrepresentation of bidirectional connections in local cortical networks has been repeatedly reported and is in the focus of the ongoing discussion of non-random connectivity. Here we show in a brief mathematical analysis that in a network in which connection probabilities are symmetric in pairs, $P_{ij} = P_{ji}$, the occurrence of bidirectional connections and non-random structures are inheren… ▽ More Overrepresentation of bidirectional connections in local cortical networks has been repeatedly reported and is in the focus of the ongoing discussion of non-random connectivity. Here we show in a brief mathematical analysis that in a network in which connection probabilities are symmetric in pairs, $P_{ij} = P_{ji}$, the occurrence of bidirectional connections and non-random structures are inherently linked; an overabundance of reciprocally connected pairs emerges necessarily when the network structure deviates from a random network in any form. △ Less

Submitted 13 December, 2016; v1 submitted 14 September, 2016; originally announced September 2016.

Comments: 16 pages, 3 figures

arXiv:1606.06443 [pdf]

An active efficient coding model of the optokinetic nystagmus

Authors: Chong Zhang, Jochen Triesch, Bertram E. Shi

Abstract: Optokinetic nystagmus (OKN) is an involuntary eye movement responsible for stabilizing retinal images in the presence of relative motion between an observer and the environment. Fully understanding the development of optokinetic nystagmus requires a neurally plausible computational model that accounts for the neural development and the behavior. To date, work in this area has been limited. We prop… ▽ More Optokinetic nystagmus (OKN) is an involuntary eye movement responsible for stabilizing retinal images in the presence of relative motion between an observer and the environment. Fully understanding the development of optokinetic nystagmus requires a neurally plausible computational model that accounts for the neural development and the behavior. To date, work in this area has been limited. We propose a neurally plausible framework for the joint development of disparity and motion tuning in the visual cortex, the optokinetic and vergence eye movements. This framework models the joint emergence of both perception and behavior, and accounts for the importance of the development of normal vergence control and binocular vision in achieving normal monocular OKN (mOKN) behaviors. Because the model includes behavior, we can simulate the same perturbations as performed in past experiments, such as artificially induced strabismus. The proposed model agrees both qualitatively and quantitatively with a number of findings from the literature on both binocular vision as well as the optokinetic reflex. Finally, our model also makes quantitative predictions about the OKN behavior using the same methods used to characterize the OKN in the experimental literature. △ Less

Submitted 11 October, 2016; v1 submitted 21 June, 2016; originally announced June 2016.

arXiv:1402.3344 [pdf]

Intrinsically Motivated Learning of Visual Motion Perception and Smooth Pursuit

Authors: Chong Zhang, Yu Zhao, Jochen Triesch, Bertram E. Shi

Abstract: We extend the framework of efficient coding, which has been used to model the development of sensory processing in isolation, to model the development of the perception/action cycle. Our extension combines sparse coding and reinforcement learning so that sensory processing and behavior co-develop to optimize a shared intrinsic motivational signal: the fidelity of the neural encoding of the sensory… ▽ More We extend the framework of efficient coding, which has been used to model the development of sensory processing in isolation, to model the development of the perception/action cycle. Our extension combines sparse coding and reinforcement learning so that sensory processing and behavior co-develop to optimize a shared intrinsic motivational signal: the fidelity of the neural encoding of the sensory input under resource constraints. Applying this framework to a model system consisting of an active eye behaving in a time varying environment, we find that this generic principle leads to the simultaneous development of both smooth pursuit behavior and model neurons whose properties are similar to those of primary visual cortical neurons selective for different directions of visual motion. We suggest that this general principle may form the basis for a unified and integrated explanation of many perception/action loops. △ Less

Submitted 24 February, 2014; v1 submitted 13 February, 2014; originally announced February 2014.

Comments: 6 pages, 5 figures

Showing 1–24 of 24 results for author: Triesch, J