Skip to main content

Showing 1–28 of 28 results for author: Sudhakaran, S

.
  1. Generative Design through Quality-Diversity Data Synthesis and Language Models

    Authors: Adam Gaier, James Stoddart, Lorenzo Villaggi, Shyam Sudhakaran

    Abstract: Two fundamental challenges face generative models in engineering applications: the acquisition of high-performing, diverse datasets, and the adherence to precise constraints in generated designs. We propose a novel approach combining optimization, constraint satisfaction, and language models to tackle these challenges in architectural design. Our method uses Quality-Diversity (QD) to generate a di… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: 8 pages, 8 figures, GECCO 2024

  2. arXiv:2307.08197  [pdf, other

    cs.NE cs.AI

    Towards Self-Assembling Artificial Neural Networks through Neural Developmental Programs

    Authors: Elias Najarro, Shyam Sudhakaran, Sebastian Risi

    Abstract: Biological nervous systems are created in a fundamentally different way than current artificial neural networks. Despite its impressive results in a variety of different domains, deep learning often requires considerable engineering effort to design high-performing neural architectures. By contrast, biological nervous systems are grown through a dynamic self-organizing process. In this paper, we t… ▽ More

    Submitted 16 July, 2023; originally announced July 2023.

  3. arXiv:2302.05981  [pdf, other

    cs.AI cs.CL cs.LG

    MarioGPT: Open-Ended Text2Level Generation through Large Language Models

    Authors: Shyam Sudhakaran, Miguel González-Duque, Claire Glanois, Matthias Freiberger, Elias Najarro, Sebastian Risi

    Abstract: Procedural Content Generation (PCG) is a technique to generate complex and diverse environments in an automated way. However, while generating content with PCG methods is often straightforward, generating meaningful content that reflects specific intentions and constraints remains challenging. Furthermore, many PCG algorithms lack the ability to generate content in an open-ended manner. Recently,… ▽ More

    Submitted 8 November, 2023; v1 submitted 12 February, 2023; originally announced February 2023.

  4. arXiv:2301.13573  [pdf, other

    cs.LG

    Skill Decision Transformer

    Authors: Shyam Sudhakaran, Sebastian Risi

    Abstract: Recent work has shown that Large Language Models (LLMs) can be incredibly effective for offline reinforcement learning (RL) by representing the traditional RL problem as a sequence modelling problem (Chen et al., 2021; Janner et al., 2021). However many of these methods only optimize for high returns, and may not extract much information from a diverse dataset of trajectories. Generalized Decision… ▽ More

    Submitted 31 January, 2023; originally announced January 2023.

  5. arXiv:2206.06674  [pdf, other

    cs.NE cs.LG q-bio.PE q-bio.TO

    Severe Damage Recovery in Evolving Soft Robots through Differentiable Programming

    Authors: Kazuya Horibe, Kathryn Walker, Rasmus Berg Palm, Shyam Sudhakaran, Sebastian Risi

    Abstract: Biological systems are very robust to morphological damage, but artificial systems (robots) are currently not. In this paper we present a system based on neural cellular automata, in which locomoting robots are evolved and then given the ability to regenerate their morphology from damage through gradient-based training. Our approach thus combines the benefits of evolution to discover a wide range… ▽ More

    Submitted 14 June, 2022; originally announced June 2022.

    Comments: Genetic Programming and Evolvable Machines (GENP). arXiv admin note: substantial text overlap with arXiv:2102.02579

  6. arXiv:2205.06806  [pdf, other

    cs.NE cs.LG

    Goal-Guided Neural Cellular Automata: Learning to Control Self-Organising Systems

    Authors: Shyam Sudhakaran, Elias Najarro, Sebastian Risi

    Abstract: Inspired by cellular growth and self-organization, Neural Cellular Automata (NCAs) have been capable of "growing" artificial cells into images, 3D structures, and even functional machines. NCAs are flexible and robust computational systems but -- similarly to many other self-organizing systems -- inherently uncontrollable during and after their growth process. We present an approach to control the… ▽ More

    Submitted 25 April, 2022; originally announced May 2022.

  7. Relevance-based Margin for Contrastively-trained Video Retrieval Models

    Authors: Alex Falcon, Swathikiran Sudhakaran, Giuseppe Serra, Sergio Escalera, Oswald Lanz

    Abstract: Video retrieval using natural language queries has attracted increasing interest due to its relevance in real-world applications, from intelligent access in private media galleries to web-scale video search. Learning the cross-similarity of video and text in a joint embedding space is the dominant approach. To do so, a contrastive loss is usually employed because it organizes the embedding space b… ▽ More

    Submitted 27 April, 2022; originally announced April 2022.

    Comments: Accepted for presentation at International Conference on Multimedia Retrieval (ICMR '22)

  8. arXiv:2204.11674  [pdf, other

    cs.NE cs.AI cs.LG

    HyperNCA: Growing Developmental Networks with Neural Cellular Automata

    Authors: Elias Najarro, Shyam Sudhakaran, Claire Glanois, Sebastian Risi

    Abstract: In contrast to deep reinforcement learning agents, biological neural networks are grown through a self-organized developmental process. Here we propose a new hypernetwork approach to grow artificial neural networks based on neural cellular automata (NCA). Inspired by self-organising systems and information-theoretic approaches to developmental biology, we show that our HyperNCA method can grow neu… ▽ More

    Submitted 25 April, 2022; originally announced April 2022.

    Comments: Paper accepted as a conference paper at ICLR 'From Cells to Societies' workshop 2022

  9. arXiv:2203.08897  [pdf, other

    cs.CV

    Gate-Shift-Fuse for Video Action Recognition

    Authors: Swathikiran Sudhakaran, Sergio Escalera, Oswald Lanz

    Abstract: Convolutional Neural Networks are the de facto models for image recognition. However 3D CNNs, the straight forward extension of 2D CNNs for video recognition, have not achieved the same success on standard action recognition benchmarks. One of the main reasons for this reduced performance of 3D CNNs is the increased computational complexity requiring large scale annotated datasets to train them in… ▽ More

    Submitted 15 April, 2023; v1 submitted 16 March, 2022; originally announced March 2022.

    Comments: Accepted to TPAMI. arXiv admin note: text overlap with arXiv:1912.00381

  10. arXiv:2201.12360  [pdf, other

    cs.NE

    Variational Neural Cellular Automata

    Authors: Rasmus Berg Palm, Miguel González-Duque, Shyam Sudhakaran, Sebastian Risi

    Abstract: In nature, the process of cellular growth and differentiation has lead to an amazing diversity of organisms -- algae, starfish, giant sequoia, tardigrades, and orcas are all created by the same generative process. Inspired by the incredible diversity of this biological generative process, we propose a generative model, the Variational Neural Cellular Automata (VNCA), which is loosely inspired by t… ▽ More

    Submitted 2 February, 2022; v1 submitted 28 January, 2022; originally announced January 2022.

    Comments: ICLR 2022

  11. arXiv:2110.02902  [pdf, ps, other

    cs.CV

    SAIC_Cambridge-HuPBA-FBK Submission to the EPIC-Kitchens-100 Action Recognition Challenge 2021

    Authors: Swathikiran Sudhakaran, Adrian Bulat, Juan-Manuel Perez-Rua, Alex Falcon, Sergio Escalera, Oswald Lanz, Brais Martinez, Georgios Tzimiropoulos

    Abstract: This report presents the technical details of our submission to the EPIC-Kitchens-100 Action Recognition Challenge 2021. To participate in the challenge we deployed spatio-temporal feature extraction and aggregation models we have developed recently: GSF and XViT. GSF is an efficient spatio-temporal feature extracting module that can be plugged into 2D CNNs for video action recognition. XViT is a… ▽ More

    Submitted 6 October, 2021; originally announced October 2021.

    Comments: Ranked third in the EPIC-Kitchens-100 Action Recognition Challenge @ CVPR 2021

  12. arXiv:2106.05968  [pdf, other

    cs.CV cs.AI cs.LG

    Space-time Mixing Attention for Video Transformer

    Authors: Adrian Bulat, Juan-Manuel Perez-Rua, Swathikiran Sudhakaran, Brais Martinez, Georgios Tzimiropoulos

    Abstract: This paper is on video recognition using Transformers. Very recent attempts in this area have demonstrated promising results in terms of recognition accuracy, yet they have been also shown to induce, in many cases, significant computational overheads due to the additional modelling of the temporal information. In this work, we propose a Video Transformer model the complexity of which scales linear… ▽ More

    Submitted 11 June, 2021; v1 submitted 10 June, 2021; originally announced June 2021.

    Comments: Updated results on SSv2

  13. arXiv:2104.02309  [pdf, other

    cs.SD cs.LG eess.AS

    MuSLCAT: Multi-Scale Multi-Level Convolutional Attention Transformer for Discriminative Music Modeling on Raw Waveforms

    Authors: Kai Middlebrook, Shyam Sudhakaran, David Guy Brizan

    Abstract: In this work, we aim to improve the expressive capacity of waveform-based discriminative music networks by modeling both sequential (temporal) and hierarchical information in an efficient end-to-end architecture. We present MuSLCAT, or Multi-scale and Multi-level Convolutional Attention Transformer, a novel architecture for learning robust representations of complex music tags directly from raw wa… ▽ More

    Submitted 6 April, 2021; originally announced April 2021.

  14. arXiv:2103.08737  [pdf, other

    cs.LG

    Growing 3D Artefacts and Functional Machines with Neural Cellular Automata

    Authors: Shyam Sudhakaran, Djordje Grbic, Siyan Li, Adam Katona, Elias Najarro, Claire Glanois, Sebastian Risi

    Abstract: Neural Cellular Automata (NCAs) have been proven effective in simulating morphogenetic processes, the continuous construction of complex structures from very few starting cells. Recent developments in NCAs lie in the 2D domain, namely reconstructing target images from a single pixel or infinitely growing 2D textures. In this work, we propose an extension of NCAs to 3D, utilizing 3D convolutions in… ▽ More

    Submitted 4 June, 2021; v1 submitted 15 March, 2021; originally announced March 2021.

    Journal ref: Proceedings of the 2021 Conference on Artificial Life (ALIFE 2021)

  15. Learning to Recognize Actions on Objects in Egocentric Video with Attention Dictionaries

    Authors: Swathikiran Sudhakaran, Sergio Escalera, Oswald Lanz

    Abstract: We present EgoACO, a deep neural architecture for video action recognition that learns to pool action-context-object descriptors from frame level features by leveraging the verb-noun structure of action labels in egocentric video datasets. The core component of EgoACO is class activation pooling (CAP), a differentiable pooling operation that combines ideas from bilinear pooling for fine-grained re… ▽ More

    Submitted 16 February, 2021; originally announced February 2021.

    Comments: Accepted to TPAMI

  16. arXiv:2006.13725  [pdf, other

    cs.CV

    FBK-HUPBA Submission to the EPIC-Kitchens Action Recognition 2020 Challenge

    Authors: Swathikiran Sudhakaran, Sergio Escalera, Oswald Lanz

    Abstract: In this report we describe the technical details of our submission to the EPIC-Kitchens Action Recognition 2020 Challenge. To participate in the challenge we deployed spatio-temporal feature extraction and aggregation models we have developed recently: Gate-Shift Module (GSM) [1] and EgoACO, an extension of Long Short-Term Attention (LSTA) [2]. We design an ensemble of GSM and EgoACO model familie… ▽ More

    Submitted 24 June, 2020; originally announced June 2020.

    Comments: Ranked 3rd in the EPIC-Kitchens action recognition challenge @ CVPR 2020

  17. arXiv:1912.00381  [pdf, other

    cs.CV

    Gate-Shift Networks for Video Action Recognition

    Authors: Swathikiran Sudhakaran, Sergio Escalera, Oswald Lanz

    Abstract: Deep 3D CNNs for video action recognition are designed to learn powerful representations in the joint spatio-temporal feature space. In practice however, because of the large number of parameters and computations involved, they may under-perform in the lack of sufficiently large datasets for training them at scale. In this paper we introduce spatial gating in spatial-temporal decomposition of 3D k… ▽ More

    Submitted 21 March, 2020; v1 submitted 1 December, 2019; originally announced December 2019.

    Comments: CVPR20 camera ready version. Code and models available at https://github.com/swathikirans/GSM

  18. arXiv:1907.01273  [pdf, other

    cs.CV

    An Analysis of Deep Neural Networks with Attention for Action Recognition from a Neurophysiological Perspective

    Authors: Swathikiran Sudhakaran, Oswald Lanz

    Abstract: We review three recent deep learning based methods for action recognition and present a brief comparative analysis of the methods from a neurophyisiological point of view. We posit that there are some analogy between the three presented deep learning based methods and some of the existing hypotheses regarding the functioning of human brain.

    Submitted 2 July, 2019; originally announced July 2019.

    Comments: Presented as an extended abstract in the Mutual benefits of cognitive and computer vision (MBCCV) workshop, CVPR 2019

  19. arXiv:1906.08960  [pdf, other

    cs.CV

    FBK-HUPBA Submission to the EPIC-Kitchens 2019 Action Recognition Challenge

    Authors: Swathikiran Sudhakaran, Sergio Escalera, Oswald Lanz

    Abstract: In this report we describe the technical details of our submission to the EPIC-Kitchens 2019 action recognition challenge. To participate in the challenge we have developed a number of CNN-LSTA [3] and HF-TSN [2] variants, and submitted predictions from an ensemble compiled out of these two model families. Our submission, visible on the public leaderboard with team name FBK-HUPBA, achieved a top-1… ▽ More

    Submitted 21 June, 2019; originally announced June 2019.

    Comments: Ranked 3rd in the EPIC-Kitchens 2019 action recognition challenge, held as part of CVPR 2019

  20. arXiv:1905.12462  [pdf, other

    cs.CV

    Hierarchical Feature Aggregation Networks for Video Action Recognition

    Authors: Swathikiran Sudhakaran, Sergio Escalera, Oswald Lanz

    Abstract: Most action recognition methods base on a) a late aggregation of frame level CNN features using average pooling, max pooling, or RNN, among others, or b) spatio-temporal aggregation via 3D convolutions. The first assume independence among frame features up to a certain level of abstraction and then perform higher-level aggregation, while the second extracts spatio-temporal features from grouped fr… ▽ More

    Submitted 29 May, 2019; originally announced May 2019.

  21. arXiv:1812.01431  [pdf, other

    cs.CL cs.AI

    Modeling natural language emergence with integral transform theory and reinforcement learning

    Authors: Bohdan Khomtchouk, Shyam Sudhakaran

    Abstract: Zipf's law predicts a power-law relationship between word rank and frequency in language communication systems and has been widely reported in a variety of natural language processing applications. However, the emergence of natural language is often modeled as a function of bias between speaker and listener interests, which lacks a direct way of relating information-theoretic bias to Zipfian rank.… ▽ More

    Submitted 30 November, 2018; originally announced December 2018.

    Comments: 9 pages, 4 figures, 2 tables. arXiv admin note: text overlap with arXiv:1603.03153

  22. arXiv:1811.10698  [pdf, other

    cs.CV

    LSTA: Long Short-Term Attention for Egocentric Action Recognition

    Authors: Swathikiran Sudhakaran, Sergio Escalera, Oswald Lanz

    Abstract: Egocentric activity recognition is one of the most challenging tasks in video analysis. It requires a fine-grained discrimination of small objects and their manipulation. While some methods base on strong supervision and attention mechanisms, they are either annotation consuming or do not take spatio-temporal patterns into account. In this paper we propose LSTA as a mechanism to focus on features… ▽ More

    Submitted 12 April, 2019; v1 submitted 26 November, 2018; originally announced November 2018.

    Comments: Accepted to CVPR 2019

  23. arXiv:1808.09892  [pdf, other

    cs.CV

    Top-down Attention Recurrent VLAD Encoding for Action Recognition in Videos

    Authors: Swathikiran Sudhakaran, Oswald Lanz

    Abstract: Most recent approaches for action recognition from video leverage deep architectures to encode the video clip into a fixed length representation vector that is then used for classification. For this to be successful, the network must be capable of suppressing irrelevant scene background and extract the representation from the most discriminative part of the video. Our contribution builds on the ob… ▽ More

    Submitted 29 August, 2018; originally announced August 2018.

    Comments: Accepted to the 17th International Conference of the Italian Association for Artificial Intelligence

  24. arXiv:1807.11794  [pdf, ps, other

    cs.CV

    Attention is All We Need: Nailing Down Object-centric Attention for Egocentric Activity Recognition

    Authors: Swathikiran Sudhakaran, Oswald Lanz

    Abstract: In this paper we propose an end-to-end trainable deep neural network model for egocentric activity recognition. Our model is built on the observation that egocentric activities are highly characterized by the objects and their locations in the video. Based on this, we develop a spatial attention mechanism that enables the network to attend to regions containing objects that are correlated with the… ▽ More

    Submitted 31 July, 2018; originally announced July 2018.

    Comments: Accepted to BMVC 2018

  25. arXiv:1709.06531  [pdf, ps, other

    cs.CV

    Learning to Detect Violent Videos using Convolutional Long Short-Term Memory

    Authors: Swathikiran Sudhakaran, Oswald Lanz

    Abstract: Develo** a technique for the automatic analysis of surveillance videos in order to identify the presence of violence is of broad interest. In this work, we propose a deep neural network for the purpose of recognizing violent videos. A convolutional neural network is used to extract frame level features from a video. The frame level features are then aggregated using a variant of the long short t… ▽ More

    Submitted 19 September, 2017; originally announced September 2017.

    Comments: Accepted in International Conference on Advanced Video and Signal based Surveillance(AVSS 2017)

  26. arXiv:1709.06495  [pdf, ps, other

    cs.CV

    Convolutional Long Short-Term Memory Networks for Recognizing First Person Interactions

    Authors: Swathikiran Sudhakaran, Oswald Lanz

    Abstract: In this paper, we present a novel deep learning based approach for addressing the problem of interaction recognition from a first person perspective. The proposed approach uses a pair of convolutional neural networks, whose parameters are shared, for extracting frame level features from successive frames of the video. The frame level features are then aggregated using a convolutional long short-te… ▽ More

    Submitted 19 September, 2017; originally announced September 2017.

    Comments: Accepted on the second International Workshop on Egocentric Perception, Interaction and Computing(EPIC) at International Conference on Computer Vision(ICCV-17)

  27. arXiv:cond-mat/0610558  [pdf, ps, other

    cond-mat.mtrl-sci

    Experimental study of the sub-wavelength imaging by a wire medium slab

    Authors: Pavel A. Belov, Yan Zhao, Sunil Sudhakaran, Akram Alomainy, Yang Hao

    Abstract: An experimental investigation of sub-wavelength imaging by a wire medium slab is performed. A complex-shaped near field source is used in order to test imaging performance of the device. It is demonstrated that the ultimate bandwidth of operation of the constructed imaging device is 4.5% that coincides with theoretical predictions [Phys. Rev. E 73, 056607 (2006)]. Within this band the wire mediu… ▽ More

    Submitted 19 October, 2006; originally announced October 2006.

    Comments: 3 pages, 3 figures, submitted to APL

  28. Sub-wavelength imaging by wire media

    Authors: Pavel A. Belov, Yang Hao, Sunil Sudhakaran

    Abstract: Original realization of a lens capable to transmit images with sub-wavelength resolution is proposed. The lens is formed by parallel conducting wires and effectively operates as a telegraph: it captures image at the front interface and the transmit it to the back interface without distortion. This regime of operation is called canalization and is inherent in flat lenses formed by electromagnetic… ▽ More

    Submitted 1 September, 2005; v1 submitted 29 August, 2005; originally announced August 2005.

    Comments: 4 pages, 4 figures, submitted to PRL

    Journal ref: Physical Review B, vol. 73, 033108 (1-4), 2006