-
Planning with Adaptive World Models for Autonomous Driving
Authors:
Arun Balajee Vasudevan,
Neehar Peri,
Jeff Schneider,
Deva Ramanan
Abstract:
Motion planning is crucial for safe navigation in complex urban environments. Historically, motion planners (MPs) have been evaluated with procedurally-generated simulators like CARLA. However, such synthetic benchmarks do not capture real-world multi-agent interactions. nuPlan, a recently released MP benchmark, addresses this limitation by augmenting real-world driving logs with closed-loop simul…
▽ More
Motion planning is crucial for safe navigation in complex urban environments. Historically, motion planners (MPs) have been evaluated with procedurally-generated simulators like CARLA. However, such synthetic benchmarks do not capture real-world multi-agent interactions. nuPlan, a recently released MP benchmark, addresses this limitation by augmenting real-world driving logs with closed-loop simulation logic, effectively turning the fixed dataset into a reactive simulator. We analyze the characteristics of nuPlan's recorded logs and find that each city has its own unique driving behaviors, suggesting that robust planners must adapt to different environments. We learn to model such unique behaviors with BehaviorNet, a graph convolutional neural network (GCNN) that predicts reactive agent behaviors using features derived from recently-observed agent histories; intuitively, some aggressive agents may tailgate lead vehicles, while others may not. To model such phenomena, BehaviorNet predicts parameters of an agent's motion controller rather than predicting its spacetime trajectory (as most forecasters do). Finally, we present AdaptiveDriver, a model-predictive control (MPC) based planner that unrolls different world models conditioned on BehaviorNet's predictions. Our extensive experiments demonstrate that AdaptiveDriver achieves state-of-the-art results on the nuPlan closed-loop planning benchmark, reducing test error from 6.4% to 4.6%, even when applied to never-before-seen cities.
△ Less
Submitted 15 June, 2024;
originally announced June 2024.
-
Leveraging AES Padding: dBs for Nothing and FEC for Free in IoT Systems
Authors:
Jongchan Woo,
Vipindev Adat Vasudevan,
Benjamin D. Kim,
Rafael G. L. D'Oliveira,
Alejandro Cohen,
Thomas Stahlbuhk,
Ken R. Duffy,
Muriel Médard
Abstract:
The Internet of Things (IoT) represents a significant advancement in digital technology, with its rapidly growing network of interconnected devices. This expansion, however, brings forth critical challenges in data security and reliability, especially under the threat of increasing cyber vulnerabilities. Addressing the security concerns, the Advanced Encryption Standard (AES) is commonly employed…
▽ More
The Internet of Things (IoT) represents a significant advancement in digital technology, with its rapidly growing network of interconnected devices. This expansion, however, brings forth critical challenges in data security and reliability, especially under the threat of increasing cyber vulnerabilities. Addressing the security concerns, the Advanced Encryption Standard (AES) is commonly employed for secure encryption in IoT systems. Our study explores an innovative use of AES, by repurposing AES padding bits for error correction and thus introducing a dual-functional method that seamlessly integrates error-correcting capabilities into the standard encryption process. The integration of the state-of-the-art Guessing Random Additive Noise Decoder (GRAND) in the receiver's architecture facilitates the joint decoding and decryption process. This strategic approach not only preserves the existing structure of the transmitter but also significantly enhances communication reliability in noisy environments, achieving a notable over 3 dB gain in Block Error Rate (BLER). Remarkably, this enhanced performance comes with a minimal power overhead at the receiver - less than 15% compared to the traditional decryption-only process, underscoring the efficiency of our hardware design for IoT applications. This paper discusses a comprehensive analysis of our approach, particularly in energy efficiency and system performance, presenting a novel and practical solution for reliable IoT communications.
△ Less
Submitted 8 May, 2024;
originally announced May 2024.
-
On the Benefits of Coding for Network Slicing
Authors:
Homa Esfahanizadeh,
Vipindev Adat Vasudevan,
Benjamin D. Kim,
Shruti Siva,
Jennifer Kim,
Alejandro Cohen,
Muriel Médard
Abstract:
Network slicing has emerged as an integral concept in 5G, aiming to partition the physical network infrastructure into isolated slices, customized for specific applications. We theoretically formulate the key performance metrics of an application, in terms of goodput and delivery delay, at a cost of network resources in terms of bandwidth. We explore an un-coded communication protocol that uses fe…
▽ More
Network slicing has emerged as an integral concept in 5G, aiming to partition the physical network infrastructure into isolated slices, customized for specific applications. We theoretically formulate the key performance metrics of an application, in terms of goodput and delivery delay, at a cost of network resources in terms of bandwidth. We explore an un-coded communication protocol that uses feedback-based repetitions, and a coded protocol, implementing random linear network coding and using coding-aware acknowledgments. We find that coding reduces the resource demands of a slice to meet the requirements for an application, thereby serving more applications efficiently. Coded slices thus free up resources for other slices, be they coded or not. Based on these results, we propose a hybrid approach, wherein coding is introduced selectively in certain network slices. This approach not only facilitates a smoother transition from un-coded systems to coded systems but also reduces costs across all slices. Theoretical findings in this paper are validated and expanded upon through real-time simulations of the network.
△ Less
Submitted 26 April, 2024;
originally announced April 2024.
-
Model-Based Reinforcement Learning Control of Reaction-Diffusion Problems
Authors:
Christina Schenk,
Aditya Vasudevan,
Maciej Haranczyk,
Ignacio Romero
Abstract:
Mathematical and computational tools have proven to be reliable in decision-making processes. In recent times, in particular, machine learning-based methods are becoming increasingly popular as advanced support tools. When dealing with control problems, reinforcement learning has been applied to decision-making in several applications, most notably in games. The success of these methods in finding…
▽ More
Mathematical and computational tools have proven to be reliable in decision-making processes. In recent times, in particular, machine learning-based methods are becoming increasingly popular as advanced support tools. When dealing with control problems, reinforcement learning has been applied to decision-making in several applications, most notably in games. The success of these methods in finding solutions to complex problems motivates the exploration of new areas where they can be employed to overcome current difficulties. In this paper, we explore the use of automatic control strategies to initial boundary value problems in thermal and disease transport. Specifically, in this work, we adapt an existing reinforcement learning algorithm using a stochastic policy gradient method and we introduce two novel reward functions to drive the flow of the transported field. The new model-based framework exploits the interactions between a reaction-diffusion model and the modified agent. The results show that certain controls can be implemented successfully in these applications, although model simplifications had to be assumed.
△ Less
Submitted 22 February, 2024;
originally announced February 2024.
-
tda-segmentor: A tool to extract and analyze local structure and porosity features in porous materials
Authors:
Aditya Vasudevan,
Jorge Zorrilla Prieto,
Sergei Zorkaltsev,
Maciej Haranczyk
Abstract:
Local geometrical features of a porous material such as the shape and size of a pore or the curvature of a solid ligament often affect the macroscopic properties of the material, and their characterization is necessary to fully understand the structure-property relationships.In this contribution, we present an approach to automatically segment large porous structures into such local features. Our…
▽ More
Local geometrical features of a porous material such as the shape and size of a pore or the curvature of a solid ligament often affect the macroscopic properties of the material, and their characterization is necessary to fully understand the structure-property relationships.In this contribution, we present an approach to automatically segment large porous structures into such local features. Our work takes inspiration from techniques available in Topological Data Analysis(TDA).In particular, using Morse theory, we generate Morse-Smale Complexes(MSC) of our structures that segment the structure, and/or its porosity into individual features that can then be compared. We develop a tool that is built on the topology toolkit (TTK) library, an open source platform for the topological analysis of scalar data, with which we can perform segmentation of these structures. Our tool takes a volumetric grid representation as an input, which can be generated from atomistic or mesh structure models and any function defined on such grid, e.g. the distance to the surface or the interaction energy with a probe. We demonstrate the applicability of the tool by two examples related with analysis of porosity in zeolite materials as well as analysis of ligaments in a porous metal structure. Specifically, by segmenting the pores in the structure we demonstrate some applications to zeolites such as assessing pore-similarity between structures or evaluating the accessible volume to a target molecule such as methane that can be adsorbed to its surface. Moreover, once the MSC's are generated, we can construct graph representations of the void space, replacing the entire pore structure by a simply connected graph. Similarly, the same tool is used to segment and generate graphs representing the solid structure and we show how they can be used to correlate structure and mechanical properties of the material.
△ Less
Submitted 27 December, 2023;
originally announced December 2023.
-
The Un-Kidnappable Robot: Acoustic Localization of Sneaking People
Authors:
Mengyu Yang,
Patrick Grady,
Samarth Brahmbhatt,
Arun Balajee Vasudevan,
Charles C. Kemp,
James Hays
Abstract:
How easy is it to sneak up on a robot? We examine whether we can detect people using only the incidental sounds they produce as they move, even when they try to be quiet. We collect a robotic dataset of high-quality 4-channel audio paired with 360 degree RGB data of people moving in different indoor settings. We train models that predict if there is a moving person nearby and their location using…
▽ More
How easy is it to sneak up on a robot? We examine whether we can detect people using only the incidental sounds they produce as they move, even when they try to be quiet. We collect a robotic dataset of high-quality 4-channel audio paired with 360 degree RGB data of people moving in different indoor settings. We train models that predict if there is a moving person nearby and their location using only audio. We implement our method on a robot, allowing it to track a single person moving quietly with only passive audio sensing. For demonstration videos, see our project page: https://sites.google.com/view/unkidnappable-robot
△ Less
Submitted 9 May, 2024; v1 submitted 5 October, 2023;
originally announced October 2023.
-
CRYPTO-MINE: Cryptanalysis via Mutual Information Neural Estimation
Authors:
Benjamin D. Kim,
Vipindev Adat Vasudevan,
Jongchan Woo,
Alejandro Cohen,
Rafael G. L. D'Oliveira,
Thomas Stahlbuhk,
Muriel Médard
Abstract:
The use of Mutual Information (MI) as a measure to evaluate the efficiency of cryptosystems has an extensive history. However, estimating MI between unknown random variables in a high-dimensional space is challenging. Recent advances in machine learning have enabled progress in estimating MI using neural networks. This work presents a novel application of MI estimation in the field of cryptography…
▽ More
The use of Mutual Information (MI) as a measure to evaluate the efficiency of cryptosystems has an extensive history. However, estimating MI between unknown random variables in a high-dimensional space is challenging. Recent advances in machine learning have enabled progress in estimating MI using neural networks. This work presents a novel application of MI estimation in the field of cryptography. We propose applying this methodology directly to estimate the MI between plaintext and ciphertext in a chosen plaintext attack. The leaked information, if any, from the encryption could potentially be exploited by adversaries to compromise the computational security of the cryptosystem. We evaluate the efficiency of our approach by empirically analyzing multiple encryption schemes and baseline approaches. Furthermore, we extend the analysis to novel network coding-based cryptosystems that provide individual secrecy and study the relationship between information leakage and input distribution.
△ Less
Submitted 18 September, 2023; v1 submitted 14 September, 2023;
originally announced September 2023.
-
CERMET: Coding for Energy Reduction with Multiple Encryption Techniques -- $It's\ easy\ being\ green$
Authors:
Jongchan Woo,
Vipindev Adat Vasudevan,
Benjamin Kim,
Alejandro Cohen,
Rafael G. L. D'Oliveira,
Thomas Stahlbuhk,
Muriel Médard
Abstract:
This paper presents CERMET, an energy-efficient hardware architecture designed for hardware-constrained cryptosystems. CERMET employs a base cryptosystem in conjunction with network coding to provide both information-theoretic and computational security while reducing energy consumption per bit. This paper introduces the hardware architecture for the system and explores various optimizations to en…
▽ More
This paper presents CERMET, an energy-efficient hardware architecture designed for hardware-constrained cryptosystems. CERMET employs a base cryptosystem in conjunction with network coding to provide both information-theoretic and computational security while reducing energy consumption per bit. This paper introduces the hardware architecture for the system and explores various optimizations to enhance its performance. The universality of the approach is demonstrated by designing the architecture to accommodate both asymmetric and symmetric cryptosystems. The analysis reveals that the benefits of this proposed approach are multifold, reducing energy per bit and area without compromising security or throughput. The optimized hardware architectures can achieve below 1 pJ/bit operations for AES-256. Furthermore, for a public key cryptosystem based on Elliptic Curve Cryptography (ECC), a remarkable 14.6X reduction in energy per bit and a 9.3X reduction in area are observed, bringing it to less than 1 nJ/bit.
△ Less
Submitted 9 August, 2023;
originally announced August 2023.
-
Practical Sliding Window Recoder: Design, Analysis, and Usecases
Authors:
Vipindev Adat Vasudevan,
Tarun Soni,
Muriel Médard
Abstract:
Network coding has been widely used as a technology to ensure efficient and reliable communication. The ability to recode packets at the intermediate nodes is a major benefit of network coding implementations. This allows the intermediate nodes to choose a different code rate and fine-tune the outgoing transmission to the channel conditions, decoupling the requirement for the source node to compen…
▽ More
Network coding has been widely used as a technology to ensure efficient and reliable communication. The ability to recode packets at the intermediate nodes is a major benefit of network coding implementations. This allows the intermediate nodes to choose a different code rate and fine-tune the outgoing transmission to the channel conditions, decoupling the requirement for the source node to compensate for cumulative losses over a multi-hop network. Block network coding solutions already have practical recoders but an on-the-fly recoder for sliding window network coding has not been studied in detail. In this paper, we present the implementation details of a practical recoder for sliding window network coding for the first time along with a comprehensive performance analysis of a multi-hop network using the recoder. The sliding window recoder ensures that the network performs closest to its capacity and that each node can use its outgoing links efficiently.
△ Less
Submitted 16 June, 2023;
originally announced June 2023.
-
Analysis and design of bistable and thermally reversible metamaterials inspired by shape-memory alloys
Authors:
Aditya Vasudevan,
José A. Rodríguez-Martínez,
Ignacio Romero
Abstract:
In this work, we study lattice structures that exhibit a bistable behavior, i. e., they can snap from one stable state to another, and are also completely reversible, capable of reverting back to its original state through a heat treatment. We design this behavior by constructing lattice structures using networks of nonlinear springs that display tension-compression asymmetry and have different th…
▽ More
In this work, we study lattice structures that exhibit a bistable behavior, i. e., they can snap from one stable state to another, and are also completely reversible, capable of reverting back to its original state through a heat treatment. We design this behavior by constructing lattice structures using networks of nonlinear springs that display tension-compression asymmetry and have different thermal expansion coefficients. The mismatch in the thermal expansion coefficients induces residual stresses in the springs which results in the lattice structure exhibiting bistability at low temperatures and monostability at high temperatures. This behavior mimics the crystallographic phase transformations of shape memory alloys, but here artificially introduced in a structural lattice. By analyzing a representative unit cell, we quantify the effect that the stiffness and the thermal expansion coefficient of the springs have on the stability of the structural lattice. In addition, for simple 2D lattices, using the concept of universal unfoldings of singularity theory, we perform a perturbation analysis to identify the key variables of the structure where controlling defects is important, as they lead to drastic changes in the bifurcation behavior of the lattice. Finally, we verify numerically our analytical predictions in both 2D and 3D simulations using continuation techniques. The examples proposed confirm that the bistable and reversible features of the unit cell carry on to the macroscale, opening the route for the design of lattice structures for energy absorption applications that can hea} with a heat treatment.
△ Less
Submitted 11 July, 2022;
originally announced July 2022.
-
An Integrated Approach for Energy Efficient Handover and Key Distribution Protocol for Secure NC-enabled Small Cells
Authors:
Vipindev Adat Vasudevan,
Muhammad Tayyab,
George P. Koudouridis,
Xavier Gelabert,
Ilias Politis
Abstract:
Future wireless networks must serve dense mobile networks with high data rates, kee** energy requirements to a possible minimum. The small cell-based network architecture and device-to-device (D2D) communication are already being considered part of 5G networks and beyond. In such environments, network coding (NC) can be employed to achieve both higher throughput and energy efficiency. However, N…
▽ More
Future wireless networks must serve dense mobile networks with high data rates, kee** energy requirements to a possible minimum. The small cell-based network architecture and device-to-device (D2D) communication are already being considered part of 5G networks and beyond. In such environments, network coding (NC) can be employed to achieve both higher throughput and energy efficiency. However, NC-enabled systems need to address security challenges specific to NC, such as pollution attacks. All integrity schemes against pollution attacks generally require proper key distribution and management to ensure security in a mobile environment. Additionally, the mobility requirements in small cell environments are more challenging and demanding in terms of signaling overhead. This paper proposes a blockchain-assisted key distribution protocol tailored for MAC-based integrity schemes, which combined with an uplink reference signal (UL RS) handover mechanism, enables energy efficient secure NC. The performance analysis of the protocol during handover scenarios indicates its suitability for ensuring high level of security against pollution attacks in dense small cell environments with multiple adversaries being present. Furthermore, the proposed scheme achieves lower bandwidth and signaling overhead during handover compared to legacy schemes and the signaling cost reduces significantly as the communication progresses, thus enhancing the network's cumulative energy efficiency.
△ Less
Submitted 17 May, 2022;
originally announced May 2022.
-
Sound and Visual Representation Learning with Multiple Pretraining Tasks
Authors:
Arun Balajee Vasudevan,
Dengxin Dai,
Luc Van Gool
Abstract:
Different self-supervised tasks (SSL) reveal different features from the data. The learned feature representations can exhibit different performance for each downstream task. In this light, this work aims to combine Multiple SSL tasks (Multi-SSL) that generalizes well for all downstream tasks. Specifically, for this study, we investigate binaural sounds and image data in isolation. For binaural so…
▽ More
Different self-supervised tasks (SSL) reveal different features from the data. The learned feature representations can exhibit different performance for each downstream task. In this light, this work aims to combine Multiple SSL tasks (Multi-SSL) that generalizes well for all downstream tasks. Specifically, for this study, we investigate binaural sounds and image data in isolation. For binaural sounds, we propose three SSL tasks namely, spatial alignment, temporal synchronization of foreground objects and binaural audio and temporal gap prediction. We investigate several approaches of Multi-SSL and give insights into the downstream task performance on video retrieval, spatial sound super resolution, and semantic prediction on the OmniAudio dataset. Our experiments on binaural sound representations demonstrate that Multi-SSL via incremental learning (IL) of SSL tasks outperforms single SSL task models and fully supervised models in the downstream task performance. As a check of applicability on other modality, we also formulate our Multi-SSL models for image representation learning and we use the recently proposed SSL tasks, MoCov2 and DenseCL. Here, Multi-SSL surpasses recent methods such as MoCov2, DenseCL and DetCo by 2.06%, 3.27% and 1.19% on VOC07 classification and +2.83, +1.56 and +1.61 AP on COCO detection. Code will be made publicly available.
△ Less
Submitted 4 January, 2022;
originally announced January 2022.
-
Binaural SoundNet: Predicting Semantics, Depth and Motion with Binaural Sounds
Authors:
Dengxin Dai,
Arun Balajee Vasudevan,
Jiri Matas,
Luc Van Gool
Abstract:
Humans can robustly recognize and localize objects by using visual and/or auditory cues. While machines are able to do the same with visual data already, less work has been done with sounds. This work develops an approach for scene understanding purely based on binaural sounds. The considered tasks include predicting the semantic masks of sound-making objects, the motion of sound-making objects, a…
▽ More
Humans can robustly recognize and localize objects by using visual and/or auditory cues. While machines are able to do the same with visual data already, less work has been done with sounds. This work develops an approach for scene understanding purely based on binaural sounds. The considered tasks include predicting the semantic masks of sound-making objects, the motion of sound-making objects, and the depth map of the scene. To this aim, we propose a novel sensor setup and record a new audio-visual dataset of street scenes with eight professional binaural microphones and a 360-degree camera. The co-existence of visual and audio cues is leveraged for supervision transfer. In particular, we employ a cross-modal distillation framework that consists of multiple vision teacher methods and a sound student method -- the student method is trained to generate the same results as the teacher methods do. This way, the auditory system can be trained without using human annotations. To further boost the performance, we propose another novel auxiliary task, coined Spatial Sound Super-Resolution, to increase the directional resolution of sounds. We then formulate the four tasks into one end-to-end trainable multi-tasking network aiming to boost the overall performance. Experimental results show that 1) our method achieves good results for all four tasks, 2) the four tasks are mutually beneficial -- training them together achieves the best performance, 3) the number and orientation of microphones are both important, and 4) features learned from the standard spectrogram and features obtained by the classic signal processing pipeline are complementary for auditory perception tasks. The data and code are released.
△ Less
Submitted 27 February, 2022; v1 submitted 6 September, 2021;
originally announced September 2021.
-
Adaptation of the tapered double cantilever beam test for the measurement of fracture energy and its variations with crack speed
Authors:
Aditya Vasudevan,
Thiago Melo Grabois,
Guilherme Chagas Cordeiro,
Stéphane Morel,
Romildo Dias Toledo Filho,
Laurent Ponson
Abstract:
In this work we present the design of a new test geometry inspired by the Tapered Double Cantilever Beam (TDCB) specimen that is shown to provide an improved characterization of the fracture properties of brittle solids. First, we show that our new design results in an exponential increase of the specimen compliance with crack length, leading to an extremely stable crack growth during the test. We…
▽ More
In this work we present the design of a new test geometry inspired by the Tapered Double Cantilever Beam (TDCB) specimen that is shown to provide an improved characterization of the fracture properties of brittle solids. First, we show that our new design results in an exponential increase of the specimen compliance with crack length, leading to an extremely stable crack growth during the test. We determine an analytical description of this behavior, which provides a simple procedure to extract the fracture energy without depending on finite element calculations. Validation tests are done on polymethylmethacrylate (PMMA) specimens. We use both finite element simulations and our analytical model to interpret the data. We find a very good agreement between the toughness determined by both methods. The stable nature of crack growth in our improved TDCB specimens results in a precise control of the crack speed. This feature is employed to go one step further and characterize the variations of toughness with crack speed. We propose an original optimization procedure for the determination of the material parameters characterizing the kinetic law describing the toughness rate dependency. Overall, the approach proposed together with the newly designed test geometry offer unprecedented possibilities for the full and accurate characterization of the fracture behavior of brittle materials such as rocks, sandstone, mortar etc.
△ Less
Submitted 12 January, 2021;
originally announced January 2021.
-
Oscillatory and tip-splitting instabilities in 2D dynamic fracture: The roles of intrinsic material length and time scales
Authors:
Aditya Vasudevan,
Yuri Lubomirsky,
Chih-Hung Chen,
Eran Bouchbinder,
Alain Karma
Abstract:
Recent theoretical and computational progress has led to unprecedented understanding of symmetry-breaking instabilities in 2D dynamic fracture. At the heart of this progress resides the identification of two intrinsic, near crack tip length scales -- a nonlinear elastic length scale $\ell$ and a dissipation length scale $ξ$ -- that do not exist in the classical theory of cracks. In particular, it…
▽ More
Recent theoretical and computational progress has led to unprecedented understanding of symmetry-breaking instabilities in 2D dynamic fracture. At the heart of this progress resides the identification of two intrinsic, near crack tip length scales -- a nonlinear elastic length scale $\ell$ and a dissipation length scale $ξ$ -- that do not exist in the classical theory of cracks. In particular, it has been shown that at a high propagation velocity $v$, cracks in 2D brittle materials undergo an oscillatory instability whose wavelength varies linearly with $\ell$, and at yet higher propagation velocities and larger loading levels, a tip-splitting instability emerges, both in agreements with experiments. In this paper, using phase-field models of brittle fracture, we demonstrate the following properties of the oscillatory instability: (i) It exists also in the absence of near-tip elastic nonlinearity, i.e. in the limit $\ell\!\to\!0$, with a wavelength determined by the dissipation length scale $ξ$. This result shows that the instability crucially depends on the existence of an intrinsic length scale associated with the breakdown of linear elasticity near crack tips, independently of whether the latter is related to nonlinear elasticity or to dissipation. (ii) It is a supercritical Hopf bifurcation, featuring a vanishing oscillations amplitude at onset. (iii) It is largely independent of the fracture energy $Γ(v)$ that is controlled by a dissipation time scale. These results substantiate the universal nature of the oscillatory instability of ultra-high speed cracks in 2D. In addition, we provide evidence indicating that the ultra-high velocity tip-splitting instability is controlled by the limiting rate of elastic energy transport inside the crack tip region. Finally, we describe in detail the numerical implementation scheme of the employed phase-field fracture approach.
△ Less
Submitted 19 February, 2021; v1 submitted 8 December, 2020;
originally announced December 2020.
-
Semantic Object Prediction and Spatial Sound Super-Resolution with Binaural Sounds
Authors:
Arun Balajee Vasudevan,
Dengxin Dai,
Luc Van Gool
Abstract:
Humans can robustly recognize and localize objects by integrating visual and auditory cues. While machines are able to do the same now with images, less work has been done with sounds. This work develops an approach for dense semantic labelling of sound-making objects, purely based on binaural sounds. We propose a novel sensor setup and record a new audio-visual dataset of street scenes with eight…
▽ More
Humans can robustly recognize and localize objects by integrating visual and auditory cues. While machines are able to do the same now with images, less work has been done with sounds. This work develops an approach for dense semantic labelling of sound-making objects, purely based on binaural sounds. We propose a novel sensor setup and record a new audio-visual dataset of street scenes with eight professional binaural microphones and a 360 degree camera. The co-existence of visual and audio cues is leveraged for supervision transfer. In particular, we employ a cross-modal distillation framework that consists of a vision `teacher' method and a sound `student' method -- the student method is trained to generate the same results as the teacher method. This way, the auditory system can be trained without using human annotations. We also propose two auxiliary tasks namely, a) a novel task on Spatial Sound Super-resolution to increase the spatial resolution of sounds, and b) dense depth prediction of the scene. We then formulate the three tasks into one end-to-end trainable multi-tasking network aiming to boost the overall performance. Experimental results on the dataset show that 1) our method achieves promising results for semantic prediction and the two auxiliary tasks; and 2) the three tasks are mutually beneficial -- training them together achieves the best performance and 3) the number and orientations of microphones are both important. The data and code will be released to facilitate the research in this new direction.
△ Less
Submitted 9 March, 2020;
originally announced March 2020.
-
Talk2Nav: Long-Range Vision-and-Language Navigation with Dual Attention and Spatial Memory
Authors:
Arun Balajee Vasudevan,
Dengxin Dai,
Luc Van Gool
Abstract:
The role of robots in society keeps expanding, bringing with it the necessity of interacting and communicating with humans. In order to keep such interaction intuitive, we provide automatic wayfinding based on verbal navigational instructions. Our first contribution is the creation of a large-scale dataset with verbal navigation instructions. To this end, we have developed an interactive visual na…
▽ More
The role of robots in society keeps expanding, bringing with it the necessity of interacting and communicating with humans. In order to keep such interaction intuitive, we provide automatic wayfinding based on verbal navigational instructions. Our first contribution is the creation of a large-scale dataset with verbal navigation instructions. To this end, we have developed an interactive visual navigation environment based on Google Street View; we further design an annotation method to highlight mined anchor landmarks and local directions between them in order to help annotators formulate typical, human references to those. The annotation task was crowdsourced on the AMT platform, to construct a new Talk2Nav dataset with $10,714$ routes. Our second contribution is a new learning method. Inspired by spatial cognition research on the mental conceptualization of navigational instructions, we introduce a soft dual attention mechanism defined over the segmented language instructions to jointly extract two partial instructions -- one for matching the next upcoming visual landmark and the other for matching the local directions to the next landmark. On the similar lines, we also introduce spatial memory scheme to encode the local directional transitions. Our work takes advantage of the advance in two lines of research: mental formalization of verbal navigational instructions and training neural network agents for automatic way finding. Extensive experiments show that our method significantly outperforms previous navigation methods. For demo video, dataset and code, please refer to our project page: https://www.trace.ethz.ch/publications/2019/talk2nav/index.html
△ Less
Submitted 22 October, 2020; v1 submitted 4 October, 2019;
originally announced October 2019.
-
Configurational stability of a crack propagating in a material with mode-dependent fracture energy -- Part II: Drift of fracture facets in mixed-mode I+II+III
Authors:
Aditya Vasudevan,
Laurent Ponson,
Jean-Baptiste Leblond,
Alain Karma
Abstract:
In earlier papers (Leblond et.al., 2011, 2019), we presented linear stability analyses of the coplanar propagation of a crack loaded in mixed-mode I+III, based on a "double'' propagation criterion combining Griffith (1920)'s energetic condition and Goldstein and Salganik (1974)'s principle of local symmetry. The difference between the two papers was that in the more recent one, the local value of…
▽ More
In earlier papers (Leblond et.al., 2011, 2019), we presented linear stability analyses of the coplanar propagation of a crack loaded in mixed-mode I+III, based on a "double'' propagation criterion combining Griffith (1920)'s energetic condition and Goldstein and Salganik (1974)'s principle of local symmetry. The difference between the two papers was that in the more recent one, the local value of the critical energy-release-rate was no longer considered as a constant, but heuristically allowed to depend upon the ratio of the local mode III to mode I stress intensity factors. This led to a much improved, qualitatively acceptable agreement of theory and experiments, for the "threshold'' value of the ratio of the unperturbed mode III to mode I stress intensity factors, above which coplanar propagation becomes unstable. In this paper, the analysis is extended to the case where a small additional mode II loading component is present in the initially planar configuration of the crack, generating a small, general kink of this crack from the moment it is applied. The main new effect resulting from presence of such a loading component is that the instability modes present above the threshold must drift along the crack front during its propagation. This prediction may be useful for future theoretical interpretations of a number of experiments where such a drifting motion was indeed observed.
△ Less
Submitted 20 September, 2019;
originally announced September 2019.
-
Time-sliced perturbation theory with primordial non-Gaussianity and effects of large bulk flows on inflationary oscillating features
Authors:
Anagha Vasudevan,
Mikhail M. Ivanov,
Sergey Sibiryakov,
Julien Lesgourgues
Abstract:
We extend the formalism of time-sliced perturbation theory (TSPT) for cosmological large-scale structure to include non-Gaussian initial conditions. We show that in such a case the TSPT interaction vertices acquire new contributions whose time-dependence factorizes for the Einstein-de Sitter cosmology. The new formulation is free from spurious infrared (IR) enhancements and reveals a clear IR stru…
▽ More
We extend the formalism of time-sliced perturbation theory (TSPT) for cosmological large-scale structure to include non-Gaussian initial conditions. We show that in such a case the TSPT interaction vertices acquire new contributions whose time-dependence factorizes for the Einstein-de Sitter cosmology. The new formulation is free from spurious infrared (IR) enhancements and reveals a clear IR structure of non-Gaussian vertices. We use the new technique to study the evolution of oscillating features in primordial statistics and show that they are damped due to non-linear effects of large bulk flows. We derive the dam** factors for the oscillating primordial power spectrum and bispectrum by means of a systematic IR resummation of relevant Feynman diagrams.
△ Less
Submitted 19 September, 2019; v1 submitted 20 June, 2019;
originally announced June 2019.
-
Configurational stability of a crack propagating in a material with mode-dependent fracture energy - Part I: Mixed-mode I+III
Authors:
Jean-Baptiste Leblond,
Alain Karma,
Laurent Ponson,
Aditya Vasudevan
Abstract:
In a previous paper (Leblond et al., 2011), we proposed a theoretical interpretation of the experimentally well known instability of coplanar crack propagation in mode I+III. The interpretation relied on a stability analysis based on analytical expressions of the stress intensity factors for a crack slightly perturbed both within and out of its original plane, due to Gao and Rice (1986) and Movcha…
▽ More
In a previous paper (Leblond et al., 2011), we proposed a theoretical interpretation of the experimentally well known instability of coplanar crack propagation in mode I+III. The interpretation relied on a stability analysis based on analytical expressions of the stress intensity factors for a crack slightly perturbed both within and out of its original plane, due to Gao and Rice (1986) and Movchan et al. (1998), coupled with a double propagation criterion combining Griffith's energetic condition and principle of local symmetry. Under such assumptions instability modes were indeed evidenced for values of the mode mixity ratio of the mode III to mode I stress intensity factors applied remotely larger than some threshold depending only on Poisson's ratio. Unfortunately, the predicted thresholds were much larger than those generally observed for typical values of this material parameter. While the subcritical character of the nonlinear bifurcation from coplanar to fragmented fronts has been proposed as a possible explanation for this discrepancy (Chen et al., 2015), we propose here an alternative explanation based on the introduction of a constitutive relationship between the fracture energy and the mode mixity ratio, which is motivated by experimental observations. By reexamining the linear stability analysis of a planar propagating front, we show that such a relationship suffices, provided that it is strong enough, to lower significantly the threshold value of the mode mixity ratio for instability so as to bring it in a range more consistent with experiments. Interesting formulae are also derived for the distributions of the perturbed stress intensity factors and energy release rate, in the special case of perturbations of the crack surface and front obeying the principle of local symmetry and having reached a stationary state.
△ Less
Submitted 14 December, 2018;
originally announced December 2018.
-
Dynamics and wetting behavior of soft particles at a fluid-fluid interface
Authors:
Siddarth A. Vasudevan,
Astrid Rauh,
Martin Kröger,
Matthias Karg,
Lucio Isa
Abstract:
We investigate the conformation, position, and dynamics of core-shell nanoparticles (CSNPs) composed of a silica core encapsulated in a cross-linked poly-N-isopropylacrylamide shell at a water-oil interface for a systematic range of core sizes and shell thicknesses. We first present a free-energy model that we use to predict the CSNP wetting behavior at the interface as a function of its geometric…
▽ More
We investigate the conformation, position, and dynamics of core-shell nanoparticles (CSNPs) composed of a silica core encapsulated in a cross-linked poly-N-isopropylacrylamide shell at a water-oil interface for a systematic range of core sizes and shell thicknesses. We first present a free-energy model that we use to predict the CSNP wetting behavior at the interface as a function of its geometrical and compositional properties in the bulk phases, which gives good agreement with our experimental data. Remarkably, upon knowledge of the polymer shell deformability, the equilibrium particle position relative to the interface plane, an often elusive experimental quantity, can be extracted by measuring its radial dimensions after adsorption. For all the systems studied here, the interfacial dimensions are always larger than in bulk and the particle core resides in a configuration wherein it just touches the interface or is fully immersed in water. Moreover, the stretched shell induces a larger viscous drag at the interface, which appears to depend solely on the interfacial dimensions, irrespective of the portion of the CSNP surface exposed to the two fluids. Our findings indicate that tailoring the architecture of CSNPs can be used to control their properties at the interface, as of interest for applications including emulsion stabilization and nanopatterning.
△ Less
Submitted 6 September, 2018;
originally announced September 2018.
-
Equations of Motion for the Standard Model Effective Field Theory: Theory and Applications
Authors:
Abdurrahman Barzinji,
Michael Trott,
Anagha Vasudevan
Abstract:
The equations of motion for the Standard Model Effective Field Theory (SMEFT) differ from those in the Standard Model. Corrections due to local contact operators modify the equations of motion and impact matching results at sub-leading order in the operator expansion. As a consequence, a matching coefficient in $\mathcal{L}^{(n)}$ (for operators of dimension $n$) can be dependent on the basis choi…
▽ More
The equations of motion for the Standard Model Effective Field Theory (SMEFT) differ from those in the Standard Model. Corrections due to local contact operators modify the equations of motion and impact matching results at sub-leading order in the operator expansion. As a consequence, a matching coefficient in $\mathcal{L}^{(n)}$ (for operators of dimension $n$) can be dependent on the basis choice for $\mathcal{L}^{(m<n)}$. We report the SMEFT equations of motion with corrections due to $\mathcal{L}^{(5,6)}$. We demonstrate the effect of these corrections when matching to sub-leading order by considering the interpretation of recently reported $B \to K^{(*)} \ell^+ \ell^-$ lepton universality anomalies in the SMEFT.
△ Less
Submitted 7 December, 2018; v1 submitted 17 June, 2018;
originally announced June 2018.
-
Object Referring in Videos with Language and Human Gaze
Authors:
Arun Balajee Vasudevan,
Dengxin Dai,
Luc Van Gool
Abstract:
We investigate the problem of object referring (OR) i.e. to localize a target object in a visual scene coming with a language description. Humans perceive the world more as continued video snippets than as static images, and describe objects not only by their appearance, but also by their spatio-temporal context and motion features. Humans also gaze at the object when they issue a referring expres…
▽ More
We investigate the problem of object referring (OR) i.e. to localize a target object in a visual scene coming with a language description. Humans perceive the world more as continued video snippets than as static images, and describe objects not only by their appearance, but also by their spatio-temporal context and motion features. Humans also gaze at the object when they issue a referring expression. Existing works for OR mostly focus on static images only, which fall short in providing many such cues. This paper addresses OR in videos with language and human gaze. To that end, we present a new video dataset for OR, with 30, 000 objects over 5, 000 stereo video sequences annotated for their descriptions and gaze. We further propose a novel network model for OR in videos, by integrating appearance, motion, gaze, and spatio-temporal context into one network. Experimental results show that our method effectively utilizes motion cues, human gaze, and spatio-temporal context. Our method outperforms previousOR methods. For dataset and code, please refer https://people.ee.ethz.ch/~arunv/ORGaze.html.
△ Less
Submitted 4 April, 2018; v1 submitted 4 January, 2018;
originally announced January 2018.
-
Object Referring in Visual Scene with Spoken Language
Authors:
Arun Balajee Vasudevan,
Dengxin Dai,
Luc Van Gool
Abstract:
Object referring has important applications, especially for human-machine interaction. While having received great attention, the task is mainly attacked with written language (text) as input rather than spoken language (speech), which is more natural. This paper investigates Object Referring with Spoken Language (ORSpoken) by presenting two datasets and one novel approach. Objects are annotated w…
▽ More
Object referring has important applications, especially for human-machine interaction. While having received great attention, the task is mainly attacked with written language (text) as input rather than spoken language (speech), which is more natural. This paper investigates Object Referring with Spoken Language (ORSpoken) by presenting two datasets and one novel approach. Objects are annotated with their locations in images, text descriptions and speech descriptions. This makes the datasets ideal for multi-modality learning. The approach is developed by carefully taking down ORSpoken problem into three sub-problems and introducing task-specific vision-language interactions at the corresponding levels. Experiments show that our method outperforms competing methods consistently and significantly. The approach is also evaluated in the presence of audio noise, showing the efficacy of the proposed vision-language interaction methods in counteracting background noise.
△ Less
Submitted 5 December, 2017; v1 submitted 10 November, 2017;
originally announced November 2017.
-
Low-memory GEMM-based convolution algorithms for deep neural networks
Authors:
Andrew Anderson,
Aravind Vasudevan,
Cormac Keane,
David Gregg
Abstract:
Deep neural networks (DNNs) require very large amounts of computation both for training and for inference when deployed in the field. A common approach to implementing DNNs is to recast the most computationally expensive operations as general matrix multiplication (GEMM). However, as we demonstrate in this paper, there are a great many different ways to express DNN convolution operations using GEM…
▽ More
Deep neural networks (DNNs) require very large amounts of computation both for training and for inference when deployed in the field. A common approach to implementing DNNs is to recast the most computationally expensive operations as general matrix multiplication (GEMM). However, as we demonstrate in this paper, there are a great many different ways to express DNN convolution operations using GEMM. Although different approaches all perform the same number of operations, the size of temporary data structures differs significantly. Convolution of an input matrix with dimensions $C \times H \times W$, requires $O(K^2CHW)$ additional space using the classical im2col approach. More recently memory-efficient approaches requiring just $O(KCHW)$ auxiliary space have been proposed.
We present two novel GEMM-based algorithms that require just $O(MHW)$ and $O(KW)$ additional space respectively, where $M$ is the number of channels in the result of the convolution. These algorithms dramatically reduce the space overhead of DNN convolution, making it much more suitable for memory-limited embedded systems. Experimental evaluation shows that our low-memory algorithms are just as fast as the best patch-building approaches despite requiring just a fraction of the amount of additional memory. Our low-memory algorithms have excellent data locality which gives them a further edge over patch-building algorithms when multiple cores are used. As a result, our low memory algorithms often outperform the best patch-building algorithms using multiple threads.
△ Less
Submitted 8 September, 2017;
originally announced September 2017.
-
Query-adaptive Video Summarization via Quality-aware Relevance Estimation
Authors:
Arun Balajee Vasudevan,
Michael Gygli,
Anna Volokitin,
Luc Van Gool
Abstract:
Although the problem of automatic video summarization has recently received a lot of attention, the problem of creating a video summary that also highlights elements relevant to a search query has been less studied. We address this problem by posing query-relevant summarization as a video frame subset selection problem, which lets us optimise for summaries which are simultaneously diverse, represe…
▽ More
Although the problem of automatic video summarization has recently received a lot of attention, the problem of creating a video summary that also highlights elements relevant to a search query has been less studied. We address this problem by posing query-relevant summarization as a video frame subset selection problem, which lets us optimise for summaries which are simultaneously diverse, representative of the entire video, and relevant to a text query. We quantify relevance by measuring the distance between frames and queries in a common textual-visual semantic embedding space induced by a neural network. In addition, we extend the model to capture query-independent properties, such as frame quality. We compare our method against previous state of the art on textual-visual embeddings for thumbnail selection and show that our model outperforms them on relevance prediction. Furthermore, we introduce a new dataset, annotated with diversity and query-specific relevance labels. On this dataset, we train and test our complete model for video summarization and show that it outperforms standard baselines such as Maximal Marginal Relevance.
△ Less
Submitted 28 September, 2017; v1 submitted 1 May, 2017;
originally announced May 2017.
-
Parallel Multi Channel Convolution using General Matrix Multiplication
Authors:
Aravind Vasudevan,
Andrew Anderson,
David Gregg
Abstract:
Convolutional neural networks (CNNs) have emerged as one of the most successful machine learning technologies for image and video processing. The most computationally intensive parts of CNNs are the convolutional layers, which convolve multi-channel images with multiple kernels. A common approach to implementing convolutional layers is to expand the image into a column matrix (im2col) and perform…
▽ More
Convolutional neural networks (CNNs) have emerged as one of the most successful machine learning technologies for image and video processing. The most computationally intensive parts of CNNs are the convolutional layers, which convolve multi-channel images with multiple kernels. A common approach to implementing convolutional layers is to expand the image into a column matrix (im2col) and perform Multiple Channel Multiple Kernel (MCMK) convolution using an existing parallel General Matrix Multiplication (GEMM) library. This im2col conversion greatly increases the memory footprint of the input matrix and reduces data locality.
In this paper we propose a new approach to MCMK convolution that is based on General Matrix Multiplication (GEMM), but not on im2col. Our algorithm eliminates the need for data replication on the input thereby enabling us to apply the convolution kernels on the input images directly. We have implemented several variants of our algorithm on a CPU processor and an embedded ARM processor. On the CPU, our algorithm is faster than im2col in most cases.
△ Less
Submitted 3 July, 2017; v1 submitted 6 April, 2017;
originally announced April 2017.
-
Mutual Inclusivity of the Critical Path and its Partial Schedule on Heterogeneous Systems
Authors:
Aravind Vasudevan,
David Gregg
Abstract:
The critical path of a group of tasks is an important measure that is commonly used to guide task allocation and scheduling on parallel computers. The critical path is the longest chain of dependencies in an acyclic task dependence graph. A problem arises on heterogeneous parallel machines where computation and communication costs can vary between different types of processor. Existing solutions f…
▽ More
The critical path of a group of tasks is an important measure that is commonly used to guide task allocation and scheduling on parallel computers. The critical path is the longest chain of dependencies in an acyclic task dependence graph. A problem arises on heterogeneous parallel machines where computation and communication costs can vary between different types of processor. Existing solutions for heterogeneous machines attempt to estimate the critical path using average values of computation and communication costs. However, this ignores opportunities to match specific tasks to specific classes of processor and communication links, and can result in quite misleading paths being identified as critical. We argue that an accurate critical path must consider the map** of tasks to classes of processor and communication links. We formulate a polynomial time algorithm to find such a critical path. Our Critical Earliest Finish Time (CEFT) algorithm finds both the length of the critical path and an allocation of tasks to processors on that path. We compared CEFT experimentally to existing approaches such as averaging execution times across processors. The latter approach fails to accurately model the execution cost of tasks, and as a result fails to identify a correct critical path in 83.99% of cases in our experiments. We also adapted a critical path-oriented scheduling algorithm (CPOP) to use our critical path algorithm and found that the resulting schedules are faster.
△ Less
Submitted 30 January, 2017;
originally announced January 2017.
-
A Novel Multipath Approach to Security in Mobile Ad Hoc Networks (MANETs)
Authors:
Rangarajan A. Vasudevan,
Sugata Sanyal
Abstract:
In this paper, we present a novel encryption-less algorithm to enhance security in transmission of data packets across mobile ad hoc networks. The paper hinges on the paradigm of multipath routing and exploits the properties of polynomials. The first step in the algorithm is to transform the data such that it is impossible to obtain any information without possessing the entire transformed data. T…
▽ More
In this paper, we present a novel encryption-less algorithm to enhance security in transmission of data packets across mobile ad hoc networks. The paper hinges on the paradigm of multipath routing and exploits the properties of polynomials. The first step in the algorithm is to transform the data such that it is impossible to obtain any information without possessing the entire transformed data. The algorithm then uses an intuitively simple idea of a jigsaw puzzle to break the transformed data into multiple packets where these packets form the pieces of the puzzle. Then these packets are sent along disjoint paths to reach the receiver. A secure and efficient mechanism is provided to convey the information that is necessary for obtaining the original data at the receiver-end from its fragments in the packets, that is, for solving the jigsaw puzzle. The algorithm is designed to be secure so that no intermediate or unintended node can obtain the entire data. An authentication code is also used to ensure authenticity of every packet.
△ Less
Submitted 23 November, 2011;
originally announced December 2011.
-
Grid Security and Integration with Minimal Performance Degradation
Authors:
Sugata Sanyal,
Rangarajan A. Vasudevan,
Ajith Abraham,
Marcin Paprzycki
Abstract:
Computational grids are believed to be the ultimate framework to meet the growing computational needs of the scientific community. Here, the processing power of geographically distributed resources working under different ownerships, having their own access policy, cost structure and the likes, is logically coupled to make them perform as a unified resource. The continuous increase of availability…
▽ More
Computational grids are believed to be the ultimate framework to meet the growing computational needs of the scientific community. Here, the processing power of geographically distributed resources working under different ownerships, having their own access policy, cost structure and the likes, is logically coupled to make them perform as a unified resource. The continuous increase of availability of high-bandwidth communication as well as powerful computers built of low-cost components further enhance chances of computational grids becoming a reality. However, the question of grid security remains one of the important open research issues. Here, we present some novel ideas about how to implement grid security, without appreciable performance degradation in grids. A suitable alternative to the computationally expensive encryption is suggested, which uses a key for message authentication. Methods of secure transfer and exchange of the required key(s) are also discussed.
△ Less
Submitted 19 November, 2011;
originally announced November 2011.
-
A Novel Scheme for Secured Data Transfer Over Computer Networks
Authors:
Rangarajan Athi Vasudevan,
Ajith Abraham,
Sugata Sanyal
Abstract:
This paper presents a novel encryption-less algorithm to enhance security in transmission of data in networks. The algorithm uses an intuitively simple idea of a "jigsaw puzzle" to break the transformed data into multiple parts where these parts form the pieces of the puzzle. Then these parts are packaged into packets and sent to the receiver. A secure and efficient mechanism is provided to conv…
▽ More
This paper presents a novel encryption-less algorithm to enhance security in transmission of data in networks. The algorithm uses an intuitively simple idea of a "jigsaw puzzle" to break the transformed data into multiple parts where these parts form the pieces of the puzzle. Then these parts are packaged into packets and sent to the receiver. A secure and efficient mechanism is provided to convey the information that is necessary for obtaining the original data at the receiver-end from its parts in the packets, that is, for solving the "jigsaw puzzle". The algorithm is designed to provide information-theoretic (that is, unconditional) security by the use of a one-time pad like scheme so that no intermediate or unintended node can obtain the entire data. A parallelizable design has been adopted for the implementation. An authentication code is also used to ensure authenticity of every packet.
△ Less
Submitted 24 February, 2010;
originally announced February 2010.