-
Efficient data-driven encoding of scene motion using Eccentricity
Authors:
Bruno Costa,
Enrique Corona,
Mostafa Parchami,
Gint Puskorius,
Dimitar Filev
Abstract:
This paper presents a novel approach of representing dynamic visual scenes with static maps generated from video/image streams. Such representation allows easy visual assessment of motion in dynamic environments. These maps are 2D matrices calculated recursively, in a pixel-wise manner, that is based on the recently introduced concept of Eccentricity data analysis. Eccentricity works as a metric o…
▽ More
This paper presents a novel approach of representing dynamic visual scenes with static maps generated from video/image streams. Such representation allows easy visual assessment of motion in dynamic environments. These maps are 2D matrices calculated recursively, in a pixel-wise manner, that is based on the recently introduced concept of Eccentricity data analysis. Eccentricity works as a metric of a discrepancy between a particular pixel of an image and its normality model, calculated in terms of mean and variance of past readings of the same spatial region of the image. While Eccentricity maps carry temporal information about the scene, actual images do not need to be stored nor processed in batches. Rather, all the calculations are done recursively, based on a small amount of statistical information stored in memory, thus resulting in a very computationally efficient (processor- and memory-wise) method. The list of potential applications includes video-based activity recognition, intent recognition, object tracking, video description, and so on.
△ Less
Submitted 3 March, 2021;
originally announced March 2021.
-
Hierarchical Sequence to Sequence Voice Conversion with Limited Data
Authors:
Praveen Narayanan,
Punarjay Chakravarty,
Francois Charette,
Gint Puskorius
Abstract:
We present a voice conversion solution using recurrent sequence to sequence modeling for DNNs. Our solution takes advantage of recent advances in attention based modeling in the fields of Neural Machine Translation (NMT), Text-to-Speech (TTS) and Automatic Speech Recognition (ASR). The problem consists of converting between voices in a parallel setting when {\it $<$source,target$>$} audio pairs ar…
▽ More
We present a voice conversion solution using recurrent sequence to sequence modeling for DNNs. Our solution takes advantage of recent advances in attention based modeling in the fields of Neural Machine Translation (NMT), Text-to-Speech (TTS) and Automatic Speech Recognition (ASR). The problem consists of converting between voices in a parallel setting when {\it $<$source,target$>$} audio pairs are available. Our seq2seq architecture makes use of a hierarchical encoder to summarize input audio frames. On the decoder side, we use an attention based architecture used in recent TTS works. Since there is a dearth of large multispeaker voice conversion databases needed for training DNNs, we resort to training the network with a large single speaker dataset as an autoencoder. This is then adapted for the smaller multispeaker voice conversion datasets available for voice conversion. In contrast with other voice conversion works that use $F_0$, duration and linguistic features, our system uses mel spectrograms as the audio representation. Output mel frames are converted back to audio using a wavenet vocoder.
△ Less
Submitted 15 July, 2019;
originally announced July 2019.
-
Motion Guided LIDAR-camera Self-calibration and Accelerated Depth Upsampling for Autonomous Vehicles
Authors:
Juan Castorena,
Gint Puskorius,
Gaurav Pandey
Abstract:
This work proposes a novel motion guided method for target-less self-calibration of a LiDAR and camera and use the re-projection of LiDAR points onto the image reference frame for real-time depth upsampling. The calibration parameters are estimated by optimizing an objective function that penalizes distances between 2D and re-projected 3D motion vectors obtained from time-synchronized image and po…
▽ More
This work proposes a novel motion guided method for target-less self-calibration of a LiDAR and camera and use the re-projection of LiDAR points onto the image reference frame for real-time depth upsampling. The calibration parameters are estimated by optimizing an objective function that penalizes distances between 2D and re-projected 3D motion vectors obtained from time-synchronized image and point cloud sequences. For upsampling, a simple, yet effective and time efficient formulation that minimizes depth gradients subject to an equality constraint involving the LiDAR measurements is proposed. Validation is performed on recorded real data from urban environments and demonstrations that our two methods are effective and suitable to mobile robotics and autonomous vehicle applications imposing real-time requirements is shown.
△ Less
Submitted 2 July, 2020; v1 submitted 28 March, 2018;
originally announced March 2018.