Search | arXiv e-print repository

Scheduling Inference Workloads on Distributed Edge Clusters with Reinforcement Learning

Authors: Gabriele Castellano, Juan-José Nieto, Jordi Luque, Ferrán Diego, Carlos Segura, Diego Perino, Flavio Esposito, Fulvio Risso, Aravindh Raman

Abstract: Many real-time applications (e.g., Augmented/Virtual Reality, cognitive assistance) rely on Deep Neural Networks (DNNs) to process inference tasks. Edge computing is considered a key infrastructure to deploy such applications, as moving computation close to the data sources enables us to meet stringent latency and throughput requirements. However, the constrained nature of edge networks poses seve… ▽ More Many real-time applications (e.g., Augmented/Virtual Reality, cognitive assistance) rely on Deep Neural Networks (DNNs) to process inference tasks. Edge computing is considered a key infrastructure to deploy such applications, as moving computation close to the data sources enables us to meet stringent latency and throughput requirements. However, the constrained nature of edge networks poses several additional challenges to the management of inference workloads: edge clusters can not provide unlimited processing power to DNN models, and often a trade-off between network and processing time should be considered when it comes to end-to-end delay requirements. In this paper, we focus on the problem of scheduling inference queries on DNN models in edge networks at short timescales (i.e., few milliseconds). By means of simulations, we analyze several policies in the realistic network settings and workloads of a large ISP, highlighting the need for a dynamic scheduling policy that can adapt to network conditions and workloads. We therefore design ASET, a Reinforcement Learning based scheduling algorithm able to adapt its decisions according to the system conditions. Our results show that ASET effectively provides the best performance compared to static policies when scheduling over a distributed pool of edge resources. △ Less

Submitted 31 January, 2023; originally announced January 2023.

arXiv:2104.08086 [pdf, other]

Efficient Keyword Spotting by capturing long-range interactions with Temporal Lambda Networks

Authors: Biel Tura, Santiago Escuder, Ferran Diego, Carlos Segura, Jordi Luque

Abstract: Models based on attention mechanisms have shown unprecedented speech recognition performance. However, they are computationally expensive and unnecessarily complex for keyword spotting, a task targeted to small-footprint devices. This work explores the application of Lambda networks, an alternative framework for capturing long-range interactions without attention, for the keyword spotting task. We… ▽ More Models based on attention mechanisms have shown unprecedented speech recognition performance. However, they are computationally expensive and unnecessarily complex for keyword spotting, a task targeted to small-footprint devices. This work explores the application of Lambda networks, an alternative framework for capturing long-range interactions without attention, for the keyword spotting task. We propose a novel \textit{ResNet}-based model by swap** the residual blocks by temporal Lambda layers. Furthermore, the proposed architecture is built upon uni-dimensional temporal convolutions that further reduce its complexity. The presented model does not only reach state-of-the-art accuracies on the Google Speech Commands dataset, but it is 85% and 65% lighter than its Transformer-based (KWT) and convolutional (Res15) counterparts while being up to 100 times faster. To the best of our knowledge, this is the first attempt to explore the Lambda framework within the speech domain and therefore, we unravel further research of new interfaces based on this architecture. △ Less

Submitted 1 July, 2021; v1 submitted 16 April, 2021; originally announced April 2021.

Comments: speech recognition, keyword spotting, lambda networks

arXiv:2006.00785 [pdf, ps, other]

Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and Videos

Authors: Benet Oriol, Jordi Luque, Ferran Diego, Xavier Giro-i-Nieto

Abstract: In this work, we propose an effective approach for training unique embedding representations by combining three simultaneous modalities: image and spoken and textual narratives. The proposed methodology departs from a baseline system that spawns a embedding space trained with only spoken narratives and image cues. Our experiments on the EPIC-Kitchen and Places Audio Caption datasets show that intr… ▽ More In this work, we propose an effective approach for training unique embedding representations by combining three simultaneous modalities: image and spoken and textual narratives. The proposed methodology departs from a baseline system that spawns a embedding space trained with only spoken narratives and image cues. Our experiments on the EPIC-Kitchen and Places Audio Caption datasets show that introducing the human-generated textual transcriptions of the spoken narratives helps to the training procedure yielding to get better embedding representations. The triad speech, image and words allows for a better estimate of the point embedding and show an improving of the performance within tasks like image and speech retrieval, even when text third modality, text, is not present in the task. △ Less

Submitted 1 June, 2020; originally announced June 2020.

Comments: Accepted for presentation at EPIC@CVPR2020 workshop

arXiv:1911.07808 [pdf, other]

Unsupervised Representation Learning by Discovering Reliable Image Relations

Authors: Timo Milbich, Omair Ghori, Ferran Diego, Björn Ommer

Abstract: Learning robust representations that allow to reliably establish relations between images is of paramount importance for virtually all of computer vision. Annotating the quadratic number of pairwise relations between training images is simply not feasible, while unsupervised inference is prone to noise, thus leaving the vast majority of these relations to be unreliable. To nevertheless find those… ▽ More Learning robust representations that allow to reliably establish relations between images is of paramount importance for virtually all of computer vision. Annotating the quadratic number of pairwise relations between training images is simply not feasible, while unsupervised inference is prone to noise, thus leaving the vast majority of these relations to be unreliable. To nevertheless find those relations which can be reliably utilized for learning, we follow a divide-and-conquer strategy: We find reliable similarities by extracting compact groups of images and reliable dissimilarities by partitioning these groups into subsets, converting the complicated overall problem into few reliable local subproblems. For each of the subsets we obtain a representation by learning a map** to a target feature space so that their reliable relations are kept. Transitivity relations between the subsets are then exploited to consolidate the local solutions into a concerted global representation. While iterating between grou**, partitioning, and learning, we can successively use more and more reliable relations which, in turn, improves our image representation. In experiments, our approach shows state-of-the-art performance on unsupervised classification on ImageNet with 46.0% and competes favorably on different transfer learning tasks on PASCAL VOC. △ Less

Submitted 18 November, 2019; originally announced November 2019.

Comments: Accepted for Publication in 'Pattern Recognition Journal'

arXiv:1810.09726 [pdf, other]

CEREALS - Cost-Effective REgion-based Active Learning for Semantic Segmentation

Authors: Radek Mackowiak, Philip Lenz, Omair Ghori, Ferran Diego, Oliver Lange, Carsten Rother

Abstract: State of the art methods for semantic image segmentation are trained in a supervised fashion using a large corpus of fully labeled training images. However, gathering such a corpus is expensive, due to human annotation effort, in contrast to gathering unlabeled data. We propose an active learning-based strategy, called CEREALS, in which a human only has to hand-label a few, automatically selected,… ▽ More State of the art methods for semantic image segmentation are trained in a supervised fashion using a large corpus of fully labeled training images. However, gathering such a corpus is expensive, due to human annotation effort, in contrast to gathering unlabeled data. We propose an active learning-based strategy, called CEREALS, in which a human only has to hand-label a few, automatically selected, regions within an unlabeled image corpus. This minimizes human annotation effort while maximizing the performance of a semantic image segmentation method. The automatic selection procedure is achieved by: a) using a suitable information measure combined with an estimate about human annotation effort, which is inferred from a learned cost model, and b) exploiting the spatial coherency of an image. The performance of CEREALS is demonstrated on Cityscapes, where we are able to reduce the annotation effort to 17%, while kee** 95% of the mean Intersection over Union (mIoU) of a model that was trained with the fully annotated training set of Cityscapes. △ Less

Submitted 23 October, 2018; originally announced October 2018.

Comments: Published at British Machine Vision Conference 2018 (BMVC)

arXiv:1606.07029 [pdf, other]

Sparse convolutional coding for neuronal ensemble identification

Authors: Sven Peter, Daniel Durstewitz, Ferran Diego, Fred A. Hamprecht

Abstract: Cell ensembles, originally proposed by Donald Hebb in 1949, are subsets of synchronously firing neurons and proposed to explain basic firing behavior in the brain. Despite having been studied for many years no conclusive evidence has been presented yet for their existence and involvement in information processing such that their identification is still a topic of modern research, especially since… ▽ More Cell ensembles, originally proposed by Donald Hebb in 1949, are subsets of synchronously firing neurons and proposed to explain basic firing behavior in the brain. Despite having been studied for many years no conclusive evidence has been presented yet for their existence and involvement in information processing such that their identification is still a topic of modern research, especially since simultaneous recordings of large neuronal population have become possible in the past three decades. These large recordings pose a challenge for methods allowing to identify individual neurons forming cell ensembles and their time course of activity inside the vast amounts of spikes recorded. Related work so far focused on the identification of purely simulta- neously firing neurons using techniques such as Principal Component Analysis. In this paper we propose a new algorithm based on sparse convolution coding which is also able to find ensembles with temporal structure. Application of our algorithm to synthetically generated datasets shows that it outperforms previous work and is able to accurately identify temporal cell ensembles even when those contain overlap** neurons or when strong background noise is present. △ Less

Submitted 22 June, 2016; originally announced June 2016.

Comments: 12 pages, 6 figures

arXiv:1412.3159 [pdf, ps, other]

Road Detection via On--line Label Transfer

Authors: José M. Álvarez, Ferran Diego, Joan Serrat, Antonio M. López

Abstract: Vision-based road detection is an essential functionality for supporting advanced driver assistance systems (ADAS) such as road following and vehicle and pedestrian detection. The major challenges of road detection are dealing with shadows and lighting variations and the presence of other objects in the scene. Current road detection algorithms characterize road areas at pixel level and group pixel… ▽ More Vision-based road detection is an essential functionality for supporting advanced driver assistance systems (ADAS) such as road following and vehicle and pedestrian detection. The major challenges of road detection are dealing with shadows and lighting variations and the presence of other objects in the scene. Current road detection algorithms characterize road areas at pixel level and group pixels accordingly. However, these algorithms fail in presence of strong shadows and lighting variations. Therefore, we propose a road detection algorithm based on video alignment. The key idea of the algorithm is to exploit the similarities occurred when a vehicle follows the same trajectory more than once. In this way, road areas are learned in a first ride and then, this road knowledge is used to infer areas depicting drivable road surfaces in subsequent rides. Two different experiments are conducted to validate the proposal on different video sequences taken at different scenarios and different daytime. The former aims to perform on-line road detection. The latter aims to perform off-line road detection and is applied to automatically generate the ground-truth necessary to validate road detection algorithms. Qualitative and quantitative evaluations prove that the proposed algorithm is a valid road detection approach. △ Less

Submitted 9 December, 2014; originally announced December 2014.

arXiv:1312.4166 [pdf, ps, other]

doi 10.1088/0004-637X/782/1/40

The hard X-ray shortages prompted by the clock bursts in GS 1826--238

Authors: Ji Long, Zhang Shu, Chen YuPeng, Zhang Shuang-Nan, Torres F. Diego, Kretschmar Peter, Li Jian

Abstract: We report on a study of GS 1826--238 using all available {\it RXTE} observations, concentrating on the behavior of the hard X-rays during type-I bursts. We find a hard X-ray shortage at 30--50 keV promoted by the shower of soft X-rays coming from type-I bursts. This shortage happens with a time delay after the peak of the soft flux of 3.6 $\pm$ 1.2 seconds.The behavior of hard X-rays during bursts… ▽ More We report on a study of GS 1826--238 using all available {\it RXTE} observations, concentrating on the behavior of the hard X-rays during type-I bursts. We find a hard X-ray shortage at 30--50 keV promoted by the shower of soft X-rays coming from type-I bursts. This shortage happens with a time delay after the peak of the soft flux of 3.6 $\pm$ 1.2 seconds.The behavior of hard X-rays during bursts indicates cooling and reheating of the corona, during which a large amount of energy is required. We speculate that this energy originates from the feedback of the type-I bursts to the accretion process, resulting in a rapid temporary increase of the accretion rate. △ Less

Submitted 15 December, 2013; originally announced December 2013.

Comments: 11 pages, 4 figures, Accepted to the ApJ

Showing 1–8 of 8 results for author: Diego, F