-
Joint Communication and Sensing for 6G -- A Cross-Layer Perspective
Authors:
Henk Wymeersch,
Sharief Saleh,
Ahmad Nimr,
Rreze Halili,
Rafael Berkvens,
Mohammad H. Moghaddam,
José Miguel Mateos-Ramos,
Athanasios Stavridis,
Stefan Wänstedt,
Sokratis Barmpounakis,
Basuki Priyanto,
Martin Beale,
Jaap van de Beek,
Zi Ye,
Marvin Manalastas,
Apostolos Kousaridas,
Gerhard P. Fettweis
Abstract:
As 6G emerges, cellular systems are envisioned to integrate sensing with communication capabilities, leading to multi-faceted communication and sensing (JCAS). This paper presents a comprehensive cross-layer overview of the Hexa-X-II project's endeavors in JCAS, aligning 6G use cases with service requirements and pinpointing distinct scenarios that bridge communication and sensing. This work relat…
▽ More
As 6G emerges, cellular systems are envisioned to integrate sensing with communication capabilities, leading to multi-faceted communication and sensing (JCAS). This paper presents a comprehensive cross-layer overview of the Hexa-X-II project's endeavors in JCAS, aligning 6G use cases with service requirements and pinpointing distinct scenarios that bridge communication and sensing. This work relates to these scenarios through the lens of the cross-layer physical and networking domains, covering models, deployments, resource allocation, storage challenges, computational constraints, interfaces, and innovative functions.
△ Less
Submitted 14 February, 2024;
originally announced February 2024.
-
On the Ground and in the Sky: A Tutorial on Radio Localization in Ground-Air-Space Networks
Authors:
Hazem Sallouha,
Sharief Saleh,
Sibren De Bast,
Zhuangzhuang Cui,
Sofie Pollin,
Henk Wymeersch
Abstract:
The inherent limitations in scaling up ground infrastructure for future wireless networks, combined with decreasing operational costs of aerial and space networks, are driving considerable research interest in multisegment ground-air-space (GAS) networks. In GAS networks, where ground and aerial users share network resources, ubiquitous and accurate user localization becomes indispensable, not onl…
▽ More
The inherent limitations in scaling up ground infrastructure for future wireless networks, combined with decreasing operational costs of aerial and space networks, are driving considerable research interest in multisegment ground-air-space (GAS) networks. In GAS networks, where ground and aerial users share network resources, ubiquitous and accurate user localization becomes indispensable, not only as an end-user service but also as an enabler for location-aware communications. This breaks the convention of having localization as a byproduct in networks primarily designed for communications. To address these imperative localization needs, the design and utilization of ground, aerial, and space anchors require thorough investigation. In this tutorial, we provide an in-depth systemic analysis of the radio localization problem in GAS networks, considering ground and aerial users as targets to be localized. Starting from a survey of the most relevant works, we then define the key characteristics of anchors and targets in GAS networks. Subsequently, we detail localization fundamentals in GAS networks, considering 3D positions, orientations, and velocities. Afterward, we thoroughly analyze radio localization systems in GAS networks, detailing the system model, design aspects, and considerations for each of the three GAS anchors. Preliminary results are presented to provide a quantifiable perspective on key design aspects in GAS-based localization scenarios. We then identify the vital roles 6G enablers are expected to play in radio localization in GAS networks.
△ Less
Submitted 17 June, 2024; v1 submitted 9 December, 2023;
originally announced December 2023.
-
Enhancing Supply Chain Resilience: A Machine Learning Approach for Predicting Product Availability Dates Under Disruption
Authors:
Mustafa Can Camur,
Sandipp Krishnan Ravi,
Shadi Saleh
Abstract:
The COVID 19 pandemic and ongoing political and regional conflicts have a highly detrimental impact on the global supply chain, causing significant delays in logistics operations and international shipments. One of the most pressing concerns is the uncertainty surrounding the availability dates of products, which is critical information for companies to generate effective logistics and shipment pl…
▽ More
The COVID 19 pandemic and ongoing political and regional conflicts have a highly detrimental impact on the global supply chain, causing significant delays in logistics operations and international shipments. One of the most pressing concerns is the uncertainty surrounding the availability dates of products, which is critical information for companies to generate effective logistics and shipment plans. Therefore, accurately predicting availability dates plays a pivotal role in executing successful logistics operations, ultimately minimizing total transportation and inventory costs. We investigate the prediction of product availability dates for General Electric (GE) Gas Power's inbound shipments for gas and steam turbine service and manufacturing operations, utilizing both numerical and categorical features. We evaluate several regression models, including Simple Regression, Lasso Regression, Ridge Regression, Elastic Net, Random Forest (RF), Gradient Boosting Machine (GBM), and Neural Network models. Based on real world data, our experiments demonstrate that the tree based algorithms (i.e., RF and GBM) provide the best generalization error and outperforms all other regression models tested. We anticipate that our prediction models will assist companies in managing supply chain disruptions and reducing supply chain risks on a broader scale.
△ Less
Submitted 28 April, 2023;
originally announced April 2023.
-
The IKEA ASM Dataset: Understanding People Assembling Furniture through Actions, Objects and Pose
Authors:
Yizhak Ben-Shabat,
Xin Yu,
Fatemeh Sadat Saleh,
Dylan Campbell,
Cristian Rodriguez-Opazo,
Hongdong Li,
Stephen Gould
Abstract:
The availability of a large labeled dataset is a key requirement for applying deep learning methods to solve various computer vision tasks. In the context of understanding human activities, existing public datasets, while large in size, are often limited to a single RGB camera and provide only per-frame or per-clip action annotations. To enable richer analysis and understanding of human activities…
▽ More
The availability of a large labeled dataset is a key requirement for applying deep learning methods to solve various computer vision tasks. In the context of understanding human activities, existing public datasets, while large in size, are often limited to a single RGB camera and provide only per-frame or per-clip action annotations. To enable richer analysis and understanding of human activities, we introduce IKEA ASM -- a three million frame, multi-view, furniture assembly video dataset that includes depth, atomic actions, object segmentation, and human pose. Additionally, we benchmark prominent methods for video action recognition, object segmentation and human pose estimation tasks on this challenging dataset. The dataset enables the development of holistic methods, which integrate multi-modal and multi-view data to better perform on these tasks.
△ Less
Submitted 17 May, 2023; v1 submitted 1 July, 2020;
originally announced July 2020.
-
On the Optimal Interaction Range for Multi-Agent Systems Under Adversarial Attack
Authors:
Saad J Saleh
Abstract:
Consider a consensus-driven multi-agent dynamic system. The interaction range, which defines the set of neighbors for each agent, plays a key role in influencing connectivity of the underlying network. In this paper, we assume the system is under attack by a predator and explore the question of finding the optimal interaction range that facilitates the most-efficient escape trajectories for the gr…
▽ More
Consider a consensus-driven multi-agent dynamic system. The interaction range, which defines the set of neighbors for each agent, plays a key role in influencing connectivity of the underlying network. In this paper, we assume the system is under attack by a predator and explore the question of finding the optimal interaction range that facilitates the most-efficient escape trajectories for the group of agents. We find that for many cases of interest the optimal interaction range is one that forces the network to break up into a handful of disconnected graphs, each containing a subset of agents, thus outperforming the two extreme cases corresponding to fully-connected and fully-disconnected networks. In other words, the results indicate that some connectivity among the agents is helpful because information is effectively transmitted from the agents closest to the predator to others slightly farther away, but also that too much connectivity can be detrimental to the agility of the group, thus hampering efficient and rapid escape.
△ Less
Submitted 25 April, 2020; v1 submitted 14 April, 2020;
originally announced April 2020.
-
UC-Net: Uncertainty Inspired RGB-D Saliency Detection via Conditional Variational Autoencoders
Authors:
**g Zhang,
Deng-** Fan,
Yuchao Dai,
Saeed Anwar,
Fatemeh Sadat Saleh,
Tong Zhang,
Nick Barnes
Abstract:
In this paper, we propose the first framework (UCNet) to employ uncertainty for RGB-D saliency detection by learning from the data labeling process. Existing RGB-D saliency detection methods treat the saliency detection task as a point estimation problem, and produce a single saliency map following a deterministic learning pipeline. Inspired by the saliency data labeling process, we propose probab…
▽ More
In this paper, we propose the first framework (UCNet) to employ uncertainty for RGB-D saliency detection by learning from the data labeling process. Existing RGB-D saliency detection methods treat the saliency detection task as a point estimation problem, and produce a single saliency map following a deterministic learning pipeline. Inspired by the saliency data labeling process, we propose probabilistic RGB-D saliency detection network via conditional variational autoencoders to model human annotation uncertainty and generate multiple saliency maps for each input image by sampling in the latent space. With the proposed saliency consensus process, we are able to generate an accurate saliency map based on these multiple predictions. Quantitative and qualitative evaluations on six challenging benchmark datasets against 18 competing algorithms demonstrate the effectiveness of our approach in learning the distribution of saliency maps, leading to a new state-of-the-art in RGB-D saliency detection.
△ Less
Submitted 13 April, 2020;
originally announced April 2020.
-
Contextually Plausible and Diverse 3D Human Motion Prediction
Authors:
Sadegh Aliakbarian,
Fatemeh Sadat Saleh,
Lars Petersson,
Stephen Gould,
Mathieu Salzmann
Abstract:
We tackle the task of diverse 3D human motion prediction, that is, forecasting multiple plausible future 3D poses given a sequence of observed 3D poses. In this context, a popular approach consists of using a Conditional Variational Autoencoder (CVAE). However, existing approaches that do so either fail to capture the diversity in human motion, or generate diverse but semantically implausible cont…
▽ More
We tackle the task of diverse 3D human motion prediction, that is, forecasting multiple plausible future 3D poses given a sequence of observed 3D poses. In this context, a popular approach consists of using a Conditional Variational Autoencoder (CVAE). However, existing approaches that do so either fail to capture the diversity in human motion, or generate diverse but semantically implausible continuations of the observed motion. In this paper, we address both of these problems by develo** a new variational framework that accounts for both diversity and context of the generated future motion. To this end, and in contrast to existing approaches, we condition the sampling of the latent variable that acts as source of diversity on the representation of the past observation, thus encouraging it to carry relevant information. Our experiments demonstrate that our approach yields motions not only of higher quality while retaining diversity, but also that preserve the contextual information contained in the observed 3D pose sequence.
△ Less
Submitted 5 December, 2020; v1 submitted 18 December, 2019;
originally announced December 2019.
-
A Novel Method to Generate Key-Dependent S-Boxes with Identical Algebraic Properties
Authors:
Ahmad Y. Al-Dweik,
Iqtadar Hussain,
Moutaz S. Saleh,
M. T. Mustafa
Abstract:
The s-box plays the vital role of creating confusion between the ciphertext and secret key in any cryptosystem, and is the only nonlinear component in many block ciphers. Dynamic s-boxes, as compared to static, improve entropy of the system, hence leading to better resistance against linear and differential attacks. It was shown in [2] that while incorporating dynamic s-boxes in cryptosystems is s…
▽ More
The s-box plays the vital role of creating confusion between the ciphertext and secret key in any cryptosystem, and is the only nonlinear component in many block ciphers. Dynamic s-boxes, as compared to static, improve entropy of the system, hence leading to better resistance against linear and differential attacks. It was shown in [2] that while incorporating dynamic s-boxes in cryptosystems is sufficiently secure, they do not keep non-linearity invariant. This work provides an algorithmic scheme to generate key-dependent dynamic $n\times n$ clone s-boxes having the same algebraic properties namely bijection, nonlinearity, the strict avalanche criterion (SAC), the output bits independence criterion (BIC) as of the initial seed s-box. The method is based on group action of symmetric group $S_n$ and a subgroup $S_{2^n}$ respectively on columns and rows of Boolean functions ($GF(2^n)\to GF(2)$) of s-box. Invariance of the bijection, nonlinearity, SAC, and BIC for the generated clone copies is proved. As illustration, examples are provided for $n=8$ and $n=4$ along with comparison of the algebraic properties of the clone and initial seed s-box. The proposed method is an extension of [3,4,5,6] which involved group action of $S_8$ only on columns of Boolean functions ($GF(2^8)\to GF(2)$ ) of s-box. For $n=4$, we have used an initial $4\times 4$ s-box constructed by Carlisle Adams and Stafford Tavares [7] to generated $(4!)^2$ clone copies. For $n=8$, it can be seen [3,4,5,6] that the number of clone copies that can be constructed by permuting the columns is $8!$. For each column permutation, the proposed method enables to generate $8!$ clone copies by permuting the rows.
△ Less
Submitted 3 May, 2021; v1 submitted 24 August, 2019;
originally announced August 2019.
-
Proposal-free Temporal Moment Localization of a Natural-Language Query in Video using Guided Attention
Authors:
Cristian Rodriguez-Opazo,
Edison Marrese-Taylor,
Fatemeh Sadat Saleh,
Hongdong Li,
Stephen Gould
Abstract:
This paper studies the problem of temporal moment localization in a long untrimmed video using natural language as the query. Given an untrimmed video and a sentence as the query, the goal is to determine the starting, and the ending, of the relevant visual moment in the video, that corresponds to the query sentence. While previous works have tackled this task by a propose-and-rank approach, we in…
▽ More
This paper studies the problem of temporal moment localization in a long untrimmed video using natural language as the query. Given an untrimmed video and a sentence as the query, the goal is to determine the starting, and the ending, of the relevant visual moment in the video, that corresponds to the query sentence. While previous works have tackled this task by a propose-and-rank approach, we introduce a more efficient, end-to-end trainable, and {\em proposal-free approach} that relies on three key components: a dynamic filter to transfer language information to the visual domain, a new loss function to guide our model to attend the most relevant parts of the video, and soft labels to model annotation uncertainty. We evaluate our method on two benchmark datasets, Charades-STA and ActivityNet-Captions. Experimental results show that our approach outperforms state-of-the-art methods on both datasets.
△ Less
Submitted 12 March, 2020; v1 submitted 20 August, 2019;
originally announced August 2019.
-
Learning Variations in Human Motion via Mix-and-Match Perturbation
Authors:
Mohammad Sadegh Aliakbarian,
Fatemeh Sadat Saleh,
Mathieu Salzmann,
Lars Petersson,
Stephen Gould,
Amirhossein Habibian
Abstract:
Human motion prediction is a stochastic process: Given an observed sequence of poses, multiple future motions are plausible. Existing approaches to modeling this stochasticity typically combine a random noise vector with information about the previous poses. This combination, however, is done in a deterministic manner, which gives the network the flexibility to learn to ignore the random noise. In…
▽ More
Human motion prediction is a stochastic process: Given an observed sequence of poses, multiple future motions are plausible. Existing approaches to modeling this stochasticity typically combine a random noise vector with information about the previous poses. This combination, however, is done in a deterministic manner, which gives the network the flexibility to learn to ignore the random noise. In this paper, we introduce an approach to stochastically combine the root of variations with previous pose information, which forces the model to take the noise into account. We exploit this idea for motion prediction by incorporating it into a recurrent encoder-decoder network with a conditional variational autoencoder block that learns to exploit the perturbations. Our experiments demonstrate that our model yields high-quality pose sequences that are much more diverse than those from state-of-the-art stochastic motion prediction techniques.
△ Less
Submitted 24 February, 2020; v1 submitted 2 August, 2019;
originally announced August 2019.
-
VIENA2: A Driving Anticipation Dataset
Authors:
Mohammad Sadegh Aliakbarian,
Fatemeh Sadat Saleh,
Mathieu Salzmann,
Basura Fernando,
Lars Petersson,
Lars Andersson
Abstract:
Action anticipation is critical in scenarios where one needs to react before the action is finalized. This is, for instance, the case in automated driving, where a car needs to, e.g., avoid hitting pedestrians and respect traffic lights. While solutions have been proposed to tackle subsets of the driving anticipation tasks, by making use of diverse, task-specific sensors, there is no single datase…
▽ More
Action anticipation is critical in scenarios where one needs to react before the action is finalized. This is, for instance, the case in automated driving, where a car needs to, e.g., avoid hitting pedestrians and respect traffic lights. While solutions have been proposed to tackle subsets of the driving anticipation tasks, by making use of diverse, task-specific sensors, there is no single dataset or framework that addresses them all in a consistent manner. In this paper, we therefore introduce a new, large-scale dataset, called VIENA2, covering 5 generic driving scenarios, with a total of 25 distinct action classes. It contains more than 15K full HD, 5s long videos acquired in various driving conditions, weathers, daytimes and environments, complemented with a common and realistic set of sensor measurements. This amounts to more than 2.25M frames, each annotated with an action label, corresponding to 600 samples per action class. We discuss our data acquisition strategy and the statistics of our dataset, and benchmark state-of-the-art action anticipation techniques, including a new multi-modal LSTM architecture with an effective loss function for action anticipation in driving scenarios.
△ Less
Submitted 29 October, 2018; v1 submitted 21 October, 2018;
originally announced October 2018.
-
Effective Use of Synthetic Data for Urban Scene Semantic Segmentation
Authors:
Fatemeh Sadat Saleh,
Mohammad Sadegh Aliakbarian,
Mathieu Salzmann,
Lars Petersson,
Jose M. Alvarez
Abstract:
Training a deep network to perform semantic segmentation requires large amounts of labeled data. To alleviate the manual effort of annotating real images, researchers have investigated the use of synthetic data, which can be labeled automatically. Unfortunately, a network trained on synthetic data performs relatively poorly on real images. While this can be addressed by domain adaptation, existing…
▽ More
Training a deep network to perform semantic segmentation requires large amounts of labeled data. To alleviate the manual effort of annotating real images, researchers have investigated the use of synthetic data, which can be labeled automatically. Unfortunately, a network trained on synthetic data performs relatively poorly on real images. While this can be addressed by domain adaptation, existing methods all require having access to real images during training. In this paper, we introduce a drastically different way to handle synthetic images that does not require seeing any real images at training time. Our approach builds on the observation that foreground and background classes are not affected in the same manner by the domain shift, and thus should be treated differently. In particular, the former should be handled in a detection-based manner to better account for the fact that, while their texture in synthetic images is not photo-realistic, their shape looks natural. Our experiments evidence the effectiveness of our approach on Cityscapes and CamVid with models trained on synthetic data only.
△ Less
Submitted 16 July, 2018;
originally announced July 2018.
-
Shedding Light on the Dark Corners of the Internet: A Survey of Tor Research
Authors:
Saad Saleh,
Junaid Qadir,
Muhammad U. Ilyas
Abstract:
Anonymity services have seen high growth rates with increased usage in the past few years. Among various services, Tor is one of the most popular peer-to-peer anonymizing service. In this survey paper, we summarize, analyze, classify and quantify 26 years of research on the Tor network. Our research shows that `security' and `anonymity' are the most frequent keywords associated with Tor research s…
▽ More
Anonymity services have seen high growth rates with increased usage in the past few years. Among various services, Tor is one of the most popular peer-to-peer anonymizing service. In this survey paper, we summarize, analyze, classify and quantify 26 years of research on the Tor network. Our research shows that `security' and `anonymity' are the most frequent keywords associated with Tor research studies. Quantitative analysis shows that the majority of research studies on Tor focus on `deanonymization' the design of a breaching strategy. The second most frequent topic is analysis of path selection algorithms to select more resilient paths. Analysis shows that the majority of experimental studies derived their results by deploying private testbeds while others performed simulations by develo** custom simulators. No consistent parameters have been used for Tor performance analysis. The majority of authors performed throughput and latency analysis.
△ Less
Submitted 7 March, 2018;
originally announced March 2018.
-
Bringing Background into the Foreground: Making All Classes Equal in Weakly-supervised Video Semantic Segmentation
Authors:
Fatemeh Sadat Saleh,
Mohammad Sadegh Aliakbarian,
Mathieu Salzmann,
Lars Petersson,
Jose M. Alvarez
Abstract:
Pixel-level annotations are expensive and time-consuming to obtain. Hence, weak supervision using only image tags could have a significant impact in semantic segmentation. Recent years have seen great progress in weakly-supervised semantic segmentation, whether from a single image or from videos. However, most existing methods are designed to handle a single background class. In practical applicat…
▽ More
Pixel-level annotations are expensive and time-consuming to obtain. Hence, weak supervision using only image tags could have a significant impact in semantic segmentation. Recent years have seen great progress in weakly-supervised semantic segmentation, whether from a single image or from videos. However, most existing methods are designed to handle a single background class. In practical applications, such as autonomous navigation, it is often crucial to reason about multiple background classes. In this paper, we introduce an approach to doing so by making use of classifier heatmaps. We then develop a two-stream deep architecture that jointly leverages appearance and motion, and design a loss based on our heatmaps to train it. Our experiments demonstrate the benefits of our classifier heatmaps and of our two-stream architecture on challenging urban scene datasets and on the YouTube-Objects benchmark, where we obtain state-of-the-art results.
△ Less
Submitted 15 August, 2017;
originally announced August 2017.
-
Incorporating Network Built-in Priors in Weakly-supervised Semantic Segmentation
Authors:
Fatemeh Sadat Saleh,
Mohammad Sadegh Aliakbarian,
Mathieu Salzmann,
Lars Petersson,
Jose M. Alvarez,
Stephen Gould
Abstract:
Pixel-level annotations are expensive and time consuming to obtain. Hence, weak supervision using only image tags could have a significant impact in semantic segmentation. Recently, CNN-based methods have proposed to fine-tune pre-trained networks using image tags. Without additional information, this leads to poor localization accuracy. This problem, however, was alleviated by making use of objec…
▽ More
Pixel-level annotations are expensive and time consuming to obtain. Hence, weak supervision using only image tags could have a significant impact in semantic segmentation. Recently, CNN-based methods have proposed to fine-tune pre-trained networks using image tags. Without additional information, this leads to poor localization accuracy. This problem, however, was alleviated by making use of objectness priors to generate foreground/background masks. Unfortunately these priors either require pixel-level annotations/bounding boxes, or still yield inaccurate object boundaries. Here, we propose a novel method to extract accurate masks from networks pre-trained for the task of object recognition, thus forgoing external objectness modules. We first show how foreground/background masks can be obtained from the activations of higher-level convolutional layers of a network. We then show how to obtain multi-class masks by the fusion of foreground/background ones with information extracted from a weakly-supervised localization network. Our experiments evidence that exploiting these masks in conjunction with a weakly-supervised training loss yields state-of-the-art tag-based weakly-supervised semantic segmentation results.
△ Less
Submitted 5 June, 2017;
originally announced June 2017.
-
Encouraging LSTMs to Anticipate Actions Very Early
Authors:
Mohammad Sadegh Aliakbarian,
Fatemeh Sadat Saleh,
Mathieu Salzmann,
Basura Fernando,
Lars Petersson,
Lars Andersson
Abstract:
In contrast to the widely studied problem of recognizing an action given a complete sequence, action anticipation aims to identify the action from only partially available videos. As such, it is therefore key to the success of computer vision applications requiring to react as early as possible, such as autonomous navigation. In this paper, we propose a new action anticipation method that achieves…
▽ More
In contrast to the widely studied problem of recognizing an action given a complete sequence, action anticipation aims to identify the action from only partially available videos. As such, it is therefore key to the success of computer vision applications requiring to react as early as possible, such as autonomous navigation. In this paper, we propose a new action anticipation method that achieves high prediction accuracy even in the presence of a very small percentage of a video sequence. To this end, we develop a multi-stage LSTM architecture that leverages context-aware and action-aware features, and introduce a novel loss function that encourages the model to predict the correct class as early as possible. Our experiments on standard benchmark datasets evidence the benefits of our approach; We outperform the state-of-the-art action anticipation methods for early prediction by a relative increase in accuracy of 22.0% on JHMDB-21, 14.0% on UT-Interaction and 49.9% on UCF-101.
△ Less
Submitted 13 August, 2017; v1 submitted 20 March, 2017;
originally announced March 2017.