Search | arXiv e-print repository

EEG2Rep: Enhancing Self-supervised EEG Representation Through Informative Masked Inputs

Authors: Navid Mohammadi Foumani, Geoffrey Mackellar, Soheila Ghane, Saad Irtza, Nam Nguyen, Mahsa Salehi

Abstract: Self-supervised approaches for electroencephalography (EEG) representation learning face three specific challenges inherent to EEG data: (1) The low signal-to-noise ratio which challenges the quality of the representation learned, (2) The wide range of amplitudes from very small to relatively large due to factors such as the inter-subject variability, risks the models to be dominated by higher amp… ▽ More Self-supervised approaches for electroencephalography (EEG) representation learning face three specific challenges inherent to EEG data: (1) The low signal-to-noise ratio which challenges the quality of the representation learned, (2) The wide range of amplitudes from very small to relatively large due to factors such as the inter-subject variability, risks the models to be dominated by higher amplitude ranges, and (3) The absence of explicit segmentation in the continuous-valued sequences which can result in less informative representations. To address these challenges, we introduce \textit{EEG2Rep}, a self-prediction approach for self-supervised representation learning from EEG. Two core novel components of EEG2Rep are as follows: 1) Instead of learning to predict the masked input from raw EEG, EEG2Rep learns to predict masked input in latent representation space, and 2) Instead of conventional masking methods, EEG2Rep uses a new semantic subsequence preserving (SSP) method which provides informative masked inputs to guide EEG2Rep to generate rich semantic representations. In experiments on 6 diverse EEG tasks with subject variability, EEG2Rep significantly outperforms state-of-the-art methods. We show that our semantic subsequence preserving improves the existing masking methods in self-prediction literature and find that preserving 50\% of EEG recordings will result in the most accurate results on all 6 tasks on average. Finally, we show that EEG2Rep is robust to noise addressing a significant challenge that exists in EEG data. Models and code are available at:\url{https://github.com/Navidfoumani/EEG2Rep} △ Less

Submitted 18 June, 2024; v1 submitted 17 February, 2024; originally announced February 2024.

arXiv:2402.14982 [pdf, other]

Human Brain Exhibits Distinct Patterns When Listening to Fake Versus Real Audio: Preliminary Evidence

Authors: Mahsa Salehi, Kalin Stefanov, Ehsan Shareghi

Abstract: In this paper we study the variations in human brain activity when listening to real and fake audio. Our preliminary results suggest that the representations learned by a state-of-the-art deepfake audio detection algorithm, do not exhibit clear distinct patterns between real and fake audio. In contrast, human brain activity, as measured by EEG, displays distinct patterns when individuals are expos… ▽ More In this paper we study the variations in human brain activity when listening to real and fake audio. Our preliminary results suggest that the representations learned by a state-of-the-art deepfake audio detection algorithm, do not exhibit clear distinct patterns between real and fake audio. In contrast, human brain activity, as measured by EEG, displays distinct patterns when individuals are exposed to fake versus real audio. This preliminary evidence enables future research directions in areas such as deepfake audio detection. △ Less

Submitted 14 March, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

Comments: 9 pages, 4 figures, 3 tables

arXiv:2402.02725 [pdf]

Cybersickness Detection through Head Movement Patterns: A Promising Approach

Authors: Masoud Salehi, Nikoo Javadpour, Brietta Beisner, Mohammadamin Sanaei, Stephen B. Gilbert

Abstract: Despite the widespread adoption of Virtual Reality (VR) technology, cybersickness remains a barrier for some users. This research investigates head movement patterns as a novel physiological marker for cybersickness detection. Unlike traditional markers, head movements provide a continuous, non-invasive measure that can be easily captured through the sensors embedded in all commercial VR headsets.… ▽ More Despite the widespread adoption of Virtual Reality (VR) technology, cybersickness remains a barrier for some users. This research investigates head movement patterns as a novel physiological marker for cybersickness detection. Unlike traditional markers, head movements provide a continuous, non-invasive measure that can be easily captured through the sensors embedded in all commercial VR headsets. We used a publicly available dataset from a VR experiment involving 75 participants and analyzed head movements across six axes. An extensive feature extraction process was then performed on the head movement dataset and its derivatives, including velocity, acceleration, and jerk. Three categories of features were extracted, encompassing statistical, temporal, and spectral features. Subsequently, we employed the Recursive Feature Elimination method to select the most important and effective features. In a series of experiments, we trained a variety of machine learning algorithms. The results demonstrate a 76% accuracy and 83% precision in predicting cybersickness in the subjects based on the head movements. This study contribution to the cybersickness literature lies in offering a preliminary analysis of a new source of data and providing insight into the relationship of head movements and cybersickness. △ Less

Submitted 26 February, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

Comments: 18 pages, 3 Figures, 3 Tables

arXiv:2312.08646 [pdf, other]

Guarding the Grid: Enhancing Resilience in Automated Residential Demand Response Against False Data Injection Attacks

Authors: Thusitha Dayaratne, Carsten Rudolph, Ariel Liebman, Mahsa Salehi

Abstract: Utility companies are increasingly leveraging residential demand flexibility and the proliferation of smart/IoT devices to enhance the effectiveness of residential demand response (DR) programs through automated device scheduling. However, the adoption of distributed architectures in these systems exposes them to the risk of false data injection attacks (FDIAs), where adversaries can manipulate de… ▽ More Utility companies are increasingly leveraging residential demand flexibility and the proliferation of smart/IoT devices to enhance the effectiveness of residential demand response (DR) programs through automated device scheduling. However, the adoption of distributed architectures in these systems exposes them to the risk of false data injection attacks (FDIAs), where adversaries can manipulate decision-making processes by injecting false data. Given the limited control utility companies have over these distributed systems and data, the need for reliable implementations to enhance the resilience of residential DR schemes against FDIAs is paramount. In this work, we present a comprehensive framework that combines DR optimisation, anomaly detection, and strategies for mitigating the impacts of attacks to create a resilient and automated device scheduling system. To validate the robustness of our framework against FDIAs, we performed an evaluation using real-world data sets, highlighting its effectiveness in securing residential DR systems. △ Less

Submitted 13 December, 2023; originally announced December 2023.

arXiv:2312.02839 [pdf, other]

Low-complexity Linear Multicast Beamforming for Cache-aided MIMO Communications

Authors: Mohammad NaseriTehrani, MohammadJavad Salehi, Antti Tölli

Abstract: A practical and scalable multicast beamformer design in multi-input multi-output~(MIMO) coded caching~(CC) systems is introduced in this paper. The proposed approach allows multicast transmission to multiple groups with partially overlap** user sets using receiver dimensions to distinguish between different group-specific streams. Additionally, it provides flexibility in accommodating various pa… ▽ More A practical and scalable multicast beamformer design in multi-input multi-output~(MIMO) coded caching~(CC) systems is introduced in this paper. The proposed approach allows multicast transmission to multiple groups with partially overlap** user sets using receiver dimensions to distinguish between different group-specific streams. Additionally, it provides flexibility in accommodating various parameter configurations of the MIMO-CC setup and overcomes practical limitations, such as the requirement to use successive interference cancellation~(SIC) at the receiver, while achieving the same degrees-of-freedom~(DoF). To evaluate the proposed scheme, we define the symmetric rate as the sum rate of the partially overlap** streams received per user, comprising a linear multistream multicast transmission vector and the linear minimum mean square error~(LMMSE) receiver. The resulting non-convex symmetric rate maximization problem is solved using alternative optimization and successive convex approximation~(SCA). Moreover, a fast iterative Lagrangian-based algorithm is developed, significantly reducing the computational overhead compared to previous designs. The effectiveness of our proposed method is demonstrated by extensive simulations. △ Less

Submitted 5 December, 2023; originally announced December 2023.

arXiv:2309.07131 [pdf, other]

Wideband High Gain Metasurface-Based 4T4R MIMO antenna with Highly Isolated Ports for Sub-6 GHz 5G Applications

Authors: Mahdi Salehi, Homayoon Oraizi

Abstract: This study presents the design of four $178\times178$ $(mm)^{2}$ wideband, high gain, highly efficient metasurface-based 4T4R MIMO (Multiple-Input Multiple-Output) antennas with highly isolated ports, covering the middle and a portion of the upper bands of the sub 6 GHz 5G frequency spectrum for 5G-based systems, such as IoT (Internet of Things) applications, vehicular communications (e.g., roofto… ▽ More This study presents the design of four $178\times178$ $(mm)^{2}$ wideband, high gain, highly efficient metasurface-based 4T4R MIMO (Multiple-Input Multiple-Output) antennas with highly isolated ports, covering the middle and a portion of the upper bands of the sub 6 GHz 5G frequency spectrum for 5G-based systems, such as IoT (Internet of Things) applications, vehicular communications (e.g., rooftop antennas of cars or trains), smart industries (e.g., farms and factories). The radiating elements of these antennas use the aperture-coupled feeding technique with a dumbbell-shaped slot, a truncated square patch with two U-shaped slots, and a metasurface layer. The proposed MIMO structures place four identical radiating elements like a $2\times2$ matrix with $90^\circ$ successive rotations to produce orthogonal electromagnetic waves, improving the isolation between ports. Six-millimeter spaces are added between these elements, and two vertical and horizontal strip slots are carved on the ground as the decoupling structure to decrease the mutual coupling. Simulation results show that Antenna\_{1}, Antenna\_{2}, and Antenna\_{3} achieve gain values of 6.2 to 9.4 dBi, 8.2 to 11.6 dBi, 6.2 to 9.5 dBi, below -35, -25, and -33 isolation and almost 10 dB diversity gain from 2.8 to 4.7 GHz, 2.8 to 4.5 GHz, and 2.7 to 4.9 GHz, respectively. As a prototype, Antenna\_{4} is manufactured, and measurements are performed. It achieves 6.28 to 10.45 dBi gain values, below -23 dB isolation, and 0.001 envelope correlation coefficient over 2.7 to 4.3 GHz. The results confirm that the proposed MIMO antennas are compatible with the 5G essential requisites. △ Less

Submitted 30 May, 2024; v1 submitted 26 August, 2023; originally announced September 2023.

Comments: 20 pages, 15 figures, and 3 Tables

arXiv:2305.06858 [pdf, ps, other]

Low-Complexity Multi-Antenna Coded Caching Using Location-Aware Placement Delivery Arrays

Authors: Hamidreza Bakhshzad Mahmoodi, MohammadJavad Salehi, Antti Tolli

Abstract: A location-aware multi-antenna coded caching scheme is proposed for applications with location-dependent data requests, such as wireless immersive experience, where users are immersed in a three-dimensional virtual world. The wireless connectivity conditions vary as the users move within the application area motivating the use of a non-uniform cache memory allocation process to avoid excessive del… ▽ More A location-aware multi-antenna coded caching scheme is proposed for applications with location-dependent data requests, such as wireless immersive experience, where users are immersed in a three-dimensional virtual world. The wireless connectivity conditions vary as the users move within the application area motivating the use of a non-uniform cache memory allocation process to avoid excessive delivery time for users located in wireless bottleneck areas. To this end, a location-aware placement and delivery array (LAPDA) is designed for cache-aided multiantenna data delivery with a fast converging, iterative linear beamforming process. The underlying weighted max-min transmit precoder design enables the proposed scheme to serve users in poor connectivity areas with smaller amounts of data while simultaneously delivering larger amounts to other users. Our new scheme is suitable for large networks due to its linear transceiver structure and it is not constrained by the number of users, cache size, or the number of antennas at the transmitter, unlike the existing schemes. Despite non-uniform cache placement, the proposed scheme still achieves a significant degree of coded caching gain that is additive to the multiplexing gain and greatly outperforms the conventional symmetric CC schemes in terms of both average and 95-percentile delivery time. △ Less

Submitted 9 July, 2023; v1 submitted 11 May, 2023; originally announced May 2023.

Comments: 13 pages and 8 figures

arXiv:2302.09244 [pdf, other]

Dual-Domain Self-Supervised Learning for Accelerated Non-Cartesian MRI Reconstruction

Authors: Bo Zhou, Jo Schlemper, Neel Dey, Seyed Sadegh Mohseni Salehi, Kevin Sheth, Chi Liu, James S. Duncan, Michal Sofka

Abstract: While enabling accelerated acquisition and improved reconstruction accuracy, current deep MRI reconstruction networks are typically supervised, require fully sampled data, and are limited to Cartesian sampling patterns. These factors limit their practical adoption as fully-sampled MRI is prohibitively time-consuming to acquire clinically. Further, non-Cartesian sampling patterns are particularly d… ▽ More While enabling accelerated acquisition and improved reconstruction accuracy, current deep MRI reconstruction networks are typically supervised, require fully sampled data, and are limited to Cartesian sampling patterns. These factors limit their practical adoption as fully-sampled MRI is prohibitively time-consuming to acquire clinically. Further, non-Cartesian sampling patterns are particularly desirable as they are more amenable to acceleration and show improved motion robustness. To this end, we present a fully self-supervised approach for accelerated non-Cartesian MRI reconstruction which leverages self-supervision in both k-space and image domains. In training, the undersampled data are split into disjoint k-space domain partitions. For the k-space self-supervision, we train a network to reconstruct the input undersampled data from both the disjoint partitions and from itself. For the image-level self-supervision, we enforce appearance consistency obtained from the original undersampled data and the two partitions. Experimental results on our simulated multi-coil non-Cartesian MRI dataset demonstrate that DDSS can generate high-quality reconstruction that approaches the accuracy of the fully supervised reconstruction, outperforming previous baseline methods. Finally, DDSS is shown to scale to highly challenging real-world clinical MRI reconstruction acquired on a portable low-field (0.064 T) MRI scanner with no data available for supervised training while demonstrating improved image quality as compared to traditional reconstruction, as determined by a radiologist study. △ Less

Submitted 18 February, 2023; originally announced February 2023.

Comments: 14 pages, 10 figures, published at Medical Image Analysis (MedIA)

arXiv:2301.10520 [pdf, other]

Ultra-NeRF: Neural Radiance Fields for Ultrasound Imaging

Authors: Magdalena Wysocki, Mohammad Farid Azampour, Christine Eilers, Benjamin Busam, Mehrdad Salehi, Nassir Navab

Abstract: We present a physics-enhanced implicit neural representation (INR) for ultrasound (US) imaging that learns tissue properties from overlap** US sweeps. Our proposed method leverages a ray-tracing-based neural rendering for novel view US synthesis. Recent publications demonstrated that INR models could encode a representation of a three-dimensional scene from a set of two-dimensional US frames. Ho… ▽ More We present a physics-enhanced implicit neural representation (INR) for ultrasound (US) imaging that learns tissue properties from overlap** US sweeps. Our proposed method leverages a ray-tracing-based neural rendering for novel view US synthesis. Recent publications demonstrated that INR models could encode a representation of a three-dimensional scene from a set of two-dimensional US frames. However, these models fail to consider the view-dependent changes in appearance and geometry intrinsic to US imaging. In our work, we discuss direction-dependent changes in the scene and show that a physics-inspired rendering improves the fidelity of US image synthesis. In particular, we demonstrate experimentally that our proposed method generates geometrically accurate B-mode images for regions with ambiguous representation owing to view-dependent differences of the US images. We conduct our experiments using simulated B-mode US sweeps of the liver and acquired US sweeps of a spine phantom tracked with a robotic arm. The experiments corroborate that our method generates US frames that enable consistent volume compounding from previously unseen views. To the best of our knowledge, the presented work is the first to address view-dependent US image synthesis using INR. △ Less

Submitted 11 April, 2023; v1 submitted 25 January, 2023; originally announced January 2023.

Comments: accepted for oral presentation at MIDL 2023 (https://openreview.net/forum?id=x4McMBwVyi)

arXiv:2301.00484 [pdf, other]

Federated Fog Computing for Remote Industry 4.0 Applications

Authors: Razin Farhan Hussain, Mohsen Amini Salehi

Abstract: Industry 4.0 operates based on IoT devices, sensors, and actuators, transforming the use of computing resources and software solutions in diverse sectors. Various Industry 4.0 latency-sensitive applications function based on machine learning to process sensor data for automation and other industrial activities. Sending sensor data to cloud systems is time consuming and detrimental to the latency c… ▽ More Industry 4.0 operates based on IoT devices, sensors, and actuators, transforming the use of computing resources and software solutions in diverse sectors. Various Industry 4.0 latency-sensitive applications function based on machine learning to process sensor data for automation and other industrial activities. Sending sensor data to cloud systems is time consuming and detrimental to the latency constraints of the applications, thus, fog computing is often deployed. Executing these applications across heterogeneous fog systems demonstrates stochastic execution time behavior that affects the task completion time. We investigate and model various Industry 4.0 ML-based applications' stochastic executions and analyze them. Industries like oil and gas are prone to disasters requiring coordination of various latency-sensitive activities. Hence, fog computing resources can get oversubscribed due to the surge in the computing demands during a disaster. We propose federating nearby fog computing systems and forming a fog federation to make remote Industry 4.0 sites resilient against the surge in computing demands. We propose a statistical resource allocation method across fog federation for latency-sensitive tasks. Many of the modern Industry 4.0 applications operate based on a workflow of micro-services that are used alone within an industrial site. As such, industry 4.0 solutions need to be aware of applications' architecture, particularly monolithic vs. micro-service. Therefore, we propose a probability-based resource allocation method that can partition micro-service workflows across fog federation to meet their latency constraints. Another concern in Industry 4.0 is the data privacy of the federated fog. As such, we propose a solution based on federated learning to train industrial ML applications across federated fog systems without compromising the data confidentiality. △ Less

Submitted 1 January, 2023; originally announced January 2023.

Comments: PhD Dissertation

arXiv:2207.08619 [pdf, other]

CACTUSS: Common Anatomical CT-US Space for US examinations

Authors: Yordanka Velikova, Walter Simson, Mehrdad Salehi, Mohammad Farid Azampour, Philipp Paprottka, Nassir Navab

Abstract: Abdominal aortic aneurysm (AAA) is a vascular disease in which a section of the aorta enlarges, weakening its walls and potentially rupturing the vessel. Abdominal ultrasound has been utilized for diagnostics, but due to its limited image quality and operator dependency, CT scans are usually required for monitoring and treatment planning. Recently, abdominal CT datasets have been successfully util… ▽ More Abdominal aortic aneurysm (AAA) is a vascular disease in which a section of the aorta enlarges, weakening its walls and potentially rupturing the vessel. Abdominal ultrasound has been utilized for diagnostics, but due to its limited image quality and operator dependency, CT scans are usually required for monitoring and treatment planning. Recently, abdominal CT datasets have been successfully utilized to train deep neural networks for automatic aorta segmentation. Knowledge gathered from this solved task could therefore be leveraged to improve US segmentation for AAA diagnosis and monitoring. To this end, we propose CACTUSS: a common anatomical CT-US space, which acts as a virtual bridge between CT and US modalities to enable automatic AAA screening sonography. CACTUSS makes use of publicly available labelled data to learn to segment based on an intermediary representation that inherits properties from both US and CT. We train a segmentation network in this new representation and employ an additional image-to-image translation network which enables our model to perform on real B-mode images. Quantitative comparisons against fully supervised methods demonstrate the capabilities of CACTUSS in terms of Dice Score and diagnostic metrics, showing that our method also meets the clinical requirements for AAA scanning and diagnosis. △ Less

Submitted 11 August, 2022; v1 submitted 18 July, 2022; originally announced July 2022.

arXiv:2206.15291 [pdf, other]

Sonification as a Reliable Alternative to Conventional Visual Surgical Navigation

Authors: Sasan Matinfar, Mehrdad Salehi, Daniel Suter, Matthias Seibold, Navid Navab, Shervin Dehghani, Florian Wanivenhaus, Philipp Fürnstahl, Mazda Farshad, Nassir Navab

Abstract: Despite the undeniable advantages of image-guided surgical assistance systems in terms of accuracy, such systems have not yet fully met surgeons' needs or expectations regarding usability, time efficiency, and their integration into the surgical workflow. On the other hand, perceptual studies have shown that presenting independent but causally correlated information via multimodal feedback involvi… ▽ More Despite the undeniable advantages of image-guided surgical assistance systems in terms of accuracy, such systems have not yet fully met surgeons' needs or expectations regarding usability, time efficiency, and their integration into the surgical workflow. On the other hand, perceptual studies have shown that presenting independent but causally correlated information via multimodal feedback involving different sensory modalities can improve task performance. This article investigates an alternative method for computer-assisted surgical navigation, introduces a novel sonification methodology for navigated pedicle screw placement, and discusses advanced solutions based on multisensory feedback. The proposed method comprises a novel sonification solution for alignment tasks in four degrees of freedom based on frequency modulation (FM) synthesis. We compared the resulting accuracy and execution time of the proposed sonification method with visual navigation, which is currently considered the state of the art. We conducted a phantom study in which 17 surgeons executed the pedicle screw placement task in the lumbar spine, guided by either the proposed sonification-based or the traditional visual navigation method. The results demonstrated that the proposed method is as accurate as the state of the art while decreasing the surgeon's need to focus on visual navigation displays instead of the natural focus on surgical tools and targeted anatomy during task execution. △ Less

Submitted 30 June, 2022; originally announced June 2022.

Comments: 19 pages, 7 figures

arXiv:2201.11611 [pdf, ps, other]

Asymmetric Coded Caching for Multi-Antenna Location-Dependent Content Delivery

Authors: Hamidreza Bakhshzad Mahmoodi, MohammadJavad Salehi, Antti Tölli

Abstract: Efficient usage of in-device storage and computation capabilities are key solutions to support data-intensive applications such as immersive digital experiences. This paper proposes a location-dependent multi-antenna coded caching -based content delivery scheme tailored specifically for wireless immersive viewing applications. First, a novel memory allocation process incentivizes the content relev… ▽ More Efficient usage of in-device storage and computation capabilities are key solutions to support data-intensive applications such as immersive digital experiences. This paper proposes a location-dependent multi-antenna coded caching -based content delivery scheme tailored specifically for wireless immersive viewing applications. First, a novel memory allocation process incentivizes the content relevant to the identified wireless bottleneck areas. This enables a trade-off between local and global caching gains and results in unequal fractions of location-dependent multimedia content cached by each user. Then, a novel packet generation process is carried out during the subsequent delivery phase, given the asymmetric cache placement. During this phase, the number of packets transmitted to each user is the same, while the sizes of the packets are proportional to the corresponding location-dependent cache ratios. In this regard, each user is served with location-specific content using joint multicast beamforming and a multi-rate modulation scheme that simultaneously benefits from global caching and spatial multiplexing gains. Numerical experiments and mathematical analysis demonstrate significant performance gains compared to the state-of-the-art. △ Less

Submitted 16 February, 2023; v1 submitted 27 January, 2022; originally announced January 2022.

Comments: 32 pages, 15 figures, journal paper. arXiv admin note: text overlap with arXiv:2102.02518

arXiv:2201.10776 [pdf, other]

DSFormer: A Dual-domain Self-supervised Transformer for Accelerated Multi-contrast MRI Reconstruction

Authors: Bo Zhou, Neel Dey, Jo Schlemper, Seyed Sadegh Mohseni Salehi, Chi Liu, James S. Duncan, Michal Sofka

Abstract: Multi-contrast MRI (MC-MRI) captures multiple complementary imaging modalities to aid in radiological decision-making. Given the need for lowering the time cost of multiple acquisitions, current deep accelerated MRI reconstruction networks focus on exploiting the redundancy between multiple contrasts. However, existing works are largely supervised with paired data and/or prohibitively expensive fu… ▽ More Multi-contrast MRI (MC-MRI) captures multiple complementary imaging modalities to aid in radiological decision-making. Given the need for lowering the time cost of multiple acquisitions, current deep accelerated MRI reconstruction networks focus on exploiting the redundancy between multiple contrasts. However, existing works are largely supervised with paired data and/or prohibitively expensive fully-sampled MRI sequences. Further, reconstruction networks typically rely on convolutional architectures which are limited in their capacity to model long-range interactions and may lead to suboptimal recovery of fine anatomical detail. To these ends, we present a dual-domain self-supervised transformer (DSFormer) for accelerated MC-MRI reconstruction. DSFormer develops a deep conditional cascade transformer (DCCT) consisting of several cascaded Swin transformer reconstruction networks (SwinRN) trained under two deep conditioning strategies to enable MC-MRI information sharing. We further present a dual-domain (image and k-space) self-supervised learning strategy for DCCT to alleviate the costs of acquiring fully sampled training data. DSFormer generates high-fidelity reconstructions which experimentally outperform current fully-supervised baselines. Moreover, we find that DSFormer achieves nearly the same performance when trained either with full supervision or with our proposed dual-domain self-supervision. △ Less

Submitted 16 August, 2022; v1 submitted 26 January, 2022; originally announced January 2022.

Comments: Accepted at WACV 2023

arXiv:2201.02639 [pdf, other]

MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound

Authors: Rowan Zellers, Jiasen Lu, Ximing Lu, Youngjae Yu, Yanpeng Zhao, Mohammadreza Salehi, Aditya Kusupati, Jack Hessel, Ali Farhadi, Ye** Choi

Abstract: As humans, we navigate a multimodal world, building a holistic understanding from all our senses. We introduce MERLOT Reserve, a model that represents videos jointly over time -- through a new training objective that learns from audio, subtitles, and video frames. Given a video, we replace snippets of text and audio with a MASK token; the model learns by choosing the correct masked-out snippet. Ou… ▽ More As humans, we navigate a multimodal world, building a holistic understanding from all our senses. We introduce MERLOT Reserve, a model that represents videos jointly over time -- through a new training objective that learns from audio, subtitles, and video frames. Given a video, we replace snippets of text and audio with a MASK token; the model learns by choosing the correct masked-out snippet. Our objective learns faster than alternatives, and performs well at scale: we pretrain on 20 million YouTube videos. Empirical results show that MERLOT Reserve learns strong multimodal representations. When finetuned, it sets state-of-the-art on Visual Commonsense Reasoning (VCR), TVQA, and Kinetics-600; outperforming prior work by 5%, 7%, and 1.5% respectively. Ablations show that these tasks benefit from audio pretraining -- even VCR, a QA task centered around images (without sound). Moreover, our objective enables out-of-the-box prediction, revealing strong multimodal commonsense understanding. In a fully zero-shot setting, our model obtains competitive results on four video tasks, even outperforming supervised approaches on the recently proposed Situated Reasoning (STAR) benchmark. We analyze why audio enables better vision-language representations, suggesting significant opportunities for future research. We conclude by discussing ethical and societal implications of multimodal pretraining. △ Less

Submitted 13 May, 2022; v1 submitted 7 January, 2022; originally announced January 2022.

Comments: CVPR 2022. Project page at https://rowanzellers.com/merlotreserve

arXiv:2106.00667 [pdf, other]

doi 10.1145/3479722.3480994

SoK: Oracles from the Ground Truth to Market Manipulation

Authors: Shayan Eskandari, Mehdi Salehi, Wanyun Catherine Gu, Jeremy Clark

Abstract: One fundamental limitation of blockchain-based smart contracts is that they execute in a closed environment. Thus, they only have access to data and functionality that is already on the blockchain, or is fed into the blockchain. Any interactions with the real world need to be mediated by a bridge service, which is called an oracle. As decentralized applications mature, oracles are playing an incre… ▽ More One fundamental limitation of blockchain-based smart contracts is that they execute in a closed environment. Thus, they only have access to data and functionality that is already on the blockchain, or is fed into the blockchain. Any interactions with the real world need to be mediated by a bridge service, which is called an oracle. As decentralized applications mature, oracles are playing an increasingly prominent role. With their evolution comes more attacks, necessitating greater attention to their trust model. In this systemization of knowledge paper (SoK), we dissect the design alternatives for oracles, showcase attacks, and discuss attack mitigation strategies. △ Less

Submitted 2 September, 2021; v1 submitted 1 June, 2021; originally announced June 2021.

Journal ref: 3rd ACM Conference on Advances in Financial Technologies (AFT '21), September 26--28, 2021, Arlington, VA, USA

arXiv:2105.02817 [pdf, other]

Holographic Transmitarray Antenna with linear Polarization in X band

Authors: Mahdi Salehi, Homayoon Oraizi

Abstract: In this paper, we present the design and demonstration of transmitarray antennas (TAs) based on the holographic technique for the first time. According to the holographic theory, the amplitudes and phases of electromagnetic waves can be recorded on a surface, and then they can be reconstructed independently. This concept is used to design single-beam and multi-beam linearly polarized holographic T… ▽ More In this paper, we present the design and demonstration of transmitarray antennas (TAs) based on the holographic technique for the first time. According to the holographic theory, the amplitudes and phases of electromagnetic waves can be recorded on a surface, and then they can be reconstructed independently. This concept is used to design single-beam and multi-beam linearly polarized holographic TAs without using any iterative optimization algorithms. Initially, a transmission impedance surface is analyzed and compared with the reflection one. Then, interferograms associated with the scalar admittance distribution are defined according to the number and direction of the radiation beams. After that, a transmission metasurface of dimensions equal to 0:26l0 is hired to design holographic TAs at 12 GHz. Several examples are provided to support the method. In the end, a linearly polarized circular aperture wideband holographic transmitarray antenna with a radius of 13.3 cm has been manufactured and tested. The antenna achieves 12.5% (11.4-12.9 GHz) 1-dB gain bandwidth and 23.8 dB maximum gain, leading to 21.46% aperture efficiency. △ Less

Submitted 6 October, 2023; v1 submitted 19 April, 2021; originally announced May 2021.

Comments: 11 Pages and 12 subfigures (31 figures) 1 Table

arXiv:2008.07220 [pdf, other]

Scoring the Terabit/s Goal:Broadband Connectivity in 6G

Authors: Nandana Rajatheva, Italo Atzeni, Simon Bicais, Emil Bjornson, Andre Bourdoux, Stefano Buzzi, Carmen D'Andrea, Jean-Baptiste Dore, Serhat Erkucuk, Manuel Fuentes, Ke Guan, Yuzhou Hu, Xiao**g Huang, Jari Hulkkonen, Josep Miquel Jornet, Marcos Katz, Behrooz Makki, Rickard Nilsson, Erdal Panayirci, Khaled Rabie, Nuwanthika Rajapaksha, MohammadJavad Salehi, Hadi Sarieddeen, Shahriar Shahabuddin, Tommy Svensson , et al. (4 additional authors not shown)

Abstract: This paper explores the road to vastly improving the broadband connectivity in future 6G wireless systems. Different categories of use cases are considered, with peak data rates up to 1 Tbps. Several categories of enablers at the infrastructure, spectrum, and protocol/algorithmic levels are required to realize the intended broadband connectivity goals in 6G. At the infrastructure level, we conside… ▽ More This paper explores the road to vastly improving the broadband connectivity in future 6G wireless systems. Different categories of use cases are considered, with peak data rates up to 1 Tbps. Several categories of enablers at the infrastructure, spectrum, and protocol/algorithmic levels are required to realize the intended broadband connectivity goals in 6G. At the infrastructure level, we consider ultra-massive MIMO technology (possibly implemented using holographic radio), intelligent reflecting surfaces, user-centric cell-free networking, integrated access and backhaul, and integrated space and terrestrial networks. At the spectrum level, the network must seamlessly utilize sub-6 GHz bands for coverage and spatial multiplexing of many devices, while higher bands will be mainly used for pushing the peak rates of point-to-point links. Finally, at the protocol/algorithmic level, the enablers include improved coding, modulation, and waveforms to achieve lower latency, higher reliability, and reduced complexity. △ Less

Submitted 21 February, 2021; v1 submitted 17 August, 2020; originally announced August 2020.

Comments: Submitted to IEEE Access. 51 pages,31 figures. arXiv admin note: text overlap with arXiv:2004.14247

arXiv:2004.14247 [pdf, other]

White Paper on Broadband Connectivity in 6G

Authors: Nandana Rajatheva, Italo Atzeni, Emil Bjornson, Andre Bourdoux, Stefano Buzzi, Jean-Baptiste Dore, Serhat Erkucuk, Manuel Fuentes, Ke Guan, Yuzhou Hu, Xiao**g Huang, Jari Hulkkonen, Josep Miquel Jornet, Marcos Katz, Rickard Nilsson, Erdal Panayirci, Khaled Rabie, Nuwanthika Rajapaksha, MohammadJavad Salehi, Hadi Sarieddeen, Tommy Svensson, Oskari Tervo, Antti Tolli, Qingqing Wu, Wen Xu

Abstract: This white paper explores the road to implementing broadband connectivity in future 6G wireless systems. Different categories of use cases are considered, from extreme capacity with peak data rates up to 1 Tbps, to raising the typical data rates by orders-of-magnitude, to support broadband connectivity at railway speeds up to 1000 km/h. To achieve these goals, not only the terrestrial networks wil… ▽ More This white paper explores the road to implementing broadband connectivity in future 6G wireless systems. Different categories of use cases are considered, from extreme capacity with peak data rates up to 1 Tbps, to raising the typical data rates by orders-of-magnitude, to support broadband connectivity at railway speeds up to 1000 km/h. To achieve these goals, not only the terrestrial networks will be evolved but they will also be integrated with satellite networks, all facilitating autonomous systems and various interconnected structures. We believe that several categories of enablers at the infrastructure, spectrum, and protocol/ algorithmic levels are required to realize the intended broadband connectivity goals in 6G. At the infrastructure level, we consider ultra-massive MIMO technology (possibly implemented using holographic radio), intelligent reflecting surfaces, user-centric and scalable cell-free networking, integrated access and backhaul, and integrated space and terrestrial networks. At the spectrum level, the network must seamlessly utilize sub-6 GHz bands for coverage and spatial multiplexing of many devices, while higher bands will be used for pushing the peak rates of point-to-point links. The latter path will lead to THz communications complemented by visible light communications in specific scenarios. At the protocol/algorithmic level, the enablers include improved coding, modulation, and waveforms to achieve lower latencies, higher reliability, and reduced complexity. Different options will be needed to optimally support different use cases. The resource efficiency can be further improved by using various combinations of full-duplex radios, interference management based on rate-splitting, machine-learning-based optimization, coded caching, and broadcasting. △ Less

Submitted 29 April, 2020; originally announced April 2020.

Comments: 46 pages, 13 figures

arXiv:2004.11839 [pdf, other]

Detecting Driver's Distraction using Long-term Recurrent Convolutional Network

Authors: Chang Wei Tan, Mahsa Salehi, Geoffrey Mackellar

Abstract: In this study we demonstrate a novel Brain Computer Interface (BCI) approach to detect driver distraction events to improve road safety. We use a commercial wireless headset that generates EEG signals from the brain. We collected real EEG signals from participants who undertook a 40-minute driving simulation and were required to perform different tasks while driving. These signals are segmented in… ▽ More In this study we demonstrate a novel Brain Computer Interface (BCI) approach to detect driver distraction events to improve road safety. We use a commercial wireless headset that generates EEG signals from the brain. We collected real EEG signals from participants who undertook a 40-minute driving simulation and were required to perform different tasks while driving. These signals are segmented into short windows and labelled using a time series classification (TSC) model. We studied different TSC approaches and designed a Long-term Recurrent Convolutional Network (LCRN) model for this task. Our results showed that our LRCN model performs better than the state of the art TSC models at detecting driver distraction events. △ Less

Submitted 13 April, 2020; originally announced April 2020.

Comments: 3 pages 2 figures

arXiv:2004.11206 [pdf]

Multi-level Binarized LSTM in EEG Classification for Wearable Devices

Authors: Najmeh Nazari, Seyed Ahmad Mirsalari, Sima Sinaei, Mostafa E. Salehi, Masoud Daneshtalab

Abstract: Long Short-Term Memory (LSTM) is widely used in various sequential applications. Complex LSTMs could be hardly deployed on wearable and resourced-limited devices due to the huge amount of computations and memory requirements. Binary LSTMs are introduced to cope with this problem, however, they lead to significant accuracy loss in some application such as EEG classification which is essential to be… ▽ More Long Short-Term Memory (LSTM) is widely used in various sequential applications. Complex LSTMs could be hardly deployed on wearable and resourced-limited devices due to the huge amount of computations and memory requirements. Binary LSTMs are introduced to cope with this problem, however, they lead to significant accuracy loss in some application such as EEG classification which is essential to be deployed in wearable devices. In this paper, we propose an efficient multi-level binarized LSTM which has significantly reduced computations whereas ensuring an accuracy pretty close to full precision LSTM. By deploying 5-level binarized weights and inputs, our method reduces area and delay of MAC operation about 31* and 27* in 65nm technology, respectively with less than 0.01% accuracy loss. In contrast to many compute-intensive deep-learning approaches, the proposed algorithm is lightweight, and therefore, brings performance efficiency with accurate LSTM-based EEG classification to real-time wearable devices. △ Less

Submitted 19 April, 2020; originally announced April 2020.

Comments: o appear in IEEE International Conference on Parallel, Distributed and Network-based Processing in 2020. arXiv admin note: text overlap with arXiv:1812.04818 by other authors

MSC Class: 03B05 ACM Class: I.2.6; B.8.2

arXiv:2004.08914 [pdf]

MuBiNN: Multi-Level Binarized Recurrent Neural Network for EEG signal Classification

Authors: Seyed Ahmad Mirsalari, Sima Sinaei, Mostafa E. Salehi, Masoud Daneshtalab

Abstract: Recurrent Neural Networks (RNN) are widely used for learning sequences in applications such as EEG classification. Complex RNNs could be hardly deployed on wearable devices due to their computation and memory-intensive processing patterns. Generally, reduction in precision leads much more efficiency and binarized RNNs are introduced as energy-efficient solutions. However, naive binarization method… ▽ More Recurrent Neural Networks (RNN) are widely used for learning sequences in applications such as EEG classification. Complex RNNs could be hardly deployed on wearable devices due to their computation and memory-intensive processing patterns. Generally, reduction in precision leads much more efficiency and binarized RNNs are introduced as energy-efficient solutions. However, naive binarization methods lead to significant accuracy loss in EEG classification. In this paper, we propose a multi-level binarized LSTM, which significantly reduces computations whereas ensuring an accuracy pretty close to the full precision LSTM. Our method reduces the delay of the 3-bit LSTM cell operation 47* with less than 0.01% accuracy loss. △ Less

Submitted 19 April, 2020; originally announced April 2020.

Comments: To appear in IEEE International Symposium on Circuits & Systems in 2020. arXiv admin note: text overlap with arXiv:1807.04093 by other authors

MSC Class: 03B05 ACM Class: I.2.6; B.8.2

arXiv:1909.11625 [pdf, other]

doi 10.1109/TMI.2020.2998600

Deep Predictive Motion Tracking in Magnetic Resonance Imaging: Application to Fetal Imaging

Authors: Ayush Singh, Seyed Sadegh Mohseni Salehi, Ali Gholipour

Abstract: Fetal magnetic resonance imaging (MRI) is challenged by uncontrollable, large, and irregular fetal movements. It is, therefore, performed through visual monitoring of fetal motion and repeated acquisitions to ensure diagnostic-quality images are acquired. Nevertheless, visual monitoring of fetal motion based on displayed slices, and navigation at the level of stacks-of-slices is inefficient. The c… ▽ More Fetal magnetic resonance imaging (MRI) is challenged by uncontrollable, large, and irregular fetal movements. It is, therefore, performed through visual monitoring of fetal motion and repeated acquisitions to ensure diagnostic-quality images are acquired. Nevertheless, visual monitoring of fetal motion based on displayed slices, and navigation at the level of stacks-of-slices is inefficient. The current process is highly operator-dependent, increases scanner usage and cost, and significantly increases the length of fetal MRI scans which makes them hard to tolerate for pregnant women. To help build automatic MRI motion tracking and navigation systems to overcome the limitations of the current process and improve fetal imaging, we have developed a new real time image-based motion tracking method based on deep learning that learns to predict fetal motion directly from acquired images. Our method is based on a recurrent neural network, composed of spatial and temporal encoder-decoders, that infers motion parameters from anatomical features extracted from sequences of acquired slices. We compared our trained network on held out test sets (including data with different characteristics, e.g. different fetuses scanned at different ages, and motion trajectories recorded from volunteer subjects) with networks designed for estimation as well as methods adopted to make predictions. The results show that our method outperformed alternative techniques, and achieved real-time performance with average errors of 3.5 and 8 degrees for the estimation and prediction tasks, respectively. Our real-time deep predictive motion tracking technique can be used to assess fetal movements, to guide slice acquisitions, and to build navigation systems for fetal MRI. △ Less

Submitted 6 June, 2020; v1 submitted 25 September, 2019; originally announced September 2019.

Comments: The article has been published in IEEE TMI: 14 pages, 11 figures, 2 tables and 1 supplementary https://github.com/bchimagine/DeepPredictiveMotionTracking

ACM Class: I.4.5

Showing 1–23 of 23 results for author: Salehi, M