-
Open-Source Conversational AI with SpeechBrain 1.0
Authors:
Mirco Ravanelli,
Titouan Parcollet,
Adel Moumen,
Sylvain de Langen,
Cem Subakan,
Peter Plantinga,
Yingzhi Wang,
Pooneh Mousavi,
Luca Della Libera,
Artem Ploujnikov,
Francesco Paissan,
Davide Borra,
Salah Zaiem,
Zeyu Zhao,
Shucong Zhang,
Georgios Karakasidis,
Sung-Lin Yeh,
Aku Rouhe,
Rudolf Braun,
Florian Mai,
Juan Zuluaga-Gomez,
Seyed Mahed Mousavi,
Andreas Nautsch,
Xuechen Liu,
Sangeet Sagar
, et al. (5 additional authors not shown)
Abstract:
SpeechBrain is an open-source Conversational AI toolkit based on PyTorch, focused particularly on speech processing tasks such as speech recognition, speech enhancement, speaker recognition, text-to-speech, and much more. It promotes transparency and replicability by releasing both the pre-trained models and the complete "recipes" of code and algorithms required for training them. This paper prese…
▽ More
SpeechBrain is an open-source Conversational AI toolkit based on PyTorch, focused particularly on speech processing tasks such as speech recognition, speech enhancement, speaker recognition, text-to-speech, and much more. It promotes transparency and replicability by releasing both the pre-trained models and the complete "recipes" of code and algorithms required for training them. This paper presents SpeechBrain 1.0, a significant milestone in the evolution of the toolkit, which now has over 200 recipes for speech, audio, and language processing tasks, and more than 100 models available on Hugging Face. SpeechBrain 1.0 introduces new technologies to support diverse learning modalities, Large Language Model (LLM) integration, and advanced decoding strategies, along with novel models, tasks, and modalities. It also includes a new benchmark repository, offering researchers a unified platform for evaluating models across diverse tasks
△ Less
Submitted 2 July, 2024; v1 submitted 29 June, 2024;
originally announced July 2024.
-
Should We Fine-Tune or RAG? Evaluating Different Techniques to Adapt LLMs for Dialogue
Authors:
Simone Alghisi,
Massimo Rizzoli,
Gabriel Roccabruna,
Seyed Mahed Mousavi,
Giuseppe Riccardi
Abstract:
We study the limitations of Large Language Models (LLMs) for the task of response generation in human-machine dialogue. Several techniques have been proposed in the literature for different dialogue types (e.g., Open-Domain). However, the evaluations of these techniques have been limited in terms of base LLMs, dialogue types and evaluation metrics. In this work, we extensively analyze different LL…
▽ More
We study the limitations of Large Language Models (LLMs) for the task of response generation in human-machine dialogue. Several techniques have been proposed in the literature for different dialogue types (e.g., Open-Domain). However, the evaluations of these techniques have been limited in terms of base LLMs, dialogue types and evaluation metrics. In this work, we extensively analyze different LLM adaptation techniques when applied to different dialogue types. We have selected two base LLMs, Llama-2 and Mistral, and four dialogue types Open-Domain, Knowledge-Grounded, Task-Oriented, and Question Answering. We evaluate the performance of in-context learning and fine-tuning techniques across datasets selected for each dialogue type. We assess the impact of incorporating external knowledge to ground the generation in both scenarios of Retrieval-Augmented Generation (RAG) and gold knowledge. We adopt consistent evaluation and explainability criteria for automatic metrics and human evaluation protocols. Our analysis shows that there is no universal best-technique for adapting large language models as the efficacy of each technique depends on both the base LLM and the specific type of dialogue. Last but not least, the assessment of the best adaptation technique should include human evaluation to avoid false expectations and outcomes derived from automatic metrics.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Gemini & Physical World: Large Language Models Can Estimate the Intensity of Earthquake Shaking from Multi-Modal Social Media Posts
Authors:
S. Mostafa Mousavi,
Marc Stogaitis,
Ta**der Gadh,
Richard M Allen,
Alexei Barski,
Robert Bosch,
Patrick Robertson,
Nivetha Thiruverahan,
Youngmin Cho,
Aman Raj
Abstract:
This paper presents a novel approach to extract scientifically valuable information about Earth's physical phenomena from unconventional sources, such as multi-modal social media posts. Employing a state-of-the-art large language model (LLM), Gemini 1.5 Pro (Reid et al. 2024), we estimate earthquake ground shaking intensity from these unstructured posts. The model's output, in the form of Modified…
▽ More
This paper presents a novel approach to extract scientifically valuable information about Earth's physical phenomena from unconventional sources, such as multi-modal social media posts. Employing a state-of-the-art large language model (LLM), Gemini 1.5 Pro (Reid et al. 2024), we estimate earthquake ground shaking intensity from these unstructured posts. The model's output, in the form of Modified Mercalli Intensity (MMI) values, aligns well with independent observational data. Furthermore, our results suggest that LLMs, trained on vast internet data, may have developed a unique understanding of physical phenomena. Specifically, Google's Gemini models demonstrate a simplified understanding of the general relationship between earthquake magnitude, distance, and MMI intensity, accurately describing observational data even though it's not identical to established models. These findings raise intriguing questions about the extent to which Gemini's training has led to a broader understanding of the physical world and its phenomena. The ability of Generative AI models like Gemini to generate results consistent with established scientific knowledge highlights their potential to augment our understanding of complex physical phenomena like earthquakes. The flexible and effective approach proposed in this study holds immense potential for enriching our understanding of the impact of physical phenomena and improving resilience during natural disasters. This research is a significant step toward harnessing the power of social media and AI for natural disaster mitigation, opening new avenues for understanding the emerging capabilities of Generative AI and LLMs for scientific applications.
△ Less
Submitted 14 June, 2024; v1 submitted 28 May, 2024;
originally announced May 2024.
-
DyKnow:Dynamically Verifying Time-Sensitive Factual Knowledge in LLMs
Authors:
Seyed Mahed Mousavi,
Simone Alghisi,
Giuseppe Riccardi
Abstract:
LLMs acquire knowledge from massive data snapshots collected at different timestamps. Their knowledge is then commonly evaluated using static benchmarks. However, factual knowledge is generally subject to time-sensitive changes, and static benchmarks cannot address those cases. We present an approach to dynamically evaluate the knowledge in LLMs and their time-sensitiveness against Wikidata, a pub…
▽ More
LLMs acquire knowledge from massive data snapshots collected at different timestamps. Their knowledge is then commonly evaluated using static benchmarks. However, factual knowledge is generally subject to time-sensitive changes, and static benchmarks cannot address those cases. We present an approach to dynamically evaluate the knowledge in LLMs and their time-sensitiveness against Wikidata, a publicly available up-to-date knowledge graph. We evaluate the time-sensitive knowledge in twenty-four private and open-source LLMs, as well as the effectiveness of four editing methods in updating the outdated facts. Our results show that 1) outdatedness is a critical problem across state-of-the-art LLMs; 2) LLMs output inconsistent answers when prompted with slight variations of the question prompt; and 3) the performance of the state-of-the-art knowledge editing algorithms is very limited, as they can not reduce the cases of outdatedness and output inconsistency.
△ Less
Submitted 12 June, 2024; v1 submitted 10 April, 2024;
originally announced April 2024.
-
Are LLMs Robust for Spoken Dialogues?
Authors:
Seyed Mahed Mousavi,
Gabriel Roccabruna,
Simone Alghisi,
Massimo Rizzoli,
Mirco Ravanelli,
Giuseppe Riccardi
Abstract:
Large Pre-Trained Language Models have demonstrated state-of-the-art performance in different downstream tasks, including dialogue state tracking and end-to-end response generation. Nevertheless, most of the publicly available datasets and benchmarks on task-oriented dialogues focus on written conversations. Consequently, the robustness of the developed models to spoken interactions is unknown. In…
▽ More
Large Pre-Trained Language Models have demonstrated state-of-the-art performance in different downstream tasks, including dialogue state tracking and end-to-end response generation. Nevertheless, most of the publicly available datasets and benchmarks on task-oriented dialogues focus on written conversations. Consequently, the robustness of the developed models to spoken interactions is unknown. In this work, we have evaluated the performance of LLMs for spoken task-oriented dialogues on the DSTC11 test sets. Due to the lack of proper spoken dialogue datasets, we have automatically transcribed a development set of spoken dialogues with a state-of-the-art ASR engine. We have characterized the ASR-error types and their distributions and simulated these errors in a large dataset of dialogues. We report the intrinsic (perplexity) and extrinsic (human evaluation) performance of fine-tuned GPT-2 and T5 models in two subtasks of response generation and dialogue state tracking, respectively. The results show that LLMs are not robust to spoken noise by default, however, fine-tuning/training such models on a proper dataset of spoken TODs can result in a more robust performance.
△ Less
Submitted 4 January, 2024;
originally announced January 2024.
-
MRI brain tumor segmentation using informative feature vectors and kernel dictionary learning
Authors:
Seyedeh Mahya Mousavi,
Mohammad Mostafavi
Abstract:
This paper presents a method based on a kernel dictionary learning algorithm for segmenting brain tumor regions in magnetic resonance images (MRI). A set of first-order and second-order statistical feature vectors are extracted from patches of size 3 * 3 around pixels in the brain MRI scans. These feature vectors are utilized to train two kernel dictionaries separately for healthy and tumorous tis…
▽ More
This paper presents a method based on a kernel dictionary learning algorithm for segmenting brain tumor regions in magnetic resonance images (MRI). A set of first-order and second-order statistical feature vectors are extracted from patches of size 3 * 3 around pixels in the brain MRI scans. These feature vectors are utilized to train two kernel dictionaries separately for healthy and tumorous tissues. To enhance the efficiency of the dictionaries and reduce training time, a correlation-based sample selection technique is developed to identify the most informative and discriminative subset of feature vectors. This technique aims to improve the performance of the dictionaries by selecting a subset of feature vectors that provide valuable information for the segmentation task. Subsequently, a linear classifier is utilized to distinguish between healthy and unhealthy pixels based on the learned dictionaries. The results demonstrate that the proposed method outperforms other existing methods in terms of segmentation accuracy and significantly reduces both the time and memory required, resulting in a remarkably fast training process.
△ Less
Submitted 16 October, 2023;
originally announced October 2023.
-
The Way We Were: Structural Operational Semantics Research in Perspective
Authors:
Luca Aceto,
Pierluigi Crescenzi,
Anna Ingólfsdóttir,
Mohammad Reza Mousavi
Abstract:
This position paper on the (meta-)theory of Structural Operational Semantic (SOS) is motivated by the following two questions: (1) Is the (meta-)theory of SOS dying out as a research field? (2) If so, is it possible to rejuvenate this field with a redefined purpose?
In this article, we will consider possible answers to those questions by first analysing the history of the EXPRESS/SOS workshops…
▽ More
This position paper on the (meta-)theory of Structural Operational Semantic (SOS) is motivated by the following two questions: (1) Is the (meta-)theory of SOS dying out as a research field? (2) If so, is it possible to rejuvenate this field with a redefined purpose?
In this article, we will consider possible answers to those questions by first analysing the history of the EXPRESS/SOS workshops and the data concerning the authors and the presentations featured in the editions of those workshops as well as their subject matters.
The results of our quantitative and qualitative analyses all indicate a diminishing interest in the theory of SOS as a field of research. Even though `all good things must come to an end', we strive to finish this position paper on an upbeat note by addressing our second motivating question with some optimism. To this end, we use our personal reflections and an analysis of recent trends in two of the flagship conferences in the field of Programming Languages (namely POPL and PDLI) to draw some conclusions on possible future directions that may rejuvenate research on the (meta-)theory of SOS. We hope that our musings will entice members of the research community to breathe new life into a field of research that has been kind to three of the authors of this article.
△ Less
Submitted 13 September, 2023;
originally announced September 2023.
-
Bees Local Phase Quantization Feature Selection for RGB-D Facial Expressions Recognition
Authors:
Seyed Muhammad Hossein Mousavi,
Atiye Ilanloo
Abstract:
Feature selection could be defined as an optimization problem and solved by bio-inspired algorithms. Bees Algorithm (BA) shows decent performance in feature selection optimization tasks. On the other hand, Local Phase Quantization (LPQ) is a frequency domain feature which has excellent performance on Depth images. Here, after extracting LPQ features out of RGB (colour) and Depth images from the Ir…
▽ More
Feature selection could be defined as an optimization problem and solved by bio-inspired algorithms. Bees Algorithm (BA) shows decent performance in feature selection optimization tasks. On the other hand, Local Phase Quantization (LPQ) is a frequency domain feature which has excellent performance on Depth images. Here, after extracting LPQ features out of RGB (colour) and Depth images from the Iranian Kinect Face Database (IKFDB), the Bees feature selection algorithm applies to select the desired number of features for final classification tasks. IKFDB is recorded with Kinect sensor V.2 and contains colour and depth images for facial and facial micro-expressions recognition purposes. Here five facial expressions of Anger, Joy, Surprise, Disgust and Fear are used for final validation. The proposed Bees LPQ method is compared with Particle Swarm Optimization (PSO) LPQ, PCA LPQ, Lasso LPQ, and just LPQ features for classification tasks with Support Vector Machines (SVM), K-Nearest Neighbourhood (KNN), Shallow Neural Network and Ensemble Subspace KNN. Returned results, show a decent performance of the proposed algorithm (99 % accuracy) in comparison with others.
△ Less
Submitted 3 August, 2023;
originally announced August 2023.
-
Introduction to Facial Micro Expressions Analysis Using Color and Depth Images: A Matlab Coding Approach (Second Edition, 2023)
Authors:
Seyed Muhammad Hossein Mousavi
Abstract:
The book attempts to introduce a gentle introduction to the field of Facial Micro Expressions Recognition (FMER) using Color and Depth images, with the aid of MATLAB programming environment. FMER is a subset of image processing and it is a multidisciplinary topic to analysis. So, it requires familiarity with other topics of Artifactual Intelligence (AI) such as machine learning, digital image proc…
▽ More
The book attempts to introduce a gentle introduction to the field of Facial Micro Expressions Recognition (FMER) using Color and Depth images, with the aid of MATLAB programming environment. FMER is a subset of image processing and it is a multidisciplinary topic to analysis. So, it requires familiarity with other topics of Artifactual Intelligence (AI) such as machine learning, digital image processing, psychology and more. So, it is a great opportunity to write a book which covers all of these topics for beginner to professional readers in the field of AI and even without having background of AI. Our goal is to provide a standalone introduction in the field of MFER analysis in the form of theorical descriptions for readers with no background in image processing with reproducible Matlab practical examples. Also, we describe any basic definitions for FMER analysis and MATLAB library which is used in the text, that helps final reader to apply the experiments in the real-world applications. We believe that this book is suitable for students, researchers, and professionals alike, who need to develop practical skills, along with a basic understanding of the field. We expect that, after reading this book, the reader feels comfortable with different key stages such as color and depth image processing, color and depth image representation, classification, machine learning, facial micro-expressions recognition, feature extraction and dimensionality reduction. The book attempts to introduce a gentle introduction to the field of Facial Micro Expressions Recognition (FMER) using Color and Depth images, with the aid of MATLAB programming environment.
△ Less
Submitted 19 June, 2023;
originally announced July 2023.
-
Data Coverage for Detecting Representation Bias in Image Datasets: A Crowdsourcing Approach
Authors:
Melika Mousavi,
Nima Shahbazi,
Abolfazl Asudeh
Abstract:
Existing machine learning models have proven to fail when it comes to their performance for minority groups, mainly due to biases in data. In particular, datasets, especially social data, are often not representative of minorities. In this paper, we consider the problem of representation bias identification on image datasets without explicit attribute values. Using the notion of data coverage for…
▽ More
Existing machine learning models have proven to fail when it comes to their performance for minority groups, mainly due to biases in data. In particular, datasets, especially social data, are often not representative of minorities. In this paper, we consider the problem of representation bias identification on image datasets without explicit attribute values. Using the notion of data coverage for detecting a lack of representation, we develop multiple crowdsourcing approaches. Our core approach, at a high level, is a divide and conquer algorithm that applies a search space pruning strategy to efficiently identify if a dataset misses proper coverage for a given group. We provide a different theoretical analysis of our algorithm, including a tight upper bound on its performance which guarantees its near-optimality. Using this algorithm as the core, we propose multiple heuristics to reduce the coverage detection cost across different cases with multiple intersectional/non-intersectional groups. We demonstrate how the pre-trained predictors are not reliable and hence not sufficient for detecting representation bias in the data. Finally, we adjust our core algorithm to utilize existing models for predicting image group(s) to minimize the coverage identification cost. We conduct extensive experiments, including live experiments on Amazon Mechanical Turk to validate our problem and evaluate our algorithms' performance.
△ Less
Submitted 24 June, 2023;
originally announced June 2023.
-
Understanding Emotion Valence is a Joint Deep Learning Task
Authors:
Gabriel Roccabruna,
Seyed Mahed Mousavi,
Giuseppe Riccardi
Abstract:
The valence analysis of speakers' utterances or written posts helps to understand the activation and variations of the emotional state throughout the conversation. More recently, the concept of Emotion Carriers (EC) has been introduced to explain the emotion felt by the speaker and its manifestations. In this work, we investigate the natural inter-dependency of valence and ECs via a multi-task lea…
▽ More
The valence analysis of speakers' utterances or written posts helps to understand the activation and variations of the emotional state throughout the conversation. More recently, the concept of Emotion Carriers (EC) has been introduced to explain the emotion felt by the speaker and its manifestations. In this work, we investigate the natural inter-dependency of valence and ECs via a multi-task learning approach. We experiment with Pre-trained Language Models (PLM) for single-task, two-step, and joint settings for the valence and EC prediction tasks. We compare and evaluate the performance of generative (GPT-2) and discriminative (BERT) architectures in each setting. We observed that providing the ground truth label of one task improves the prediction performance of the models in the other task. We further observed that the discriminative model achieves the best trade-off of valence and EC prediction tasks in the joint prediction setting. As a result, we attain a single model that performs both tasks, thus, saving computation resources at training and inference times.
△ Less
Submitted 31 October, 2023; v1 submitted 27 May, 2023;
originally announced May 2023.
-
Response Generation in Longitudinal Dialogues: Which Knowledge Representation Helps?
Authors:
Seyed Mahed Mousavi,
Simone Caldarella,
Giuseppe Riccardi
Abstract:
Longitudinal Dialogues (LD) are the most challenging type of conversation for human-machine dialogue systems. LDs include the recollections of events, personal thoughts, and emotions specific to each individual in a sparse sequence of dialogue sessions. Dialogue systems designed for LDs should uniquely interact with the users over multiple sessions and long periods of time (e.g. weeks), and engage…
▽ More
Longitudinal Dialogues (LD) are the most challenging type of conversation for human-machine dialogue systems. LDs include the recollections of events, personal thoughts, and emotions specific to each individual in a sparse sequence of dialogue sessions. Dialogue systems designed for LDs should uniquely interact with the users over multiple sessions and long periods of time (e.g. weeks), and engage them in personal dialogues to elaborate on their feelings, thoughts, and real-life events. In this paper, we study the task of response generation in LDs. We evaluate whether general-purpose Pre-trained Language Models (PLM) are appropriate for this purpose. We fine-tune two PLMs, GePpeTto (GPT-2) and iT5, using a dataset of LDs. We experiment with different representations of the personal knowledge extracted from LDs for grounded response generation, including the graph representation of the mentioned events and participants. We evaluate the performance of the models via automatic metrics and the contribution of the knowledge via the Integrated Gradients technique. We categorize the natural language generation errors via human evaluations of contextualization, appropriateness and engagement of the user.
△ Less
Submitted 25 May, 2023;
originally announced May 2023.
-
Victoria Amazonica Optimization (VAO): An Algorithm Inspired by the Giant Water Lily Plant
Authors:
Seyed Muhammad Hossein Mousavi
Abstract:
The Victoria Amazonica plant, often known as the Giant Water Lily, has the largest floating spherical leaf in the world, with a maximum leaf diameter of 3 meters. It spreads its leaves by the force of its spines and creates a large shadow underneath, killing any plants that require sunlight. These water tyrants use their formidable spines to compel each other to the surface and increase their stre…
▽ More
The Victoria Amazonica plant, often known as the Giant Water Lily, has the largest floating spherical leaf in the world, with a maximum leaf diameter of 3 meters. It spreads its leaves by the force of its spines and creates a large shadow underneath, killing any plants that require sunlight. These water tyrants use their formidable spines to compel each other to the surface and increase their strength to grab more space from the surface. As they spread throughout the pond or basin, with the earliest-growing leaves having more room to grow, each leaf gains a unique size. Its flowers are transsexual and when they bloom, Cyclocephala beetles are responsible for the pollination process, being attracted to the scent of the female flower. After entering the flower, the beetle becomes covered with pollen and transfers it to another flower for fertilization. After the beetle leaves, the flower turns into a male and changes color from white to pink. The male flower dies and sinks into the water, releasing its seed to help create a new generation. In this paper, the mathematical life cycle of this magnificent plant is introduced, and each leaf and blossom are treated as a single entity. The proposed bio-inspired algorithm is tested with 24 benchmark optimization test functions, such as Ackley, and compared to ten other famous algorithms, including the Genetic Algorithm. The proposed algorithm is tested on 10 optimization problems: Minimum Spanning Tree, Hub Location Allocation, Quadratic Assignment, Clustering, Feature Selection, Regression, Economic Dispatching, Parallel Machine Scheduling, Color Quantization, and Image Segmentation and compared to traditional and bio-inspired algorithms. Overall, the performance of the algorithm in all tasks is satisfactory.
△ Less
Submitted 22 January, 2023;
originally announced March 2023.
-
Whats New? Identifying the Unfolding of New Events in Narratives
Authors:
Seyed Mahed Mousavi,
Shohei Tanaka,
Gabriel Roccabruna,
Koichiro Yoshino,
Satoshi Nakamura,
Giuseppe Riccardi
Abstract:
Narratives include a rich source of events unfolding over time and context. Automatic understanding of these events provides a summarised comprehension of the narrative for further computation (such as reasoning). In this paper, we study the Information Status (IS) of the events and propose a novel challenging task: the automatic identification of new events in a narrative. We define an event as a…
▽ More
Narratives include a rich source of events unfolding over time and context. Automatic understanding of these events provides a summarised comprehension of the narrative for further computation (such as reasoning). In this paper, we study the Information Status (IS) of the events and propose a novel challenging task: the automatic identification of new events in a narrative. We define an event as a triplet of subject, predicate, and object. The event is categorized as new with respect to the discourse context and whether it can be inferred through commonsense reasoning. We annotated a publicly available corpus of narratives with the new events at sentence level using human annotators. We present the annotation protocol and study the quality of the annotation and the difficulty of the task. We publish the annotated dataset, annotation materials, and machine learning baseline models for the task of new event extraction for narrative understanding.
△ Less
Submitted 8 August, 2023; v1 submitted 15 February, 2023;
originally announced February 2023.
-
Neural Gas Network Image Features and Segmentation for Brain Tumor Detection Using Magnetic Resonance Imaging Data
Authors:
S. Muhammad Hossein Mousavi
Abstract:
Accurate detection of brain tumors could save lots of lives and increasing the accuracy of this binary classification even as much as a few percent has high importance. Neural Gas Networks (NGN) is a fast, unsupervised algorithm that could be used in data clustering, image pattern recognition, and image segmentation. In this research, we used the metaheuristic Firefly Algorithm (FA) for image cont…
▽ More
Accurate detection of brain tumors could save lots of lives and increasing the accuracy of this binary classification even as much as a few percent has high importance. Neural Gas Networks (NGN) is a fast, unsupervised algorithm that could be used in data clustering, image pattern recognition, and image segmentation. In this research, we used the metaheuristic Firefly Algorithm (FA) for image contrast enhancement as pre-processing and NGN weights for feature extraction and segmentation of Magnetic Resonance Imaging (MRI) data on two brain tumor datasets from the Kaggle platform. Also, tumor classification is conducted by Support Vector Machine (SVM) classification algorithms and compared with a deep learning technique plus other features in train and test phases. Additionally, NGN tumor segmentation is evaluated by famous performance metrics such as Accuracy, F-measure, Jaccard, and more versus ground truth data and compared with traditional segmentation techniques. The proposed method is fast and precise in both tasks of tumor classification and segmentation compared with other methods. A classification accuracy of 95.14 % and segmentation accuracy of 0.977 is achieved by the proposed method.
△ Less
Submitted 28 January, 2023;
originally announced January 2023.
-
Enabling Heterogeneous Domain Adaptation in Multi-inhabitants Smart Home Activity Learning
Authors:
Md Mahmudur Rahman,
Mahta Mousavi,
Peri Tarr,
Mohammad Arif Ul Alam
Abstract:
Domain adaptation for sensor-based activity learning is of utmost importance in remote health monitoring research. However, many domain adaptation algorithms suffer with failure to operate adaptation in presence of target domain heterogeneity (which is always present in reality) and presence of multiple inhabitants dramatically hinders their generalizability producing unsatisfactory results for se…
▽ More
Domain adaptation for sensor-based activity learning is of utmost importance in remote health monitoring research. However, many domain adaptation algorithms suffer with failure to operate adaptation in presence of target domain heterogeneity (which is always present in reality) and presence of multiple inhabitants dramatically hinders their generalizability producing unsatisfactory results for semi-supervised and unseen activity learning tasks. We propose \emph{AEDA}, a novel deep auto-encoder-based model to enable semi-supervised domain adaptation in the existence of target domain heterogeneity and how to incorporate it to empower heterogeneity to any homogeneous deep domain adaptation architecture for cross-domain activity learning. Experimental evaluation on 18 different heterogeneous and multi-inhabitants use-cases of 8 different domains created from 2 publicly available human activity datasets (wearable and ambient smart homes) shows that \emph{AEDA} outperforms (max. 12.8\% and 8.9\% improvements for ambient smart home and wearables) over existing domain adaptation techniques for both seen and unseen activity learning in a heterogeneous setting.
△ Less
Submitted 17 October, 2022;
originally announced October 2022.
-
Adaptive Behavioral Model Learning for Software Product Lines
Authors:
Shaghayegh Tavassoli,
Carlos Diego Nascimento Damasceno,
Ramtin Khosravi,
Mohammad Reza Mousavi
Abstract:
Behavioral models enable the analysis of the functionality of software product lines (SPL), e.g., model checking and model-based testing. Model learning aims at constructing behavioral models for software systems in some form of a finite state machine. Due to the commonalities among the products of an SPL, it is possible to reuse the previously learned models during the model learning process. In…
▽ More
Behavioral models enable the analysis of the functionality of software product lines (SPL), e.g., model checking and model-based testing. Model learning aims at constructing behavioral models for software systems in some form of a finite state machine. Due to the commonalities among the products of an SPL, it is possible to reuse the previously learned models during the model learning process. In this paper, an adaptive approach (the $\text{PL}^*$ method) for learning the product models of an SPL is presented based on the well-known $L^*$ algorithm. In this method, after model learning of each product, the sequences in the final observation table are stored in a repository which will be used to initialize the observation table of the remaining products to be learned. The proposed algorithm is evaluated on two open-source SPLs and the total learning cost is measured in terms of the number of rounds, the total number of resets and input symbols. The results show that for complex SPLs, the total learning cost for the $\text{PL}^*$ method is significantly lower than that of the non-adaptive learning method in terms of all three metrics. Furthermore, it is observed that the order in which the products are learned affects the efficiency of the $\text{PL}^*$ method. Based on this observation, we introduced a heuristic to determine an ordering which reduces the total cost of adaptive learning in both case studies.
△ Less
Submitted 1 August, 2022; v1 submitted 11 July, 2022;
originally announced July 2022.
-
On Specifying for Trustworthiness
Authors:
Dhaminda B. Abeywickrama,
Amel Bennaceur,
Greg Chance,
Yiannis Demiris,
Anastasia Kordoni,
Mark Levine,
Luke Moffat,
Luc Moreau,
Mohammad Reza Mousavi,
Bashar Nuseibeh,
Subramanian Ramamoorthy,
Jan Oliver Ringert,
James Wilson,
Shane Windsor,
Kerstin Eder
Abstract:
As autonomous systems (AS) increasingly become part of our daily lives, ensuring their trustworthiness is crucial. In order to demonstrate the trustworthiness of an AS, we first need to specify what is required for an AS to be considered trustworthy. This roadmap paper identifies key challenges for specifying for trustworthiness in AS, as identified during the "Specifying for Trustworthiness" work…
▽ More
As autonomous systems (AS) increasingly become part of our daily lives, ensuring their trustworthiness is crucial. In order to demonstrate the trustworthiness of an AS, we first need to specify what is required for an AS to be considered trustworthy. This roadmap paper identifies key challenges for specifying for trustworthiness in AS, as identified during the "Specifying for Trustworthiness" workshop held as part of the UK Research and Innovation (UKRI) Trustworthy Autonomous Systems (TAS) programme. We look across a range of AS domains with consideration of the resilience, trust, functionality, verifiability, security, and governance and regulation of AS and identify some of the key specification challenges in these domains. We then highlight the intellectual challenges that are involved with specifying for trustworthiness in AS that cut across domains and are exacerbated by the inherent uncertainty involved with the environments in which AS need to operate.
△ Less
Submitted 20 August, 2023; v1 submitted 22 June, 2022;
originally announced June 2022.
-
Comparative analysis of machine learning and numerical modeling for combined heat transfer in Polymethylmethacrylate
Authors:
Mahsa Dehghan Manshadi,
Nima Alafchi,
Alireza Taat,
Milad Mousavi,
Amir Mosavi
Abstract:
This study compares different methods to predict the simultaneous effects of conductive and radiative heat transfer in a Polymethylmethacrylate (PMMA) sample. PMMA is a kind of polymer utilized in various sensors and actuator devices. One-dimensional combined heat transfer is considered in numerical analysis. Computer implementation was obtained for the numerical solution of governing equation wit…
▽ More
This study compares different methods to predict the simultaneous effects of conductive and radiative heat transfer in a Polymethylmethacrylate (PMMA) sample. PMMA is a kind of polymer utilized in various sensors and actuator devices. One-dimensional combined heat transfer is considered in numerical analysis. Computer implementation was obtained for the numerical solution of governing equation with the implicit finite difference method in the case of discretization. Kirchhoff transformation was used to get data from a non-linear equation of conductive heat transfer by considering monochromatic radiation intensity and temperature conditions applied to the PMMA sample boundaries. For Deep Neural Network (DNN) method, the novel Long Short Term Memory (LSTM) method was introduced to find accurate results in the least processing time than the numerical method. A recent study derived the combined heat transfers and their temperature profiles for the PMMA sample. Furthermore, the transient temperature profile is validated by another study. A comparison proves a perfect agreement. It shows the temperature gradient in the primary positions that makes a spectral amount of conductive heat transfer from a PMMA sample. It is more straightforward when they are compared with the novel DNN method. Results demonstrate that this artificial intelligence method is accurate and fast in predicting problems. By analyzing the results from the numerical solution it can be understood that the conductive and radiative heat flux is similar in the case of gradient behavior, but it is also twice in its amount approximately. Hence, total heat flux has a constant value in an approximated steady state condition. In addition to analyzing their composition, ROC curve and confusion matrix were implemented to evaluate the algorithm performance.
△ Less
Submitted 12 April, 2022;
originally announced April 2022.
-
A Benchmark for Active Learning of Variability-Intensive Systems
Authors:
Shaghayegh Tavassoli,
Carlos Diego Nascimento Damasceno,
Mohammad Reza Mousavi,
Ramtin Khosravi
Abstract:
Behavioral models are the key enablers for behavioral analysis of Software Product Lines (SPL), including testing and model checking. Active model learning comes to the rescue when family behavioral models are non-existent or outdated. A key challenge on active model learning is to detect commonalities and variability efficiently and combine them into concise family models. Benchmarks and their as…
▽ More
Behavioral models are the key enablers for behavioral analysis of Software Product Lines (SPL), including testing and model checking. Active model learning comes to the rescue when family behavioral models are non-existent or outdated. A key challenge on active model learning is to detect commonalities and variability efficiently and combine them into concise family models. Benchmarks and their associated metrics will play a key role in sha** the research agenda in this promising field and provide an effective means for comparing and identifying relative strengths and weaknesses in the forthcoming techniques. In this challenge, we seek benchmarks to evaluate the efficiency (e.g., learning time and memory footprint) and effectiveness (e.g., conciseness and accuracy of family models) of active model learning methods in the software product line context. These benchmark sets must contain the structural and behavioral variability models of at least one SPL. Each SPL in a benchmark must contain products that requires more than one round of model learning with respect to the basic active learning $L^{*}$ algorithm. Alternatively, tools supporting the synthesis of artificial benchmark models are also welcome.
△ Less
Submitted 10 March, 2022;
originally announced March 2022.
-
Spectrally Adaptive Common Spatial Patterns
Authors:
Mahta Mousavi,
Eric Lybrand,
Shuangquan Feng,
Shuai Tang,
Rayan Saab,
Virginia de Sa
Abstract:
The method of Common Spatial Patterns (CSP) is widely used for feature extraction of electroencephalography (EEG) data, such as in motor imagery brain-computer interface (BCI) systems. It is a data-driven method estimating a set of spatial filters so that the power of the filtered EEG signal is maximized for one motor imagery class and minimized for the other. This method, however, is prone to ove…
▽ More
The method of Common Spatial Patterns (CSP) is widely used for feature extraction of electroencephalography (EEG) data, such as in motor imagery brain-computer interface (BCI) systems. It is a data-driven method estimating a set of spatial filters so that the power of the filtered EEG signal is maximized for one motor imagery class and minimized for the other. This method, however, is prone to overfitting and is known to suffer from poor generalization especially with limited calibration data. Additionally, due to the high heterogeneity in brain data and the non-stationarity of brain activity, CSP is usually trained for each user separately resulting in long calibration sessions or frequent re-calibrations that are tiring for the user. In this work, we propose a novel algorithm called Spectrally Adaptive Common Spatial Patterns (SACSP) that improves CSP by learning a temporal/spectral filter for each spatial filter so that the spatial filters are concentrated on the most relevant temporal frequencies for each user. We show the efficacy of SACSP in providing better generalizability and higher classification accuracy from calibration to online control compared to existing methods. Furthermore, we show that SACSP provides neurophysiologically relevant information about the temporal frequencies of the filtered signals. Our results highlight the differences in the motor imagery signal among BCI users as well as spectral differences in the signals generated for each class, and show the importance of learning robust user-specific features in a data-driven manner.
△ Less
Submitted 9 February, 2022;
originally announced February 2022.
-
SURENA IV: Towards A Cost-effective Full-size Humanoid Robot for Real-world Scenarios
Authors:
Aghil Yousefi-Koma,
Behnam Maleki,
Hessam Maleki,
Amin Amani,
Mohammad Ali Bazrafshani,
Hossein Keshavarz,
Ala Iranmanesh,
Alireza Yazdanpanah,
Hamidreza Alai,
Sahel Salehi,
Mahyar Ashkvari,
Milad Mousavi,
Milad Shafiee Ashtiani
Abstract:
This paper describes the hardware, software framework, and experimental testing of SURENA IV humanoid robotics platform. SURENA IV has 43 degrees of freedom (DoFs), including seven DoFs for each arm, six DoFs for each hand, and six DoFs for each leg, with a height of 170 cm and a mass of 68 kg and morphological and mass properties similar to an average adult human. SURENA IV aims to realize a cost…
▽ More
This paper describes the hardware, software framework, and experimental testing of SURENA IV humanoid robotics platform. SURENA IV has 43 degrees of freedom (DoFs), including seven DoFs for each arm, six DoFs for each hand, and six DoFs for each leg, with a height of 170 cm and a mass of 68 kg and morphological and mass properties similar to an average adult human. SURENA IV aims to realize a cost-effective and anthropomorphic humanoid robot for real-world scenarios. In this way, we demonstrate a locomotion framework based on a novel and inexpensive predictive foot sensor that enables walking with 7cm foot position error because of accumulative error of links and connections' deflection(that has been manufactured by the tools which are available in the Universities). Thanks to this sensor, the robot can walk on unknown obstacles without any force feedback, by online adaptation of foot height and orientation. Moreover, the arm and hand of the robot have been designed to grasp the objects with different stiffness and geometries that enable the robot to do drilling, visual servoing of a moving object, and writing his name on the white-board.
△ Less
Submitted 30 August, 2021;
originally announced August 2021.
-
SuperCaustics: Real-time, open-source simulation of transparent objects for deep learning applications
Authors:
Mehdi Mousavi,
Rolando Estrada
Abstract:
Transparent objects are a very challenging problem in computer vision. They are hard to segment or classify due to their lack of precise boundaries, and there is limited data available for training deep neural networks. As such, current solutions for this problem employ rigid synthetic datasets, which lack flexibility and lead to severe performance degradation when deployed on real-world scenarios…
▽ More
Transparent objects are a very challenging problem in computer vision. They are hard to segment or classify due to their lack of precise boundaries, and there is limited data available for training deep neural networks. As such, current solutions for this problem employ rigid synthetic datasets, which lack flexibility and lead to severe performance degradation when deployed on real-world scenarios. In particular, these synthetic datasets omit features such as refraction, dispersion and caustics due to limitations in the rendering pipeline. To address this issue, we present SuperCaustics, a real-time, open-source simulation of transparent objects designed for deep learning applications. SuperCaustics features extensive modules for stochastic environment creation; uses hardware ray-tracing to support caustics, dispersion, and refraction; and enables generating massive datasets with multi-modal, pixel-perfect ground truth annotations. To validate our proposed system, we trained a deep neural network from scratch to segment transparent objects in difficult lighting scenarios. Our neural network achieved performance comparable to the state-of-the-art on a real-world dataset using only 10% of the training data and in a fraction of the training time. Further experiments show that a model trained with SuperCaustics can segment different types of caustics, even in images with multiple overlap** transparent objects. To the best of our knowledge, this is the first such result for a model trained on synthetic data. Both our open-source code and experimental data are freely available online.
△ Less
Submitted 11 October, 2021; v1 submitted 22 July, 2021;
originally announced July 2021.
-
DyNetKAT: An Algebra of Dynamic Networks
Authors:
Georgiana Caltais,
Hossein Hojjat,
Mohammad Mousavi,
Hunkar Can Tunc
Abstract:
We introduce a formal language for specifying dynamic updates for Software Defined Networks. Our language builds upon Network Kleene Algebra with Tests (NetKAT) and adds constructs for synchronisations and multi-packet behaviour to capture the interaction between the control- and data-plane in dynamic updates. We provide a sound and ground-complete axiomatisation of our language. We exploit the eq…
▽ More
We introduce a formal language for specifying dynamic updates for Software Defined Networks. Our language builds upon Network Kleene Algebra with Tests (NetKAT) and adds constructs for synchronisations and multi-packet behaviour to capture the interaction between the control- and data-plane in dynamic updates. We provide a sound and ground-complete axiomatisation of our language. We exploit the equational theory to provide an efficient reasoning method about safety properties for dynamic networks. We implement our equational theory in DyNetiKAT -- a tool prototype, based on the Maude Rewriting Logic and the NetKAT tool, and apply it to a case study. We show that we can analyse the case study for networks with hundreds of switches using our initial tool prototype.
△ Less
Submitted 22 May, 2021; v1 submitted 19 February, 2021;
originally announced February 2021.
-
Conformance Relations and Hyperproperties for Do** Detection in Time and Space
Authors:
Sebastian Biewer,
Rayna Dimitrova,
Michael Fries,
Maciej Gazda,
Thomas Heinze,
Holger Hermanns,
Mohammad Reza Mousavi
Abstract:
We present a novel and generalised notion of do** cleanness for cyber-physical systems that allows for perturbing the inputs and observing the perturbed outputs both in the time- and value-domains. We instantiate our definition using existing notions of conformance for cyber-physical systems. As a formal basis for monitoring conformance-based cleanness, we develop the temporal logic HyperSTL*, a…
▽ More
We present a novel and generalised notion of do** cleanness for cyber-physical systems that allows for perturbing the inputs and observing the perturbed outputs both in the time- and value-domains. We instantiate our definition using existing notions of conformance for cyber-physical systems. As a formal basis for monitoring conformance-based cleanness, we develop the temporal logic HyperSTL*, an extension of Signal Temporal Logics with trace quantifiers and a freeze operator. We show that our generalised definitions are essential in a data-driven method for do** detection and apply our definitions to a case study concerning diesel emission tests.
△ Less
Submitted 17 January, 2022; v1 submitted 7 December, 2020;
originally announced December 2020.
-
AI Playground: Unreal Engine-based Data Ablation Tool for Deep Learning
Authors:
Mehdi Mousavi,
Aashis Khanal,
Rolando Estrada
Abstract:
Machine learning requires data, but acquiring and labeling real-world data is challenging, expensive, and time-consuming. More importantly, it is nearly impossible to alter real data post-acquisition (e.g., change the illumination of a room), making it very difficult to measure how specific properties of the data affect performance. In this paper, we present AI Playground (AIP), an open-source, Un…
▽ More
Machine learning requires data, but acquiring and labeling real-world data is challenging, expensive, and time-consuming. More importantly, it is nearly impossible to alter real data post-acquisition (e.g., change the illumination of a room), making it very difficult to measure how specific properties of the data affect performance. In this paper, we present AI Playground (AIP), an open-source, Unreal Engine-based tool for generating and labeling virtual image data. With AIP, it is trivial to capture the same image under different conditions (e.g., fidelity, lighting, etc.) and with different ground truths (e.g., depth or surface normal values). AIP is easily extendable and can be used with or without code. To validate our proposed tool, we generated eight datasets of otherwise identical but varying lighting and fidelity conditions. We then trained deep neural networks to predict (1) depth values, (2) surface normals, or (3) object labels and assessed each network's intra- and cross-dataset performance. Among other insights, we verified that sensitivity to different settings is problem-dependent. We confirmed the findings of other studies that segmentation models are very sensitive to fidelity, but we also found that they are just as sensitive to lighting. In contrast, depth and normal estimation models seem to be less sensitive to fidelity or lighting and more sensitive to the structure of the image. Finally, we tested our trained depth-estimation networks on two real-world datasets and obtained results comparable to training on real data alone, confirming that our virtual environments are realistic enough for real-world tasks.
△ Less
Submitted 12 July, 2020;
originally announced July 2020.
-
Bayesian-Deep-Learning Estimation of Earthquake Location from Single-Station Observations
Authors:
S. Mostafa Mousavi,
Gregory C. Beroza
Abstract:
We present a deep learning method for single-station earthquake location, which we approach as a regression problem using two separate Bayesian neural networks. We use a multi-task temporal-convolutional neural network to learn epicentral distance and P travel time from 1-minute seismograms. The network estimates epicentral distance and P travel time with absolute mean errors of 0.23 km and 0.03 s…
▽ More
We present a deep learning method for single-station earthquake location, which we approach as a regression problem using two separate Bayesian neural networks. We use a multi-task temporal-convolutional neural network to learn epicentral distance and P travel time from 1-minute seismograms. The network estimates epicentral distance and P travel time with absolute mean errors of 0.23 km and 0.03 s respectively, along with their epistemic and aleatory uncertainties. We design a separate multi-input network using standard convolutional layers to estimate the back-azimuth angle, and its epistemic uncertainty. This network estimates the direction from which seismic waves arrive to the station with a mean error of 1 degree. Using this information, we estimate the epicenter, origin time, and depth along with their confidence intervals. We use a global dataset of earthquake signals recorded within 1 degree (~112 km) from the event to build the model and to demonstrate its performance. Our model can predict epicenter, origin time, and depth with mean errors of 7.3 km, 0.4 second, and 6.7 km respectively, at different locations around the world. Our approach can be used for fast earthquake source characterization with a limited number of observations, and also for estimating location of earthquakes that are sparsely recorded -- either because they are small or because stations are widely separated.
△ Less
Submitted 2 December, 2019;
originally announced December 2019.
-
A Machine-Learning Approach for Earthquake Magnitude Estimation
Authors:
S. Mostafa Mousavi,
Gregory C. Beroza
Abstract:
In this study we develop a single-station deep-learning approach for fast and reliable estimation of earthquake magnitude directly from raw waveforms. We design a regressor composed of convolutional and recurrent neural networks that is not sensitive to the data normalization, hence waveform amplitude information can be utilized during the training. Our network can predict earthquake magnitudes wi…
▽ More
In this study we develop a single-station deep-learning approach for fast and reliable estimation of earthquake magnitude directly from raw waveforms. We design a regressor composed of convolutional and recurrent neural networks that is not sensitive to the data normalization, hence waveform amplitude information can be utilized during the training. Our network can predict earthquake magnitudes with an average error close to zero and standard deviation of ~0.2 based on single-station waveforms without instrument response correction. We test the network for both local and duration magnitude scales and show a station-based learning can be an effective approach for improving the performance. The proposed approach has a variety of potential applications from routine earthquake monitoring to early warning systems.
△ Less
Submitted 14 November, 2019;
originally announced November 2019.
-
Energy and Social Cost Minimization for Data Dissemination in Wireless Networks: Centralized and Decentralized Approaches
Authors:
Mahdi Mousavi,
Anja Klein
Abstract:
We study multi-hop data-dissemination in a wireless network from one source to multiple nodes where some of the nodes of the network act as re-transmitting nodes and help the source in data dissemination. In this network, we study two scenarios; i) the transmitting nodes do not need an incentive for transmission and ii) they do need an incentive and are paid by their corresponding receiving nodes…
▽ More
We study multi-hop data-dissemination in a wireless network from one source to multiple nodes where some of the nodes of the network act as re-transmitting nodes and help the source in data dissemination. In this network, we study two scenarios; i) the transmitting nodes do not need an incentive for transmission and ii) they do need an incentive and are paid by their corresponding receiving nodes by virtual tokens. We investigate two problems; P1) network power minimization for the first scenario and P2) social cost minimization for the second scenario, defined as the total cost paid by the nodes of the network for receiving data. In this paper, to address P1 and P2, we propose centralized and decentralized approaches that determine which of the nodes of the network should act as transmitting nodes, find their transmit powers and their corresponding receiving nodes. For the sake of energy efficiency, in our model, we employ maximal-ratio combining (MRC) at the receivers so that a receiver can be served by multiple transmitters. The proposed decentralized approach is based on a non-cooperative cost-sharing game (CSG). In our proposed game, every receiving node chooses its respective transmitting nodes and consequently, a cost is assigned to it according to the power imposed on its chosen transmitting nodes. We discuss how the network is formed in a decentralized way, find the action of the nodes in the game and show that, despite being decentralized, the proposed game converges to a stable solution. To find the centralized global optimum, which is a benchmark to our decentralized approach, we use a mixed-integer-liner-program (MILP). Simulation results show that our proposed decentralized approach outperforms the conventional algorithms in terms of energy efficiency and social cost while it can address the need for an incentive for collaboration.
△ Less
Submitted 21 March, 2020; v1 submitted 6 November, 2019;
originally announced November 2019.
-
Joint Relaying and Spatial Sharing Multicast Scheduling for mmWave Networks
Authors:
Gek Hong,
Sim,
Mahdi Mousavi,
Lin Wang,
Anja Klein,
Matthias Hollick
Abstract:
Millimeter-wave (mmWave) communication plays a vital role to efficiently disseminate large volumes of data in beyond-5G networks. Unfortunately, the directionality of mmWave communication significantly complicates efficient data dissemination, particularly in multicasting, which is gaining more and more importance in emerging applications (e.g., V2X, public safety). While multicasting for systems…
▽ More
Millimeter-wave (mmWave) communication plays a vital role to efficiently disseminate large volumes of data in beyond-5G networks. Unfortunately, the directionality of mmWave communication significantly complicates efficient data dissemination, particularly in multicasting, which is gaining more and more importance in emerging applications (e.g., V2X, public safety). While multicasting for systems operating at lower frequencies (i.e., sub-6GHz) has been extensively studied, they are sub-optimal for mmWave systems as mmWave has significantly different propagation characteristics, i.e., using the directional transmission to compensate for the high path loss and thus promoting spectrum sharing. In this paper, we propose novel multicast scheduling algorithms by jointly exploiting relaying and spatial sharing gains while aiming to minimize the multicast completion time. We first characterize the min-time mmWave multicasting problem with a comprehensive model and formulate it with an integer linear program (ILP). We further design a practical and scalable distributed algorithm named mmDiMu, based on gradually maximizing the transmission throughput over time. Finally, we carry out validation through extensive simulations in different scales and the results show that mmDiMu significantly outperforms conventional algorithms with around 95% reduction on multicast completion time.
△ Less
Submitted 30 July, 2019;
originally announced July 2019.
-
An Empirical Study on Post-processing Methods for Word Embeddings
Authors:
Shuai Tang,
Mahta Mousavi,
Virginia R. de Sa
Abstract:
Word embeddings learnt from large corpora have been adopted in various applications in natural language processing and served as the general input representations to learning systems. Recently, a series of post-processing methods have been proposed to boost the performance of word embeddings on similarity comparison and analogy retrieval tasks, and some have been adapted to compose sentence repres…
▽ More
Word embeddings learnt from large corpora have been adopted in various applications in natural language processing and served as the general input representations to learning systems. Recently, a series of post-processing methods have been proposed to boost the performance of word embeddings on similarity comparison and analogy retrieval tasks, and some have been adapted to compose sentence representations. The general hypothesis behind these methods is that by enforcing the embedding space to be more isotropic, the similarity between words can be better expressed. We view these methods as an approach to shrink the covariance/gram matrix, which is estimated by learning word vectors, towards a scaled identity matrix. By optimising an objective in the semi-Riemannian manifold with Centralised Kernel Alignment (CKA), we are able to search for the optimal shrinkage parameter, and provide a post-processing method to smooth the spectrum of learnt word vectors which yields improved performance on downstream tasks.
△ Less
Submitted 23 October, 2019; v1 submitted 27 May, 2019;
originally announced May 2019.
-
Private Inner Product Retrieval for Distributed Machine Learning
Authors:
Mohammad Hossein Mousavi,
Mohammad Ali Maddah-Ali,
Mahtab Mirmohseni
Abstract:
In this paper, we argue that in many basic algorithms for machine learning, including support vector machine (SVM) for classification, principal component analysis (PCA) for dimensionality reduction, and regression for dependency estimation, we need the inner products of the data samples, rather than the data samples themselves.
Motivated by the above observation, we introduce the problem of pri…
▽ More
In this paper, we argue that in many basic algorithms for machine learning, including support vector machine (SVM) for classification, principal component analysis (PCA) for dimensionality reduction, and regression for dependency estimation, we need the inner products of the data samples, rather than the data samples themselves.
Motivated by the above observation, we introduce the problem of private inner product retrieval for distributed machine learning, where we have a system including a database of some files, duplicated across some non-colluding servers. A user intends to retrieve a subset of specific size of the inner products of the data files with minimum communication load, without revealing any information about the identity of the requested subset. For achievability, we use the algorithms for multi-message private information retrieval. For converse, we establish that as the length of the files becomes large, the set of all inner products converges to independent random variables with uniform distribution, and derive the rate of convergence. To prove that, we construct special dependencies among sequences of the sets of all inner products with different length, which forms a time-homogeneous irreducible Markov chain, without affecting the marginal distribution. We show that this Markov chain has a uniform distribution as its unique stationary distribution, with rate of convergence dominated by the second largest eigenvalue of the transition probability matrix. This allows us to develop a converse, which converges to a tight bound in some cases, as the size of the files becomes large. While this converse is based on the one in multi-message private information retrieval, due to the nature of retrieving inner products instead of data itself some changes are made to reach the desired result.
△ Less
Submitted 17 February, 2019;
originally announced February 2019.
-
CRED: A Deep Residual Network of Convolutional and Recurrent Units for Earthquake Signal Detection
Authors:
S. Mostafa Mousavi,
Weiqiang Zhu,
Yixiao Sheng,
Gregory C. Beroza
Abstract:
Earthquake signal detection is at the core of observational seismology. A good detection algorithm should be sensitive to small and weak events with a variety of waveform shapes, robust to background noise and non-earthquake signals, and efficient for processing large data volumes. Here, we introduce the Cnn-Rnn Earthquake Detector (CRED), a detector based on deep neural networks. The network uses…
▽ More
Earthquake signal detection is at the core of observational seismology. A good detection algorithm should be sensitive to small and weak events with a variety of waveform shapes, robust to background noise and non-earthquake signals, and efficient for processing large data volumes. Here, we introduce the Cnn-Rnn Earthquake Detector (CRED), a detector based on deep neural networks. The network uses a combination of convolutional layers and bi-directional long-short-term memory units in a residual structure. It learns the time-frequency characteristics of the dominant phases in an earthquake signal from three component data recorded on a single station. We train the network using 500,000 seismograms (250k associated with tectonic earthquakes and 250k identified as noise) recorded in Northern California and tested it with an F-score of 99.95. The robustness of the trained model with respect to the noise level and non-earthquake signals is shown by applying it to a set of semi-synthetic signals. The model is applied to one month of continuous data recorded at Central Arkansas to demonstrate its efficiency, generalization, and sensitivity. Our model is able to detect more than 700 microearthquakes as small as -1.3 ML induced during hydraulic fracturing far away than the training region. The performance of the model is compared with STA/LTA, template matching, and FAST algorithms. Our results indicate an efficient and reliable performance of CRED. This framework holds great promise in lowering the detection threshold while minimizing false positive detection rates.
△ Less
Submitted 3 October, 2018;
originally announced October 2018.
-
Cost Sharing Games for Energy-Efficient Multi-Hop Broadcast in Wireless Networks
Authors:
Mahdi Mousavi,
Hussein Al-Shatri,
Anja Klein
Abstract:
We study multi-hop broadcast in wireless networks with one source node and multiple receiving nodes. The message flow from the source to the receivers can be modeled as a tree-graph, called broadcast-tree. The problem of finding the minimum-power broadcast-tree (MPBT) is NP-complete. Unlike most of the existing centralized approaches, we propose a decentralized algorithm, based on a non-cooperativ…
▽ More
We study multi-hop broadcast in wireless networks with one source node and multiple receiving nodes. The message flow from the source to the receivers can be modeled as a tree-graph, called broadcast-tree. The problem of finding the minimum-power broadcast-tree (MPBT) is NP-complete. Unlike most of the existing centralized approaches, we propose a decentralized algorithm, based on a non-cooperative cost-sharing game. In this game, every receiving node, as a player, chooses another node of the network as its respective transmitting node for receiving the message. Consequently, a cost is assigned to the receiving node based on the power imposed on its chosen transmitting node. In our model, the total required power at a transmitting node consists of (i) the transmit power and (ii) the circuitry power needed for communication hardware modules. We develop our algorithm using the marginal contribution (MC) cost-sharing scheme and show that the optimum broadcast-tree is always a Nash equilibrium (NE) of the game. Simulation results demonstrate that our proposed algorithm outperforms conventional algorithms for the MPBT problem. Besides, we show that the circuitry power, which is usually ignored by existing algorithms, significantly impacts the energy-efficiency of the network.
△ Less
Submitted 17 January, 2020; v1 submitted 28 May, 2018;
originally announced May 2018.
-
Towards an Approximate Conformance Relation for Hybrid I/O Automata
Authors:
Morteza Mohaqeqi,
Mohammad Reza Mousavi
Abstract:
Several notions of conformance have been proposed for checking the behavior of cyber-physical systems against their hybrid systems models. In this paper, we explore the initial idea of a notion of approximate conformance that allows for comparison of both observable discrete actions and (sampled) continuous trajectories. As such, this notion will consolidate two earlier notions, namely the notion…
▽ More
Several notions of conformance have been proposed for checking the behavior of cyber-physical systems against their hybrid systems models. In this paper, we explore the initial idea of a notion of approximate conformance that allows for comparison of both observable discrete actions and (sampled) continuous trajectories. As such, this notion will consolidate two earlier notions, namely the notion of Hybrid Input-Output Conformance (HIOCO) by M. van Osch and the notion of Hybrid Conformance by H. Abbas and G.E. Fainekos. We prove that our proposed notion of conformance satisfies a semi-transitivity property, which makes it suitable for a step-wise proof of conformance or refinement.
△ Less
Submitted 15 December, 2016;
originally announced December 2016.
-
(De-)Composing Causality in Labeled Transition Systems
Authors:
Georgiana Caltais,
Stefan Leue,
Mohammad Reza Mousavi
Abstract:
In this paper we introduce a notion of counterfactual causality in the Halpern and Pearl sense that is compositional with respect to the interleaving of transition systems. The formal framework for reasoning on what caused the violation of a safety property is established in the context of labeled transition systems and Hennessy Milner logic. The compositionality results are devised for non-commun…
▽ More
In this paper we introduce a notion of counterfactual causality in the Halpern and Pearl sense that is compositional with respect to the interleaving of transition systems. The formal framework for reasoning on what caused the violation of a safety property is established in the context of labeled transition systems and Hennessy Milner logic. The compositionality results are devised for non-communicating systems.
△ Less
Submitted 28 August, 2016;
originally announced August 2016.
-
Geometry of Interest (GOI): Spatio-Temporal Destination Extraction and Partitioning in GPS Trajectory Data
Authors:
Seyed Morteza Mousavi,
Aaron Harwood,
Shanika Karunasekera,
Mojtaba Maghrebi
Abstract:
Nowadays large amounts of GPS trajectory data is being continuously collected by GPS-enabled devices such as vehicles navigation systems and mobile phones. GPS trajectory data is useful for applications such as traffic management, location forecasting, and itinerary planning. Such applications often need to extract the time-stamped Sequence of Visited Locations (SVLs) of the mobile objects. The ne…
▽ More
Nowadays large amounts of GPS trajectory data is being continuously collected by GPS-enabled devices such as vehicles navigation systems and mobile phones. GPS trajectory data is useful for applications such as traffic management, location forecasting, and itinerary planning. Such applications often need to extract the time-stamped Sequence of Visited Locations (SVLs) of the mobile objects. The nearest neighbor query (NNQ) is the most applied method for labeling the visited locations based on the IDs of the POIs in the process of SVL generation. NNQ in some scenarios is not accurate enough. To improve the quality of the extracted SVLs, instead of using NNQ, we label the visited locations as the IDs of the POIs which geometrically intersect with the GPS observations. Intersection operator requires the accurate geometry of the points of interest which we refer to them as the Geometries of Interest (GOIs). In some application domains (e.g. movement trajectories of animals), adequate information about the POIs and their GOIs may not be available a priori, or they may not be publicly accessible and, therefore, they need to be derived from GPS trajectory data. In this paper we propose a novel method for estimating the POIs and their GOIs, which consists of three phases: (i) extracting the geometries of the stay regions; (ii) constructing the geometry of destination regions based on the extracted stay regions; and (iii) constructing the GOIs based on the geometries of the destination regions. Using the geometric similarity to known GOIs as the major evaluation criterion, the experiments we performed using long-term GPS trajectory data show that our method outperforms the existing approaches.
△ Less
Submitted 16 May, 2016; v1 submitted 13 March, 2016;
originally announced March 2016.
-
Spinal Test Suites for Software Product Lines
Authors:
Harsh Beohar,
Mohammad Reza Mousavi
Abstract:
A major challenge in testing software product lines is efficiency. In particular, testing a product line should take less effort than testing each and every product individually. We address this issue in the context of input-output conformance testing, which is a formal theory of model-based testing. We extend the notion of conformance testing on input-output featured transition systems with the n…
▽ More
A major challenge in testing software product lines is efficiency. In particular, testing a product line should take less effort than testing each and every product individually. We address this issue in the context of input-output conformance testing, which is a formal theory of model-based testing. We extend the notion of conformance testing on input-output featured transition systems with the novel concept of spinal test suites. We show how this concept dispenses with retesting the common behavior among different, but similar, products of a software product line.
△ Less
Submitted 27 March, 2014;
originally announced March 2014.
-
Shape-constrained Estimation of Value Functions
Authors:
Mohammad Mousavi,
Peter W. Glynn
Abstract:
We present a fully nonparametric method to estimate the value function, via simulation, in the context of expected infinite-horizon discounted rewards for Markov chains. Estimating such value functions plays an important role in approximate dynamic programming and applied probability in general. We incorporate "soft information" into the estimation algorithm, such as knowledge of convexity, monoto…
▽ More
We present a fully nonparametric method to estimate the value function, via simulation, in the context of expected infinite-horizon discounted rewards for Markov chains. Estimating such value functions plays an important role in approximate dynamic programming and applied probability in general. We incorporate "soft information" into the estimation algorithm, such as knowledge of convexity, monotonicity, or Lipchitz constants. In the presence of such information, a nonparametric estimator for the value function can be computed that is provably consistent as the simulated time horizon tends to infinity. As an application, we implement our method on price tolling agreement contracts in energy markets.
△ Less
Submitted 25 December, 2013;
originally announced December 2013.
-
Exact Simulation of Non-stationary Reflected Brownian Motion
Authors:
Mohammad Mousavi,
Peter W. Glynn
Abstract:
This paper develops the first method for the exact simulation of reflected Brownian motion (RBM) with non-stationary drift and infinitesimal variance. The running time of generating exact samples of non-stationary RBM at any time $t$ is uniformly bounded by $\mathcal{O}(1/\barγ^2)$ where $\barγ$ is the average drift of the process. The method can be used as a guide for planning simulations of comp…
▽ More
This paper develops the first method for the exact simulation of reflected Brownian motion (RBM) with non-stationary drift and infinitesimal variance. The running time of generating exact samples of non-stationary RBM at any time $t$ is uniformly bounded by $\mathcal{O}(1/\barγ^2)$ where $\barγ$ is the average drift of the process. The method can be used as a guide for planning simulations of complex queueing systems with non-stationary arrival rates and/or service time.
△ Less
Submitted 22 December, 2013;
originally announced December 2013.
-
Algebraic Meta-Theory of Processes with Data
Authors:
Daniel Gebler,
Eugen-Ioan Goriac,
Mohammad Reza Mousavi
Abstract:
There exists a rich literature of rule formats guaranteeing different algebraic properties for formalisms with a Structural Operational Semantics. Moreover, there exist a few approaches for automatically deriving axiomatizations characterizing strong bisimilarity of processes. To our knowledge, this literature has never been extended to the setting with data (e.g. to model storage and memory). We…
▽ More
There exists a rich literature of rule formats guaranteeing different algebraic properties for formalisms with a Structural Operational Semantics. Moreover, there exist a few approaches for automatically deriving axiomatizations characterizing strong bisimilarity of processes. To our knowledge, this literature has never been extended to the setting with data (e.g. to model storage and memory). We show how the rule formats for algebraic properties can be exploited in a generic manner in the setting with data. Moreover, we introduce a new approach for deriving sound and ground-complete axiom schemata for a notion of bisimilarity with data, called stateless bisimilarity, based on intuitive auxiliary function symbols for handling the store component. We do restrict, however, the axiomatization to the setting where the store component is only given in terms of constants.
△ Less
Submitted 28 July, 2013;
originally announced July 2013.
-
Decomposability in Input Output Conformance Testing
Authors:
Neda Noroozi,
Mohammad Reza Mousavi,
Tim A. C. Willemse
Abstract:
We study the problem of deriving a specification for a third-party component, based on the specification of the system and the environment in which the component is supposed to reside. Particularly, we are interested in using component specifications for conformance testing of black-box components, using the theory of input-output conformance (ioco) testing. We propose and prove sufficient criteri…
▽ More
We study the problem of deriving a specification for a third-party component, based on the specification of the system and the environment in which the component is supposed to reside. Particularly, we are interested in using component specifications for conformance testing of black-box components, using the theory of input-output conformance (ioco) testing. We propose and prove sufficient criteria for decompositionality, i.e., that components conforming to the derived specification will always compose to produce a correct system with respect to the system specification. We also study the criteria for strong decomposability, by which we can ensure that only those components conforming to the derived specification can lead to a correct system.
△ Less
Submitted 5 March, 2013;
originally announced March 2013.
-
Proceedings First International Workshop on Process Algebra and Coordination
Authors:
Luca Aceto,
Mohammad Reza Mousavi
Abstract:
Process algebra provides abstract and rigorous means for studying communicating concurrent systems. Coordination languages also provide abstract means for the specifying and programming communication of components. Hence, the two fields seem to have very much in common and the link between these two research areas have been established formally by means of several translations, mainly from coordin…
▽ More
Process algebra provides abstract and rigorous means for studying communicating concurrent systems. Coordination languages also provide abstract means for the specifying and programming communication of components. Hence, the two fields seem to have very much in common and the link between these two research areas have been established formally by means of several translations, mainly from coordination languages to process algebras. There have also been proposals of process algebras whose communication policy is inspired by the one underlying coordination languages.
The aim of this workshop was to push the state of the art in the study of the connections between process algebra and coordination languages by bringing together experts as well as young researchers from the two fields to communicate their ideas and findings. It includes both contributed and invited papers that have been presented during the one day meeting on Process Algebra and Coordination (PACO 2011) which took place on June 9, 2011 in Reykjavik, Iceland.
△ Less
Submitted 6 August, 2011;
originally announced August 2011.
-
Proceedings 10th International Workshop on the Foundations of Coordination Languages and Software Architectures
Authors:
Mohammad Reza Mousavi,
Antonio Ravara
Abstract:
Computation nowadays is becoming inherently concurrent, either because of characteristics of the hardware (with multicore processors becoming omnipresent) or due to the ubiquitous presence of distributed systems (incarnated in the Internet). Computational systems are therefore typically distributed, concurrent, mobile, and often involve composition of heterogeneous components.
To specify and rea…
▽ More
Computation nowadays is becoming inherently concurrent, either because of characteristics of the hardware (with multicore processors becoming omnipresent) or due to the ubiquitous presence of distributed systems (incarnated in the Internet). Computational systems are therefore typically distributed, concurrent, mobile, and often involve composition of heterogeneous components.
To specify and reason about such systems and go beyond the functional correctness proofs, e.g., by supporting reusability and improving maintainability, approaches such as coordination languages and software architecture are recognised as fundamental.
The goal of the this workshop is to put together researchers and practitioners of the aforementioned fields, to share and identify common problems, and to devise general solutions in the context of coordination languages and software architectures.
△ Less
Submitted 28 July, 2011;
originally announced July 2011.
-
Robustness of Equations Under Operational Extensions
Authors:
Peter D. Mosses,
MohammadReza Mousavi,
Michel A. Reniers
Abstract:
Sound behavioral equations on open terms may become unsound after conservative extensions of the underlying operational semantics. Providing criteria under which such equations are preserved is extremely useful; in particular, it can avoid the need to repeat proofs when extending the specified language.
This paper investigates preservation of sound equations for several notions of bisimilarity o…
▽ More
Sound behavioral equations on open terms may become unsound after conservative extensions of the underlying operational semantics. Providing criteria under which such equations are preserved is extremely useful; in particular, it can avoid the need to repeat proofs when extending the specified language.
This paper investigates preservation of sound equations for several notions of bisimilarity on open terms: closed-instance (ci-)bisimilarity and formal-hypothesis (fh-)bisimilarity, both due to Robert de Simone, and hypothesis-preserving (hp-)bisimilarity, due to Arend Rensink. For both fh-bisimilarity and hp-bisimilarity, we prove that arbitrary sound equations on open terms are preserved by all disjoint extensions which do not add labels. We also define slight variations of fh- and hp-bisimilarity such that all sound equations are preserved by arbitrary disjoint extensions. Finally, we give two sets of syntactic criteria (on equations, resp. operational extensions) and prove each of them to be sufficient for preserving ci-bisimilarity.
△ Less
Submitted 29 November, 2010;
originally announced November 2010.
-
Proceedings Ninth International Workshop on the Foundations of Coordination Languages and Software Architectures
Authors:
MohammadReza Mousavi,
Gwen Salaün
Abstract:
This volume contains the proceedings of FOCLASA 2010, the 9th International Workshop on the Foundations of Coordination Languages and Software Architectures. FOCLASA 2010 was held in Paris, France on July 30th, 2010 as a satellite event of the 21st International Conference on Concurrency Theory, CONCUR 2010. The papers presented in this proceedings tackle different issues that are currently centra…
▽ More
This volume contains the proceedings of FOCLASA 2010, the 9th International Workshop on the Foundations of Coordination Languages and Software Architectures. FOCLASA 2010 was held in Paris, France on July 30th, 2010 as a satellite event of the 21st International Conference on Concurrency Theory, CONCUR 2010. The papers presented in this proceedings tackle different issues that are currently central to our community, namely software adaptation, sensor networks, distributed control, non-functional aspects of coordination such as resources, timing and stochastics.
△ Less
Submitted 28 July, 2010;
originally announced July 2010.
-
Causality in the Semantics of Esterel: Revisited
Authors:
MohammadReza Mousavi
Abstract:
We re-examine the challenges concerning causality in the semantics of Esterel and show that they pertain to the known issues in the semantics of Structured Operational Semantics with negative premises. We show that the solutions offered for the semantics of SOS also provide answers to the semantic challenges of Esterel and that they satisfy the intuitive requirements set by the language designer…
▽ More
We re-examine the challenges concerning causality in the semantics of Esterel and show that they pertain to the known issues in the semantics of Structured Operational Semantics with negative premises. We show that the solutions offered for the semantics of SOS also provide answers to the semantic challenges of Esterel and that they satisfy the intuitive requirements set by the language designers.
△ Less
Submitted 15 February, 2010;
originally announced February 2010.