Search | arXiv e-print repository

Optimizing Near Field Computation in the MLFMA Algorithm with Data Redundancy and Performance Modeling on a Single GPU

Authors: Morteza Sadeghi, Abdolreza Torabi

Abstract: The Multilevel Fast Multipole Algorithm (MLFMA) has known applications in scientific modeling in the fields of telecommunications, physics, mechanics, and chemistry. Accelerating calculation of far-field using GPUs and GPU clusters for large-scale problems has been studied for more than a decade. The acceleration of the Near Field Computation (P2P operator) however was less of a concern because it… ▽ More The Multilevel Fast Multipole Algorithm (MLFMA) has known applications in scientific modeling in the fields of telecommunications, physics, mechanics, and chemistry. Accelerating calculation of far-field using GPUs and GPU clusters for large-scale problems has been studied for more than a decade. The acceleration of the Near Field Computation (P2P operator) however was less of a concern because it does not face the challenges of distributed processing which does far field. This article proposes a modification of the P2P algorithm and uses performance models to determine its optimality criteria. By modeling the speedup, we found that making threads independence by creating redundancy in the data makes the algorithm for lower dense (higher frequency) problems nearly 13 times faster than non-redundant mode. △ Less

Submitted 3 March, 2024; originally announced March 2024.

arXiv:2302.07059 [pdf]

GeoFault: A well-founded fault ontology for interoperability in geological modeling

Authors: Yuanwei Qu, Michel Perrin, Anita Torabi, Mara Abel, Martin Giese

Abstract: Geological modeling currently uses various computer-based applications. Data harmonization at the semantic level by means of ontologies is essential for making these applications interoperable. Since geo-modeling is currently part of multidisciplinary projects, semantic harmonization is required to model not only geological knowledge but also to integrate other domain knowledge at a general level.… ▽ More Geological modeling currently uses various computer-based applications. Data harmonization at the semantic level by means of ontologies is essential for making these applications interoperable. Since geo-modeling is currently part of multidisciplinary projects, semantic harmonization is required to model not only geological knowledge but also to integrate other domain knowledge at a general level. For this reason, the domain ontologies used for describing geological knowledge must be based on a sound ontology background to ensure the described geological knowledge is integratable. This paper presents a domain ontology: GeoFault, resting on the Basic Formal Ontology BFO (Arp et al., 2015) and the GeoCore ontology (Garcia et al., 2020). It models the knowledge related to geological faults. Faults are essential to various industries but are complex to model. They can be described as thin deformed rock volumes or as spatial arrangements resulting from the different displacements of geological blocks. At a broader scale, faults are currently described as mere surfaces, which are the components of complex fault arrays. The reference to the BFO and GeoCore package allows assigning these various fault elements to define ontology classes and their logical linkage within a consistent ontology framework. The GeoFault ontology covers the core knowledge of faults 'strico sensu,' excluding ductile shear deformations. This considered vocabulary is essentially descriptive and related to regional to outcrop scales, excluding microscopic, orogenic, and tectonic plate structures. The ontology is molded in OWL 2, validated by competency questions with two use cases, and tested using an in-house ontology-driven data entry application. The work of GeoFault provides a solid framework for disambiguating fault knowledge and a foundation of fault data integration for the applications and the users. △ Less

Submitted 14 February, 2023; originally announced February 2023.

arXiv:1708.09522 [pdf, other]

Action Classification and Highlighting in Videos

Authors: Atousa Torabi, Leonid Sigal

Abstract: Inspired by recent advances in neural machine translation, that jointly align and translate using encoder-decoder networks equipped with attention, we propose an attentionbased LSTM model for human activity recognition. Our model jointly learns to classify actions and highlight frames associated with the action, by attending to salient visual information through a jointly learned soft-attention ne… ▽ More Inspired by recent advances in neural machine translation, that jointly align and translate using encoder-decoder networks equipped with attention, we propose an attentionbased LSTM model for human activity recognition. Our model jointly learns to classify actions and highlight frames associated with the action, by attending to salient visual information through a jointly learned soft-attention networks. We explore attention informed by various forms of visual semantic features, including those encoding actions, objects and scenes. We qualitatively show that soft-attention can learn to effectively attend to important objects and scene information correlated with specific human actions. Further, we show that, quantitatively, our attention-based LSTM outperforms the vanilla LSTM and CNN models used by stateof-the-art methods. On a large-scale youtube video dataset, ActivityNet, our model outperforms competing methods in action classification. △ Less

Submitted 30 August, 2017; originally announced August 2017.

arXiv:1609.08124 [pdf, other]

Learning Language-Visual Embedding for Movie Understanding with Natural-Language

Authors: Atousa Torabi, Niket Tandon, Leonid Sigal

Abstract: Learning a joint language-visual embedding has a number of very appealing properties and can result in variety of practical application, including natural language image/video annotation and search. In this work, we study three different joint language-visual neural network model architectures. We evaluate our models on large scale LSMDC16 movie dataset for two tasks: 1) Standard Ranking for video… ▽ More Learning a joint language-visual embedding has a number of very appealing properties and can result in variety of practical application, including natural language image/video annotation and search. In this work, we study three different joint language-visual neural network model architectures. We evaluate our models on large scale LSMDC16 movie dataset for two tasks: 1) Standard Ranking for video annotation and retrieval 2) Our proposed movie multiple-choice test. This test facilitate automatic evaluation of visual-language models for natural language video annotation based on human activities. In addition to original Audio Description (AD) captions, provided as part of LSMDC16, we collected and will make available a) manually generated re-phrasings of those captions obtained using Amazon MTurk b) automatically generated human activity elements in "Predicate + Object" (PO) phrases based on "Knowlywood", an activity knowledge mining model. Our best model archives Recall@10 of 19.2% on annotation and 18.9% on video retrieval tasks for subset of 1000 samples. For multiple-choice test, our best model achieve accuracy 58.11% over whole LSMDC16 public test-set. △ Less

Submitted 26 September, 2016; originally announced September 2016.

Comments: 13 pages

arXiv:1605.03705 [pdf, other]

Movie Description

Authors: Anna Rohrbach, Atousa Torabi, Marcus Rohrbach, Niket Tandon, Christopher Pal, Hugo Larochelle, Aaron Courville, Bernt Schiele

Abstract: Audio Description (AD) provides linguistic descriptions of movies and allows visually impaired people to follow a movie along with their peers. Such descriptions are by design mainly visual and thus naturally form an interesting data source for computer vision and computational linguistics. In this work we propose a novel dataset which contains transcribed ADs, which are temporally aligned to full… ▽ More Audio Description (AD) provides linguistic descriptions of movies and allows visually impaired people to follow a movie along with their peers. Such descriptions are by design mainly visual and thus naturally form an interesting data source for computer vision and computational linguistics. In this work we propose a novel dataset which contains transcribed ADs, which are temporally aligned to full length movies. In addition we also collected and aligned movie scripts used in prior work and compare the two sources of descriptions. In total the Large Scale Movie Description Challenge (LSMDC) contains a parallel corpus of 118,114 sentences and video clips from 202 movies. First we characterize the dataset by benchmarking different approaches for generating video descriptions. Comparing ADs to scripts, we find that ADs are indeed more visual and describe precisely what is shown rather than what should happen according to the scripts created prior to movie production. Furthermore, we present and compare the results of several teams who participated in a challenge organized in the context of the workshop "Describing and Understanding Video & The Large Scale Movie Description Challenge (LSMDC)", at ICCV 2015. △ Less

Submitted 12 May, 2016; originally announced May 2016.

arXiv:1503.01070 [pdf, other]

Using Descriptive Video Services to Create a Large Data Source for Video Annotation Research

Authors: Atousa Torabi, Christopher Pal, Hugo Larochelle, Aaron Courville

Abstract: In this work, we introduce a dataset of video annotated with high quality natural language phrases describing the visual content in a given segment of time. Our dataset is based on the Descriptive Video Service (DVS) that is now encoded on many digital media products such as DVDs. DVS is an audio narration describing the visual elements and actions in a movie for the visually impaired. It is tempo… ▽ More In this work, we introduce a dataset of video annotated with high quality natural language phrases describing the visual content in a given segment of time. Our dataset is based on the Descriptive Video Service (DVS) that is now encoded on many digital media products such as DVDs. DVS is an audio narration describing the visual elements and actions in a movie for the visually impaired. It is temporally aligned with the movie and mixed with the original movie soundtrack. We describe an automatic DVS segmentation and alignment method for movies, that enables us to scale up the collection of a DVS-derived dataset with minimal human intervention. Using this method, we have collected the largest DVS-derived dataset for video description of which we are aware. Our dataset currently includes over 84.6 hours of paired video/sentences from 92 DVDs and is growing. △ Less

Submitted 3 March, 2015; originally announced March 2015.

Comments: 7 pages

arXiv:1502.08029 [pdf, other]

Describing Videos by Exploiting Temporal Structure

Authors: Li Yao, Atousa Torabi, Kyunghyun Cho, Nicolas Ballas, Christopher Pal, Hugo Larochelle, Aaron Courville

Abstract: Recent progress in using recurrent neural networks (RNNs) for image description has motivated the exploration of their application for video description. However, while images are static, working with videos requires modeling their dynamic temporal structure and then properly integrating that information into a natural language description. In this context, we propose an approach that successfully… ▽ More Recent progress in using recurrent neural networks (RNNs) for image description has motivated the exploration of their application for video description. However, while images are static, working with videos requires modeling their dynamic temporal structure and then properly integrating that information into a natural language description. In this context, we propose an approach that successfully takes into account both the local and global temporal structure of videos to produce descriptions. First, our approach incorporates a spatial temporal 3-D convolutional neural network (3-D CNN) representation of the short temporal dynamics. The 3-D CNN representation is trained on video action recognition tasks, so as to produce a representation that is tuned to human motion and behavior. Second we propose a temporal attention mechanism that allows to go beyond local temporal modeling and learns to automatically select the most relevant temporal segments given the text-generating RNN. Our approach exceeds the current state-of-art for both BLEU and METEOR metrics on the Youtube2Text dataset. We also present results on a new, larger and more challenging dataset of paired video and natural language descriptions. △ Less

Submitted 30 September, 2015; v1 submitted 27 February, 2015; originally announced February 2015.

Comments: Accepted to ICCV15. This version comes with code release and supplementary material

arXiv:1305.3189 [pdf, other]

A Bag of Words Approach for Semantic Segmentation of Monitored Scenes

Authors: Wassim Bouachir, Atousa Torabi, Guillaume-Alexandre Bilodeau, Pascal Blais

Abstract: This paper proposes a semantic segmentation method for outdoor scenes captured by a surveillance camera. Our algorithm classifies each perceptually homogenous region as one of the predefined classes learned from a collection of manually labelled images. The proposed approach combines two different types of information. First, color segmentation is performed to divide the scene into perceptually si… ▽ More This paper proposes a semantic segmentation method for outdoor scenes captured by a surveillance camera. Our algorithm classifies each perceptually homogenous region as one of the predefined classes learned from a collection of manually labelled images. The proposed approach combines two different types of information. First, color segmentation is performed to divide the scene into perceptually similar regions. Then, the second step is based on SIFT keypoints and uses the bag of words representation of the regions for the classification. The prediction is done using a Naïve Bayesian Network as a generative classifier. Compared to existing techniques, our method provides more compact representations of scene contents and the segmentation result is more consistent with human perception due to the combination of the color information with the image keypoints. The experiments conducted on a publicly available data set demonstrate the validity of the proposed method. △ Less

Submitted 14 May, 2013; originally announced May 2013.

Comments: École Polytechnique de Montréal, iWatchLife Inc

Showing 1–8 of 8 results for author: Torabi, A