Search | arXiv e-print repository

Design Space Exploration and Optimization for Carbon-Efficient Extended Reality Systems

Authors: Mariam Elgamal, Doug Carmean, Elnaz Ansari, Okay Zed, Ramesh Peri, Srilatha Manne, Udit Gupta, Gu-Yeon Wei, David Brooks, Gage Hills, Carole-Jean Wu

Abstract: As computing hardware becomes more specialized, designing environmentally sustainable computing systems requires accounting for both hardware and software parameters. Our goal is to design low carbon computing systems while maintaining a competitive level of performance and operational efficiency. Despite previous carbon modeling efforts for computing systems, there is a distinct lack of holistic… ▽ More As computing hardware becomes more specialized, designing environmentally sustainable computing systems requires accounting for both hardware and software parameters. Our goal is to design low carbon computing systems while maintaining a competitive level of performance and operational efficiency. Despite previous carbon modeling efforts for computing systems, there is a distinct lack of holistic design strategies to simultaneously optimize for carbon, performance, power and energy. In this work, we take a data-driven approach to characterize the carbon impact (quantified in units of CO2e) of various artificial intelligence (AI) and extended reality (XR) production-level hardware and application use-cases. We propose a holistic design exploration framework to optimize and design for carbon-efficient computing systems and hardware. Our frameworks identifies significant opportunities for carbon efficiency improvements in application-specific and general purpose hardware design and optimization. Using our framework, we demonstrate 10$\times$ carbon efficiency improvement for specialized AI and XR accelerators (quantified by a key metric, tCDP: the product of total CO2e and total application execution time), up to 21% total life cycle carbon savings for existing general-purpose hardware and applications due to hardware over-provisioning, and up to 7.86$\times$ carbon efficiency improvement using advanced 3D integration techniques for resource-constrained XR systems. △ Less

Submitted 2 May, 2023; originally announced May 2023.

arXiv:2003.06499 [pdf, other]

LSCP: Enhanced Large Scale Colloquial Persian Language Understanding

Authors: Hadi Abdi Khojasteh, Ebrahim Ansari, Mahdi Bohlouli

Abstract: Language recognition has been significantly advanced in recent years by means of modern machine learning methods such as deep learning and benchmarks with rich annotations. However, research is still limited in low-resource formal languages. This consists of a significant gap in describing the colloquial language especially for low-resourced ones such as Persian. In order to target this gap for lo… ▽ More Language recognition has been significantly advanced in recent years by means of modern machine learning methods such as deep learning and benchmarks with rich annotations. However, research is still limited in low-resource formal languages. This consists of a significant gap in describing the colloquial language especially for low-resourced ones such as Persian. In order to target this gap for low resource languages, we propose a "Large Scale Colloquial Persian Dataset" (LSCP). LSCP is hierarchically organized in a semantic taxonomy that focuses on multi-task informal Persian language understanding as a comprehensive problem. This encompasses the recognition of multiple semantic aspects in the human-level sentences, which naturally captures from the real-world sentences. We believe that further investigations and processing, as well as the application of novel algorithms and methods, can strengthen enriching computerized understanding and processing of low resource languages. The proposed corpus consists of 120M sentences resulted from 27M tweets annotated with parsing tree, part-of-speech tags, sentiment polarity and translation in five different languages. △ Less

Submitted 13 March, 2020; originally announced March 2020.

Comments: 6 pages, 2 figures, 3 tables, Accepted at the 12th International Conference on Language Resources and Evaluation (LREC 2020)

MSC Class: 68T50 (Primary) 68T09; 68T07 (Secondary) ACM Class: A.0; E.0; I.2.0; I.2.6; I.2.7; I.7.1; I.7.2

Journal ref: https://www.aclweb.org/anthology/2020.lrec-1.776/

arXiv:2002.10016 [pdf, other]

Deep Multimodal Image-Text Embeddings for Automatic Cross-Media Retrieval

Authors: Hadi Abdi Khojasteh, Ebrahim Ansari, Parvin Razzaghi, Akbar Karimi

Abstract: This paper considers the task of matching images and sentences by learning a visual-textual embedding space for cross-modal retrieval. Finding such a space is a challenging task since the features and representations of text and image are not comparable. In this work, we introduce an end-to-end deep multimodal convolutional-recurrent network for learning both vision and language representations si… ▽ More This paper considers the task of matching images and sentences by learning a visual-textual embedding space for cross-modal retrieval. Finding such a space is a challenging task since the features and representations of text and image are not comparable. In this work, we introduce an end-to-end deep multimodal convolutional-recurrent network for learning both vision and language representations simultaneously to infer image-text similarity. The model learns which pairs are a match (positive) and which ones are a mismatch (negative) using a hinge-based triplet ranking. To learn about the joint representations, we leverage our newly extracted collection of tweets from Twitter. The main characteristic of our dataset is that the images and tweets are not standardized the same as the benchmarks. Furthermore, there can be a higher semantic correlation between the pictures and tweets contrary to benchmarks in which the descriptions are well-organized. Experimental results on MS-COCO benchmark dataset show that our model outperforms certain methods presented previously and has competitive performance compared to the state-of-the-art. The code and dataset have been made available publicly. △ Less

Submitted 23 February, 2020; originally announced February 2020.

Comments: 6 pages and 2 figures, Learn more about this project at https://iasbs.ac.ir/~ansari/deeptwitter

ACM Class: E.0; H.3.3; I.2.0; I.2.6; I.2.7; I.2.10; I.5.0; I.4.0; I.4.10; I.7.0

arXiv:1812.03953 [pdf, other]

doi 10.1007/978-3-030-37309-2_26

An Intelligent Safety System for Human-Centered Semi-Autonomous Vehicles

Authors: Hadi Abdi Khojasteh, Alireza Abbas Alipour, Ebrahim Ansari, Parvin Razzaghi

Abstract: Nowadays, automobile manufacturers make efforts to develop ways to make cars fully safe. Monitoring driver's actions by computer vision techniques to detect driving mistakes in real-time and then planning for autonomous driving to avoid vehicle collisions is one of the most important issues that has been investigated in the machine vision and Intelligent Transportation Systems (ITS). The main goal… ▽ More Nowadays, automobile manufacturers make efforts to develop ways to make cars fully safe. Monitoring driver's actions by computer vision techniques to detect driving mistakes in real-time and then planning for autonomous driving to avoid vehicle collisions is one of the most important issues that has been investigated in the machine vision and Intelligent Transportation Systems (ITS). The main goal of this study is to prevent accidents caused by fatigue, drowsiness, and driver distraction. To avoid these incidents, this paper proposes an integrated safety system that continuously monitors the driver's attention and vehicle surroundings, and finally decides whether the actual steering control status is safe or not. For this purpose, we equipped an ordinary car called FARAZ with a vision system consisting of four mounted cameras along with a universal car tool for communicating with surrounding factory-installed sensors and other car systems, and sending commands to actuators. The proposed system leverages a scene understanding pipeline using deep convolutional encoder-decoder networks and a driver state detection pipeline. We have been identifying and assessing domestic capabilities for the development of technologies specifically of the ordinary vehicles in order to manufacture smart cars and eke providing an intelligent system to increase safety and to assist the driver in various conditions/situations. △ Less

Submitted 20 February, 2019; v1 submitted 10 December, 2018; originally announced December 2018.

Comments: 15 pages and 5 figures, Submitted to the international conference on Contemporary issues in Data Science (CiDaS 2019), Learn more about this project at https://iasbs.ac.ir/~ansari/faraz

Journal ref: Nature Switzerland AG - Springer LNDECT 45(2020) 322-336

arXiv:1812.03939 [pdf, other]

JSSignature: Eliminating Third-Party-Hosted JavaScript Infection Threats Using Digital Signatures

Authors: Kousha Nakhaei, Ebrahim Ansari, Fateme Ansari

Abstract: Today, third-party JavaScript resources are indispensable part of the web platform. More than 88% of world's top websites include at least one JavaScript resource from a remote host. However, there is a great security risk behind using a third-party JavaScript resource, if an attacker can infect one of these remote JavaScript resources all websites those have included the script would be at risk.… ▽ More Today, third-party JavaScript resources are indispensable part of the web platform. More than 88% of world's top websites include at least one JavaScript resource from a remote host. However, there is a great security risk behind using a third-party JavaScript resource, if an attacker can infect one of these remote JavaScript resources all websites those have included the script would be at risk. In this paper, we present JSSignature, an entirely at the client-side pure JavaScript framework in order to validate third-party JavaScript resources using digital signature. Therefore, all included JavaScript resources are checked against the integrity, authentication and non-repudiation risks before the execution. In contrary to existing methods, JSSignature protects web pages regardless of third-party resource infection nature while it does not set any restrictions on trusted JavaScript providers. This approach has an acceptable one-time performance overhead and is an easily deployable add-in. We have validated the proposed solution by applying tests on an implemented version\footnote{The source-code, resources and the working demo are available at JSSignature website. △ Less

Submitted 8 February, 2019; v1 submitted 10 December, 2018; originally announced December 2018.

Comments: 18 pages, 2 figures, Submitted to CiDaS 2019

arXiv:1711.00681 [pdf]

Extracting an English-Persian Parallel Corpus from Comparable Corpora

Authors: Akbar Karimi, Ebrahim Ansari, Bahram Sadeghi Bigham

Abstract: Parallel data are an important part of a reliable Statistical Machine Translation (SMT) system. The more of these data are available, the better the quality of the SMT system. However, for some language pairs such as Persian-English, parallel sources of this kind are scarce. In this paper, a bidirectional method is proposed to extract parallel sentences from English and Persian document aligned Wi… ▽ More Parallel data are an important part of a reliable Statistical Machine Translation (SMT) system. The more of these data are available, the better the quality of the SMT system. However, for some language pairs such as Persian-English, parallel sources of this kind are scarce. In this paper, a bidirectional method is proposed to extract parallel sentences from English and Persian document aligned Wikipedia. Two machine translation systems are employed to translate from Persian to English and the reverse after which an IR system is used to measure the similarity of the translated sentences. Adding the extracted sentences to the training data of the existing SMT systems is shown to improve the quality of the translation. Furthermore, the proposed method slightly outperforms the one-directional approach. The extracted corpus consists of about 200,000 sentences which have been sorted by their degree of similarity calculated by the IR system and is freely available for public access on the Web. △ Less

Submitted 31 March, 2019; v1 submitted 2 November, 2017; originally announced November 2017.

Comments: 6 pages, 3 figures, 3 tables and published and presented at LREC2018

arXiv:1701.08340 [pdf, other]

Extracting Bilingual Persian Italian Lexicon from Comparable Corpora Using Different Types of Seed Dictionaries

Authors: Ebrahim Ansari, M. H. Sadreddini, Lucio Grandinetti, Mahsa Radinmehr, Ziba Khosravan, Mehdi Sheikhalishahi

Abstract: Bilingual dictionaries are very important in various fields of natural language processing. In recent years, research on extracting new bilingual lexicons from non-parallel (comparable) corpora have been proposed. Almost all use a small existing dictionary or other resources to make an initial list called the "seed dictionary". In this paper, we discuss the use of different types of dictionaries a… ▽ More Bilingual dictionaries are very important in various fields of natural language processing. In recent years, research on extracting new bilingual lexicons from non-parallel (comparable) corpora have been proposed. Almost all use a small existing dictionary or other resources to make an initial list called the "seed dictionary". In this paper, we discuss the use of different types of dictionaries as the initial starting list for creating a bilingual Persian-Italian lexicon from a comparable corpus. Our experiments apply state-of-the-art techniques on three different seed dictionaries; an existing dictionary, a dictionary created with pivot-based schema, and a dictionary extracted from a small Persian-Italian parallel text. The interesting challenge of our approach is to find a way to combine different dictionaries together in order to produce a better and more accurate lexicon. In order to combine seed dictionaries, we propose two different combination models and examine the effect of our novel combination models on various comparable corpora that have differing degrees of comparability. We conclude with a proposal for a new weighting system to improve the extracted lexicon. The experimental results produced by our implementation show the efficiency of our proposed models. △ Less

Submitted 20 September, 2019; v1 submitted 28 January, 2017; originally announced January 2017.

Comments: 16 pages, accepted to be published in "Applications of Comparable Corpora", Berlin: Language Science Press

arXiv:1701.08339 [pdf, other]

Using English as Pivot to Extract Persian-Italian Parallel Sentences from Non-Parallel Corpora

Authors: Ebrahim Ansari, M. H. Sadreddini, Mostafa Sheikhalishahi, Richard Wallace, Fatemeh Alimardani

Abstract: The effectiveness of a statistical machine translation system (SMT) is very dependent upon the amount of parallel corpus used in the training phase. For low-resource language pairs there are not enough parallel corpora to build an accurate SMT. In this paper, a novel approach is presented to extract bilingual Persian-Italian parallel sentences from a non-parallel (comparable) corpus. In this study… ▽ More The effectiveness of a statistical machine translation system (SMT) is very dependent upon the amount of parallel corpus used in the training phase. For low-resource language pairs there are not enough parallel corpora to build an accurate SMT. In this paper, a novel approach is presented to extract bilingual Persian-Italian parallel sentences from a non-parallel (comparable) corpus. In this study, English is used as the pivot language to compute the matching scores between source and target sentences and candidate selection phase. Additionally, a new monolingual sentence similarity metric, Normalized Google Distance (NGD) is proposed to improve the matching process. Moreover, some extensions of the baseline system are applied to improve the quality of extracted sentences measured with BLEU. Experimental results show that using the new pivot based extraction can increase the quality of bilingual corpus significantly and consequently improves the performance of the Persian-Italian SMT system. △ Less

Submitted 28 January, 2017; originally announced January 2017.

Comments: 30 pages, Accepted to be published in "Applications of Comparable Corpora", Berlin: Language Science Press

Showing 1–8 of 8 results for author: Ansari, E