Search | arXiv e-print repository

A Big Data Driven Framework for Duplicate Device Detection from Multi-sourced Mobile Device Location Data

Authors: Aliakbar Kabiri, Aref Darzi, Saeed Saleh Namadi, Yixuan Pan, Guangchen Zhao, Qianqian Sun, Mofeng Yang, Mohammad Ashoori

Abstract: Mobile Device Location Data (MDLD) has been popularly utilized in various fields. Yet its large-scale applications are limited because of either biased or insufficient spatial coverage of the data from individual data vendors. One approach to improve the data coverage is to leverage the data from multiple data vendors and integrate them to build a more representative dataset. For data integration,… ▽ More Mobile Device Location Data (MDLD) has been popularly utilized in various fields. Yet its large-scale applications are limited because of either biased or insufficient spatial coverage of the data from individual data vendors. One approach to improve the data coverage is to leverage the data from multiple data vendors and integrate them to build a more representative dataset. For data integration, further treatments on the multi-sourced dataset are required due to several reasons. First, the possibility of carrying more than one device could result in duplicated observations from the same data subject. Additionally, when utilizing multiple data sources, the same device might be captured by more than one data provider. Our paper proposes a data integration methodology for multi-sourced data to investigate the feasibility of integrating data from several sources without introducing additional biases to the data. By leveraging the uniqueness of travel pattern of each device, duplicate devices are identified. The proposed methodology is shown to be cost-effective while it achieves the desired accuracy level. Our findings suggest that devices sharing the same imputed home location and the top five most-visited locations during a month can represent the same user in the MDLD. It is shown that more than 99.6% of the sample devices having the aforementioned attribute in common are observed at the same location simultaneously. Finally, the proposed algorithm has been successfully applied to the national-level MDLD of 2020 to produce the national passenger origin-destination data for the NextGeneration National Household Travel Survey (NextGen NHTS) program. △ Less

Submitted 28 February, 2023; originally announced February 2023.

arXiv:2301.08660 [pdf]

A Big-Data Driven Framework to Estimating Vehicle Volume based on Mobile Device Location Data

Authors: Mofeng Yang, Weiyu Luo, Mohammad Ashoori, **a Mahmoudi, Chenfeng Xiong, Jiawei Lu, Guangchen Zhao, Saeed Saleh Namadi, Songhua Hu, Aliakbar Kabiri

Abstract: Vehicle volume serves as a critical metric and the fundamental basis for traffic signal control, transportation project prioritization, road maintenance plans and more. Traditional methods of quantifying vehicle volume rely on manual counting, video cameras, and loop detectors at a limited number of locations. These efforts require significant labor and cost for expansions. Researchers and private… ▽ More Vehicle volume serves as a critical metric and the fundamental basis for traffic signal control, transportation project prioritization, road maintenance plans and more. Traditional methods of quantifying vehicle volume rely on manual counting, video cameras, and loop detectors at a limited number of locations. These efforts require significant labor and cost for expansions. Researchers and private sector companies have also explored alternative solutions such as probe vehicle data, while still suffering from a low penetration rate. In recent years, along with the technological advancement in mobile sensors and mobile networks, Mobile Device Location Data (MDLD) have been growing dramatically in terms of the spatiotemporal coverage of the population and its mobility. This paper presents a big-data driven framework that can ingest terabytes of MDLD and estimate vehicle volume at a larger geographical area with a larger sample size. The proposed framework first employs a series of cloud-based computational algorithms to extract multimodal trajectories and trip rosters. A scalable map matching and routing algorithm is then applied to snap and route vehicle trajectories to the roadway network. The observed vehicle counts on each roadway segment are weighted and calibrated against ground truth control totals, i.e., Annual Vehicle-Miles of Travel (AVMT), and Annual Average Daily Traffic (AADT). The proposed framework is implemented on the all-street network in the state of Maryland using MDLD for the entire year of 2019. Results indicate that our proposed framework produces reliable vehicle volume estimates and also demonstrate its transferability and the generalization ability. △ Less

Submitted 24 January, 2023; v1 submitted 20 January, 2023; originally announced January 2023.

arXiv:2212.08206 [pdf, other]

Meeting Summarization: A Survey of the State of the Art

Authors: Lakshmi Prasanna Kumar, Arman Kabiri

Abstract: Information overloading requires the need for summarizers to extract salient information from the text. Currently, there is an overload of dialogue data due to the rise of virtual communication platforms. The rise of Covid-19 has led people to rely on online communication platforms like Zoom, Slack, Microsoft Teams, Discord, etc. to conduct their company meetings. Instead of going through the enti… ▽ More Information overloading requires the need for summarizers to extract salient information from the text. Currently, there is an overload of dialogue data due to the rise of virtual communication platforms. The rise of Covid-19 has led people to rely on online communication platforms like Zoom, Slack, Microsoft Teams, Discord, etc. to conduct their company meetings. Instead of going through the entire meeting transcripts, people can use meeting summarizers to select useful data. Nevertheless, there is a lack of comprehensive surveys in the field of meeting summarizers. In this survey, we aim to cover recent meeting summarization techniques. Our survey offers a general overview of text summarization along with datasets and evaluation metrics for meeting summarization. We also provide the performance of each summarizer on a leaderboard. We conclude our survey with different challenges in this domain and potential research opportunities for future researchers. △ Less

Submitted 15 December, 2022; originally announced December 2022.

arXiv:2112.02607 [pdf, other]

Differentiating Approach and Avoidance from Traditional Notions of Sentiment in Economic Contexts

Authors: Jacob Turton, Ali Kabiri, David Tuckett, Robert Elliott Smith, David P. Vinson

Abstract: There is growing interest in the role of sentiment in economic decision-making. However, most research on the subject has focused on positive and negative valence. Conviction Narrative Theory (CNT) places Approach and Avoidance sentiment (that which drives action) at the heart of real-world decision-making, and argues that it better captures emotion in financial markets. This research, bringing to… ▽ More There is growing interest in the role of sentiment in economic decision-making. However, most research on the subject has focused on positive and negative valence. Conviction Narrative Theory (CNT) places Approach and Avoidance sentiment (that which drives action) at the heart of real-world decision-making, and argues that it better captures emotion in financial markets. This research, bringing together psychology and machine learning, introduces new techniques to differentiate Approach and Avoidance from positive and negative sentiment on a fundamental level of meaning. It does this by comparing word-lists, previously constructed to capture these concepts in text data, across a large range of semantic features. The results demonstrate that Avoidance in particular is well defined as a separate type of emotion, which is evaluative/cognitive and action-orientated in nature. Refining the Avoidance word-list according to these features improves macroeconomic models, suggesting that they capture the essence of Avoidance and that it plays a crucial role in driving real-world economic decision-making. △ Less

Submitted 5 December, 2021; originally announced December 2021.

arXiv:2012.06154 [pdf, other]

ParsiNLU: A Suite of Language Understanding Challenges for Persian

Authors: Daniel Khashabi, Arman Cohan, Siamak Shakeri, Pedram Hosseini, Pouya Pezeshkpour, Malihe Alikhani, Moin Aminnaseri, Marzieh Bitaab, Faeze Brahman, Sarik Ghazarian, Mozhdeh Gheini, Arman Kabiri, Rabeeh Karimi Mahabadi, Omid Memarrast, Ahmadreza Mosallanezhad, Erfan Noury, Shahab Raji, Mohammad Sadegh Rasooli, Sepideh Sadeghi, Erfan Sadeqi Azer, Niloofar Safi Samghabadi, Mahsa Shafaei, Saber Sheybani, Ali Tazarv, Yadollah Yaghoobzadeh

Abstract: Despite the progress made in recent years in addressing natural language understanding (NLU) challenges, the majority of this progress remains to be concentrated on resource-rich languages like English. This work focuses on Persian language, one of the widely spoken languages in the world, and yet there are few NLU datasets available for this rich language. The availability of high-quality evaluat… ▽ More Despite the progress made in recent years in addressing natural language understanding (NLU) challenges, the majority of this progress remains to be concentrated on resource-rich languages like English. This work focuses on Persian language, one of the widely spoken languages in the world, and yet there are few NLU datasets available for this rich language. The availability of high-quality evaluation datasets is a necessity for reliable assessment of the progress on different NLU tasks and domains. We introduce ParsiNLU, the first benchmark in Persian language that includes a range of high-level tasks -- Reading Comprehension, Textual Entailment, etc. These datasets are collected in a multitude of ways, often involving manual annotations by native speakers. This results in over 14.5$k$ new instances across 6 distinct NLU tasks. Besides, we present the first results on state-of-the-art monolingual and multi-lingual pre-trained language-models on this benchmark and compare them with human performance, which provides valuable insights into our ability to tackle natural language understanding challenges in Persian. We hope ParsiNLU fosters further research and advances in Persian language understanding. △ Less

Submitted 13 July, 2021; v1 submitted 11 December, 2020; originally announced December 2020.

Comments: To appear on Transactions of the Association for Computational Linguistics (TACL), 2021

arXiv:2007.10436 [pdf]

How different age groups responded to the COVID-19 pandemic in terms of mobility behaviors: a case study of the United States

Authors: Aliakbar Kabiri, Aref Darzi, Weiyi Zhou, Qianqian Sun, Lei Zhang

Abstract: The rapid spread of COVID-19 has affected thousands of people from different socio-demographic groups all over the country. A decisive step in preventing or slowing the outbreak is the use of mobility interventions, such as government stay-at-home orders. However, different socio-demographic groups might have different responses to these orders and regulations. In this paper, we attempt to fill th… ▽ More The rapid spread of COVID-19 has affected thousands of people from different socio-demographic groups all over the country. A decisive step in preventing or slowing the outbreak is the use of mobility interventions, such as government stay-at-home orders. However, different socio-demographic groups might have different responses to these orders and regulations. In this paper, we attempt to fill the current gap in the literature by examining how different communities with different age groups performed social distancing by following orders such as the national emergency declaration on March 13, as well as how fast they started changing their behavior after the regulations were imposed. For this purpose, we calculated the behavior changes of people in different mobility metrics, such as percentage of people staying home during the study period (March, April, and May 2020), in different age groups in comparison to the days before the pandemic (January and February 2020), by utilizing anonymized and privacy-protected mobile device data. Our study indicates that senior communities outperformed younger communities in terms of their behavior change. Senior communities not only had a faster response to the outbreak in comparison to young communities, they also had better performance consistency during the pandemic. △ Less

Submitted 21 July, 2020; v1 submitted 20 July, 2020; originally announced July 2020.

Comments: 13 pages, 3 figures, 2 tables

arXiv:2007.02160 [pdf]

doi 10.1111/rsp3.12598

COVID-19 and income profile: How communities in the United States responded to mobility restrictions in the pandemic's early stages

Authors: Qianqian Sun, Weiyi Zhou, Aliakbar Kabiri, Aref Darzi, Songhua Hu, Hannah Younes, Lei Zhang

Abstract: Mobility interventions in communities play a critical role in containing a pandemic at an early stage. The real-world practice of social distancing can enlighten policymakers and help them implement more efficient and effective control measures. A lack of such research using real-world observations initiates this article. We analyzed the social distancing performance of 66,149 census tracts from 3… ▽ More Mobility interventions in communities play a critical role in containing a pandemic at an early stage. The real-world practice of social distancing can enlighten policymakers and help them implement more efficient and effective control measures. A lack of such research using real-world observations initiates this article. We analyzed the social distancing performance of 66,149 census tracts from 3,142 counties in the United States with a specific focus on income profile. Six daily mobility metrics, including a social distancing index, stay-at-home percentage, miles traveled per person, trip rate, work trip rate, and non-work trip rate, were produced for each census tract using the location data from over 100 million anonymous devices on a monthly basis. Each mobility metric was further tabulated by three perspectives of social distancing performance: "best performance", "effort", and "consistency". We found that for all 18 indicators, high-income communities demonstrated better social distancing performance. Such disparities between communities of different income levels are presented in detail in this article. The comparisons across scenarios also raise other concerns for low-income communities, such as employment status, working conditions, and accessibility to basic needs. This article lays out a series of facts extracted from real-world data and offers compelling perspectives for future discussions. △ Less

Submitted 14 April, 2024; v1 submitted 4 July, 2020; originally announced July 2020.

Journal ref: Regional Science Policy & Practice. 15(2023)541-559

arXiv:2006.07398 [pdf, ps, other]

Evaluating a Multi-sense Definition Generation Model for Multiple Languages

Authors: Arman Kabiri, Paul Cook

Abstract: Most prior work on definition modeling has not accounted for polysemy, or has done so by considering definition modeling for a target word in a given context. In contrast, in this study, we propose a context-agnostic approach to definition modeling, based on multi-sense word embeddings, that is capable of generating multiple definitions for a target word. In further, contrast to most prior work, w… ▽ More Most prior work on definition modeling has not accounted for polysemy, or has done so by considering definition modeling for a target word in a given context. In contrast, in this study, we propose a context-agnostic approach to definition modeling, based on multi-sense word embeddings, that is capable of generating multiple definitions for a target word. In further, contrast to most prior work, which has primarily focused on English, we evaluate our proposed approach on fifteen different datasets covering nine languages from several language families. To evaluate our approach we consider several variations of BLEU. Our results demonstrate that our proposed multi-sense model outperforms a single-sense model on all fifteen datasets. △ Less

Submitted 12 June, 2020; originally announced June 2020.

Comments: To be presented orally in 23rd International Conference on Text, Speech and Dialogue (TSD 2020)

arXiv:2005.01224 [pdf]

Quantifying human mobility behavior changes in response to non-pharmaceutical interventions during the COVID-19 outbreak in the United States

Authors: Yixuan Pan, Aref Darzi, Aliakbar Kabiri, Guangchen Zhao, Weiyu Luo, Chenfeng Xiong, Lei Zhang

Abstract: Ever since the first case of the novel coronavirus disease (COVID-19) was confirmed in Wuhan, China, social distancing has been promoted worldwide, including the United States. It is one of the major community mitigation strategies, also known as non-pharmaceutical interventions. However, our understanding is remaining limited in how people practice social distancing. In this study, we construct a… ▽ More Ever since the first case of the novel coronavirus disease (COVID-19) was confirmed in Wuhan, China, social distancing has been promoted worldwide, including the United States. It is one of the major community mitigation strategies, also known as non-pharmaceutical interventions. However, our understanding is remaining limited in how people practice social distancing. In this study, we construct a Social Distancing Index (SDI) to evaluate people's mobility pattern changes along with the spread of COVID-19. We utilize an integrated dataset of mobile device location data for the contiguous United States plus Alaska and Hawaii over a 100-day period from January 1, 2020 to April 9, 2020. The major findings are: 1) the declaration of the national emergency concerning the COVID-19 outbreak greatly encouraged social distancing and the mandatory stay-at-home orders in most states further strengthened the practice; 2) the states with more confirmed cases have taken more active and timely responses in practicing social distancing; 3) people in the states with fewer confirmed cases did not pay much attention to maintaining social distancing and some states, e.g., Wyoming, North Dakota, and Montana, already began to practice less social distancing despite the high increasing speed of confirmed cases; 4) some counties with the highest infection rates are not performing much social distancing, e.g., Randolph County and Dougherty County in Georgia, and some counties began to practice less social distancing right after the increasing speed of confirmed cases went down, e.g., in Blaine County, Idaho, which may be dangerous as well. △ Less

Submitted 3 May, 2020; originally announced May 2020.

Showing 1–9 of 9 results for author: Kabiri, A