Search | arXiv e-print repository

SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages

Authors: Holy Lovenia, Rahmad Mahendra, Salsabil Maulana Akbar, Lester James V. Miranda, Jennifer Santoso, Elyanah Aco, Akhdan Fadhilah, Jonibek Mansurov, Joseph Marvin Imperial, Onno P. Kampman, Joel Ruben Antony Moniz, Muhammad Ravi Shulthan Habibi, Frederikus Hudi, Railey Montalan, Ryan Ignatius, Joanito Agili Lopo, William Nixon, Börje F. Karlsson, James Jaya, Ryandito Diandaru, Yuze Gao, Patrick Amadeus, Bin Wang, Jan Christian Blaise Cruz, Chenxi Whitehouse , et al. (36 additional authors not shown)

Abstract: Southeast Asia (SEA) is a region rich in linguistic diversity and cultural variety, with over 1,300 indigenous languages and a population of 671 million people. However, prevailing AI models suffer from a significant lack of representation of texts, images, and audio datasets from SEA, compromising the quality of AI models for SEA languages. Evaluating models for SEA languages is challenging due t… ▽ More Southeast Asia (SEA) is a region rich in linguistic diversity and cultural variety, with over 1,300 indigenous languages and a population of 671 million people. However, prevailing AI models suffer from a significant lack of representation of texts, images, and audio datasets from SEA, compromising the quality of AI models for SEA languages. Evaluating models for SEA languages is challenging due to the scarcity of high-quality datasets, compounded by the dominance of English training data, raising concerns about potential cultural misrepresentation. To address these challenges, we introduce SEACrowd, a collaborative initiative that consolidates a comprehensive resource hub that fills the resource gap by providing standardized corpora in nearly 1,000 SEA languages across three modalities. Through our SEACrowd benchmarks, we assess the quality of AI models on 36 indigenous languages across 13 tasks, offering valuable insights into the current AI landscape in SEA. Furthermore, we propose strategies to facilitate greater AI advancements, maximizing potential utility and resource equity for the future of AI in SEA. △ Less

Submitted 8 July, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

Comments: https://github.com/SEACrowd

arXiv:2101.10079 [pdf]

Data Resource Profile: Egress Behavior from Select NYC COVID-19 Exposed Health Facilities March-May 2020

Authors: Debra F. Laefer, Thomas Kirchner, Haoran, Jiang, Darlene Cheong, Yunqi, Jiang, Aseah Khan, Weiyi Qiu, Nikki Tai, Tiffany Truong, Maimunah Virk

Abstract: Vector control strategies are central to the mitigation and containment of COVID-19 and have come in the form of municipal ordinances that restrict the operational status of public and private spaces and associated services. Yet, little is known about specific population responses in terms of risk behaviors. To help understand the impact of those vector control variable strategies, a multi-week, m… ▽ More Vector control strategies are central to the mitigation and containment of COVID-19 and have come in the form of municipal ordinances that restrict the operational status of public and private spaces and associated services. Yet, little is known about specific population responses in terms of risk behaviors. To help understand the impact of those vector control variable strategies, a multi-week, multi-site observational study was undertaken outside of 19 New York City medical facilities during the peak of the city's initial COVID-19 wave (03/22/20-05/19/20). The aim was to capture perishable data of the touch, destination choice, and PPE usage behavior of individuals egressing hospitals and urgent care centers. A major goal was to establish an empirical basis for future research on the way people interact with three-dimensional vector environments. Anonymized data were collected via smart phones. Each data record includes the time, data, and location of an individual leaving a healthcare facility, their routing, interactions with the build environment, other individuals, and themselves. Most records also note their PPE usage, destination, intermediary stops, and transportation choices. The records were linked with 61 socio-economic factors by the facility zip code and 7 contemporaneous weather factors and the merged in a unified shapefile in an ARCGIS system. This paper describes the project team and protocols used to produce over 5,100 publicly accessible observational records and an affiliated codebook that can be used to study linkages between individual behaviors and on-the-ground conditions. △ Less

Submitted 18 January, 2021; originally announced January 2021.

Comments: 14 Pages, 4 figures

arXiv:2003.02599 [pdf]

An Incremental Explanation of Inference in Hybrid Bayesian Networks for Increasing Model Trustworthiness and Supporting Clinical Decision Making

Authors: Evangelia Kyrimi, Somayyeh Mossadegh, Nigel Tai, William Marsh

Abstract: Various AI models are increasingly being considered as part of clinical decision-support tools. However, the trustworthiness of such models is rarely considered. Clinicians are more likely to use a model if they can understand and trust its predictions. Key to this is if its underlying reasoning can be explained. A Bayesian network (BN) model has the advantage that it is not a black-box and its re… ▽ More Various AI models are increasingly being considered as part of clinical decision-support tools. However, the trustworthiness of such models is rarely considered. Clinicians are more likely to use a model if they can understand and trust its predictions. Key to this is if its underlying reasoning can be explained. A Bayesian network (BN) model has the advantage that it is not a black-box and its reasoning can be explained. In this paper, we propose an incremental explanation of inference that can be applied to hybrid BNs, i.e. those that contain both discrete and continuous nodes. The key questions that we answer are: (1) which important evidence supports or contradicts the prediction, and (2) through which intermediate variables does the information flow. The explanation is illustrated using a real clinical case study. A small evaluation study is also conducted. △ Less

Submitted 6 March, 2020; v1 submitted 5 March, 2020; originally announced March 2020.

arXiv:1311.6870 [pdf]

Multi-agent based protection system for distribution system with DG

Authors: ** Shang, Nengling Tai, Qi Liu

Abstract: This paper introduces the basic structure of multi-agent based protection system for distribution system with DGs. The entire system consists of intelligent agents and communication system. Intelligent agents can be divided into three layers, the bottom layer, the middle layer and the upper layer. The design of the agent in different layer is analyzed in detail. Communication system is the bridge… ▽ More This paper introduces the basic structure of multi-agent based protection system for distribution system with DGs. The entire system consists of intelligent agents and communication system. Intelligent agents can be divided into three layers, the bottom layer, the middle layer and the upper layer. The design of the agent in different layer is analyzed in detail. Communication system is the bridge of multi-agent system (MAS). The transmission mode, selective communication and other principles are discussed to improve the transmission efficiency. Finally, some evaluations are proposed, which provides the design of MAS with reference. △ Less

Submitted 27 November, 2013; originally announced November 2013.

Comments: 8 pages, 5 figures, 1 table, 21 conference

MSC Class: 93B99 ACM Class: C.2.1

Showing 1–4 of 4 results for author: Tai, N