-
Search Engines Post-ChatGPT: How Generative Artificial Intelligence Could Make Search Less Reliable
Authors:
Shahan Ali Memon,
Jevin D. West
Abstract:
In this commentary, we discuss the evolving nature of search engines, as they begin to generate, index, and distribute content created by generative artificial intelligence (GenAI). Our discussion highlights challenges in the early stages of GenAI integration, particularly around factual inconsistencies and biases. We discuss how output from GenAI carries an unwarranted sense of credibility, while…
▽ More
In this commentary, we discuss the evolving nature of search engines, as they begin to generate, index, and distribute content created by generative artificial intelligence (GenAI). Our discussion highlights challenges in the early stages of GenAI integration, particularly around factual inconsistencies and biases. We discuss how output from GenAI carries an unwarranted sense of credibility, while decreasing transparency and sourcing ability. Furthermore, search engines are already answering queries with error-laden, generated content, further blurring the provenance of information and impacting the integrity of the information ecosystem. We argue how all these factors could reduce the reliability of search engines. Finally, we summarize some of the active research directions and open questions.
△ Less
Submitted 18 February, 2024;
originally announced February 2024.
-
Machine Learning for Healthcare-IoT Security: A Review and Risk Mitigation
Authors:
Mirza Akhi Khatun,
Sanober Farheen Memon,
Ciarán Eising,
Lubna Luxmi Dhirani
Abstract:
The Healthcare Internet-of-Things (H-IoT), commonly known as Digital Healthcare, is a data-driven infrastructure that highly relies on smart sensing devices (i.e., blood pressure monitors, temperature sensors, etc.) for faster response time, treatments, and diagnosis. However, with the evolving cyber threat landscape, IoT devices have become more vulnerable to the broader risk surface (e.g., risks…
▽ More
The Healthcare Internet-of-Things (H-IoT), commonly known as Digital Healthcare, is a data-driven infrastructure that highly relies on smart sensing devices (i.e., blood pressure monitors, temperature sensors, etc.) for faster response time, treatments, and diagnosis. However, with the evolving cyber threat landscape, IoT devices have become more vulnerable to the broader risk surface (e.g., risks associated with generative AI, 5G-IoT, etc.), which, if exploited, may lead to data breaches, unauthorized access, and lack of command and control and potential harm. This paper reviews the fundamentals of healthcare IoT, its privacy, and data security challenges associated with machine learning and H-IoT devices. The paper further emphasizes the importance of monitoring healthcare IoT layers such as perception, network, cloud, and application. Detecting and responding to anomalies involves various cyber-attacks and protocols such as Wi-Fi 6, Narrowband Internet of Things (NB-IoT), Bluetooth, ZigBee, LoRa, and 5G New Radio (5G NR). A robust authentication mechanism based on machine learning and deep learning techniques is required to protect and mitigate H-IoT devices from increasing cybersecurity vulnerabilities. Hence, in this review paper, security and privacy challenges and risk mitigation strategies for building resilience in H-IoT are explored and reported.
△ Less
Submitted 17 January, 2024;
originally announced January 2024.
-
Characterizing the effect of retractions on scientific careers
Authors:
Shahan Ali Memon,
Kinga Makovi,
Bedoor AlShebli
Abstract:
Retracting academic papers is a fundamental tool of quality control when the validity of papers or the integrity of authors is questioned post-publication. While retractions do not eliminate papers from the record, they have far-reaching consequences for retracted authors and their careers, serving as a visible and permanent signal of potential transgressions. Previous studies have highlighted the…
▽ More
Retracting academic papers is a fundamental tool of quality control when the validity of papers or the integrity of authors is questioned post-publication. While retractions do not eliminate papers from the record, they have far-reaching consequences for retracted authors and their careers, serving as a visible and permanent signal of potential transgressions. Previous studies have highlighted the adverse effects of retractions on citation counts and coauthors' citations; however, the broader impacts beyond these have not been fully explored. We address this gap leveraging Retraction Watch, the most extensive data set on retractions and link it to Microsoft Academic Graph, a comprehensive data set of scientific publications and their citation networks, and Altmetric that monitors online attention to scientific output. Our investigation focuses on: 1) the likelihood of authors exiting scientific publishing following a retraction, and 2) the evolution of collaboration networks among authors who continue publishing after a retraction. Our empirical analysis reveals that retracted authors, particularly those with less experience, tend to leave scientific publishing in the aftermath of retraction, particularly if their retractions attract widespread attention. We also uncover that retracted authors who remain active in publishing maintain and establish more collaborations compared to their similar non-retracted counterparts. Nevertheless, retracted authors with less than a decade of publishing experience retain less senior, less productive and less impactful coauthors, and gain less senior coauthors post-retraction. Taken together, notwithstanding the indispensable role of retractions in upholding the integrity of the academic community, our findings shed light on the disproportionate impact that retractions impose on early-career authors.
△ Less
Submitted 18 July, 2023; v1 submitted 11 June, 2023;
originally announced June 2023.
-
China and the U.S. produce more impactful AI research when collaborating together
Authors:
Bedoor AlShebli,
Shahan Ali Memon,
James A. Evans,
Talal Rahwan
Abstract:
Artificial Intelligence (AI) has become a disruptive technology, promising to grant a significant economic and strategic advantage to the nations that harness its power. China, with its recent push towards AI adoption, is challenging the U.S.'s position as the global leader in this field. Given AI's massive potential, as well as the fierce geopolitical tensions between the two nations, a number of…
▽ More
Artificial Intelligence (AI) has become a disruptive technology, promising to grant a significant economic and strategic advantage to the nations that harness its power. China, with its recent push towards AI adoption, is challenging the U.S.'s position as the global leader in this field. Given AI's massive potential, as well as the fierce geopolitical tensions between the two nations, a number of policies have been put in place that discourage AI scientists from migrating to, or collaborating with, the other country. However, the extents of such brain drain and cross-border collaboration are not fully understood. Here, we analyze a dataset of over 350,000 AI scientists and 5,000,000 AI papers. We find that, since the year 2000, China and the U.S. have been leading the field in terms of impact, novelty, productivity, and workforce. Most AI scientists who migrate to China come from the U.S., and most who migrate to the U.S. come from China, highlighting a notable brain drain in both directions. Upon migrating from one country to the other, scientists continue to collaborate frequently with the origin country. Although the number of collaborations between the two countries has been increasing since the dawn of the millennium, such collaborations continue to be relatively rare. A matching experiment reveals that the two countries have always been more impactful when collaborating than when each of them works without the other. These findings suggest that instead of suppressing cross-border migration and collaboration between the two nations, the field could benefit from promoting such activities.
△ Less
Submitted 21 April, 2023;
originally announced April 2023.
-
Acoustic Correlates of the Voice Qualifiers: A Survey
Authors:
Shahan Ali Memon
Abstract:
Our voices are as distinctive as our faces and fingerprints. There is a spectrum of non-disjoint traits that make our voices unique and identifiable, such as the fundamental frequency, the intensity, and most interestingly the quality of the speech. Voice quality refers to the characteristic features of an individual's voice. Previous research has from time-to-time proven the ubiquity of voice qua…
▽ More
Our voices are as distinctive as our faces and fingerprints. There is a spectrum of non-disjoint traits that make our voices unique and identifiable, such as the fundamental frequency, the intensity, and most interestingly the quality of the speech. Voice quality refers to the characteristic features of an individual's voice. Previous research has from time-to-time proven the ubiquity of voice quality in making different paralinguistic inferences. These inferences range from identifying personality traits, to health conditions and beyond. In this manuscript, we first map the paralinguistic voice qualifiers to their acoustic correlates in the light of the previous research and literature. We also determine the openSMILE correlates one could possibly use to measure those correlates. In the second part, we give a set of example paralinguistic inferences that can be made using different acoustic and perceptual voice quality features.
△ Less
Submitted 29 October, 2020;
originally announced October 2020.
-
Characterizing COVID-19 Misinformation Communities Using a Novel Twitter Dataset
Authors:
Shahan Ali Memon,
Kathleen M. Carley
Abstract:
From conspiracy theories to fake cures and fake treatments, COVID-19 has become a hot-bed for the spread of misinformation online. It is more important than ever to identify methods to debunk and correct false information online. In this paper, we present a methodology and analyses to characterize the two competing COVID-19 misinformation communities online: (i) misinformed users or users who are…
▽ More
From conspiracy theories to fake cures and fake treatments, COVID-19 has become a hot-bed for the spread of misinformation online. It is more important than ever to identify methods to debunk and correct false information online. In this paper, we present a methodology and analyses to characterize the two competing COVID-19 misinformation communities online: (i) misinformed users or users who are actively posting misinformation, and (ii) informed users or users who are actively spreading true information, or calling out misinformation. The goals of this study are two-fold: (i) collecting a diverse set of annotated COVID-19 Twitter dataset that can be used by the research community to conduct meaningful analysis; and (ii) characterizing the two target communities in terms of their network structure, linguistic patterns, and their membership in other communities. Our analyses show that COVID-19 misinformed communities are denser, and more organized than informed communities, with a possibility of a high volume of the misinformation being part of disinformation campaigns. Our analyses also suggest that a large majority of misinformed users may be anti-vaxxers. Finally, our sociolinguistic analyses suggest that COVID-19 informed users tend to use more narratives than misinformed users.
△ Less
Submitted 19 September, 2020; v1 submitted 3 August, 2020;
originally announced August 2020.
-
Characterizing Sociolinguistic Variation in the Competing Vaccination Communities
Authors:
Shahan Ali Memon,
Aman Tyagi,
David R. Mortensen,
Kathleen M. Carley
Abstract:
Public health practitioners and policy makers grapple with the challenge of devising effective message-based interventions for debunking public health misinformation in cyber communities. "Framing" and "personalization" of the message is one of the key features for devising a persuasive messaging strategy. For an effective health communication, it is imperative to focus on "preference-based framin…
▽ More
Public health practitioners and policy makers grapple with the challenge of devising effective message-based interventions for debunking public health misinformation in cyber communities. "Framing" and "personalization" of the message is one of the key features for devising a persuasive messaging strategy. For an effective health communication, it is imperative to focus on "preference-based framing" where the preferences of the target sub-community are taken into consideration. To achieve that, it is important to understand and hence characterize the target sub-communities in terms of their social interactions. In the context of health-related misinformation, vaccination remains to be the most prevalent topic of discord. Hence, in this paper, we conduct a sociolinguistic analysis of the two competing vaccination communities on Twitter: "pro-vaxxers" or individuals who believe in the effectiveness of vaccinations, and "anti-vaxxers" or individuals who are opposed to vaccinations. Our data analysis show significant linguistic variation between the two communities in terms of their usage of linguistic intensifiers, pronouns, and uncertainty words. Our network-level analysis show significant differences between the two communities in terms of their network density, echo-chamberness, and the EI index. We hypothesize that these sociolinguistic differences can be used as proxies to characterize and understand these communities to devise better message interventions.
△ Less
Submitted 4 October, 2020; v1 submitted 7 June, 2020;
originally announced June 2020.
-
The phonetic bases of vocal expressed emotion: natural versus acted
Authors:
Hira Dhamyal,
Shahan Ali Memon,
Bhiksha Raj,
Rita Singh
Abstract:
Can vocal emotions be emulated? This question has been a recurrent concern of the speech community, and has also been vigorously investigated. It has been fueled further by its link to the issue of validity of acted emotion databases. Much of the speech and vocal emotion research has relied on acted emotion databases as valid proxies for studying natural emotions. To create models that generalize…
▽ More
Can vocal emotions be emulated? This question has been a recurrent concern of the speech community, and has also been vigorously investigated. It has been fueled further by its link to the issue of validity of acted emotion databases. Much of the speech and vocal emotion research has relied on acted emotion databases as valid proxies for studying natural emotions. To create models that generalize to natural settings, it is crucial to work with valid prototypes -- ones that can be assumed to reliably represent natural emotions. More concretely, it is important to study emulated emotions against natural emotions in terms of their physiological, and psychological concomitants. In this paper, we present an on-scale systematic study of the differences between natural and acted vocal emotions. We use a self-attention based emotion classification model to understand the phonetic bases of emotions by discovering the most 'attended' phonemes for each class of emotions. We then compare these attended-phonemes in their importance and distribution across acted and natural classes. Our tests show significant differences in the manner and choice of phonemes in acted and natural speech, concluding moderate to low validity and value in using acted speech databases for emotion classification tasks.
△ Less
Submitted 24 July, 2020; v1 submitted 12 November, 2019;
originally announced November 2019.
-
Detecting gender differences in perception of emotion in crowdsourced data
Authors:
Shahan Ali Memon,
Hira Dhamyal,
Oren Wright,
Daniel Justice,
Vijaykumar Palat,
William Boler,
Bhiksha Raj,
Rita Singh
Abstract:
Do men and women perceive emotions differently? Popular convictions place women as more emotionally perceptive than men. Empirical findings, however, remain inconclusive. Most prior studies focus on visual modalities. In addition, almost all of the studies are limited to experiments within controlled environments. Generalizability and scalability of these studies has not been sufficiently establis…
▽ More
Do men and women perceive emotions differently? Popular convictions place women as more emotionally perceptive than men. Empirical findings, however, remain inconclusive. Most prior studies focus on visual modalities. In addition, almost all of the studies are limited to experiments within controlled environments. Generalizability and scalability of these studies has not been sufficiently established. In this paper, we study the differences in perception of emotion between genders from speech data in the wild, annotated through crowdsourcing. While we limit ourselves to a single modality (i.e. speech), our framework is applicable to studies of emotion perception from all such loosely annotated data in general. Our paper addresses multiple serious challenges related to making statistically viable conclusions from crowdsourced data. Overall, the contributions of this paper are two fold: a reliable novel framework for perceptual studies from crowdsourced data; and the demonstration of statistically significant differences in speech-based emotion perception between genders.
△ Less
Submitted 4 November, 2019; v1 submitted 24 October, 2019;
originally announced October 2019.
-
A Fog Computing Framework for Autonomous Driving Assist: Architecture, Experiments, and Challenges
Authors:
Muthucumaru Maheswaran,
Tianzi Yang,
Salman Memon
Abstract:
Autonomous driving is expected to provide a range of far-reaching economic, environmental and safety benefits. In this study, we propose a fog computing based framework to assist autonomous driving. Our framework relies on overhead views from cameras and data streams from vehicle sensors to create a network of distributed digital twins, called an edge twin, on fog machines. The edge twin will be c…
▽ More
Autonomous driving is expected to provide a range of far-reaching economic, environmental and safety benefits. In this study, we propose a fog computing based framework to assist autonomous driving. Our framework relies on overhead views from cameras and data streams from vehicle sensors to create a network of distributed digital twins, called an edge twin, on fog machines. The edge twin will be continuously updated with the locations of both autonomous and human-piloted vehicles on the road segments. The vehicle locations will be harvested from overhead cameras as well as location feeds from the vehicles themselves. Although the edge twin can make fair road space allocations from a global viewpoint, there is a communication cost (delay) in reaching it from the cameras and vehicular sensors. To address this, we introduce a machine learning forecaster as a part of the edge twin which is responsible for predicting the future location of vehicles. Lastly, we introduce a box algorithm that will use the forecasted values to create a hazard map for the road segment which would be used by the framework to suggest safe manoeuvres for the autonomous vehicles such as lane changes and accelerations. We present the complete fog computing framework for autonomous driving assist and evaluate key portions of the proposed framework using simulations based on a real-world dataset of vehicle position traces on a highway
△ Less
Submitted 16 July, 2019;
originally announced July 2019.
-
A Language for Programming Edge Clouds for Next Generation IoT Applications
Authors:
Muthucumaru Maheswaran,
Robert Wenger,
Richard Olaniyan,
Salman Memon,
Olamilekan Fadahunsi,
Richboy Echomgbe
Abstract:
For effective use of edge computing in an IoT application, we need to partition the application into tasks and map them into the cloud, fog (edge server), device levels such that the resources at the different levels are optimally used to meet the overall quality of service requirements. In this paper, we consider four concerns about application-to-fog map**: task placement at different levels,…
▽ More
For effective use of edge computing in an IoT application, we need to partition the application into tasks and map them into the cloud, fog (edge server), device levels such that the resources at the different levels are optimally used to meet the overall quality of service requirements. In this paper, we consider four concerns about application-to-fog map**: task placement at different levels, data filtering to limit network loading, fog fail-over, and data consistency, and reacting to hotspots at the edge. We describe a programming language and middleware we created for edge computing that addresses the above four concerns. The language has a distributed-node programming model that allows programs to be written for a collection of nodes organized into a cloud, fog, device hierarchy. The paper describes the major design elements of the language and explains the prototype implementation. The unique distributed-node programming model embodied in the language enables new edge-oriented programming patterns that are highly suitable for cognitive or data-intensive edge computing workloads. The paper presents result from an initial evaluation of the language prototype and also a distributed shell and a smart parking app that were developed using the programming language.
△ Less
Submitted 20 June, 2019;
originally announced June 2019.
-
Hierarchical Routing Mixture of Experts
Authors:
Wenbo Zhao,
Yang Gao,
Shahan Ali Memon,
Bhiksha Raj,
Rita Singh
Abstract:
In regression tasks the distribution of the data is often too complex to be fitted by a single model. In contrast, partition-based models are developed where data is divided and fitted by local models. These models partition the input space and do not leverage the input-output dependency of multimodal-distributed data, and strong local models are needed to make good predictions. Addressing these p…
▽ More
In regression tasks the distribution of the data is often too complex to be fitted by a single model. In contrast, partition-based models are developed where data is divided and fitted by local models. These models partition the input space and do not leverage the input-output dependency of multimodal-distributed data, and strong local models are needed to make good predictions. Addressing these problems, we propose a binary tree-structured hierarchical routing mixture of experts (HRME) model that has classifiers as non-leaf node experts and simple regression models as leaf node experts. The classifier nodes jointly soft-partition the input-output space based on the natural separateness of multimodal data. This enables simple leaf experts to be effective for prediction. Further, we develop a probabilistic framework for the HRME model, and propose a recursive Expectation-Maximization (EM) based algorithm to learn both the tree structure and the expert models. Experiments on a collection of regression tasks validate the effectiveness of our method compared to a variety of other regression models.
△ Less
Submitted 18 March, 2019;
originally announced March 2019.
-
Using Machine Learning for Handover Optimization in Vehicular Fog Computing
Authors:
Salman Memon,
Muthucumaru Maheswaran
Abstract:
Smart mobility management would be an important prerequisite for future fog computing systems. In this research, we propose a learning-based handover optimization for the Internet of Vehicles that would assist the smooth transition of device connections and offloaded tasks between fog nodes. To accomplish this, we make use of machine learning algorithms to learn from vehicle interactions with fog…
▽ More
Smart mobility management would be an important prerequisite for future fog computing systems. In this research, we propose a learning-based handover optimization for the Internet of Vehicles that would assist the smooth transition of device connections and offloaded tasks between fog nodes. To accomplish this, we make use of machine learning algorithms to learn from vehicle interactions with fog nodes. Our approach uses a three-layer feed-forward neural network to predict the correct fog node at a given location and time with 99.2 % accuracy on a test set. We also implement a dual stacked recurrent neural network (RNN) with long short-term memory (LSTM) cells capable of learning the latency, or cost, associated with these service requests. We create a simulation in JAMScript using a dataset of real-world vehicle movements to create a dataset to train these networks. We further propose the use of this predictive system in a smarter request routing mechanism to minimize the service interruption during handovers between fog nodes and to anticipate areas of low coverage through a series of experiments and test the models' performance on a test set.
△ Less
Submitted 18 December, 2018;
originally announced December 2018.
-
Neural Regression Trees
Authors:
Shahan Ali Memon,
Wenbo Zhao,
Bhiksha Raj,
Rita Singh
Abstract:
Regression-via-Classification (RvC) is the process of converting a regression problem to a classification one. Current approaches for RvC use ad-hoc discretization strategies and are suboptimal. We propose a neural regression tree model for RvC. In this model, we employ a joint optimization framework where we learn optimal discretization thresholds while simultaneously optimizing the features for…
▽ More
Regression-via-Classification (RvC) is the process of converting a regression problem to a classification one. Current approaches for RvC use ad-hoc discretization strategies and are suboptimal. We propose a neural regression tree model for RvC. In this model, we employ a joint optimization framework where we learn optimal discretization thresholds while simultaneously optimizing the features for each node in the tree. We empirically show the validity of our model by testing it on two challenging regression tasks where we establish the state of the art.
△ Less
Submitted 3 April, 2019; v1 submitted 1 October, 2018;
originally announced October 2018.
-
AARC: First draft of the Blueprint Architecture for Authentication and Authorisation Infrastructures
Authors:
A. Biancini,
L. Florio,
M. Haase,
M. Hardt,
M. Jankowski,
J. Jensen,
C. Kanellopoulos,
N. Liampotis,
S. Licehammer,
S. Memon,
N. van Dijk,
S. Paetow,
M. Prochazka,
M. Sallé,
P. Solagna,
U. Stevanovic,
D. Vaghetti
Abstract:
AARC (Authentication and Authorisation for Research Communities) is a two-year EC-funded project to develop and pilot an integrated cross-discipline authentication and authorisation framework, building on existing authentication and authorisation infrastructures (AAIs) and production federated infrastructure. AARC also champions federated access and offers tailored training to complement the actio…
▽ More
AARC (Authentication and Authorisation for Research Communities) is a two-year EC-funded project to develop and pilot an integrated cross-discipline authentication and authorisation framework, building on existing authentication and authorisation infrastructures (AAIs) and production federated infrastructure. AARC also champions federated access and offers tailored training to complement the actions needed to test AARC results and to promote AARC outcomes. This article describes a high-level blueprint architectures for interoperable AAIs.
△ Less
Submitted 30 November, 2016; v1 submitted 23 November, 2016;
originally announced November 2016.
-
Image Quality Assessment for Performance Evaluation of Focus Measure Operators
Authors:
Farida Memon,
Mukhtiar Ali Unar,
Sheeraz Memon
Abstract:
This paper presents the performance evaluation of eight focus measure operators namely Image CURV (Curvature), GRAE (Gradient Energy), HISE (Histogram Entropy), LAPM (Modified Laplacian), LAPV (Variance of Laplacian), LAPD (Diagonal Laplacian), LAP3 (Laplacian in 3D Window) and WAVS (Sum of Wavelet Coefficients). Statistical matrics such as MSE (Mean Squared Error), PNSR (Peak Signal to Noise Rati…
▽ More
This paper presents the performance evaluation of eight focus measure operators namely Image CURV (Curvature), GRAE (Gradient Energy), HISE (Histogram Entropy), LAPM (Modified Laplacian), LAPV (Variance of Laplacian), LAPD (Diagonal Laplacian), LAP3 (Laplacian in 3D Window) and WAVS (Sum of Wavelet Coefficients). Statistical matrics such as MSE (Mean Squared Error), PNSR (Peak Signal to Noise Ratio), SC (Structural Content), NCC (Normalized Cross Correlation), MD (Maximum Difference) and NAE (Normalized Absolute Error) are used to evaluate stated focus measures in this research. . FR (Full Reference) method of the image quality assessment is utilized in this paper. Results indicate that LAPD method is comparatively better than other seven focus operators at typical imaging conditions.
△ Less
Submitted 2 April, 2016;
originally announced April 2016.