-
Unmasking Falsehoods in Reviews: An Exploration of NLP Techniques
Authors:
Anusuya Baby Hari Krishnan
Abstract:
In the contemporary digital landscape, online reviews have become an indispensable tool for promoting products and services across various businesses. Marketers, advertisers, and online businesses have found incentives to create deceptive positive reviews for their products and negative reviews for their competitors' offerings. As a result, the writing of deceptive reviews has become an unavoidabl…
▽ More
In the contemporary digital landscape, online reviews have become an indispensable tool for promoting products and services across various businesses. Marketers, advertisers, and online businesses have found incentives to create deceptive positive reviews for their products and negative reviews for their competitors' offerings. As a result, the writing of deceptive reviews has become an unavoidable practice for businesses seeking to promote themselves or undermine their rivals. Detecting such deceptive reviews has become an intense and ongoing area of research. This research paper proposes a machine learning model to identify deceptive reviews, with a particular focus on restaurants. This study delves into the performance of numerous experiments conducted on a dataset of restaurant reviews known as the Deceptive Opinion Spam Corpus. To accomplish this, an n-gram model and max features are developed to effectively identify deceptive content, particularly focusing on fake reviews. A benchmark study is undertaken to explore the performance of two different feature extraction techniques, which are then coupled with five distinct machine learning classification algorithms. The experimental results reveal that the passive aggressive classifier stands out among the various algorithms, showcasing the highest accuracy not only in text classification but also in identifying fake reviews. Moreover, the research delves into data augmentation and implements various deep learning techniques to further enhance the process of detecting deceptive reviews. The findings shed light on the efficacy of the proposed machine learning approach and offer valuable insights into dealing with deceptive reviews in the realm of online businesses.
△ Less
Submitted 24 July, 2023; v1 submitted 20 July, 2023;
originally announced July 2023.
-
MLExchange: A web-based platform enabling exchangeable machine learning workflows for scientific studies
Authors:
Zhuowen Zhao,
Tanny Chavez,
Elizabeth A. Holman,
Guanhua Hao,
Adam Green,
Harinarayan Krishnan,
Dylan McReynolds,
Ronald Pandolfi,
Eric J. Roberts,
Petrus H. Zwart,
Howard Yanxon,
Nicholas Schwarz,
Subramanian Sankaranarayanan,
Sergei V. Kalinin,
Apurva Mehta,
Stuart Campbell,
Alexander Hexemer
Abstract:
Machine learning (ML) algorithms are showing a growing trend in hel** the scientific communities across different disciplines and institutions to address large and diverse data problems. However, many available ML tools are programmatically demanding and computationally costly. The MLExchange project aims to build a collaborative platform equipped with enabling tools that allow scientists and fa…
▽ More
Machine learning (ML) algorithms are showing a growing trend in hel** the scientific communities across different disciplines and institutions to address large and diverse data problems. However, many available ML tools are programmatically demanding and computationally costly. The MLExchange project aims to build a collaborative platform equipped with enabling tools that allow scientists and facility users who do not have a profound ML background to use ML and computational resources in scientific discovery. At the high level, we are targeting a full user experience where managing and exchanging ML algorithms, workflows, and data are readily available through web applications. Since each component is an independent container, the whole platform or its individual service(s) can be easily deployed at servers of different scales, ranging from a personal device (laptop, smart phone, etc.) to high performance clusters (HPC) accessed (simultaneously) by many users. Thus, MLExchange renders flexible using scenarios -- users could either access the services and resources from a remote server or run the whole platform or its individual service(s) within their local network.
△ Less
Submitted 26 January, 2023; v1 submitted 20 August, 2022;
originally announced August 2022.
-
Exact Gaussian Processes for Massive Datasets via Non-Stationary Sparsity-Discovering Kernels
Authors:
Marcus M. Noack,
Harinarayan Krishnan,
Mark D. Risser,
Kristofer G. Reyes
Abstract:
A Gaussian Process (GP) is a prominent mathematical framework for stochastic function approximation in science and engineering applications. This success is largely attributed to the GP's analytical tractability, robustness, non-parametric structure, and natural inclusion of uncertainty quantification. Unfortunately, the use of exact GPs is prohibitively expensive for large datasets due to their u…
▽ More
A Gaussian Process (GP) is a prominent mathematical framework for stochastic function approximation in science and engineering applications. This success is largely attributed to the GP's analytical tractability, robustness, non-parametric structure, and natural inclusion of uncertainty quantification. Unfortunately, the use of exact GPs is prohibitively expensive for large datasets due to their unfavorable numerical complexity of $O(N^3)$ in computation and $O(N^2)$ in storage. All existing methods addressing this issue utilize some form of approximation -- usually considering subsets of the full dataset or finding representative pseudo-points that render the covariance matrix well-structured and sparse. These approximate methods can lead to inaccuracies in function approximations and often limit the user's flexibility in designing expressive kernels. Instead of inducing sparsity via data-point geometry and structure, we propose to take advantage of naturally-occurring sparsity by allowing the kernel to discover -- instead of induce -- sparse structure. The premise of this paper is that GPs, in their most native form, are often naturally sparse, but commonly-used kernels do not allow us to exploit this sparsity. The core concept of exact, and at the same time sparse GPs relies on kernel definitions that provide enough flexibility to learn and encode not only non-zero but also zero covariances. This principle of ultra-flexible, compactly-supported, and non-stationary kernels, combined with HPC and constrained optimization, lets us scale exact GPs well beyond 5 million data points.
△ Less
Submitted 18 May, 2022;
originally announced May 2022.
-
Simulation Study on Collaborative Content Distribution in Delay Tolerant Vehicular Networks
Authors:
Rusheng Zhang,
Bo Yu,
Hariharan Krishnan
Abstract:
Modern vehicles are equipped with more and more sophisticated computer modules, which need to periodically download files from the cloud, such as security certificates, digital maps, system firmwares, etc. Collaborative content distribution utilizes V2V communication to distribute large files across the vehicular networks. It has the potential to significantly reduce the cost of cellular-based com…
▽ More
Modern vehicles are equipped with more and more sophisticated computer modules, which need to periodically download files from the cloud, such as security certificates, digital maps, system firmwares, etc. Collaborative content distribution utilizes V2V communication to distribute large files across the vehicular networks. It has the potential to significantly reduce the cost of cellular-based communication such as 4G LTE. In this report, we have conducted a simulation study to verify the feasibility of a hybrid cellular and V2V collaborative content distribution network. In our simulation, a small portion of the simulated vehicles download the file directly from the cloud via cellular communication, while other vehicles receive the file via collaborative V2V communications. Our simulation results show that, with only 1\% of vehicles enabled with cellular communication, it takes less than 24 hours to distribute a file to 90\% of the vehicles in a metropolitan area, and around 48 to 72 hours to distribute to 99\%. The results are very promising for many delay-tolerant content distribution applications in vehicular networks.
△ Less
Submitted 3 July, 2018;
originally announced July 2018.
-
The Eclipse Integrated Computational Environment
Authors:
Jay Jay Billings,
Andrew R. Bennett,
Jordan Deyton,
Kasper Gammeltoft,
Jonah Graham,
Dasha Gorin,
Hari Krishnan,
Menghan Li,
Alexander J. McCaskey,
Taylor Patterson,
Robert Smith,
Gregory R. Watson,
Anna Wojtowicz
Abstract:
Problems in modeling and simulation require significantly different workflow management technologies than standard grid-based workflow management systems. Computational scientists typically interact with simulation software in a feedback driven way were solutions and workflows are developed iteratively and simultaneously. This work describes common activities in workflows and how combinations of t…
▽ More
Problems in modeling and simulation require significantly different workflow management technologies than standard grid-based workflow management systems. Computational scientists typically interact with simulation software in a feedback driven way were solutions and workflows are developed iteratively and simultaneously. This work describes common activities in workflows and how combinations of these activities form unique workflows. It presents the Eclipse Integrated Computational Environment as a workflow management system and development environment for the modeling and simulation community. Examples of the Environment's applicability to problems in energy science, general multiphysics simulations, quantum computing and other areas are presented as well as its impact on the community.
△ Less
Submitted 11 June, 2017; v1 submitted 31 March, 2017;
originally announced April 2017.
-
Nanosurveyor: a framework for real-time data processing
Authors:
Benedikt J. Daurer,
Hari Krishnan,
Talita Perciano,
Filipe R. N. C. Maia,
David A. Shapiro,
James A. Sethian,
Stefano Marchesini
Abstract:
Scientists are drawn to synchrotrons and accelerator based light sources because of their brightness, coherence and flux. The rate of improvement in brightness and detector technology has outpaced Moore's law growth seen for computers, networks, and storage, and is enabling novel observations and discoveries with faster frame rates, larger fields of view, higher resolution, and higher dimensionali…
▽ More
Scientists are drawn to synchrotrons and accelerator based light sources because of their brightness, coherence and flux. The rate of improvement in brightness and detector technology has outpaced Moore's law growth seen for computers, networks, and storage, and is enabling novel observations and discoveries with faster frame rates, larger fields of view, higher resolution, and higher dimensionality. Here we present an integrated software/algorithmic framework designed to capitalize on high throughput experiments, and describe the streamlined processing pipeline of ptychography data analysis. The pipeline provides throughput, compression, and resolution as well as rapid feedback to the microscope operators.
△ Less
Submitted 9 September, 2016;
originally announced September 2016.
-
Fairness and Stability Analysis of Congestion Control Schemes in Vehicular Ad-hoc Networks
Authors:
Neda Nasiriani,
Yaser P. Fallah,
Hariharan Krishnan
Abstract:
Cooperative vehicle safety (CVS) systems operate based on broadcast of vehicle position and safety information to neighboring cars. The communication medium of CVS is a vehicular ad-hoc network. One of the main challenges in large scale deployment of CVS systems is the issue of scalability. To address the scalability problem, several congestion control methods have been proposed and are currently…
▽ More
Cooperative vehicle safety (CVS) systems operate based on broadcast of vehicle position and safety information to neighboring cars. The communication medium of CVS is a vehicular ad-hoc network. One of the main challenges in large scale deployment of CVS systems is the issue of scalability. To address the scalability problem, several congestion control methods have been proposed and are currently under field study. These algorithms adapt transmission rate and power based on network measures such as channel busy ratio. We examine two such algorithms and study their dynamic behavior in time and space to evaluate stability (in time) and fairness (in space) properties of these algorithms. We present stability conditions and evaluate stability and fairness of the algorithms through simulation experiments. Results show that there is a trade-off between fast convergence, temporal stability and spatial fairness. The proper ranges of parameters for achieving stability are presented for the discussed algorithms. Stability is verified for all typical road density cases. Fairness is shown to be naturally achieved for some algorithms, while under the same conditions other algorithms may suffer from unfairness issues. A method for resolving unfairness is introduced and evaluated through simulations.
△ Less
Submitted 1 June, 2012;
originally announced June 2012.