-
Unmasking Falsehoods in Reviews: An Exploration of NLP Techniques
Authors:
Anusuya Baby Hari Krishnan
Abstract:
In the contemporary digital landscape, online reviews have become an indispensable tool for promoting products and services across various businesses. Marketers, advertisers, and online businesses have found incentives to create deceptive positive reviews for their products and negative reviews for their competitors' offerings. As a result, the writing of deceptive reviews has become an unavoidabl…
▽ More
In the contemporary digital landscape, online reviews have become an indispensable tool for promoting products and services across various businesses. Marketers, advertisers, and online businesses have found incentives to create deceptive positive reviews for their products and negative reviews for their competitors' offerings. As a result, the writing of deceptive reviews has become an unavoidable practice for businesses seeking to promote themselves or undermine their rivals. Detecting such deceptive reviews has become an intense and ongoing area of research. This research paper proposes a machine learning model to identify deceptive reviews, with a particular focus on restaurants. This study delves into the performance of numerous experiments conducted on a dataset of restaurant reviews known as the Deceptive Opinion Spam Corpus. To accomplish this, an n-gram model and max features are developed to effectively identify deceptive content, particularly focusing on fake reviews. A benchmark study is undertaken to explore the performance of two different feature extraction techniques, which are then coupled with five distinct machine learning classification algorithms. The experimental results reveal that the passive aggressive classifier stands out among the various algorithms, showcasing the highest accuracy not only in text classification but also in identifying fake reviews. Moreover, the research delves into data augmentation and implements various deep learning techniques to further enhance the process of detecting deceptive reviews. The findings shed light on the efficacy of the proposed machine learning approach and offer valuable insights into dealing with deceptive reviews in the realm of online businesses.
△ Less
Submitted 24 July, 2023; v1 submitted 20 July, 2023;
originally announced July 2023.
-
Optimization and Commissioning of the EPIC Commensal Radio Transient Imager for the Long Wavelength Array
Authors:
Hariharan Krishnan,
Adam P. Beardsley,
Judd D. Bowman,
Jayce Dowell,
Matthew Kolopanis,
Greg Taylor,
Nithyanandan Thyagarajan
Abstract:
Next generation aperture arrays are expected to consist of hundreds to thousands of antenna elements with substantial digital signal processing to handle large operating bandwidths of a few tens to hundreds of MHz. Conventionally, FX~correlators are used as the primary signal processing unit of the interferometer. These correlators have computational costs that scale as $\mathcal{O}(N^2)$ for larg…
▽ More
Next generation aperture arrays are expected to consist of hundreds to thousands of antenna elements with substantial digital signal processing to handle large operating bandwidths of a few tens to hundreds of MHz. Conventionally, FX~correlators are used as the primary signal processing unit of the interferometer. These correlators have computational costs that scale as $\mathcal{O}(N^2)$ for large arrays. An alternative imaging approach is implemented in the E-field Parallel Imaging Correlator (EPIC) that was recently deployed on the Long Wavelength Array station at the Sevilleta National Wildlife Refuge (LWA-SV) in New Mexico. EPIC uses a novel architecture that produces electric field or intensity images of the sky at the angular resolution of the array with full or partial polarization and the full spectral resolution of the channelizer. By eliminating the intermediate cross-correlation data products, the computational costs can be significantly lowered in comparison to a conventional FX~or XF~correlator from $\mathcal{O}(N^2)$ to $\mathcal{O}(N \log N)$ for dense (but otherwise arbitrary) array layouts. EPIC can also lower the output data rates by directly yielding polarimetric image products for science analysis. We have optimized EPIC and have now commissioned it at LWA-SV as a commensal all-sky imaging back-end that can potentially detect and localize sources of impulsive radio emission on millisecond timescales. In this article, we review the architecture of EPIC, describe code optimizations that improve performance, and present initial validations from commissioning observations. Comparisons between EPIC measurements and simultaneous beam-formed observations of bright sources show spectral-temporal structures in good agreement.
△ Less
Submitted 23 January, 2023;
originally announced January 2023.
-
MLExchange: A web-based platform enabling exchangeable machine learning workflows for scientific studies
Authors:
Zhuowen Zhao,
Tanny Chavez,
Elizabeth A. Holman,
Guanhua Hao,
Adam Green,
Harinarayan Krishnan,
Dylan McReynolds,
Ronald Pandolfi,
Eric J. Roberts,
Petrus H. Zwart,
Howard Yanxon,
Nicholas Schwarz,
Subramanian Sankaranarayanan,
Sergei V. Kalinin,
Apurva Mehta,
Stuart Campbell,
Alexander Hexemer
Abstract:
Machine learning (ML) algorithms are showing a growing trend in hel** the scientific communities across different disciplines and institutions to address large and diverse data problems. However, many available ML tools are programmatically demanding and computationally costly. The MLExchange project aims to build a collaborative platform equipped with enabling tools that allow scientists and fa…
▽ More
Machine learning (ML) algorithms are showing a growing trend in hel** the scientific communities across different disciplines and institutions to address large and diverse data problems. However, many available ML tools are programmatically demanding and computationally costly. The MLExchange project aims to build a collaborative platform equipped with enabling tools that allow scientists and facility users who do not have a profound ML background to use ML and computational resources in scientific discovery. At the high level, we are targeting a full user experience where managing and exchanging ML algorithms, workflows, and data are readily available through web applications. Since each component is an independent container, the whole platform or its individual service(s) can be easily deployed at servers of different scales, ranging from a personal device (laptop, smart phone, etc.) to high performance clusters (HPC) accessed (simultaneously) by many users. Thus, MLExchange renders flexible using scenarios -- users could either access the services and resources from a remote server or run the whole platform or its individual service(s) within their local network.
△ Less
Submitted 26 January, 2023; v1 submitted 20 August, 2022;
originally announced August 2022.
-
Exact Gaussian Processes for Massive Datasets via Non-Stationary Sparsity-Discovering Kernels
Authors:
Marcus M. Noack,
Harinarayan Krishnan,
Mark D. Risser,
Kristofer G. Reyes
Abstract:
A Gaussian Process (GP) is a prominent mathematical framework for stochastic function approximation in science and engineering applications. This success is largely attributed to the GP's analytical tractability, robustness, non-parametric structure, and natural inclusion of uncertainty quantification. Unfortunately, the use of exact GPs is prohibitively expensive for large datasets due to their u…
▽ More
A Gaussian Process (GP) is a prominent mathematical framework for stochastic function approximation in science and engineering applications. This success is largely attributed to the GP's analytical tractability, robustness, non-parametric structure, and natural inclusion of uncertainty quantification. Unfortunately, the use of exact GPs is prohibitively expensive for large datasets due to their unfavorable numerical complexity of $O(N^3)$ in computation and $O(N^2)$ in storage. All existing methods addressing this issue utilize some form of approximation -- usually considering subsets of the full dataset or finding representative pseudo-points that render the covariance matrix well-structured and sparse. These approximate methods can lead to inaccuracies in function approximations and often limit the user's flexibility in designing expressive kernels. Instead of inducing sparsity via data-point geometry and structure, we propose to take advantage of naturally-occurring sparsity by allowing the kernel to discover -- instead of induce -- sparse structure. The premise of this paper is that GPs, in their most native form, are often naturally sparse, but commonly-used kernels do not allow us to exploit this sparsity. The core concept of exact, and at the same time sparse GPs relies on kernel definitions that provide enough flexibility to learn and encode not only non-zero but also zero covariances. This principle of ultra-flexible, compactly-supported, and non-stationary kernels, combined with HPC and constrained optimization, lets us scale exact GPs well beyond 5 million data points.
△ Less
Submitted 18 May, 2022;
originally announced May 2022.
-
Simulation Study on Collaborative Content Distribution in Delay Tolerant Vehicular Networks
Authors:
Rusheng Zhang,
Bo Yu,
Hariharan Krishnan
Abstract:
Modern vehicles are equipped with more and more sophisticated computer modules, which need to periodically download files from the cloud, such as security certificates, digital maps, system firmwares, etc. Collaborative content distribution utilizes V2V communication to distribute large files across the vehicular networks. It has the potential to significantly reduce the cost of cellular-based com…
▽ More
Modern vehicles are equipped with more and more sophisticated computer modules, which need to periodically download files from the cloud, such as security certificates, digital maps, system firmwares, etc. Collaborative content distribution utilizes V2V communication to distribute large files across the vehicular networks. It has the potential to significantly reduce the cost of cellular-based communication such as 4G LTE. In this report, we have conducted a simulation study to verify the feasibility of a hybrid cellular and V2V collaborative content distribution network. In our simulation, a small portion of the simulated vehicles download the file directly from the cloud via cellular communication, while other vehicles receive the file via collaborative V2V communications. Our simulation results show that, with only 1\% of vehicles enabled with cellular communication, it takes less than 24 hours to distribute a file to 90\% of the vehicles in a metropolitan area, and around 48 to 72 hours to distribute to 99\%. The results are very promising for many delay-tolerant content distribution applications in vehicular networks.
△ Less
Submitted 3 July, 2018;
originally announced July 2018.
-
Three dimensional localization of nanoscale battery reactions using soft X-ray tomography
Authors:
Young-Sang Yu,
Maryam Farmand,
Chunjoong Kim,
Yi** Liu,
Clare P. Grey,
Fiona C. Strobridge,
Tolek Tyliszczak,
Rich Celestre,
Peter Denes,
John Joseph,
Harinarayan Krishnan,
Filipe R. N. C. Maia,
A. L. David Kilcoyne,
Stefano Marchesini,
Talita Perciano Costa Leite,
Tony Warwick,
Howard Padmore,
Jordi Cabana,
David A. Shapiro
Abstract:
Battery function is determined by the efficiency and reversibility of the electrochemical phase transformations at solid electrodes. The microscopic tools available to study the chemical states of matter with the required spatial resolution and chemical specificity are intrinsically limited when studying complex architectures by their reliance on two dimensional projections of thick material. Here…
▽ More
Battery function is determined by the efficiency and reversibility of the electrochemical phase transformations at solid electrodes. The microscopic tools available to study the chemical states of matter with the required spatial resolution and chemical specificity are intrinsically limited when studying complex architectures by their reliance on two dimensional projections of thick material. Here, we report the development of soft X-ray ptychographic tomography, which resolves chemical states in three dimensions at 11-nm spatial resolution. We study an ensemble of nano-plates of lithium iron phosphate (LixFePO4) extracted from a battery electrode at 50% state of charge. Using a set of nanoscale tomograms, we quantify the electrochemical state and resolve phase boundaries throughout the volume of individual nano-particles. These observations reveal multiple reaction points and intra-particle heterogeneity that highlights the importance of electrical connectivity, providing novel insight to the design of the next generation of high-performance devices.
△ Less
Submitted 9 March, 2018; v1 submitted 4 November, 2017;
originally announced November 2017.
-
The Eclipse Integrated Computational Environment
Authors:
Jay Jay Billings,
Andrew R. Bennett,
Jordan Deyton,
Kasper Gammeltoft,
Jonah Graham,
Dasha Gorin,
Hari Krishnan,
Menghan Li,
Alexander J. McCaskey,
Taylor Patterson,
Robert Smith,
Gregory R. Watson,
Anna Wojtowicz
Abstract:
Problems in modeling and simulation require significantly different workflow management technologies than standard grid-based workflow management systems. Computational scientists typically interact with simulation software in a feedback driven way were solutions and workflows are developed iteratively and simultaneously. This work describes common activities in workflows and how combinations of t…
▽ More
Problems in modeling and simulation require significantly different workflow management technologies than standard grid-based workflow management systems. Computational scientists typically interact with simulation software in a feedback driven way were solutions and workflows are developed iteratively and simultaneously. This work describes common activities in workflows and how combinations of these activities form unique workflows. It presents the Eclipse Integrated Computational Environment as a workflow management system and development environment for the modeling and simulation community. Examples of the Environment's applicability to problems in energy science, general multiphysics simulations, quantum computing and other areas are presented as well as its impact on the community.
△ Less
Submitted 11 June, 2017; v1 submitted 31 March, 2017;
originally announced April 2017.
-
Nanosurveyor: a framework for real-time data processing
Authors:
Benedikt J. Daurer,
Hari Krishnan,
Talita Perciano,
Filipe R. N. C. Maia,
David A. Shapiro,
James A. Sethian,
Stefano Marchesini
Abstract:
Scientists are drawn to synchrotrons and accelerator based light sources because of their brightness, coherence and flux. The rate of improvement in brightness and detector technology has outpaced Moore's law growth seen for computers, networks, and storage, and is enabling novel observations and discoveries with faster frame rates, larger fields of view, higher resolution, and higher dimensionali…
▽ More
Scientists are drawn to synchrotrons and accelerator based light sources because of their brightness, coherence and flux. The rate of improvement in brightness and detector technology has outpaced Moore's law growth seen for computers, networks, and storage, and is enabling novel observations and discoveries with faster frame rates, larger fields of view, higher resolution, and higher dimensionality. Here we present an integrated software/algorithmic framework designed to capitalize on high throughput experiments, and describe the streamlined processing pipeline of ptychography data analysis. The pipeline provides throughput, compression, and resolution as well as rapid feedback to the microscope operators.
△ Less
Submitted 9 September, 2016;
originally announced September 2016.
-
SHARP: a distributed, GPU-based ptychographic solver
Authors:
Stefano Marchesini,
Hari Krishnan,
Benedikt J. Daurer,
David A. Shapiro,
Talita Perciano,
James A. Sethian,
Filipe R. N. C. Maia
Abstract:
Ever brighter light sources, fast parallel detectors, and advances in phase retrieval methods, have made ptychography a practical and popular imaging technique. Compared to previous techniques, ptychography provides superior robustness and resolution at the expense of more advanced and time consuming data analysis. By taking advantage of massively parallel architectures, high-throughput processing…
▽ More
Ever brighter light sources, fast parallel detectors, and advances in phase retrieval methods, have made ptychography a practical and popular imaging technique. Compared to previous techniques, ptychography provides superior robustness and resolution at the expense of more advanced and time consuming data analysis. By taking advantage of massively parallel architectures, high-throughput processing can expedite this analysis and provide microscopists with immediate feedback. These advances allow real-time imaging at wavelength limited resolution, coupled with a large field of view. Here, we introduce a set of algorithmic and computational methodologies used at the Advanced Light Source, and DOE light sources packaged as a CUDA based software environment named SHARP (http://camera.lbl.gov/sharp), aimed at providing state-of-the-art high-throughput ptychography reconstructions for the coming era of diffraction limited light sources.
△ Less
Submitted 20 June, 2016; v1 submitted 29 January, 2016;
originally announced February 2016.
-
Fairness and Stability Analysis of Congestion Control Schemes in Vehicular Ad-hoc Networks
Authors:
Neda Nasiriani,
Yaser P. Fallah,
Hariharan Krishnan
Abstract:
Cooperative vehicle safety (CVS) systems operate based on broadcast of vehicle position and safety information to neighboring cars. The communication medium of CVS is a vehicular ad-hoc network. One of the main challenges in large scale deployment of CVS systems is the issue of scalability. To address the scalability problem, several congestion control methods have been proposed and are currently…
▽ More
Cooperative vehicle safety (CVS) systems operate based on broadcast of vehicle position and safety information to neighboring cars. The communication medium of CVS is a vehicular ad-hoc network. One of the main challenges in large scale deployment of CVS systems is the issue of scalability. To address the scalability problem, several congestion control methods have been proposed and are currently under field study. These algorithms adapt transmission rate and power based on network measures such as channel busy ratio. We examine two such algorithms and study their dynamic behavior in time and space to evaluate stability (in time) and fairness (in space) properties of these algorithms. We present stability conditions and evaluate stability and fairness of the algorithms through simulation experiments. Results show that there is a trade-off between fast convergence, temporal stability and spatial fairness. The proper ranges of parameters for achieving stability are presented for the discussed algorithms. Stability is verified for all typical road density cases. Fairness is shown to be naturally achieved for some algorithms, while under the same conditions other algorithms may suffer from unfairness issues. A method for resolving unfairness is introduced and evaluated through simulations.
△ Less
Submitted 1 June, 2012;
originally announced June 2012.