Skip to main content

Showing 1–45 of 45 results for author: Papalexakis, E E

.
  1. arXiv:2406.17261  [pdf, other

    cs.CL

    TRAWL: Tensor Reduced and Approximated Weights for Large Language Models

    Authors: Yiran Luo, Het Patel, Yu Fu, Dawon Ahn, Jia Chen, Yue Dong, Evangelos E. Papalexakis

    Abstract: Large language models (LLMs) have fundamentally transformed artificial intelligence, catalyzing recent advancements while imposing substantial environmental and computational burdens. We introduce TRAWL (Tensor Reduced and Approximated Weights for Large Language Models), a novel methodology for optimizing LLMs through tensor decomposition. TRAWL leverages diverse strategies to exploit matrices wit… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 8 pages, 5 figures. Submitted to EMNLP 2024 and under review

    MSC Class: 68T50 (Primary); 65F55 (Secondary) ACM Class: I.2.7

  2. arXiv:2403.07321  [pdf, other

    cs.CL

    GPT-generated Text Detection: Benchmark Dataset and Tensor-based Detection Method

    Authors: Zubair Qazi, William Shiao, Evangelos E. Papalexakis

    Abstract: As natural language models like ChatGPT become increasingly prevalent in applications and services, the need for robust and accurate methods to detect their output is of paramount importance. In this paper, we present GPT Reddit Dataset (GRiD), a novel Generative Pretrained Transformer (GPT)-generated text detection dataset designed to assess the performance of detection models in identifying gene… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

    Comments: 4 pages, 2 figures, published in the WWW 2024 Short Papers Track

  3. arXiv:2403.03004  [pdf, other

    astro-ph.CO gr-qc hep-ph

    Ultralight vector dark matter search using data from the KAGRA O3GK run

    Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, the KAGRA Collaboration, A. G. Abac, R. Abbott, H. Abe, I. Abouelfettouh, F. Acernese, K. Ackley, C. Adamcewicz, S. Adhicary, N. Adhikari, R. X. Adhikari, V. K. Adkins, V. B. Adya, C. Affeldt, D. Agarwal, M. Agathos, O. D. Aguiar, I. Aguilar, L. Aiello, A. Ain, P. Ajith, T. Akutsu, S. Albanesi , et al. (1778 additional authors not shown)

    Abstract: Among the various candidates for dark matter (DM), ultralight vector DM can be probed by laser interferometric gravitational wave detectors through the measurement of oscillating length changes in the arm cavities. In this context, KAGRA has a unique feature due to differing compositions of its mirrors, enhancing the signal of vector DM in the length change in the auxiliary channels. Here we prese… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

    Comments: 20 pages, 5 figures

    Report number: LIGO-P2300250

  4. arXiv:2312.00296  [pdf, other

    cs.LG stat.ML

    Towards Aligned Canonical Correlation Analysis: Preliminary Formulation and Proof-of-Concept Results

    Authors: Biqian Cheng, Evangelos E. Papalexakis, Jia Chen

    Abstract: Canonical Correlation Analysis (CCA) has been widely applied to jointly embed multiple views of data in a maximally correlated latent space. However, the alignment between various data perspectives, which is required by traditional approaches, is unclear in many practical cases. In this work we propose a new framework Aligned Canonical Correlation Analysis (ACCA), to address this challenge by iter… ▽ More

    Submitted 7 December, 2023; v1 submitted 30 November, 2023; originally announced December 2023.

    Comments: 4 pages, 7 figures, KDD SoCal symposium 2023 (extended version)

  5. arXiv:2308.03822  [pdf, other

    astro-ph.HE

    Search for Eccentric Black Hole Coalescences during the Third Observing Run of LIGO and Virgo

    Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, the KAGRA Collaboration, A. G. Abac, R. Abbott, H. Abe, F. Acernese, K. Ackley, C. Adamcewicz, S. Adhicary, N. Adhikari, R. X. Adhikari, V. K. Adkins, V. B. Adya, C. Affeldt, D. Agarwal, M. Agathos, O. D. Aguiar, I. Aguilar, L. Aiello, A. Ain, P. Ajith, T. Akutsu, S. Albanesi, R. A. Alfaidi , et al. (1750 additional authors not shown)

    Abstract: Despite the growing number of confident binary black hole coalescences observed through gravitational waves so far, the astrophysical origin of these binaries remains uncertain. Orbital eccentricity is one of the clearest tracers of binary formation channels. Identifying binary eccentricity, however, remains challenging due to the limited availability of gravitational waveforms that include effect… ▽ More

    Submitted 7 August, 2023; originally announced August 2023.

    Comments: 24 pages, 5 figures

    Report number: LIGO-P2300080

  6. CARL-G: Clustering-Accelerated Representation Learning on Graphs

    Authors: William Shiao, Uday Singh Saini, Yozen Liu, Tong Zhao, Neil Shah, Evangelos E. Papalexakis

    Abstract: Self-supervised learning on graphs has made large strides in achieving great performance in various downstream tasks. However, many state-of-the-art methods suffer from a number of impediments, which prevent them from realizing their full potential. For instance, contrastive methods typically require negative sampling, which is often computationally costly. While non-contrastive methods avoid this… ▽ More

    Submitted 31 July, 2023; v1 submitted 12 June, 2023; originally announced June 2023.

    Comments: 14 pages. Accepted at KDD 2023

  7. Open data from the third observing run of LIGO, Virgo, KAGRA and GEO

    Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, the KAGRA Collaboration, R. Abbott, H. Abe, F. Acernese, K. Ackley, S. Adhicary, N. Adhikari, R. X. Adhikari, V. K. Adkins, V. B. Adya, C. Affeldt, D. Agarwal, M. Agathos, O. D. Aguiar, L. Aiello, A. Ain, P. Ajith, T. Akutsu, S. Albanesi, R. A. Alfaidi, A. Al-Jodah, C. Alléné, A. Allocca , et al. (1719 additional authors not shown)

    Abstract: The global network of gravitational-wave observatories now includes five detectors, namely LIGO Hanford, LIGO Livingston, Virgo, KAGRA, and GEO 600. These detectors collected data during their third observing run, O3, composed of three phases: O3a starting in April of 2019 and lasting six months, O3b starting in November of 2019 and lasting five months, and O3GK starting in April of 2020 and lasti… ▽ More

    Submitted 7 February, 2023; originally announced February 2023.

    Comments: 27 pages, 3 figures

    Report number: LIGO-P2200316

  8. arXiv:2211.14394  [pdf, other

    cs.LG cs.SI

    Link Prediction with Non-Contrastive Learning

    Authors: William Shiao, Zhichun Guo, Tong Zhao, Evangelos E. Papalexakis, Yozen Liu, Neil Shah

    Abstract: A recent focal area in the space of graph neural networks (GNNs) is graph self-supervised learning (SSL), which aims to derive useful node representations without labeled data. Notably, many state-of-the-art graph SSL methods are contrastive methods, which use a combination of positive and negative samples to learn node representations. Owing to challenges in negative sampling (slowness and model… ▽ More

    Submitted 28 March, 2023; v1 submitted 25 November, 2022; originally announced November 2022.

    Comments: ICLR 2023. 19 pages, 6 figures

  9. arXiv:2206.09316  [pdf, other

    cs.LG stat.ML

    FRAPPE: $\underline{\text{F}}$ast $\underline{\text{Ra}}$nk $\underline{\text{App}}$roximation with $\underline{\text{E}}$xplainable Features for Tensors

    Authors: William Shiao, Evangelos E. Papalexakis

    Abstract: Tensor decompositions have proven to be effective in analyzing the structure of multidimensional data. However, most of these methods require a key parameter: the number of desired components. In the case of the CANDECOMP/PARAFAC decomposition (CPD), the ideal value for the number of components is known as the canonical rank and greatly affects the quality of the decomposition results. Existing me… ▽ More

    Submitted 25 May, 2024; v1 submitted 18 June, 2022; originally announced June 2022.

    Comments: 16 pages, 4 figures

  10. arXiv:2205.12449  [pdf, other

    cs.LG cs.MA

    MAVIPER: Learning Decision Tree Policies for Interpretable Multi-Agent Reinforcement Learning

    Authors: Stephanie Milani, Zhicheng Zhang, Nicholay Topin, Zheyuan Ryan Shi, Charles Kamhoua, Evangelos E. Papalexakis, Fei Fang

    Abstract: Many recent breakthroughs in multi-agent reinforcement learning (MARL) require the use of deep neural networks, which are challenging for human experts to interpret and understand. On the other hand, existing work on interpretable reinforcement learning (RL) has shown promise in extracting more interpretable decision tree-based policies from neural networks, but only in the single-agent setting. T… ▽ More

    Submitted 11 July, 2022; v1 submitted 24 May, 2022; originally announced May 2022.

    Comments: ECML camera-ready version. 23 pages

  11. arXiv:2108.06702  [pdf, other

    cs.CV cs.AI cs.LG

    Deepfake Representation with Multilinear Regression

    Authors: Sara Abdali, M. Alex O. Vasilescu, Evangelos E. Papalexakis

    Abstract: Generative neural network architectures such as GANs, may be used to generate synthetic instances to compensate for the lack of real data. However, they may be employed to create media that may cause social, political or economical upheaval. One emerging media is "Deepfake".Techniques that can discriminate between such media is indispensable. In this paper, we propose a modified multilinear (tenso… ▽ More

    Submitted 15 August, 2021; originally announced August 2021.

  12. arXiv:2107.01296  [pdf, other

    cs.LG

    Subspace Clustering Based Analysis of Neural Networks

    Authors: Uday Singh Saini, Pravallika Devineni, Evangelos E. Papalexakis

    Abstract: Tools to analyze the latent space of deep neural networks provide a step towards better understanding them. In this work, we motivate sparse subspace clustering (SSC) with an aim to learn affinity graphs from the latent structure of a given neural network layer trained over a set of inputs. We then use tools from Community Detection to quantify structures present in the input. These experiments re… ▽ More

    Submitted 2 July, 2021; originally announced July 2021.

  13. arXiv:2102.07857  [pdf, other

    cs.LG cs.AI cs.CY

    KNH: Multi-View Modeling with K-Nearest Hyperplanes Graph for Misinformation Detection

    Authors: Sara Abdali, Neil Shah, Evangelos E. Papalexakis

    Abstract: Graphs are one of the most efficacious structures for representing datapoints and their relations, and they have been largely exploited for different applications. Previously, the higher-order relations between the nodes have been modeled by a generalization of graphs known as hypergraphs. In hypergraphs, the edges are defined by a set of nodes i.e., hyperedges to demonstrate the higher order rela… ▽ More

    Submitted 15 February, 2021; originally announced February 2021.

    Journal ref: Second International TrueFact Workshop 2020: Making a Credible Web for Tomorrow

  14. arXiv:2102.07849  [pdf, other

    cs.LG cs.AI cs.CY cs.SI

    Identifying Misinformation from Website Screenshots

    Authors: Sara Abdali, Rutuja Gurav, Siddharth Menon, Daniel Fonseca, Negin Entezari, Neil Shah, Evangelos E. Papalexakis

    Abstract: Can the look and the feel of a website give information about the trustworthiness of an article? In this paper, we propose to use a promising, yet neglected aspect in detecting the misinformativeness: the overall look of the domain webpage. To capture this overall look, we take screenshots of news articles served by either misinformative or trustworthy web domains and leverage a tensor decompositi… ▽ More

    Submitted 3 June, 2021; v1 submitted 15 February, 2021; originally announced February 2021.

    Journal ref: The International AAAI Conference on Web and Social Media (ICWSM) 2021

  15. arXiv:2012.12516  [pdf, other

    cs.LG cs.CV

    Analyzing Representations inside Convolutional Neural Networks

    Authors: Uday Singh Saini, Evangelos E. Papalexakis

    Abstract: How can we discover and succinctly summarize the concepts that a neural network has learned? Such a task is of great importance in applications of networks in areas of inference that involve classification, like medical diagnosis based on fMRI/x-ray etc. In this work, we propose a framework to categorize the concepts a network learns based on the way it clusters a set of input examples, clusters n… ▽ More

    Submitted 23 December, 2020; originally announced December 2020.

    Comments: Work in Progress

  16. arXiv:2011.07363  [pdf, other

    cs.IR

    RecTen: A Recursive Hierarchical Low Rank Tensor Factorization Method to Discover Hierarchical Patterns in Multi-modal Data

    Authors: Risul Islam, Md Omar Faruk Rokon, Evangelos E. Papalexakis, Michalis Faloutsos

    Abstract: How can we expand the tensor decomposition to reveal a hierarchical structure of the multi-modal data in a self-adaptive way? Current tensor decomposition provides only a single layer of clusters. We argue that with the abundance of multimodal data and time-evolving networks nowadays, the ability to identify emerging hierarchies is important. To this effect, we propose RecTen, a recursive hierarch… ▽ More

    Submitted 14 November, 2020; originally announced November 2020.

    Comments: 9 pages, 9 figures, 1 table, 1 algorithm

  17. arXiv:2011.07226  [pdf, other

    cs.CR cs.IR

    TenFor: A Tensor-Based Tool to Extract Interesting Events from Security Forums

    Authors: Risul Islam, Md Omar Faruk Rokon, Evangelos E. Papalexakis, Michalis Faloutsos

    Abstract: How can we get a security forum to "tell" us its activities and events of interest? We take a unique angle: we want to identify these activities without any a priori knowledge, which is a key difference compared to most of the previous problem formulations. Despite some recent efforts, mining security forums to extract useful information has received relatively little attention, while most of them… ▽ More

    Submitted 14 November, 2020; originally announced November 2020.

    Comments: 8 pages, 5 figures, and 4 tables. In Press of ASONAM'20

  18. arXiv:2008.07672  [pdf, other

    cs.LG stat.ML

    Ensemble Node Embeddings using Tensor Decomposition: A Case-Study on DeepWalk

    Authors: Jia Chen, Evangelos E. Papalexakis

    Abstract: Node embeddings have been attracting increasing attention during the past years. In this context, we propose a new ensemble node embedding approach, called TenSemble2Vec, by first generating multiple embeddings using the existing techniques and taking them as multiview data input of the state-of-art tensor decomposition model namely PARAFAC2 to learn the shared lower-dimensional representations of… ▽ More

    Submitted 17 August, 2020; originally announced August 2020.

  19. arXiv:2005.04310  [pdf, other

    cs.SI cs.LG stat.ML

    Semi-Supervised Multi-aspect Detection of Misinformation using Hierarchical Joint Decomposition

    Authors: Sara Abdali, Neil Shah, Evangelos E. Papalexakis

    Abstract: Distinguishing between misinformation and real information is one of the most challenging problems in today's interconnected world. The vast majority of the state-of-the-art in detecting misinformation is fully supervised, requiring a large number of high-quality human annotations. However, the availability of such annotations cannot be taken for granted, since it is very costly, time-consuming, a… ▽ More

    Submitted 3 June, 2021; v1 submitted 8 May, 2020; originally announced May 2020.

  20. arXiv:2002.10252  [pdf, other

    cs.LG cs.CR cs.CV stat.ML

    TensorShield: Tensor-based Defense Against Adversarial Attacks on Images

    Authors: Negin Entezari, Evangelos E. Papalexakis

    Abstract: Recent studies have demonstrated that machine learning approaches like deep neural networks (DNNs) are easily fooled by adversarial attacks. Subtle and imperceptible perturbations of the data are able to change the result of deep neural networks. Leveraging vulnerable machine learning methods raises many concerns especially in domains where security is an important factor. Therefore, it is crucial… ▽ More

    Submitted 17 February, 2020; originally announced February 2020.

  21. arXiv:2001.02660  [pdf, other

    cs.CL cs.IR

    REST: A Thread Embedding Approach for Identifying and Classifying User-specified Information in Security Forums

    Authors: Joobin Gharibshah, Evangelos E. Papalexakis, Michalis Faloutsos

    Abstract: How can we extract useful information from a security forum? We focus on identifying threads of interest to a security professional: (a) alerts of worrisome events, such as attacks, (b) offering of malicious services and products, (c) hacking information to perform malicious acts, and (d) useful security-related experiences. The analysis of security forums is in its infancy despite several promisi… ▽ More

    Submitted 30 March, 2020; v1 submitted 8 January, 2020; originally announced January 2020.

    Comments: Accepted in ICWSM 2020

  22. arXiv:1912.09009  [pdf, other

    cs.LG stat.ML

    Adaptive Granularity in Tensors: A Quest for Interpretable Structure

    Authors: Ravdeep Pasricha, Ekta Gujral, Evangelos E. Papalexakis

    Abstract: Data collected at very frequent intervals is usually extremely sparse and has no structure that is exploitable by modern tensor decomposition algorithms. Thus the utility of such tensors is low, in terms of the amount of interpretable and exploitable structure that one can extract from them. In this paper, we introduce the problem of finding a tensor of adaptive aggregated granularity that can be… ▽ More

    Submitted 1 March, 2022; v1 submitted 18 December, 2019; originally announced December 2019.

    Comments: 17 Pages. Under Review

  23. arXiv:1811.07428  [pdf, ps, other

    cs.LG eess.SP stat.ML

    The core consistency of a compressed tensor

    Authors: Georgios Tsitsikas, Evangelos E. Papalexakis

    Abstract: Tensor decomposition on big data has attracted significant attention recently. Among the most popular methods is a class of algorithms that leverages compression in order to reduce the size of the tensor and potentially parallelize computations. A fundamental requirement for such methods to work properly is that the low-rank tensor structure is retained upon compression. In lieu of efficient and r… ▽ More

    Submitted 18 November, 2018; originally announced November 2018.

    Comments: 5 pages, 4 figures, submitted to International Conference on Acoustics, Speech, and Signal Processing ( IEEE ICASSP 2019 )

  24. arXiv:1811.01557  [pdf, other

    cs.LG cs.AI stat.ML

    Representation Learning by Reconstructing Neighborhoods

    Authors: Chin-Chia Michael Yeh, Yan Zhu, Evangelos E. Papalexakis, Abdullah Mueen, Eamonn Keogh

    Abstract: Since its introduction, unsupervised representation learning has attracted a lot of attention from the research community, as it is demonstrated to be highly effective and easy-to-apply in tasks such as dimension reduction, clustering, visualization, information retrieval, and semi-supervised learning. In this work, we propose a novel unsupervised representation learning framework called neighbor-… ▽ More

    Submitted 6 November, 2018; v1 submitted 5 November, 2018; originally announced November 2018.

  25. arXiv:1808.07793  [pdf, other

    cs.MM cs.CV cs.IR

    Webly Supervised Joint Embedding for Cross-Modal Image-Text Retrieval

    Authors: Niluthpol Chowdhury Mithun, Rameswar Panda, Evangelos E. Papalexakis, Amit K. Roy-Chowdhury

    Abstract: Cross-modal retrieval between visual data and natural language description remains a long-standing challenge in multimedia. While recent image-text retrieval methods offer great promise by learning deep representations aligned across modalities, most of these methods are plagued by the issue of training with small-scale datasets covering a limited number of images with ground-truth sentences. More… ▽ More

    Submitted 23 August, 2018; originally announced August 2018.

    Comments: ACM Multimedia 2018

  26. arXiv:1807.01350  [pdf, other

    cs.LG stat.ML

    OCTen: Online Compression-based Tensor Decomposition

    Authors: Ekta Gujral, Ravdeep Pasricha, Tianxiong Yang, Evangelos E. Papalexakis

    Abstract: Tensor decompositions are powerful tools for large data analytics as they jointly model multiple aspects of data into one framework and enable the discovery of the latent structures and higher-order correlations within the data. One of the most widely studied and used decompositions, especially in data mining and machine learning, is the Canonical Polyadic or CP decomposition. However, today's dat… ▽ More

    Submitted 3 July, 2018; originally announced July 2018.

  27. arXiv:1807.00122  [pdf, other

    cs.IR cs.CL cs.LG stat.ML

    A Constrained Coupled Matrix-Tensor Factorization for Learning Time-evolving and Emerging Topics

    Authors: Sanaz Bahargam, Evangelos E. Papalexakis

    Abstract: Topic discovery has witnessed a significant growth as a field of data mining at large. In particular, time-evolving topic discovery, where the evolution of a topic is taken into account has been instrumental in understanding the historical context of an emerging topic in a dynamic corpus. Traditionally, time-evolving topic discovery has focused on this notion of time. However, especially in settin… ▽ More

    Submitted 30 June, 2018; originally announced July 2018.

  28. arXiv:1806.02012  [pdf, other

    cs.LG cs.CV stat.ML

    A Peek Into the Hidden Layers of a Convolutional Neural Network Through a Factorization Lens

    Authors: Uday Singh Saini, Evangelos E. Papalexakis

    Abstract: Despite their increasing popularity and success in a variety of supervised learning problems, deep neural networks are extremely hard to interpret and debug: Given and already trained Deep Neural Net, and a set of test inputs, how can we gain insight into how those inputs interact with different layers of the neural network? Furthermore, can we characterize a given deep neural network based on it'… ▽ More

    Submitted 6 June, 2018; originally announced June 2018.

  29. arXiv:1806.01997  [pdf, other

    cs.SI

    TrollSpot: Detecting misbehavior in commenting platforms

    Authors: Tai Ching Li, Joobin Gharibshah, Evangelos E. Papalexakis, Michalis Faloutsos

    Abstract: Commenting platforms, such as Disqus, have emerged as a major online communication platform with millions of users and posts. Their popularity has also attracted parasitic and malicious behav- iors, such as trolling and spamming. There has been relatively little research on modeling and safeguarding these platforms. As our key contribution, we develop a systematic approach to detect malicious user… ▽ More

    Submitted 5 June, 2018; originally announced June 2018.

    Comments: Accepted in WSDM workshop on Misinformation and Misbehavior Mining on the Web, 2018

  30. arXiv:1805.05476  [pdf, other

    cs.SI

    MIMiS: Minimally Intrusive Mining of Smartphone User Behaviors

    Authors: Pravallika Devineni, Evangelos E. Papalexakis, Kalina Michalska, Michalis Faloutsos

    Abstract: How intrusive does a life-saving user-monitoring application really need to be? While most previous research was focused on analyzing mental state of users from social media and smartphones, there is little effort towards protecting user privacy in these analyses. A challenge in analyzing user behaviors is that not only is the data multi-dimensional with a myriad of user activities but these activ… ▽ More

    Submitted 14 May, 2018; originally announced May 2018.

    Comments: 8 pages, IEEE conference format

  31. arXiv:1805.01889  [pdf, other

    cs.LG stat.ML

    t-PINE: Tensor-based Predictable and Interpretable Node Embeddings

    Authors: Saba A. Al-Sayouri, Ekta Gujral, Danai Koutra, Evangelos E. Papalexakis, Sarah S. Lam

    Abstract: Graph representations have increasingly grown in popularity during the last years. Existing representation learning approaches explicitly encode network structure. Despite their good performance in downstream processes (e.g., node classification, link prediction), there is still room for improvement in different aspects, like efficacy, visualization, and interpretability. In this paper, we propose… ▽ More

    Submitted 3 May, 2018; originally announced May 2018.

  32. arXiv:1805.01509  [pdf, other

    cs.LG cs.AI cs.SI stat.ML

    RECS: Robust Graph Embedding Using Connection Subgraphs

    Authors: Saba A. Al-Sayouri, Danai Koutra, Evangelos E. Papalexakis, Sarah S. Lam

    Abstract: The success of graph embeddings or node representation learning in a variety of downstream tasks, such as node classification, link prediction, and recommendation systems, has led to their popularity in recent years. Representation learning algorithms aim to preserve local and global network structure by identifying node neighborhood notions. However, many existing algorithms generate embeddings t… ▽ More

    Submitted 5 September, 2018; v1 submitted 3 May, 2018; originally announced May 2018.

  33. arXiv:1804.09619  [pdf, ps, other

    cs.LG stat.ML

    Identifying and Alleviating Concept Drift in Streaming Tensor Decomposition

    Authors: Ravdeep Pasricha, Ekta Gujral, Evangelos E. Papalexakis

    Abstract: Tensor decompositions are used in various data mining applications from social network to medical applications and are extremely useful in discovering latent structures or concepts in the data. Many real-world applications are dynamic in nature and so are their data. To deal with this dynamic nature of data, there exist a variety of online tensor decomposition algorithms. A central assumption in a… ▽ More

    Submitted 9 November, 2018; v1 submitted 25 April, 2018; originally announced April 2018.

    Comments: 16 Pages, Accepted at ECML-PKDD 2018

  34. arXiv:1804.09088  [pdf, other

    cs.LG cs.SI stat.AP stat.ML

    Semi-supervised Content-based Detection of Misinformation via Tensor Embeddings

    Authors: Gisel Bastidas Guacho, Sara Abdali, Neil Shah, Evangelos E. Papalexakis

    Abstract: Fake news may be intentionally created to promote economic, political and social interests, and can lead to negative impacts on humans beliefs and decisions. Hence, detection of fake news is an emerging problem that has become extremely prevalent during the last few years. Most existing works on this topic focus on manual feature extraction and supervised classification models leveraging a large n… ▽ More

    Submitted 24 April, 2018; originally announced April 2018.

  35. arXiv:1804.04800  [pdf, ps, other

    cs.SI

    Mining actionable information from security forums: the case of malicious IP addresses

    Authors: Joobin Gharibshah, Tai Ching Li, Andre Castro, Konstantinos Pelechrinis, Evangelos E. Papalexakis, Michalis Faloutsos

    Abstract: The goal of this work is to systematically extract information from hacker forums, whose information would be in general described as unstructured: the text of a post is not necessarily following any writing rules. By contrast, many security initiatives and commercial entities are harnessing the readily public information, but they seem to focus on structured sources of information. Here, we focus… ▽ More

    Submitted 13 April, 2018; originally announced April 2018.

    Comments: 10 pages

  36. arXiv:1804.04760  [pdf, ps, other

    cs.IR cs.LG

    RIPEx: Extracting malicious IP addresses from security forums using cross-forum learning

    Authors: Joobin Gharibshah, Evangelos E. Papalexakis, Michalis Faloutsos

    Abstract: Is it possible to extract malicious IP addresses reported in security forums in an automatic way? This is the question at the heart of our work. We focus on security forums, where security professionals and hackers share knowledge and information, and often report misbehaving IP addresses. So far, there have only been a few efforts to extract information from such security forums. We propose RIPEx… ▽ More

    Submitted 12 April, 2018; originally announced April 2018.

    Comments: 12 pages, Accepted in n 22nd Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), 2018

  37. arXiv:1803.05473  [pdf, other

    cs.LG

    SUSTain: Scalable Unsupervised Scoring for Tensors and its Application to Phenoty**

    Authors: Ioakeim Perros, Evangelos E. Papalexakis, Haesun Park, Richard Vuduc, Xiaowei Yan, Christopher Defilippi, Walter F. Stewart, Jimeng Sun

    Abstract: This paper presents a new method, which we call SUSTain, that extends real-valued matrix and tensor factorizations to data where values are integers. Such data are common when the values correspond to event counts or ordinal measures. The conventional approach is to treat integer data as real, and then apply real-valued factorizations. However, doing so fails to preserve important characteristics… ▽ More

    Submitted 14 March, 2018; originally announced March 2018.

  38. arXiv:1803.04572  [pdf, other

    cs.LG stat.ML

    COPA: Constrained PARAFAC2 for Sparse & Large Datasets

    Authors: Ardavan Afshar, Ioakeim Perros, Evangelos E. Papalexakis, Elizabeth Searles, Joyce Ho, Jimeng Sun

    Abstract: PARAFAC2 has demonstrated success in modeling irregular tensors, where the tensor dimensions vary across one of the modes. An example scenario is modeling treatments across a set of patients with the varying number of medical encounters over time. Despite recent improvements on unconstrained PARAFAC2, its model factors are usually dense and sensitive to noise which limits their interpretability. A… ▽ More

    Submitted 27 August, 2018; v1 submitted 12 March, 2018; originally announced March 2018.

    Comments: 17 pages

  39. arXiv:1710.11306  [pdf, other

    cs.DS cs.LG stat.ML

    The Exact Solution to Rank-1 L1-norm TUCKER2 Decomposition

    Authors: Panos P. Markopoulos, Dimitris G. Chachlakis, Evangelos E. Papalexakis

    Abstract: We study rank-1 {L1-norm-based TUCKER2} (L1-TUCKER2) decomposition of 3-way tensors, treated as a collection of $N$ $D \times M$ matrices that are to be jointly decomposed. Our contributions are as follows. i) We prove that the problem is equivalent to combinatorial optimization over $N$ antipodal-binary variables. ii) We derive the first two algorithms in the literature for its exact solution. Th… ▽ More

    Submitted 30 October, 2017; originally announced October 2017.

    Comments: This is a preprint; An edited/finalized version of this manuscript has been submitted for publication to IEEE Signal Processing Letters

  40. arXiv:1709.01147  [pdf, other

    stat.ML cs.LG

    Balancing Interpretability and Predictive Accuracy for Unsupervised Tensor Mining

    Authors: Ishmam Zabir, Evangelos E. Papalexakis

    Abstract: The PARAFAC tensor decomposition has enjoyed an increasing success in exploratory multi-aspect data mining scenarios. A major challenge remains the estimation of the number of latent factors (i.e., the rank) of the decomposition, which yields high-quality, interpretable results. Previously, we have proposed an automated tensor mining method which leverages a well-known quality heuristic from the f… ▽ More

    Submitted 4 September, 2017; originally announced September 2017.

  41. arXiv:1709.00668  [pdf, other

    stat.ML cs.AI cs.LG

    SamBaTen: Sampling-based Batch Incremental Tensor Decomposition

    Authors: Ekta Gujral, Ravdeep Pasricha, Evangelos E. Papalexakis

    Abstract: Tensor decompositions are invaluable tools in analyzing multimodal datasets. In many real-world scenarios, such datasets are far from being static, to the contrary they tend to grow over time. For instance, in an online social network setting, as we observe new interactions over time, our dataset gets updated in its "time" mode. How can we maintain a valid and accurate tensor decomposition of such… ▽ More

    Submitted 18 September, 2017; v1 submitted 3 September, 2017; originally announced September 2017.

  42. arXiv:1703.04219  [pdf, other

    cs.LG math.NA

    SPARTan: Scalable PARAFAC2 for Large & Sparse Data

    Authors: Ioakeim Perros, Evangelos E. Papalexakis, Fei Wang, Richard Vuduc, Elizabeth Searles, Michael Thompson, Jimeng Sun

    Abstract: In exploratory tensor mining, a common problem is how to analyze a set of variables across a set of subjects whose observations do not align naturally. For example, when modeling medical features across a set of patients, the number and duration of treatments may vary widely in time, meaning there is no meaningful way to align their clinical records across time points for analysis purposes. To han… ▽ More

    Submitted 12 March, 2017; originally announced March 2017.

  43. arXiv:1607.01668  [pdf, other

    stat.ML cs.LG math.NA

    Tensor Decomposition for Signal Processing and Machine Learning

    Authors: Nicholas D. Sidiropoulos, Lieven De Lathauwer, Xiao Fu, Kejun Huang, Evangelos E. Papalexakis, Christos Faloutsos

    Abstract: Tensors or {\em multi-way arrays} are functions of three or more indices $(i,j,k,\cdots)$ -- similar to matrices (two-way arrays), which are functions of two indices $(r,c)$ for (row,column). Tensors have a rich history, stretching over almost a century, and touching upon numerous disciplines; but they have only recently become ubiquitous in signal and data analytics at the confluence of signal pr… ▽ More

    Submitted 14 December, 2016; v1 submitted 6 July, 2016; originally announced July 2016.

    Comments: revised version, overview article

  44. arXiv:1503.03355  [pdf, ps, other

    stat.ML cs.LG math.NA stat.AP

    Automatic Unsupervised Tensor Mining with Quality Assessment

    Authors: Evangelos E. Papalexakis

    Abstract: A popular tool for unsupervised modelling and mining multi-aspect data is tensor decomposition. In an exploratory setting, where and no labels or ground truth are available how can we automatically decide how many components to extract? How can we assess the quality of our results, so that a domain expert can factor this quality measure in the interpretation of our results? In this paper, we intro… ▽ More

    Submitted 11 March, 2015; originally announced March 2015.

  45. arXiv:1302.7043  [pdf, ps, other

    stat.ML cs.LG

    Scoup-SMT: Scalable Coupled Sparse Matrix-Tensor Factorization

    Authors: Evangelos E. Papalexakis, Tom M. Mitchell, Nicholas D. Sidiropoulos, Christos Faloutsos, Partha Pratim Talukdar, Brian Murphy

    Abstract: How can we correlate neural activity in the human brain as it responds to words, with behavioral data expressed as answers to questions about these same words? In short, we want to find latent variables, that explain both the brain activity, as well as the behavioral responses. We show that this is an instance of the Coupled Matrix-Tensor Factorization (CMTF) problem. We propose Scoup-SMT, a novel… ▽ More

    Submitted 27 February, 2013; originally announced February 2013.

    Comments: 9 pages