Skip to main content

Showing 1–50 of 83 results for author: Faloutsos, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.11884  [pdf, other

    cs.SI cs.AI

    Hierarchical Compression of Text-Rich Graphs via Large Language Models

    Authors: Shichang Zhang, Da Zheng, Jiani Zhang, Qi Zhu, Xiang song, Soji Adeshina, Christos Faloutsos, George Karypis, Yizhou Sun

    Abstract: Text-rich graphs, prevalent in data mining contexts like e-commerce and academic graphs, consist of nodes with textual features linked by various relations. Traditional graph machine learning models, such as Graph Neural Networks (GNNs), excel in encoding the graph structural information, but have limited capability in handling rich text on graph nodes. Large Language Models (LLMs), noted for thei… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  2. arXiv:2406.09534  [pdf, other

    cs.DB cs.LG

    FeatNavigator: Automatic Feature Augmentation on Tabular Data

    Authors: Jiaming Liang, Chuan Lei, Xiao Qin, Jiani Zhang, Asterios Katsifodimos, Christos Faloutsos, Huzefa Rangwala

    Abstract: Data-centric AI focuses on understanding and utilizing high-quality, relevant data in training machine learning (ML) models, thereby increasing the likelihood of producing accurate and useful results. Automatic feature augmentation, aiming to augment the initial base table with useful features from other tables, is critical in data preparation as it improves model performance, robustness, and gene… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 15 pages, 41 figures

  3. arXiv:2406.06022  [pdf, other

    cs.LG cs.DC

    GraphStorm: all-in-one graph machine learning framework for industry applications

    Authors: Da Zheng, Xiang Song, Qi Zhu, Jian Zhang, Theodore Vasiloudis, Runjie Ma, Houyu Zhang, Zichen Wang, Soji Adeshina, Israt Nisa, Alejandro Mottini, Qingjun Cui, Huzefa Rangwala, Belinda Zeng, Christos Faloutsos, George Karypis

    Abstract: Graph machine learning (GML) is effective in many business applications. However, making GML easy to use and applicable to industry applications with massive datasets remain challenging. We developed GraphStorm, which provides an end-to-end solution for scalable graph construction, graph model training and inference. GraphStorm has the following desirable properties: (a) Easy to use: it can perfor… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Journal ref: KDD 2024

  4. arXiv:2406.03728  [pdf, other

    cs.CV

    Evaluating Durability: Benchmark Insights into Multimodal Watermarking

    Authors: Jielin Qiu, William Han, Xuandong Zhao, Shangbang Long, Christos Faloutsos, Lei Li

    Abstract: With the development of large models, watermarks are increasingly employed to assert copyright, verify authenticity, or monitor content distribution. As applications become more multimodal, the utility of watermarking techniques becomes even more critical. The effectiveness and reliability of these watermarks largely depend on their robustness to various disturbances. However, the robustness of th… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  5. arXiv:2404.18209  [pdf, other

    cs.LG cs.DB

    4DBInfer: A 4D Benchmarking Toolbox for Graph-Centric Predictive Modeling on Relational DBs

    Authors: Minjie Wang, Quan Gan, David Wipf, Zhenkun Cai, Ning Li, Jianheng Tang, Yanlin Zhang, Zizhao Zhang, Zunyao Mao, Yakun Song, Yanbo Wang, Jiahang Li, Han Zhang, Guang Yang, Xiao Qin, Chuan Lei, Muhan Zhang, Weinan Zhang, Christos Faloutsos, Zheng Zhang

    Abstract: Although RDBs store vast amounts of rich, informative data spread across interconnected tables, the progress of predictive machine learning models as applied to such tasks arguably falls well behind advances in other domains such as computer vision or natural language processing. This deficit stems, at least in part, from the lack of established/public RDB benchmarks as needed for training and eva… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

    Comments: Under review

  6. arXiv:2404.01578  [pdf, other

    cs.LG cs.SI

    GLEMOS: Benchmark for Instantaneous Graph Learning Model Selection

    Authors: Namyong Park, Ryan Rossi, Xing Wang, Antoine Simoulin, Nesreen Ahmed, Christos Faloutsos

    Abstract: The choice of a graph learning (GL) model (i.e., a GL algorithm and its hyperparameter settings) has a significant impact on the performance of downstream tasks. However, selecting the right GL model becomes increasingly difficult and time consuming as more and more GL models are developed. Accordingly, it is of great significance and practical value to equip users of GL with the ability to perfor… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: NeurIPS 2023

  7. arXiv:2403.12339  [pdf, other

    cs.CV

    Entity6K: A Large Open-Domain Evaluation Dataset for Real-World Entity Recognition

    Authors: Jielin Qiu, William Han, Winfred Wang, Zhengyuan Yang, Linjie Li, Jianfeng Wang, Christos Faloutsos, Lei Li, Lijuan Wang

    Abstract: Open-domain real-world entity recognition is essential yet challenging, involving identifying various entities in diverse environments. The lack of a suitable evaluation dataset has been a major obstacle in this field due to the vast number of entities and the extensive human effort required for data curation. We introduce Entity6K, a comprehensive dataset for real-world entity recognition, featur… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  8. arXiv:2403.08027  [pdf, other

    cs.LG

    McCatch: Scalable Microcluster Detection in Dimensional and Nondimensional Datasets

    Authors: Braulio V. Sánchez Vinces, Robson L. F. Cordeiro, Christos Faloutsos

    Abstract: How could we have an outlier detector that works even with nondimensional data, and ranks together both singleton microclusters ('one-off' outliers) and nonsingleton microclusters by their anomaly scores? How to obtain scores that are principled in one scalable and 'hands-off' manner? Microclusters of outliers indicate coalition or repetition in fraud activities, etc.; their identification is thus… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

    Comments: \c{opyright} 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

  9. arXiv:2403.07653  [pdf, other

    cs.DB

    OmniMatch: Effective Self-Supervised Any-Join Discovery in Tabular Data Repositories

    Authors: Christos Koutras, Jiani Zhang, Xiao Qin, Chuan Lei, Vasileios Ioannidis, Christos Faloutsos, George Karypis, Asterios Katsifodimos

    Abstract: How can we discover join relationships among columns of tabular data in a data repository? Can this be done effectively when metadata is missing? Traditional column matching works mainly rely on similarity measures based on exact value overlaps, hence missing important semantics or failing to handle noise in the data. At the same time, recent dataset discovery methods focusing on deep table repres… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  10. arXiv:2403.04735  [pdf, other

    cs.CV

    SnapNTell: Enhancing Entity-Centric Visual Question Answering with Retrieval Augmented Multimodal LLM

    Authors: Jielin Qiu, Andrea Madotto, Zhaojiang Lin, Paul A. Crook, Yifan Ethan Xu, Xin Luna Dong, Christos Faloutsos, Lei Li, Babak Damavandi, Seungwhan Moon

    Abstract: Vision-extended LLMs have made significant strides in Visual Question Answering (VQA). Despite these advancements, VLLMs still encounter substantial difficulties in handling queries involving long-tail entities, with a tendency to produce erroneous or hallucinated responses. In this work, we introduce a novel evaluative benchmark named \textbf{SnapNTell}, specifically tailored for entity-centric V… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

  11. arXiv:2403.01382  [pdf, other

    cs.CL

    Automatic Question-Answer Generation for Long-Tail Knowledge

    Authors: Rohan Kumar, Youngmin Kim, Sunitha Ravi, Haitian Sun, Christos Faloutsos, Ruslan Salakhutdinov, Minji Yoon

    Abstract: Pretrained Large Language Models (LLMs) have gained significant attention for addressing open-domain Question Answering (QA). While they exhibit high accuracy in answering questions related to common knowledge, LLMs encounter difficulties in learning about uncommon long-tail knowledge (tail entities). Since manually constructing QA datasets demands substantial human resources, the types of existin… ▽ More

    Submitted 2 March, 2024; originally announced March 2024.

    Comments: Accepted at KDD 2023 KnowledgeNLP

  12. arXiv:2402.17944  [pdf, other

    cs.CL

    Large Language Models(LLMs) on Tabular Data: Prediction, Generation, and Understanding -- A Survey

    Authors: Xi Fang, Weijie Xu, Fiona Anting Tan, Jiani Zhang, Ziqing Hu, Yanjun Qi, Scott Nickleach, Diego Socolinsky, Srinivasan Sengamedu, Christos Faloutsos

    Abstract: Recent breakthroughs in large language modeling have facilitated rigorous exploration of their application in diverse tasks related to tabular data modeling, such as prediction, tabular data synthesis, question answering, and table understanding. Each task presents unique challenges and opportunities. However, there is currently a lack of comprehensive review that summarizes and compares the key t… ▽ More

    Submitted 21 June, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

    Comments: 41 pages, 4 figures, 8 tables

    MSC Class: 68T50 ACM Class: I.2.7

    Journal ref: TMLR 2024

  13. arXiv:2402.14361  [pdf, other

    cs.LG

    OpenTab: Advancing Large Language Models as Open-domain Table Reasoners

    Authors: Kezhi Kong, Jiani Zhang, Zhengyuan Shen, Balasubramaniam Srinivasan, Chuan Lei, Christos Faloutsos, Huzefa Rangwala, George Karypis

    Abstract: Large Language Models (LLMs) trained on large volumes of data excel at various natural language tasks, but they cannot handle tasks requiring knowledge that has not been trained on previously. One solution is to use a retriever that fetches relevant information to expand LLM's knowledge scope. However, existing textual-oriented retrieval-based LLMs are not ideal on structured table data due to div… ▽ More

    Submitted 12 April, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

    Comments: Accepted by ICLR 2024

  14. arXiv:2402.07999  [pdf, other

    cs.LG cs.SI

    NetInfoF Framework: Measuring and Exploiting Network Usable Information

    Authors: Meng-Chieh Lee, Haiyang Yu, Jian Zhang, Vassilis N. Ioannidis, Xiang Song, Soji Adeshina, Da Zheng, Christos Faloutsos

    Abstract: Given a node-attributed graph, and a graph task (link prediction or node classification), can we tell if a graph neural network (GNN) will perform well? More specifically, do the graph structure and the node features carry enough usable information for the task? Our goals are (1) to develop a fast tool to measure how much information is in the graph structure and in the node features, and (2) to e… ▽ More

    Submitted 20 March, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

    Comments: Accepted to ICLR 2024 (Spotlight)

  15. EBV: Electronic Bee-Veterinarian for Principled Mining and Forecasting of Honeybee Time Series

    Authors: Mst. Shamima Hossain, Christos Faloutsos, Boris Baer, Hyoseung Kim, Vassilis J. Tsotras

    Abstract: Honeybees are vital for pollination and food production. Among many factors, extreme temperature (e.g., due to climate change) is particularly dangerous for bee health. Anticipating such extremities would allow beekeepers to take early preventive action. Thus, given sensor (temperature) time series data from beehives, how can we find patterns and do forecasting? Forecasting is crucial as it helps… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

    Comments: 9 pages, 7 figure, Accepted at 2024 SIAM International Conference on Data Mining (SDM'24)

  16. arXiv:2310.20046  [pdf, other

    cs.CL

    Which Examples to Annotate for In-Context Learning? Towards Effective and Efficient Selection

    Authors: Costas Mavromatis, Balasubramaniam Srinivasan, Zhengyuan Shen, Jiani Zhang, Huzefa Rangwala, Christos Faloutsos, George Karypis

    Abstract: Large Language Models (LLMs) can adapt to new tasks via in-context learning (ICL). ICL is efficient as it does not require any parameter updates to the trained LLM, but only few annotated examples as input for the LLM. In this work, we investigate an active learning approach for ICL, where there is a limited budget for annotating examples. We propose a model-adaptive optimization-free algorithm, t… ▽ More

    Submitted 30 October, 2023; originally announced October 2023.

  17. arXiv:2310.09656  [pdf, other

    cs.LG

    Mixed-Type Tabular Data Synthesis with Score-based Diffusion in Latent Space

    Authors: Hengrui Zhang, Jiani Zhang, Balasubramaniam Srinivasan, Zhengyuan Shen, Xiao Qin, Christos Faloutsos, Huzefa Rangwala, George Karypis

    Abstract: Recent advances in tabular data generation have greatly enhanced synthetic data quality. However, extending diffusion models to tabular data is challenging due to the intricately varied distributions and a blend of data types of tabular data. This paper introduces Tabsyn, a methodology that synthesizes tabular data by leveraging a diffusion model within a variational autoencoder (VAE) crafted late… ▽ More

    Submitted 11 May, 2024; v1 submitted 14 October, 2023; originally announced October 2023.

    Comments: Accepted by ICLR 2024 (Oral Presentation). Code is available at: https://github.com/amazon-science/tabsyn

  18. arXiv:2309.13885  [pdf, other

    cs.LG cs.AI cs.CL cs.CV cs.SI

    TouchUp-G: Improving Feature Representation through Graph-Centric Finetuning

    Authors: **g Zhu, Xiang Song, Vassilis N. Ioannidis, Danai Koutra, Christos Faloutsos

    Abstract: How can we enhance the node features acquired from Pretrained Models (PMs) to better suit downstream graph learning tasks? Graph Neural Networks (GNNs) have become the state-of-the-art approach for many high-impact, real-world graph applications. For feature-rich graphs, a prevalent practice involves utilizing a PM directly to generate features, without incorporating any domain adaptation techniqu… ▽ More

    Submitted 25 September, 2023; originally announced September 2023.

    Comments: preprint, ongoing work

  19. arXiv:2307.01933  [pdf, other

    cs.AI cs.CG cs.CL cs.SC

    Concept2Box: Joint Geometric Embeddings for Learning Two-View Knowledge Graphs

    Authors: Zijie Huang, Daheng Wang, Binxuan Huang, Chenwei Zhang, **gbo Shang, Yan Liang, Zhengyang Wang, Xian Li, Christos Faloutsos, Yizhou Sun, Wei Wang

    Abstract: Knowledge graph embeddings (KGE) have been extensively studied to embed large-scale relational data for many real-world applications. Existing methods have long ignored the fact many KGs contain two fundamentally different views: high-level ontology-view concepts and fine-grained instance-view entities. They usually embed all nodes as vectors in one latent space. However, a single geometric repres… ▽ More

    Submitted 4 July, 2023; originally announced July 2023.

    Journal ref: ACL 2023

  20. arXiv:2304.10668  [pdf, other

    cs.LG

    Train Your Own GNN Teacher: Graph-Aware Distillation on Textual Graphs

    Authors: Costas Mavromatis, Vassilis N. Ioannidis, Shen Wang, Da Zheng, Soji Adeshina, Jun Ma, Han Zhao, Christos Faloutsos, George Karypis

    Abstract: How can we learn effective node representations on textual graphs? Graph Neural Networks (GNNs) that use Language Models (LMs) to encode textual information of graphs achieve state-of-the-art performance in many node classification tasks. Yet, combining GNNs with LMs has not been widely explored for practical deployments due to its scalability issues. In this work, we tackle this challenge by deve… ▽ More

    Submitted 20 April, 2023; originally announced April 2023.

  21. arXiv:2303.10408  [pdf, other

    cs.CV cs.AI cs.LG

    ExplainFix: Explainable Spatially Fixed Deep Networks

    Authors: Alex Gaudio, Christos Faloutsos, Asim Smailagic, Pedro Costa, Aurelio Campilho

    Abstract: Is there an initialization for deep networks that requires no learning? ExplainFix adopts two design principles: the "fixed filters" principle that all spatial filter weights of convolutional neural networks can be fixed at initialization and never learned, and the "nimbleness" principle that only few network parameters suffice. We contribute (a) visual model-based explanations, (b) speed and accu… ▽ More

    Submitted 18 March, 2023; originally announced March 2023.

    Comments: Recently Published in Wiley WIREs Journal of Data Mining and Knowledge Discovery. This version has minor formatting differences and includes the supplementary appendix with the main document. Source code: https://github.com/adgaudio/ExplainFix/

  22. arXiv:2302.12465  [pdf, other

    cs.LG cs.SI

    PaGE-Link: Path-based Graph Neural Network Explanation for Heterogeneous Link Prediction

    Authors: Shichang Zhang, Jiani Zhang, Xiang Song, Soji Adeshina, Da Zheng, Christos Faloutsos, Yizhou Sun

    Abstract: Transparency and accountability have become major concerns for black-box machine learning (ML) models. Proper explanations for the model behavior increase model transparency and help researchers develop more accountable models. Graph neural networks (GNN) have recently shown superior performance in many graph ML problems than traditional methods, and explaining them has attracted increased interes… ▽ More

    Submitted 8 May, 2023; v1 submitted 24 February, 2023; originally announced February 2023.

  23. arXiv:2302.00109  [pdf, other

    cs.LG

    OrthoReg: Improving Graph-regularized MLPs via Orthogonality Regularization

    Authors: Hengrui Zhang, Shen Wang, Vassilis N. Ioannidis, Soji Adeshina, Jiani Zhang, Xiao Qin, Christos Faloutsos, Da Zheng, George Karypis, Philip S. Yu

    Abstract: Graph Neural Networks (GNNs) are currently dominating in modeling graph-structure data, while their high reliance on graph structure for inference significantly impedes them from widespread applications. By contrast, Graph-regularized MLPs (GR-MLPs) implicitly inject the graph structure information into model weights, while their performance can hardly match that of GNNs in most tasks. This motiva… ▽ More

    Submitted 31 January, 2023; originally announced February 2023.

  24. arXiv:2301.00270  [pdf, other

    cs.SI cs.LG

    NetEffect: Discovery and Exploitation of Generalized Network Effects

    Authors: Meng-Chieh Lee, Shubhranshu Shekhar, Jaemin Yoo, Christos Faloutsos

    Abstract: Given a large graph with few node labels, how can we (a) identify whether there is generalized network-effects (GNE) or not, (b) estimate GNE to explain the interrelations among node classes, and (c) exploit GNE efficiently to improve the performance on downstream tasks? The knowledge of GNE is valuable for various tasks like node classification, and targeted advertising. However, identifying GNE… ▽ More

    Submitted 12 February, 2024; v1 submitted 31 December, 2022; originally announced January 2023.

    Comments: Accepted to PAKDD 2024

  25. arXiv:2210.04081  [pdf, other

    cs.LG cs.SI

    Less is More: SlimG for Accurate, Robust, and Interpretable Graph Mining

    Authors: Jaemin Yoo, Meng-Chieh Lee, Shubhranshu Shekhar, Christos Faloutsos

    Abstract: How can we solve semi-supervised node classification in various graphs possibly with noisy features and structures? Graph neural networks (GNNs) have succeeded in many graph mining tasks, but their generalizability to various graph scenarios is limited due to the difficulty of training, hyperparameter tuning, and the selection of a model itself. Einstein said that we should "make everything as sim… ▽ More

    Submitted 16 June, 2023; v1 submitted 8 October, 2022; originally announced October 2022.

    Comments: Accepted to KDD 2023

  26. arXiv:2210.02241  [pdf, other

    eess.IV cs.CR cs.CV cs.LG

    HeartSpot: Privatized and Explainable Data Compression for Cardiomegaly Detection

    Authors: Elvin Johnson, Shreshta Mohan, Alex Gaudio, Asim Smailagic, Christos Faloutsos, Aurélio Campilho

    Abstract: Advances in data-driven deep learning for chest X-ray image analysis underscore the need for explainability, privacy, large datasets and significant computational resources. We frame privacy and explainability as a lossy single-image compression problem to reduce both computational and data requirements without training. For Cardiomegaly detection in chest X-ray images, we propose HeartSpot and fo… ▽ More

    Submitted 5 October, 2022; originally announced October 2022.

    Comments: Accepted to IEEE-EMBS International Conference on Biomedical and Health Informatics 2022. IEEE copyrights may apply

  27. arXiv:2206.09280  [pdf, other

    cs.LG cs.SI

    MetaGL: Evaluation-Free Selection of Graph Learning Models via Meta-Learning

    Authors: Namyong Park, Ryan Rossi, Nesreen Ahmed, Christos Faloutsos

    Abstract: Given a graph learning task, such as link prediction, on a new graph, how can we select the best method as well as its hyperparameters (collectively called a model) without having to train or evaluate any model on the new graph? Model selection for graph learning has been largely ad hoc. A typical approach has been to apply popular methods to new datasets, but this is often suboptimal. On the othe… ▽ More

    Submitted 8 June, 2023; v1 submitted 18 June, 2022; originally announced June 2022.

    Comments: ICLR 2023

  28. arXiv:2206.04255  [pdf, other

    cs.LG

    ScatterSample: Diversified Label Sampling for Data Efficient Graph Neural Network Learning

    Authors: Zhenwei Dai, Vasileios Ioannidis, Soji Adeshina, Zak Jost, Christos Faloutsos, George Karypis

    Abstract: What target labels are most effective for graph neural network (GNN) training? In some applications where GNNs excel-like drug design or fraud detection, labeling new instances is expensive. We develop a data-efficient active sampling framework, ScatterSample, to train GNNs under an active learning setting. ScatterSample employs a sampling module termed DiverseUncertainty to collect instances with… ▽ More

    Submitted 9 June, 2022; originally announced June 2022.

    Comments: 13 pages

  29. arXiv:2205.12318  [pdf, other

    cs.LG cs.AI

    ColdGuess: A General and Effective Relational Graph Convolutional Network to Tackle Cold Start Cases

    Authors: Bo He, Xiang Song, Vincent Gao, Christos Faloutsos

    Abstract: Low-quality listings and bad actor behavior in online retail websites threatens e-commerce business as these result in sub-optimal buying experience and erode customer trust. When a new listing is created, how to tell it has good-quality? Is the method effective, fast, and scalable? Previous approaches often have three limitations/challenges: (1) unable to handle cold start problems where new sell… ▽ More

    Submitted 26 May, 2022; v1 submitted 24 May, 2022; originally announced May 2022.

  30. OA-Mine: Open-World Attribute Mining for E-Commerce Products with Weak Supervision

    Authors: Xinyang Zhang, Chenwei Zhang, Xian Li, Xin Luna Dong, **gbo Shang, Christos Faloutsos, Jiawei Han

    Abstract: Automatic extraction of product attributes from their textual descriptions is essential for online shopper experience. One inherent challenge of this task is the emerging nature of e-commerce products -- we see new types of products with their unique set of new attributes constantly. Most prior works on this matter mine new values for a set of known attributes but cannot handle new attributes that… ▽ More

    Submitted 29 April, 2022; originally announced April 2022.

    Comments: WWW 2022

  31. arXiv:2204.08504  [pdf, other

    cs.SI cs.AI cs.LG

    CGC: Contrastive Graph Clustering for Community Detection and Tracking

    Authors: Namyong Park, Ryan Rossi, Eunyee Koh, Iftikhar Ahamath Burhanuddin, Sungchul Kim, Fan Du, Nesreen Ahmed, Christos Faloutsos

    Abstract: Given entities and their interactions in the web data, which may have occurred at different time, how can we find communities of entities and track their evolution? In this paper, we approach this important task from graph clustering perspective. Recently, state-of-the-art clustering performance in various domains has been achieved by deep clustering methods. Especially, deep graph clustering (DGC… ▽ More

    Submitted 27 March, 2023; v1 submitted 5 April, 2022; originally announced April 2022.

    Comments: TheWebConf 2022 Research Track

  32. arXiv:2204.01839  [pdf, other

    cs.IR cs.AI

    Coarse-to-Fine Sparse Sequential Recommendation

    Authors: Jiacheng Li, Tong Zhao, ** Li, Jim Chan, Christos Faloutsos, George Karypis, Soo-Min Pantel, Julian McAuley

    Abstract: Sequential recommendation aims to model dynamic user behavior from historical interactions. Self-attentive methods have proven effective at capturing short-term dynamics and long-term preferences. Despite their success, these approaches still struggle to model sparse data, on which they struggle to learn high-quality item representations. We propose to model user dynamics from shop** intents and… ▽ More

    Submitted 4 April, 2022; originally announced April 2022.

    Comments: Accepted as conference paper at SIGIR 2022

  33. arXiv:2202.07648  [pdf, other

    cs.LG cs.AI cs.SI

    EvoKG: Jointly Modeling Event Time and Network Structure for Reasoning over Temporal Knowledge Graphs

    Authors: Namyong Park, Fuchen Liu, Purvanshi Mehta, Dana Cristofor, Christos Faloutsos, Yuxiao Dong

    Abstract: How can we perform knowledge reasoning over temporal knowledge graphs (TKGs)? TKGs represent facts about entities and their relations, where each fact is associated with a timestamp. Reasoning over TKGs, i.e., inferring new facts from time-evolving KGs, is crucial for many applications to provide intelligent services. However, despite the prevalence of real-world data that can be represented as TK… ▽ More

    Submitted 16 February, 2022; v1 submitted 15 February, 2022; originally announced February 2022.

    Comments: WSDM 2022

  34. Benefit-aware Early Prediction of Health Outcomes on Multivariate EEG Time Series

    Authors: Shubhranshu Shekhar, Dhivya Eswaran, Bryan Hooi, Jonathan Elmer, Christos Faloutsos, Leman Akoglu

    Abstract: Given a cardiac-arrest patient being monitored in the ICU (intensive care unit) for brain activity, how can we predict their health outcomes as early as possible? Early decision-making is critical in many applications, e.g. monitoring patients may assist in early intervention and improved care. On the other hand, early prediction on EEG data poses several challenges: (i) earliness-accuracy trade-o… ▽ More

    Submitted 10 November, 2021; originally announced November 2021.

    Comments: arxiv submission

    Journal ref: Journal of Biomedical Informatics Volume 139, March 2023, 104296

  35. arXiv:2109.02704  [pdf, other

    cs.LG

    gen2Out: Detecting and Ranking Generalized Anomalies

    Authors: Meng-Chieh Lee, Shubhranshu Shekhar, Christos Faloutsos, T. Noah Hutson, Leon Iasemidis

    Abstract: In a cloud of m-dimensional data points, how would we spot, as well as rank, both single-point- as well as group- anomalies? We are the first to generalize anomaly detection in two dimensions: The first dimension is that we handle both point-anomalies, as well as group-anomalies, under a unified view -- we shall refer to them as generalized anomalies. The second dimension is that gen2Out not only… ▽ More

    Submitted 15 November, 2021; v1 submitted 6 September, 2021; originally announced September 2021.

    Comments: In Proceedings of 2021 IEEE International Conference on Big Data (Big Data)

  36. arXiv:2012.15006  [pdf, other

    cs.LG

    Dynamic Graph-Based Anomaly Detection in the Electrical Grid

    Authors: Shimiao Li, Amritanshu Pandey, Bryan Hooi, Christos Faloutsos, Larry Pileggi

    Abstract: Given sensor readings over time from a power grid, how can we accurately detect when an anomaly occurs? A key part of achieving this goal is to use the network of power grid sensors to quickly detect, in real-time, when any unusual events, whether natural faults or malicious, occur on the power grid. Existing bad-data detectors in the industry lack the sophistication to robustly detect broad types… ▽ More

    Submitted 2 December, 2021; v1 submitted 29 December, 2020; originally announced December 2020.

  37. arXiv:2011.14925  [pdf, other

    cs.LG cs.AI

    Autonomous Graph Mining Algorithm Search with Best Speed/Accuracy Trade-off

    Authors: Minji Yoon, Théophile Gervet, Bryan Hooi, Christos Faloutsos

    Abstract: Graph data is ubiquitous in academia and industry, from social networks to bioinformatics. The pervasiveness of graphs today has raised the demand for algorithms that can answer various questions: Which products would a user like to purchase given her order list? Which users are buying fake followers to increase their public reputation? Myriads of new graph mining algorithms are proposed every yea… ▽ More

    Submitted 25 November, 2020; originally announced November 2020.

  38. arXiv:2011.13085  [pdf, other

    cs.SI cs.AI

    Fast and Accurate Anomaly Detection in Dynamic Graphs with a Two-Pronged Approach

    Authors: Minji Yoon, Bryan Hooi, Kijung Shin, Christos Faloutsos

    Abstract: Given a dynamic graph stream, how can we detect the sudden appearance of anomalous patterns, such as link spam, follower boosting, or denial of service attacks? Additionally, can we categorize the types of anomalies that occur in practice, and theoretically analyze the anomalous signs arising from each type? In this work, we propose AnomRank, an online algorithm for anomaly detection in dynamic gr… ▽ More

    Submitted 25 November, 2020; originally announced November 2020.

  39. arXiv:2011.10616  [pdf, other

    cs.LG physics.soc-ph q-bio.PE

    Bridging Physics-based and Data-driven modeling for Learning Dynamical Systems

    Authors: Rui Wang, Danielle Maddix, Christos Faloutsos, Yuyang Wang, Rose Yu

    Abstract: How can we learn a dynamical system to make forecasts, when some variables are unobserved? For instance, in COVID-19, we want to forecast the number of infected and death cases but we do not know the count of susceptible and exposed people. While mechanics compartment models are widely used in epidemic modeling, data-driven models are emerging for disease forecasting. We first formalize the learni… ▽ More

    Submitted 29 April, 2021; v1 submitted 20 November, 2020; originally announced November 2020.

  40. arXiv:2011.05928  [pdf, other

    cs.IR cs.AI

    J-Recs: Principled and Scalable Recommendation Justification

    Authors: Namyong Park, Andrey Kan, Christos Faloutsos, Xin Luna Dong

    Abstract: Online recommendation is an essential functionality across a variety of services, including e-commerce and video streaming, where items to buy, watch, or read are suggested to users. Justifying recommendations, i.e., explaining why a user might like the recommended item, has been shown to improve user satisfaction and persuasiveness of the recommendation. In this paper, we develop a method for gen… ▽ More

    Submitted 11 November, 2020; originally announced November 2020.

    Comments: ICDM 2020

  41. arXiv:2011.00447  [pdf, other

    cs.SI

    AutoAudit: Mining Accounting and Time-Evolving Graphs

    Authors: Meng-Chieh Lee, Yue Zhao, Aluna Wang, Pierre **ghong Liang, Leman Akoglu, Vincent S. Tseng, Christos Faloutsos

    Abstract: How can we spot money laundering in large-scale graph-like accounting datasets? How to identify the most suspicious period in a time-evolving accounting graph? What kind of accounts and events should practitioners prioritize under time constraints? To tackle these crucial challenges in accounting and auditing tasks, we propose a flexible system called AutoAudit, which can be valuable for auditors… ▽ More

    Submitted 1 November, 2020; originally announced November 2020.

    Comments: In Proceedings of 2020 IEEE International Conference on Big Data (Big Data)

  42. arXiv:2009.08452  [pdf, other

    cs.LG cs.SI stat.ML

    Real-Time Anomaly Detection in Edge Streams

    Authors: Siddharth Bhatia, Rui Liu, Bryan Hooi, Minji Yoon, Kijung Shin, Christos Faloutsos

    Abstract: Given a stream of graph edges from a dynamic graph, how can we assign anomaly scores to edges in an online manner, for the purpose of detecting unusual behavior, using constant time and memory? Existing approaches aim to detect individually surprising edges. In this work, we propose MIDAS, which focuses on detecting microcluster anomalies, or suddenly arriving groups of suspiciously similar edges,… ▽ More

    Submitted 25 April, 2022; v1 submitted 17 September, 2020; originally announced September 2020.

    Comments: Published in TKDD. Extended Journal Version of arXiv:1911.04464

  43. AutoKnow: Self-Driving Knowledge Collection for Products of Thousands of Types

    Authors: Xin Luna Dong, Xiang He, Andrey Kan, Xian Li, Yan Liang, Jun Ma, Yifan Ethan Xu, Chenwei Zhang, Tong Zhao, Gabriel Blanco Saldana, Saurabh Deshpande, Alexandre Michetti Manduca, Jay Ren, Surender Pal Singh, Fan Xiao, Haw-Shiuan Chang, Giannis Karamanolakis, Yuning Mao, Yaqing Wang, Christos Faloutsos, Andrew McCallum, Jiawei Han

    Abstract: Can one build a knowledge graph (KG) for all products in the world? Knowledge graphs have firmly established themselves as valuable sources of information for search and question answering, and it is natural to wonder if a KG can contain information about products offered at online retail sites. There have been several successful examples of generic KGs, but organizing information about products p… ▽ More

    Submitted 24 June, 2020; originally announced June 2020.

    Comments: KDD 2020

  44. MultiImport: Inferring Node Importance in a Knowledge Graph from Multiple Input Signals

    Authors: Namyong Park, Andrey Kan, Xin Luna Dong, Tong Zhao, Christos Faloutsos

    Abstract: Given multiple input signals, how can we infer node importance in a knowledge graph (KG)? Node importance estimation is a crucial and challenging task that can benefit a lot of applications including recommendation, search, and query disambiguation. A key challenge towards this goal is how to effectively use input from different sources. On the one hand, a KG is a rich source of information, with… ▽ More

    Submitted 22 June, 2020; originally announced June 2020.

    Comments: KDD 2020 Research Track. 10 pages

  45. Octet: Online Catalog Taxonomy Enrichment with Self-Supervision

    Authors: Yuning Mao, Tong Zhao, Andrey Kan, Chenwei Zhang, Xin Luna Dong, Christos Faloutsos, Jiawei Han

    Abstract: Taxonomies have found wide applications in various domains, especially online for item categorization, browsing, and search. Despite the prevalent use of online catalog taxonomies, most of them in practice are maintained by humans, which is labor-intensive and difficult to scale. While taxonomy construction from scratch is considerably studied in the literature, how to effectively enrich existing… ▽ More

    Submitted 18 June, 2020; originally announced June 2020.

    Comments: KDD 2020

  46. arXiv:2002.07833  [pdf, other

    cs.SI cs.LG

    Higher-Order Label Homogeneity and Spreading in Graphs

    Authors: Dhivya Eswaran, Srijan Kumar, Christos Faloutsos

    Abstract: Do higher-order network structures aid graph semi-supervised learning? Given a graph and a few labeled vertices, labeling the remaining vertices is a high-impact problem with applications in several tasks, such as recommender systems, fraud detection and protein identification. However, traditional methods rely on edges for spreading labels, which is limited as all edges are not equal. Vertices wi… ▽ More

    Submitted 18 February, 2020; originally announced February 2020.

    Comments: 7 pages

  47. AutoBlock: A Hands-off Blocking Framework for Entity Matching

    Authors: Wei Zhang, Hao Wei, Bunyamin Sisman, Xin Luna Dong, Christos Faloutsos, David Page

    Abstract: Entity matching seeks to identify data records over one or multiple data sources that refer to the same real-world entity. Virtually every entity matching task on large datasets requires blocking, a step that reduces the number of record pairs to be matched. However, most of the traditional blocking methods are learning-free and key-based, and their successes are largely built on laborious human e… ▽ More

    Submitted 6 December, 2019; originally announced December 2019.

    Comments: In The Thirteenth ACM International Conference on Web Search and Data Mining (WSDM '20), February 3-7, 2020, Houston, TX, USA. ACM, Anchorage, Alaska, USA , 9 pages

  48. arXiv:1911.04464  [pdf, other

    cs.LG cs.AI

    MIDAS: Microcluster-Based Detector of Anomalies in Edge Streams

    Authors: Siddharth Bhatia, Bryan Hooi, Minji Yoon, Kijung Shin, Christos Faloutsos

    Abstract: Given a stream of graph edges from a dynamic graph, how can we assign anomaly scores to edges in an online manner, for the purpose of detecting unusual behavior, using constant time and memory? Existing approaches aim to detect individually surprising edges. In this work, we propose MIDAS, which focuses on detecting microcluster anomalies, or suddenly arriving groups of suspiciously similar edges,… ▽ More

    Submitted 23 August, 2020; v1 submitted 11 November, 2019; originally announced November 2019.

    Comments: 8 pages, Accepted at AAAI Conference on Artificial Intelligence (AAAI), 2020 [oral paper]; minor fixes, updated experiments

  49. arXiv:1905.08865  [pdf, other

    cs.LG cs.IR stat.ML

    Estimating Node Importance in Knowledge Graphs Using Graph Neural Networks

    Authors: Namyong Park, Andrey Kan, Xin Luna Dong, Tong Zhao, Christos Faloutsos

    Abstract: How can we estimate the importance of nodes in a knowledge graph (KG)? A KG is a multi-relational graph that has proven valuable for many tasks including question answering and semantic search. In this paper, we present GENI, a method for tackling the problem of estimating node importance in KGs, which enables several downstream applications such as item recommendation and resource allocation. Whi… ▽ More

    Submitted 16 June, 2019; v1 submitted 21 May, 2019; originally announced May 2019.

    Comments: KDD 2019 Research Track. 11 pages. Changelog: Type 3 font removed, and minor updates made in the Appendix (v2)

  50. arXiv:1808.00239  [pdf, other

    cs.IR

    Did We Get It Right? Predicting Query Performance in E-commerce Search

    Authors: Rohan Kumar, Mohit Kumar, Neil Shah, Christos Faloutsos

    Abstract: In this paper, we address the problem of evaluating whether results served by an e-commerce search engine for a query are good or not. This is a critical question in evaluating any e-commerce search engine. While this question is traditionally answered using simple metrics like query click-through rate (CTR), we observe that in e-commerce search, such metrics can be misleading. Upon inspection, we… ▽ More

    Submitted 1 August, 2018; originally announced August 2018.