Skip to main content

Showing 1–50 of 50 results for author: Neville, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.01633  [pdf, other

    cs.IR cs.AI cs.CL cs.LG

    On Overcoming Miscalibrated Conversational Priors in LLM-based Chatbots

    Authors: Christine Herlihy, Jennifer Neville, Tobias Schnabel, Adith Swaminathan

    Abstract: We explore the use of Large Language Model (LLM-based) chatbots to power recommender systems. We observe that the chatbots respond poorly when they encounter under-specified requests (e.g., they make incorrect assumptions, hedge with a long response, or refuse to answer). We conjecture that such miscalibrated response tendencies (i.e., conversational priors) can be attributed to LLM fine-tuning us… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: Preprint of UAI'24 conference publication

  2. MS MARCO Web Search: a Large-scale Information-rich Web Dataset with Millions of Real Click Labels

    Authors: Qi Chen, Xiubo Geng, Corby Rosset, Carolyn Buractaon, **gwen Lu, Tao Shen, Kun Zhou, Chenyan Xiong, Yeyun Gong, Paul Bennett, Nick Craswell, Xing Xie, Fan Yang, Bryan Tower, Nikhil Rao, Anlei Dong, Wenqi Jiang, Zheng Liu, Mingqin Li, Chuanjie Liu, Zengzhong Li, Rangan Majumder, Jennifer Neville, Andy Oakley, Knut Magne Risvik , et al. (6 additional authors not shown)

    Abstract: Recent breakthroughs in large models have highlighted the critical significance of data scale, labels and modals. In this paper, we introduce MS MARCO Web Search, the first large-scale information-rich web dataset, featuring millions of real clicked query-document labels. This dataset closely mimics real-world web document and query distribution, provides rich information for various kinds of down… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: 10 pages, 6 figures, for associated dataset, see http://github.com/microsoft/MS-MARCO-Web-Search

  3. arXiv:2405.04656  [pdf, other

    cs.HC

    Corporate Communication Companion (CCC): An LLM-empowered Writing Assistant for Workplace Social Media

    Authors: Zhuoran Lu, Sheshera Mysore, Tara Safavi, Jennifer Neville, Longqi Yang, Mengting Wan

    Abstract: Workplace social media platforms enable employees to cultivate their professional image and connect with colleagues in a semi-formal environment. While semi-formal corporate communication poses a unique set of challenges, large language models (LLMs) have shown great promise in hel** users draft and edit their social media posts. However, LLMs may fail to capture individualized tones and voices… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  4. arXiv:2404.10148  [pdf, other

    cs.SI cs.DS cs.LG math.PR stat.ML

    Node Similarities under Random Projections: Limits and Pathological Cases

    Authors: Tvrtko Tadić, Cassiano Becker, Jennifer Neville

    Abstract: Random Projections have been widely used to generate embeddings for various graph tasks due to their computational efficiency. The majority of applications have been justified through the Johnson-Lindenstrauss Lemma. In this paper, we take a step further and investigate how well dot product and cosine similarity are preserved by Random Projections. Our analysis provides new theoretical results, id… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  5. arXiv:2404.04268  [pdf

    cs.IR cs.AI cs.CY cs.SI

    The Use of Generative Search Engines for Knowledge Work and Complex Tasks

    Authors: Siddharth Suri, Scott Counts, Leijie Wang, Chacha Chen, Mengting Wan, Tara Safavi, Jennifer Neville, Chirag Shah, Ryen W. White, Reid Andersen, Georg Buscher, Sathish Manivannan, Nagu Rangan, Longqi Yang

    Abstract: Until recently, search engines were the predominant method for people to access online information. The recent emergence of large language models (LLMs) has given machines new capabilities such as the ability to generate new digital artifacts like text, images, code etc., resulting in a new tool, a generative search engine, which combines the capabilities of LLMs with a traditional search engine.… ▽ More

    Submitted 19 March, 2024; originally announced April 2024.

    Comments: 32 pages, 3 figures, 4 tables

    ACM Class: J.4

  6. arXiv:2404.02319  [pdf, other

    cs.CL cs.AI cs.LG

    Symbolic Prompt Program Search: A Structure-Aware Approach to Efficient Compile-Time Prompt Optimization

    Authors: Tobias Schnabel, Jennifer Neville

    Abstract: In many modern LLM applications, such as retrieval augmented generation, prompts have become programs themselves. In these settings, prompt programs are repeatedly called with different user queries or data instances. A big practical challenge is optimizing such prompt programs. Recent work has mostly focused on either simple prompt programs or assumed that the general structure of a prompt progra… ▽ More

    Submitted 27 June, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

  7. arXiv:2403.12388  [pdf, other

    cs.IR cs.AI

    Interpretable User Satisfaction Estimation for Conversational Systems with Large Language Models

    Authors: Ying-Chun Lin, Jennifer Neville, Jack W. Stokes, Longqi Yang, Tara Safavi, Mengting Wan, Scott Counts, Siddharth Suri, Reid Andersen, Xiaofeng Xu, Deepak Gupta, Sujay Kumar Jauhar, Xia Song, Georg Buscher, Saurabh Tiwary, Brent Hecht, Jaime Teevan

    Abstract: Accurate and interpretable user satisfaction estimation (USE) is critical for understanding, evaluating, and continuously improving conversational systems. Users express their satisfaction or dissatisfaction with diverse conversational patterns in both general-purpose (ChatGPT and Bing Copilot) and task-oriented (customer service chatbot) conversational systems. Existing approaches based on featur… ▽ More

    Submitted 8 June, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

  8. arXiv:2403.12173  [pdf, other

    cs.CL cs.AI cs.IR

    TnT-LLM: Text Mining at Scale with Large Language Models

    Authors: Mengting Wan, Tara Safavi, Sujay Kumar Jauhar, Yu** Kim, Scott Counts, Jennifer Neville, Siddharth Suri, Chirag Shah, Ryen W White, Longqi Yang, Reid Andersen, Georg Buscher, Dhruv Joshi, Nagu Rangan

    Abstract: Transforming unstructured text into structured and meaningful forms, organized by useful category labels, is a fundamental step in text mining for downstream analysis and application. However, most existing methods for producing label taxonomies and building text-based label classifiers still rely heavily on domain expertise and manual curation, making the process expensive and time-consuming. Thi… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: 9 pages main content, 8 pages references and appendix

  9. arXiv:2402.17896  [pdf, other

    cs.CL cs.AI

    Researchy Questions: A Dataset of Multi-Perspective, Decompositional Questions for LLM Web Agents

    Authors: Corby Rosset, Ho-Lam Chung, Guanghui Qin, Ethan C. Chau, Zhuo Feng, Ahmed Awadallah, Jennifer Neville, Nikhil Rao

    Abstract: Existing question answering (QA) datasets are no longer challenging to most powerful Large Language Models (LLMs). Traditional QA benchmarks like TriviaQA, NaturalQuestions, ELI5 and HotpotQA mainly study ``known unknowns'' with clear indications of both what information is missing, and how to find it to answer the question. Hence, good performance on these benchmarks provides a false sense of sec… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

  10. arXiv:2402.14833  [pdf

    cs.CL cs.AI cs.LG

    CliqueParcel: An Approach For Batching LLM Prompts That Jointly Optimizes Efficiency And Faithfulness

    Authors: Jiayi Liu, Tinghan Yang, Jennifer Neville

    Abstract: Large language models (LLMs) have become pivotal in recent research. However, during the inference process, LLMs still require substantial resources. In this paper, we propose CliqueParcel, a method designed to improve the efficiency of LLMs via prompt batching. Existing strategies to optimize inference efficiency often compromise on output quality, leading to a discounted output problem. This iss… ▽ More

    Submitted 17 February, 2024; originally announced February 2024.

  11. arXiv:2311.09180  [pdf, other

    cs.CL cs.HC cs.IR

    PEARL: Personalizing Large Language Model Writing Assistants with Generation-Calibrated Retrievers

    Authors: Sheshera Mysore, Zhuoran Lu, Mengting Wan, Longqi Yang, Steve Menezes, Tina Baghaee, Emmanuel Barajas Gonzalez, Jennifer Neville, Tara Safavi

    Abstract: Powerful large language models have facilitated the development of writing assistants that promise to significantly improve the quality and efficiency of composition and communication. However, a barrier to effective assistance is the lack of personalization in LLM outputs to the author's communication style and specialized knowledge. In this paper, we address this challenge by proposing PEARL, a… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

    Comments: Pre-print, work in progress

  12. arXiv:2310.02263  [pdf, other

    cs.CL cs.AI cs.LG

    Automatic Pair Construction for Contrastive Post-training

    Authors: Canwen Xu, Corby Rosset, Ethan C. Chau, Luciano Del Corro, Shweti Mahajan, Julian McAuley, Jennifer Neville, Ahmed Hassan Awadallah, Nikhil Rao

    Abstract: Alignment serves as an important step to steer large language models (LLMs) towards human preferences. In this paper, we propose an automatic way to construct contrastive data for LLM, using preference pairs from multiple models of varying strengths (e.g., InstructGPT, ChatGPT and GPT-4). We compare the contrastive techniques of SLiC and DPO to SFT baselines and find that DPO provides a step-funct… ▽ More

    Submitted 2 April, 2024; v1 submitted 3 October, 2023; originally announced October 2023.

    Comments: NAACL 2024 (Findings)

  13. arXiv:2309.13063  [pdf, other

    cs.IR cs.AI cs.CL

    Using Large Language Models to Generate, Validate, and Apply User Intent Taxonomies

    Authors: Chirag Shah, Ryen W. White, Reid Andersen, Georg Buscher, Scott Counts, Sarkar Snigdha Sarathi Das, Ali Montazer, Sathish Manivannan, Jennifer Neville, Xiaochuan Ni, Nagu Rangan, Tara Safavi, Siddharth Suri, Mengting Wan, Leijie Wang, Longqi Yang

    Abstract: Log data can reveal valuable information about how users interact with Web search services, what they want, and how satisfied they are. However, analyzing user intents in log data is not easy, especially for emerging forms of Web search such as AI-driven chat. To understand user intents from log data, we need a way to label them with meaningful categories that capture their diversity and dynamics.… ▽ More

    Submitted 9 May, 2024; v1 submitted 14 September, 2023; originally announced September 2023.

    Report number: MSR-TR-2023-32

  14. arXiv:2309.08827  [pdf, other

    cs.CL cs.AI

    S3-DST: Structured Open-Domain Dialogue Segmentation and State Tracking in the Era of LLMs

    Authors: Sarkar Snigdha Sarathi Das, Chirag Shah, Mengting Wan, Jennifer Neville, Longqi Yang, Reid Andersen, Georg Buscher, Tara Safavi

    Abstract: The traditional Dialogue State Tracking (DST) problem aims to track user preferences and intents in user-agent conversations. While sufficient for task-oriented dialogue systems supporting narrow domain applications, the advent of Large Language Model (LLM)-based chat systems has introduced many real-world intricacies in open-domain dialogues. These intricacies manifest in the form of increased co… ▽ More

    Submitted 15 September, 2023; originally announced September 2023.

  15. Stationary Algorithmic Balancing For Dynamic Email Re-Ranking Problem

    Authors: Jiayi Liu, Jennifer Neville

    Abstract: Email platforms need to generate personalized rankings of emails that satisfy user preferences, which may vary over time. We approach this as a recommendation problem based on three criteria: closeness (how relevant the sender and topic are to the user), timeliness (how recent the email is), and conciseness (how brief the email is). We propose MOSR (Multi-Objective Stationary Recommender), a novel… ▽ More

    Submitted 12 August, 2023; originally announced August 2023.

    Comments: Published in KDD'23

  16. DYMOND: DYnamic MOtif-NoDes Network Generative Model

    Authors: Giselle Zeno, Timothy La Fond, Jennifer Neville

    Abstract: Motifs, which have been established as building blocks for network structure, move beyond pair-wise connections to capture longer-range correlations in connections and activity. In spite of this, there are few generative graph models that consider higher-order network structures and even fewer that focus on using motifs in models of dynamic graphs. Most existing generative models for temporal grap… ▽ More

    Submitted 1 August, 2023; originally announced August 2023.

    Comments: In Proceedings of the Web Conference 2021 (WWW '21)

    Journal ref: Proceedings of the Web Conference 2021, Pages 718-729

  17. Generating Post-hoc Explanations for Skip-gram-based Node Embeddings by Identifying Important Nodes with Bridgeness

    Authors: Hogun Park, Jennifer Neville

    Abstract: Node representation learning in a network is an important machine learning technique for encoding relational information in a continuous vector space while preserving the inherent properties and structures of the network. Recently, unsupervised node embedding methods such as DeepWalk, LINE, struc2vec, PTE, UserItem2vec, and RWJBG have emerged from the Skip-gram model and perform better performance… ▽ More

    Submitted 15 May, 2023; v1 submitted 24 April, 2023; originally announced April 2023.

    Comments: Accepted to Neural Networks Journal, 2023

    Journal ref: Neural Networks, 2023, ISSN 0893-6080

  18. arXiv:2302.08895  [pdf, ps, other

    cs.LG

    Creating generalizable downstream graph models with random projections

    Authors: Anton Amirov, Chris Quirk, Jennifer Neville

    Abstract: We investigate graph representation learning approaches that enable models to generalize across graphs: given a model trained using the representations from one graph, our goal is to apply inference using those same model parameters when given representations computed over a new graph, unseen during model training, with minimal degradation in inference accuracy. This is in contrast to the more com… ▽ More

    Submitted 17 February, 2023; originally announced February 2023.

  19. arXiv:2210.15120  [pdf, other

    cs.LG

    Federated Graph Representation Learning using Self-Supervision

    Authors: Susheel Suresh, Danny Godbout, Arko Mukherjee, Mayank Shrivastava, Jennifer Neville, Pan Li

    Abstract: Federated graph representation learning (FedGRL) brings the benefits of distributed training to graph structured data while simultaneously addressing some privacy and compliance concerns related to data curation. However, several interesting real-world graph data characteristics viz. label deficiency and downstream task heterogeneity are not taken into consideration in current FedGRL setups. In th… ▽ More

    Submitted 26 October, 2022; originally announced October 2022.

    Comments: FedGraph'22 workshop (non archival) version. (https://sites.google.com/view/fedgraph2022/accepted-papers)

  20. arXiv:2207.06272  [pdf, other

    cs.LG stat.ML

    Hindsight Learning for MDPs with Exogenous Inputs

    Authors: Sean R. Sinclair, Felipe Frujeri, Ching-An Cheng, Luke Marshall, Hugo Barbalho, **gling Li, Jennifer Neville, Ishai Menache, Adith Swaminathan

    Abstract: Many resource management problems require sequential decision-making under uncertainty, where the only uncertainty affecting the decision outcomes are exogenous variables outside the control of the decision-maker. We model these problems as Exo-MDPs (Markov Decision Processes with Exogenous Inputs) and design a class of data-efficient algorithms for them termed Hindsight Learning (HL). Our HL algo… ▽ More

    Submitted 23 October, 2023; v1 submitted 13 July, 2022; originally announced July 2022.

    Comments: 52 pages, 6 figures

    MSC Class: 68Q32 ACM Class: I.2.6

  21. arXiv:2202.02427  [pdf, other

    cs.LG

    Lightweight Compositional Embeddings for Incremental Streaming Recommendation

    Authors: Mengyue Hang, Tobias Schnabel, Longqi Yang, Jennifer Neville

    Abstract: Most work in graph-based recommender systems considers a {\em static} setting where all information about test nodes (i.e., users and items) is available upfront at training time. However, this static setting makes little sense for many real-world applications where data comes in continuously as a stream of new edges and nodes, and one has to update model predictions incrementally to reflect the l… ▽ More

    Submitted 4 February, 2022; originally announced February 2022.

  22. arXiv:2201.06505  [pdf

    cs.AI cs.CV

    Data Harmonisation for Information Fusion in Digital Healthcare: A State-of-the-Art Systematic Review, Meta-Analysis and Future Research Directions

    Authors: Yang Nan, Javier Del Ser, Simon Walsh, Carola Schönlieb, Michael Roberts, Ian Selby, Kit Howard, John Owen, Jon Neville, Julien Guiot, Benoit Ernst, Ana Pastor, Angel Alberich-Bayarri, Marion I. Menzel, Sean Walsh, Wim Vos, Nina Flerin, Jean-Paul Charbonnier, Eva van Rikxoort, Avishek Chatterjee, Henry Woodruff, Philippe Lambin, Leonor Cerdá-Alberich, Luis Martí-Bonmatí, Francisco Herrera , et al. (1 additional authors not shown)

    Abstract: Removing the bias and variance of multicentre data has always been a challenge in large scale digital healthcare studies, which requires the ability to integrate clinical features extracted from data acquired by different scanners and protocols to improve stability and robustness. Previous studies have described various computational approaches to fuse single modality multicentre datasets. However… ▽ More

    Submitted 17 January, 2022; originally announced January 2022.

    Comments: 54 pages, 14 figures, accepted by the Information Fusion journal

  23. Breaking the Limit of Graph Neural Networks by Improving the Assortativity of Graphs with Local Mixing Patterns

    Authors: Susheel Suresh, Vinith Budde, Jennifer Neville, Pan Li, Jianzhu Ma

    Abstract: Graph neural networks (GNNs) have achieved tremendous success on multiple graph-based learning tasks by fusing network structure and node features. Modern GNN models are built upon iterative aggregation of neighbor's/proximity features by message passing. Its prediction performance has been shown to be strongly bounded by assortative mixing in the graph, a key property wherein nodes with similar a… ▽ More

    Submitted 11 June, 2021; originally announced June 2021.

    Comments: published in KDD 2021; 11 pages;

  24. arXiv:2106.05819  [pdf, other

    cs.LG cs.AI

    Adversarial Graph Augmentation to Improve Graph Contrastive Learning

    Authors: Susheel Suresh, Pan Li, Cong Hao, Jennifer Neville

    Abstract: Self-supervised learning of graph neural networks (GNN) is in great need because of the widespread label scarcity issue in real-world graph/network data. Graph contrastive learning (GCL), by training GNNs to maximize the correspondence between the representations of the same graph in its different augmented forms, may yield robust and transferable GNNs even without using labels. However, GNNs trai… ▽ More

    Submitted 2 November, 2021; v1 submitted 10 June, 2021; originally announced June 2021.

    Comments: Accepted to NeurIPS 2021

  25. ReadNet: A Hierarchical Transformer Framework for Web Article Readability Analysis

    Authors: Chang** Meng, Muhao Chen, Jie Mao, Jennifer Neville

    Abstract: Analyzing the readability of articles has been an important sociolinguistic task. Addressing this task is necessary to the automatic recommendation of appropriate articles to readers with different comprehension abilities, and it further benefits education systems, web information systems, and digital libraries. Current methods for assessing readability employ empirical measures or statistical lea… ▽ More

    Submitted 6 March, 2021; originally announced March 2021.

    Comments: ECIR 2020

  26. arXiv:2009.10800  [pdf, other

    cs.AI cs.LG

    A Hybrid Model for Learning Embeddings and Logical Rules Simultaneously from Knowledge Graphs

    Authors: Susheel Suresh, Jennifer Neville

    Abstract: The problem of knowledge graph (KG) reasoning has been widely explored by traditional rule-based systems and more recently by knowledge graph embedding methods. While logical rules can capture deterministic behavior in a KG they are brittle and mining ones that infer facts beyond the known KG is challenging. Probabilistic embedding methods are effective in capturing global soft statistical tendenc… ▽ More

    Submitted 22 September, 2020; originally announced September 2020.

    Comments: 10 page extended version

  27. arXiv:2003.12169  [pdf, other

    cs.LG stat.ML

    A Collective Learning Framework to Boost GNN Expressiveness

    Authors: Mengyue Hang, Jennifer Neville, Bruno Ribeiro

    Abstract: Graph Neural Networks (GNNs) have recently been used for node and graph classification tasks with great success, but GNNs model dependencies among the attributes of nearby neighboring nodes rather than dependencies among observed node labels. In this work, we consider the task of inductive node classification using GNNs in supervised and semi-supervised settings, with the goal of incorporating lab… ▽ More

    Submitted 28 September, 2020; v1 submitted 26 March, 2020; originally announced March 2020.

  28. arXiv:2003.00627  [pdf, other

    cs.LG stat.ML

    Cluster-Based Social Reinforcement Learning

    Authors: Mahak Goindani, Jennifer Neville

    Abstract: Social Reinforcement Learning methods, which model agents in large networks, are useful for fake news mitigation, personalized teaching/healthcare, and viral marketing, but it is challenging to incorporate inter-agent dependencies into the models effectively due to network size and sparse interaction data. Previous social RL approaches either ignore agents dependencies or model them in a computati… ▽ More

    Submitted 23 March, 2020; v1 submitted 1 March, 2020; originally announced March 2020.

  29. arXiv:1910.00547  [pdf, other

    cs.LG stat.AP stat.ML

    Deep Lifetime Clustering

    Authors: S Chandra Mouli, Leonardo Teixeira, Jennifer Neville, Bruno Ribeiro

    Abstract: The goal of lifetime clustering is to develop an inductive model that maps subjects into $K$ clusters according to their underlying (unobserved) lifetime distribution. We introduce a neural-network based lifetime clustering model that can find cluster assignments by directly maximizing the divergence between the empirical lifetime distributions of the clusters. Accordingly, we define a novel clust… ▽ More

    Submitted 1 October, 2019; v1 submitted 1 October, 2019; originally announced October 2019.

  30. arXiv:1904.05332  [pdf, other

    cs.SI cs.LG stat.ML

    Community detection over a heterogeneous population of non-aligned networks

    Authors: Guilherme Gomes, Vinayak Rao, Jennifer Neville

    Abstract: Clustering and community detection with multiple graphs have typically focused on aligned graphs, where there is a map** between nodes across the graphs (e.g., multi-view, multi-layer, temporal graphs). However, there are numerous application areas with multiple graphs that are only partially aligned, or even unaligned. These graphs are often drawn from the same population, with communities of p… ▽ More

    Submitted 4 April, 2019; originally announced April 2019.

  31. Exploring Student Check-In Behavior for Improved Point-of-Interest Prediction

    Authors: Mengyue Hang, Ian Pytlarz, Jennifer Neville

    Abstract: With the availability of vast amounts of user visitation history on location-based social networks (LBSN), the problem of Point-of-Interest (POI) prediction has been extensively studied. However, much of the research has been conducted solely on voluntary checkin datasets collected from social apps such as Foursquare or Yelp. While these data contain rich information about recreational activities… ▽ More

    Submitted 25 September, 2018; originally announced November 2018.

    Comments: published in KDD'18

  32. arXiv:1809.02512  [pdf, other

    stat.ML cs.LG cs.SI stat.AP

    Multi-level hypothesis testing for populations of heterogeneous networks

    Authors: Guilherme Gomes, Vinayak Rao, Jennifer Neville

    Abstract: In this work, we consider hypothesis testing and anomaly detection on datasets where each observation is a weighted network. Examples of such data include brain connectivity networks from fMRI flow data, or word co-occurrence counts for populations of individuals. Current approaches to hypothesis testing for weighted networks typically requires thresholding the edge-weights, to transform the data… ▽ More

    Submitted 7 September, 2018; originally announced September 2018.

  33. arXiv:1707.07716  [pdf, other

    stat.ML cs.LG

    Stochastic Gradient Descent for Relational Logistic Regression via Partial Network Crawls

    Authors: Jiasen Yang, Bruno Ribeiro, Jennifer Neville

    Abstract: Research in statistical relational learning has produced a number of methods for learning relational models from large-scale network data. While these methods have been successfully applied in various domains, they have been developed under the unrealistic assumption of full data access. In practice, however, the data are often collected by crawling the network, due to proprietary access, limited… ▽ More

    Submitted 20 August, 2017; v1 submitted 24 July, 2017; originally announced July 2017.

    Comments: 7 pages, 3 figures, Proceedings of the Seventh International Workshop on Statistical Relational AI (StarAI 2017)

  34. arXiv:1703.03401  [pdf, other

    cs.SI

    Identifying User Survival Types via Clustering of Censored Social Network Data

    Authors: S Chandra Mouli, Abhishek Naik, Bruno Ribeiro, Jennifer Neville

    Abstract: The goal of cluster analysis in survival data is to identify clusters that are decidedly associated with the survival outcome. Previous research has explored this problem primarily in the medical domain with relatively small datasets, but the need for such a clustering methodology could arise in other domains with large datasets, such as social networks. Concretely, we wish to identify different s… ▽ More

    Submitted 9 March, 2017; originally announced March 2017.

  35. arXiv:1608.00712  [pdf, other

    cs.LG

    Size-Consistent Statistics for Anomaly Detection in Dynamic Networks

    Authors: Timothy La Fond, Jennifer Neville, Brian Gallagher

    Abstract: An important task in network analysis is the detection of anomalous events in a network time series. These events could merely be times of interest in the network timeline or they could be examples of malicious activity or network malfunction. Hypothesis testing using network statistics to summarize the behavior of the network provides a robust framework for the anomaly detection decision process.… ▽ More

    Submitted 2 August, 2016; originally announced August 2016.

  36. arXiv:1607.00110  [pdf, ps, other

    cs.LG stat.ML

    Combining Gradient Boosting Machines with Collective Inference to Predict Continuous Values

    Authors: Iman Alodah, Jennifer Neville

    Abstract: Gradient boosting of regression trees is a competitive procedure for learning predictive models of continuous data that fits the data with an additive non-parametric model. The classic version of gradient boosting assumes that the data is independent and identically distributed. However, relational data with interdependent, linked instances is now common and the dependencies in such data can be ex… ▽ More

    Submitted 1 July, 2016; originally announced July 2016.

    Comments: 7 pages, 3 Figures, Sixth International Workshop on Statistical Relational AI

  37. arXiv:1507.03168  [pdf, other

    cs.AI

    Using Bayesian Network Representations for Effective Sampling from Generative Network Models

    Authors: Pablo Robles-Granda, Sebastian Moreno, Jennifer Neville

    Abstract: Bayesian networks (BNs) are used for inference and sampling by exploiting conditional independence among random variables. Context specific independence (CSI) is a property of graphical models where additional independence relations arise in the context of particular values of random variables (RVs). Identifying and exploiting CSI properties can simplify inference. Some generative network models (… ▽ More

    Submitted 11 July, 2015; originally announced July 2015.

  38. arXiv:1506.04322  [pdf, other

    cs.SI cs.DC cs.IR stat.ML

    Graphlet Decomposition: Framework, Algorithms, and Applications

    Authors: Nesreen K. Ahmed, Jennifer Neville, Ryan A. Rossi, Nick Duffield, Theodore L. Willke

    Abstract: From social science to biology, numerous applications often rely on graphlets for intuitive and meaningful characterization of networks at both the global macro-level as well as the local micro-level. While graphlets have witnessed a tremendous success and impact in a variety of domains, there has yet to be a fast and efficient approach for computing the frequencies of these subgraph patterns. How… ▽ More

    Submitted 15 February, 2016; v1 submitted 13 June, 2015; originally announced June 2015.

  39. arXiv:1411.3749  [pdf, other

    cs.SI physics.soc-ph

    Anomaly Detection in Dynamic Networks of Varying Size

    Authors: Timothy La Fond, Jennifer Neville, Brian Gallagher

    Abstract: Dynamic networks, also called network streams, are an important data representation that applies to many real-world domains. Many sets of network data such as e-mail networks, social networks, or internet traffic networks are best represented by a dynamic network due to the temporal component of the data. One important application in the domain of dynamic network analysis is anomaly detection. Her… ▽ More

    Submitted 13 November, 2014; originally announced November 2014.

  40. arXiv:1403.3909  [pdf, ps, other

    cs.SI cs.DB physics.soc-ph stat.AP

    Graph Sample and Hold: A Framework for Big-Graph Analytics

    Authors: Nesreen K. Ahmed, Nick Duffield, Jennifer Neville, Ramana Kompella

    Abstract: Sampling is a standard approach in big-graph analytics; the goal is to efficiently estimate the graph properties by consulting a sample of the whole population. A perfect sample is assumed to mirror every property of the whole population. Unfortunately, such a perfect sample is hard to collect in complex populations such as graphs (e.g. web graphs, social networks etc), where an underlying network… ▽ More

    Submitted 16 March, 2014; originally announced March 2014.

  41. arXiv:1403.3707  [pdf, ps, other

    cs.SI cs.LG physics.soc-ph stat.ML

    Learning the Latent State Space of Time-Varying Graphs

    Authors: Nesreen K. Ahmed, Christopher Cole, Jennifer Neville

    Abstract: From social networks to Internet applications, a wide variety of electronic communication tools are producing streams of graph data; where the nodes represent users and the edges represent the contacts between them over time. This has led to an increased interest in mechanisms to model the dynamic structure of time-varying graphs. In this work, we develop a framework for learning the latent state… ▽ More

    Submitted 14 March, 2014; originally announced March 2014.

  42. arXiv:1211.3412  [pdf, ps, other

    cs.SI cs.DS cs.LG physics.soc-ph stat.ML

    Network Sampling: From Static to Streaming Graphs

    Authors: Nesreen K. Ahmed, Jennifer Neville, Ramana Kompella

    Abstract: Network sampling is integral to the analysis of social, information, and biological networks. Since many real-world networks are massive in size, continuously evolving, and/or distributed in nature, the network structure is often sampled in order to facilitate study. For these reasons, a more thorough and complete understanding of network sampling is critical to support the field of network scienc… ▽ More

    Submitted 13 November, 2012; originally announced November 2012.

  43. arXiv:1206.4952  [pdf, other

    cs.SI cs.DB physics.soc-ph stat.AP

    Space-Efficient Sampling from Social Activity Streams

    Authors: Nesreen K. Ahmed, Jennifer Neville, Ramana Kompella

    Abstract: In order to efficiently study the characteristics of network domains and support development of network systems (e.g. algorithms, protocols that operate on networks), it is often necessary to sample a representative subgraph from a large complex network. Although recent subgraph sampling methods have been shown to work well, they focus on sampling from memory-resident graphs and assume that the sa… ▽ More

    Submitted 20 June, 2012; originally announced June 2012.

    Comments: BigMine 2012

  44. arXiv:1205.2056  [pdf, other

    cs.SI cs.LG physics.soc-ph stat.ML

    Dynamic Behavioral Mixed-Membership Model for Large Evolving Networks

    Authors: Ryan Rossi, Brian Gallagher, Jennifer Neville, Keith Henderson

    Abstract: The majority of real-world networks are dynamic and extremely large (e.g., Internet Traffic, Twitter, Facebook, ...). To understand the structural behavior of nodes in these large dynamic networks, it may be necessary to model the dynamics of behavioral roles representing the main connectivity patterns over time. In this paper, we propose a dynamic behavioral mixed-membership model (DBMM) that cap… ▽ More

    Submitted 9 May, 2012; originally announced May 2012.

  45. arXiv:1204.0033  [pdf, other

    stat.ML cs.AI cs.LG cs.SI

    Transforming Graph Representations for Statistical Relational Learning

    Authors: Ryan A. Rossi, Luke K. McDowell, David W. Aha, Jennifer Neville

    Abstract: Relational data representations have become an increasingly important topic due to the recent proliferation of network datasets (e.g., social, biological, information networks) and a corresponding increase in the application of statistical relational learning (SRL) algorithms to these domains. In this article, we examine a range of representation issues for graph-based relational data. Since the c… ▽ More

    Submitted 30 March, 2012; originally announced April 2012.

    ACM Class: I.2; I.2.6; H.2.8; H.3.3

  46. arXiv:1203.2200  [pdf, other

    cs.SI cs.AI cs.LG stat.ML

    Role-Dynamics: Fast Mining of Large Dynamic Networks

    Authors: Ryan Rossi, Brian Gallagher, Jennifer Neville, Keith Henderson

    Abstract: To understand the structural dynamics of a large-scale social, biological or technological network, it may be useful to discover behavioral roles representing the main connectivity patterns present over time. In this paper, we propose a scalable non-parametric approach to automatically learn the structural dynamics of the network and individual nodes. Roles may represent structural or behavioral p… ▽ More

    Submitted 9 March, 2012; originally announced March 2012.

    ACM Class: H.2.8; G.2.2

  47. arXiv:1202.4805  [pdf, other

    cs.SI physics.soc-ph

    Fast Generation of Large Scale Social Networks with Clustering

    Authors: Joseph J. Pfeiffer III, Timothy La Fond, Sebastian Moreno, Jennifer Neville

    Abstract: A key challenge within the social network literature is the problem of network generation - that is, how can we create synthetic networks that match characteristics traditionally found in most real world networks? Important characteristics that are present in social networks include a power law degree distribution, small diameter and large amounts of clustering; however, most current network gener… ▽ More

    Submitted 21 February, 2012; originally announced February 2012.

    Comments: 11 pages

    ACM Class: G.2.2; G.3

  48. arXiv:1111.5312  [pdf, other

    cs.AI cs.SI physics.soc-ph stat.ML

    Representations and Ensemble Methods for Dynamic Relational Classification

    Authors: Ryan A. Rossi, Jennifer Neville

    Abstract: Temporal networks are ubiquitous and evolve over time by the addition, deletion, and changing of links, nodes, and attributes. Although many relational datasets contain temporal information, the majority of existing techniques in relational learning focus on static snapshots and ignore the temporal dynamics. We propose a framework for discovering temporal representations of relational data to incr… ▽ More

    Submitted 22 November, 2011; originally announced November 2011.

    MSC Class: 68T01 ACM Class: I.2.6; H.2.8

  49. arXiv:1104.0319  [pdf, ps, other

    cs.SI physics.soc-ph

    Methods to Determine Node Centrality and Clustering in Graphs with Uncertain Structure

    Authors: Joseph J. Pfeiffer III, Jennifer Neville

    Abstract: Much of the past work in network analysis has focused on analyzing discrete graphs, where binary edges represent the "presence" or "absence" of a relationship. Since traditional network measures (e.g., betweenness centrality) utilize a discrete link structure, complex systems must be transformed to this representation in order to investigate network properties. However, in many domains there may b… ▽ More

    Submitted 2 April, 2011; originally announced April 2011.

    Comments: Longer version of paper appearing in Fifth International AAAI Conference on Weblogs and Social Media. 9 pages, 4 Figures

    MSC Class: 91D30 ACM Class: H.3.4

  50. arXiv:1103.3103  [pdf

    cs.DB

    Guided Data Repair

    Authors: Mohamed Yakout, Ahmed K. Elmagarmid, Jennifer Neville, Mourad Ouzzani, Ihab F. Ilyas

    Abstract: In this paper we present GDR, a Guided Data Repair framework that incorporates user feedback in the cleaning process to enhance and accelerate existing automatic repair techniques while minimizing user involvement. GDR consults the user on the updates that are most likely to be beneficial in improving data quality. GDR also uses machine learning methods to identify and apply the correct updates di… ▽ More

    Submitted 16 March, 2011; originally announced March 2011.

    Comments: VLDB2011

    Journal ref: Proceedings of the VLDB Endowment (PVLDB), Vol. 4, No. 5, pp. 279-289 (2011)