Search | arXiv e-print repository

A Neuro-Symbolic Approach to Monitoring Salt Content in Food

Authors: Anuja Tayal, Barbara Di Eugenio, Devika Salunke, Andrew D. Boyd, Carolyn A Dickens, Eulalia P Abril, Olga Garcia-Bedoya, Paula G Allen-Meares

Abstract: We propose a dialogue system that enables heart failure patients to inquire about salt content in foods and help them monitor and reduce salt intake. Addressing the lack of specific datasets for food-based salt content inquiries, we develop a template-based conversational dataset. The dataset is structured to ask clarification questions to identify food items and their salt content. Our findings i… ▽ More We propose a dialogue system that enables heart failure patients to inquire about salt content in foods and help them monitor and reduce salt intake. Addressing the lack of specific datasets for food-based salt content inquiries, we develop a template-based conversational dataset. The dataset is structured to ask clarification questions to identify food items and their salt content. Our findings indicate that while fine-tuning transformer-based models on the dataset yields limited performance, the integration of Neuro-Symbolic Rules significantly enhances the system's performance. Our experiments show that by integrating neuro-symbolic rules, our system achieves an improvement in joint goal accuracy of over 20% across different data sizes compared to naively fine-tuning transformer-based models. △ Less

Submitted 1 April, 2024; originally announced April 2024.

Comments: Accepted in CL4Health workshop in LREC-COLING'24

arXiv:2401.09651 [pdf, other]

Convex and Bilevel Optimization for Neuro-Symbolic Inference and Learning

Authors: Charles Dickens, Changyu Gao, Connor Pryor, Stephen Wright, Lise Getoor

Abstract: We leverage convex and bilevel optimization techniques to develop a general gradient-based parameter learning framework for neural-symbolic (NeSy) systems. We demonstrate our framework with NeuPSL, a state-of-the-art NeSy architecture. To achieve this, we propose a smooth primal and dual formulation of NeuPSL inference and show learning gradients are functions of the optimal dual variables. Additi… ▽ More We leverage convex and bilevel optimization techniques to develop a general gradient-based parameter learning framework for neural-symbolic (NeSy) systems. We demonstrate our framework with NeuPSL, a state-of-the-art NeSy architecture. To achieve this, we propose a smooth primal and dual formulation of NeuPSL inference and show learning gradients are functions of the optimal dual variables. Additionally, we develop a dual block coordinate descent algorithm for the new formulation that naturally exploits warm-starts. This leads to over 100x learning runtime improvements over the current best NeuPSL inference method. Finally, we provide extensive empirical evaluations across 8 datasets covering a range of tasks and demonstrate our learning framework achieves up to a 16% point prediction performance improvement over alternative learning methods. △ Less

Submitted 3 June, 2024; v1 submitted 17 January, 2024; originally announced January 2024.

arXiv:2312.15520 [pdf, other]

Graph Coarsening via Convolution Matching for Scalable Graph Neural Network Training

Authors: Charles Dickens, Eddie Huang, Aishwarya Reganti, Jiong Zhu, Karthik Subbian, Danai Koutra

Abstract: Graph summarization as a preprocessing step is an effective and complementary technique for scalable graph neural network (GNN) training. In this work, we propose the Coarsening Via Convolution Matching (CONVMATCH) algorithm and a highly scalable variant, A-CONVMATCH, for creating summarized graphs that preserve the output of graph convolution. We evaluate CONVMATCH on six real-world link predicti… ▽ More Graph summarization as a preprocessing step is an effective and complementary technique for scalable graph neural network (GNN) training. In this work, we propose the Coarsening Via Convolution Matching (CONVMATCH) algorithm and a highly scalable variant, A-CONVMATCH, for creating summarized graphs that preserve the output of graph convolution. We evaluate CONVMATCH on six real-world link prediction and node classification graph datasets, and show it is efficient and preserves prediction performance while significantly reducing the graph size. Notably, CONVMATCH achieves up to 95% of the prediction performance of GNNs on node classification while trained on graphs summarized down to 1% the size of the original graph. Furthermore, on link prediction tasks, CONVMATCH consistently outperforms all baselines, achieving up to a 2x improvement. △ Less

Submitted 24 December, 2023; originally announced December 2023.

arXiv:2312.08981 [pdf, other]

Matching Noisy Keys for Obfuscation

Authors: Charlie Dickens, Eric Bax

Abstract: Data sketching has emerged as a key infrastructure for large-scale data analysis on streaming and distributed data. Merging sketches enables efficient estimation of cardinalities and frequency histograms over distributed data. However, merging sketches can require that each sketch stores hash codes for identifiers in different data sets or partitions, in order to perform effective matching. This c… ▽ More Data sketching has emerged as a key infrastructure for large-scale data analysis on streaming and distributed data. Merging sketches enables efficient estimation of cardinalities and frequency histograms over distributed data. However, merging sketches can require that each sketch stores hash codes for identifiers in different data sets or partitions, in order to perform effective matching. This can reveal identifiers during merging or across different data set or partition owners. This paper presents a framework to use noisy hash codes, with the noise level selected to obfuscate identifiers while allowing matching, with high probability. We give probabilistic error bounds on simultaneous obfuscation and matching, concluding that this is a viable approach. △ Less

Submitted 14 December, 2023; originally announced December 2023.

arXiv:2305.09887 [pdf, other]

Simplifying Distributed Neural Network Training on Massive Graphs: Randomized Partitions Improve Model Aggregation

Authors: Jiong Zhu, Aishwarya Reganti, Edward Huang, Charles Dickens, Nikhil Rao, Karthik Subbian, Danai Koutra

Abstract: Distributed training of GNNs enables learning on massive graphs (e.g., social and e-commerce networks) that exceed the storage and computational capacity of a single machine. To reach performance comparable to centralized training, distributed frameworks focus on maximally recovering cross-instance node dependencies with either communication across instances or periodic fallback to centralized tra… ▽ More Distributed training of GNNs enables learning on massive graphs (e.g., social and e-commerce networks) that exceed the storage and computational capacity of a single machine. To reach performance comparable to centralized training, distributed frameworks focus on maximally recovering cross-instance node dependencies with either communication across instances or periodic fallback to centralized training, which create overhead and limit the framework scalability. In this work, we present a simplified framework for distributed GNN training that does not rely on the aforementioned costly operations, and has improved scalability, convergence speed and performance over the state-of-the-art approaches. Specifically, our framework (1) assembles independent trainers, each of which asynchronously learns a local model on locally-available parts of the training graph, and (2) only conducts periodic (time-based) model aggregation to synchronize the local models. Backed by our theoretical analysis, instead of maximizing the recovery of cross-instance node dependencies -- which has been considered the key behind closing the performance gap between model aggregation and centralized training -- , our framework leverages randomized assignment of nodes or super-nodes (i.e., collections of original nodes) to partition the training graph such that it improves data uniformity and minimizes the discrepancy of gradient and loss function across instances. In our experiments on social and e-commerce networks with up to 1.3 billion edges, our proposed RandomTMA and SuperTMA approaches -- despite using less training data -- achieve state-of-the-art performance and 2.31x speedup compared to the fastest baseline, and show better robustness to trainer failures. △ Less

Submitted 16 May, 2023; originally announced May 2023.

Comments: 14 pages, 3 figures

arXiv:2207.07238 [pdf, other]

Emotion Recognition in Conversation using Probabilistic Soft Logic

Authors: Eriq Augustine, Pegah Jandaghi, Alon Albalak, Connor Pryor, Charles Dickens, William Wang, Lise Getoor

Abstract: Creating agents that can both appropriately respond to conversations and understand complex human linguistic tendencies and social cues has been a long standing challenge in the NLP community. A recent pillar of research revolves around emotion recognition in conversation (ERC); a sub-field of emotion recognition that focuses on conversations or dialogues that contain two or more utterances. In th… ▽ More Creating agents that can both appropriately respond to conversations and understand complex human linguistic tendencies and social cues has been a long standing challenge in the NLP community. A recent pillar of research revolves around emotion recognition in conversation (ERC); a sub-field of emotion recognition that focuses on conversations or dialogues that contain two or more utterances. In this work, we explore an approach to ERC that exploits the use of neural embeddings along with complex structures in dialogues. We implement our approach in a framework called Probabilistic Soft Logic (PSL), a declarative templating language that uses first-order like logical rules, that when combined with data, define a particular class of graphical model. Additionally, PSL provides functionality for the incorporation of results from neural models into PSL models. This allows our model to take advantage of advanced neural methods, such as sentence embeddings, and logical reasoning over the structure of a dialogue. We compare our method with state-of-the-art purely neural ERC systems, and see almost a 20% improvement. With these results, we provide an extensive qualitative and quantitative analysis over the DailyDialog conversation dataset. △ Less

Submitted 14 July, 2022; originally announced July 2022.

arXiv:2205.14268 [pdf, other]

NeuPSL: Neural Probabilistic Soft Logic

Authors: Connor Pryor, Charles Dickens, Eriq Augustine, Alon Albalak, William Wang, Lise Getoor

Abstract: In this paper, we introduce Neural Probabilistic Soft Logic (NeuPSL), a novel neuro-symbolic (NeSy) framework that unites state-of-the-art symbolic reasoning with the low-level perception of deep neural networks. To model the boundary between neural and symbolic representations, we propose a family of energy-based models, NeSy Energy-Based Models, and show that they are general enough to include N… ▽ More In this paper, we introduce Neural Probabilistic Soft Logic (NeuPSL), a novel neuro-symbolic (NeSy) framework that unites state-of-the-art symbolic reasoning with the low-level perception of deep neural networks. To model the boundary between neural and symbolic representations, we propose a family of energy-based models, NeSy Energy-Based Models, and show that they are general enough to include NeuPSL and many other NeSy approaches. Using this framework, we show how to seamlessly integrate neural and symbolic parameter learning and inference in NeuPSL. Through an extensive empirical evaluation, we demonstrate the benefits of using NeSy methods, achieving upwards of 30% improvement over independent neural network models. On a well-established NeSy task, MNIST-Addition, NeuPSL demonstrates its joint reasoning capabilities by outperforming existing NeSy approaches by up to 10% in low-data settings. Furthermore, NeuPSL achieves a 5% boost in performance over state-of-the-art NeSy methods in a canonical citation network task with up to a 40 times speed up. △ Less

Submitted 23 May, 2023; v1 submitted 27 May, 2022; originally announced May 2022.

arXiv:2203.15400 [pdf, ps, other]

Order-Invariant Cardinality Estimators Are Differentially Private

Authors: Charlie Dickens, Justin Thaler, Daniel Ting

Abstract: We consider privacy in the context of streaming algorithms for cardinality estimation. We show that a large class of algorithms all satisfy $ε$-differential privacy, so long as (a) the algorithm is combined with a simple down-sampling procedure, and (b) the cardinality of the input stream is $Ω(k/ε)$. Here, $k$ is a certain parameter of the sketch that is always at most the sketch size in bits, bu… ▽ More We consider privacy in the context of streaming algorithms for cardinality estimation. We show that a large class of algorithms all satisfy $ε$-differential privacy, so long as (a) the algorithm is combined with a simple down-sampling procedure, and (b) the cardinality of the input stream is $Ω(k/ε)$. Here, $k$ is a certain parameter of the sketch that is always at most the sketch size in bits, but is typically much smaller. We also show that, even with no modification, algorithms in our class satisfy $(ε, δ)$-differential privacy, where $δ$ falls exponentially with the stream cardinality. Our analysis applies to essentially all popular cardinality estimation algorithms, and substantially generalizes and tightens privacy bounds from earlier works. △ Less

Submitted 3 February, 2023; v1 submitted 29 March, 2022; originally announced March 2022.

Comments: Changed title and updated with camera ready version from conference

arXiv:2101.07546 [pdf, other]

Subspace exploration: Bounds on Projected Frequency Estimation

Authors: Graham Cormode, Charlie Dickens, David P. Woodruff

Abstract: Given an $n \times d$ dimensional dataset $A$, a projection query specifies a subset $C \subseteq [d]$ of columns which yields a new $n \times |C|$ array. We study the space complexity of computing data analysis functions over such subspaces, including heavy hitters and norms, when the subspaces are revealed only after observing the data. We show that this important class of problems is typically… ▽ More Given an $n \times d$ dimensional dataset $A$, a projection query specifies a subset $C \subseteq [d]$ of columns which yields a new $n \times |C|$ array. We study the space complexity of computing data analysis functions over such subspaces, including heavy hitters and norms, when the subspaces are revealed only after observing the data. We show that this important class of problems is typically hard: for many problems, we show $2^{Ω(d)}$ lower bounds. However, we present upper bounds which demonstrate space dependency better than $2^d$. That is, for $c,c' \in (0,1)$ and a parameter $N=2^d$ an $N^c$-approximation can be obtained in space $\min(N^{c'},n)$, showing that it is possible to improve on the naïve approach of kee** information for all $2^d$ subsets of $d$ columns. Our results are based on careful constructions of instances using coding theory and novel combinatorial reductions that exhibit such space-approximation tradeoffs. △ Less

Submitted 19 January, 2021; originally announced January 2021.

arXiv:2011.03607 [pdf, ps, other]

Ridge Regression with Frequent Directions: Statistical and Optimization Perspectives

Authors: Charlie Dickens

Abstract: Despite its impressive theory \& practical performance, Frequent Directions (\acrshort{fd}) has not been widely adopted for large-scale regression tasks. Prior work has shown randomized sketches (i) perform worse in estimating the covariance matrix of the data than \acrshort{fd}; (ii) incur high error when estimating the bias and/or variance on sketched ridge regression. We give the first constant… ▽ More Despite its impressive theory \& practical performance, Frequent Directions (\acrshort{fd}) has not been widely adopted for large-scale regression tasks. Prior work has shown randomized sketches (i) perform worse in estimating the covariance matrix of the data than \acrshort{fd}; (ii) incur high error when estimating the bias and/or variance on sketched ridge regression. We give the first constant factor relative error bounds on the bias \& variance for sketched ridge regression using \acrshort{fd}. We complement these statistical results by showing that \acrshort{fd} can be used in the optimization setting through an iterative scheme which yields high-accuracy solutions. This improves on randomized approaches which need to compromise the need for a new sketch every iteration with speed of convergence. In both settings, we also show using \emph{Robust Frequent Directions} further enhances performance. △ Less

Submitted 6 November, 2020; originally announced November 2020.

arXiv:2009.08952 [pdf, other]

HyperFair: A Soft Approach to Integrating Fairness Criteria

Authors: Charles Dickens, Rishika Singh, Lise Getoor

Abstract: Recommender systems are being employed across an increasingly diverse set of domains that can potentially make a significant social and individual impact. For this reason, considering fairness is a critical step in the design and evaluation of such systems. In this paper, we introduce HyperFair, a general framework for enforcing soft fairness constraints in a hybrid recommender system. HyperFair m… ▽ More Recommender systems are being employed across an increasingly diverse set of domains that can potentially make a significant social and individual impact. For this reason, considering fairness is a critical step in the design and evaluation of such systems. In this paper, we introduce HyperFair, a general framework for enforcing soft fairness constraints in a hybrid recommender system. HyperFair models integrate variations of fairness metrics as a regularization of a joint inference objective function. We implement our approach using probabilistic soft logic and show that it is particularly well-suited for this task as it is expressive and structural constraints can be added to the system in a concise and interpretable manner. We propose two ways to employ the methods we introduce: first as an extension of a probabilistic soft logic recommender system template; second as a fair retrofitting technique that can be used to improve the fairness of predictions from a black-box model. We empirically validate our approach by implementing multiple HyperFair hybrid recommenders and compare them to a state-of-the-art fair recommender. We also run experiments showing the effectiveness of our methods for the task of retrofitting a black-box model and the trade-off between the amount of fairness enforced and the prediction performance. △ Less

Submitted 5 September, 2020; originally announced September 2020.

arXiv:2008.01505 [pdf, other]

Interpretable Anomaly Detection with Mondrian P{ó}lya Forests on Data Streams

Authors: Charlie Dickens, Eric Meissner, Pablo G. Moreno, Tom Diethe

Abstract: Anomaly detection at scale is an extremely challenging problem of great practicality. When data is large and high-dimensional, it can be difficult to detect which observations do not fit the expected behaviour. Recent work has coalesced on variations of (random) $k$\emph{d-trees} to summarise data for anomaly detection. However, these methods rely on ad-hoc score functions that are not easy to int… ▽ More Anomaly detection at scale is an extremely challenging problem of great practicality. When data is large and high-dimensional, it can be difficult to detect which observations do not fit the expected behaviour. Recent work has coalesced on variations of (random) $k$\emph{d-trees} to summarise data for anomaly detection. However, these methods rely on ad-hoc score functions that are not easy to interpret, making it difficult to asses the severity of the detected anomalies or select a reasonable threshold in the absence of labelled anomalies. To solve these issues, we contextualise these methods in a probabilistic framework which we call the Mondrian \Polya{} Forest for estimating the underlying probability density function generating the data and enabling greater interpretability than prior work. In addition, we develop a memory efficient variant able to operate in the modern streaming environments. Our experiments show that these methods achieves state-of-the-art performance while providing statistically interpretable anomaly scores. △ Less

Submitted 4 August, 2020; originally announced August 2020.

arXiv:2003.11498 [pdf, other]

Similarity of Neural Networks with Gradients

Authors: Shuai Tang, Wesley J. Maddox, Charlie Dickens, Tom Diethe, Andreas Damianou

Abstract: A suitable similarity index for comparing learnt neural networks plays an important role in understanding the behaviour of the highly-nonlinear functions, and can provide insights on further theoretical analysis and empirical studies. We define two key steps when comparing models: firstly, the representation abstracted from the learnt model, where we propose to leverage both feature vectors and gr… ▽ More A suitable similarity index for comparing learnt neural networks plays an important role in understanding the behaviour of the highly-nonlinear functions, and can provide insights on further theoretical analysis and empirical studies. We define two key steps when comparing models: firstly, the representation abstracted from the learnt model, where we propose to leverage both feature vectors and gradient ones (which are largely ignored in prior work) into designing the representation of a neural network. Secondly, we define the employed similarity index which gives desired invariance properties, and we facilitate the chosen ones with sketching techniques for comparing various datasets efficiently. Empirically, we show that the proposed approach provides a state-of-the-art method for computing similarity of neural networks that are trained independently on different datasets and the tasks defined by the datasets. △ Less

Submitted 25 March, 2020; originally announced March 2020.

arXiv:1910.14166 [pdf, other]

Iterative Hessian Sketch in Input Sparsity Time

Authors: Graham Cormode, Charlie Dickens

Abstract: Scalable algorithms to solve optimization and regression tasks even approximately, are needed to work with large datasets. In this paper we study efficient techniques from matrix sketching to solve a variety of convex constrained regression problems. We adopt "Iterative Hessian Sketching" (IHS) and show that the fast CountSketch and sparse Johnson-Lindenstrauss Transforms yield state-of-the-art ac… ▽ More Scalable algorithms to solve optimization and regression tasks even approximately, are needed to work with large datasets. In this paper we study efficient techniques from matrix sketching to solve a variety of convex constrained regression problems. We adopt "Iterative Hessian Sketching" (IHS) and show that the fast CountSketch and sparse Johnson-Lindenstrauss Transforms yield state-of-the-art accuracy guarantees under IHS, while drastically improving the time cost. As a result, we obtain significantly faster algorithms for constrained regression, for both sparse and dense inputs. Our empirical results show that we can summarize data roughly 100x faster for sparse data, and, surprisingly, 10x faster on dense data! Consequently, solutions accurate to within machine precision of the optimal solution can be found much faster than the previous state of the art. △ Less

Submitted 30 October, 2019; originally announced October 2019.

arXiv:1807.02571 [pdf, other]

Leveraging Well-Conditioned Bases: Streaming \& Distributed Summaries in Minkowski $p$-Norms

Authors: Graham Cormode, Charlie Dickens, David P. Woodruff

Abstract: Work on approximate linear algebra has led to efficient distributed and streaming algorithms for problems such as approximate matrix multiplication, low rank approximation, and regression, primarily for the Euclidean norm $\ell_2$. We study other $\ell_p$ norms, which are more robust for $p < 2$, and can be used to find outliers for $p > 2$. Unlike previous algorithms for such norms, we give algor… ▽ More Work on approximate linear algebra has led to efficient distributed and streaming algorithms for problems such as approximate matrix multiplication, low rank approximation, and regression, primarily for the Euclidean norm $\ell_2$. We study other $\ell_p$ norms, which are more robust for $p < 2$, and can be used to find outliers for $p > 2$. Unlike previous algorithms for such norms, we give algorithms that are (1) deterministic, (2) work simultaneously for every $p \geq 1$, including $p = \infty$, and (3) can be implemented in both distributed and streaming environments. We apply our results to $\ell_p$-regression, entrywise $\ell_1$-low rank approximation, and approximate matrix multiplication. △ Less

Submitted 6 July, 2018; originally announced July 2018.

Showing 1–15 of 15 results for author: Dickens, C