-
Using Combinatorial Optimization to Design a High quality LLM Solution
Authors:
Samuel Ackerman,
Eitan Farchi,
Rami Katan,
Orna Raz
Abstract:
We introduce a novel LLM based solution design approach that utilizes combinatorial optimization and sampling. Specifically, a set of factors that influence the quality of the solution are identified. They typically include factors that represent prompt types, LLM inputs alternatives, and parameters governing the generation and design alternatives. Identifying the factors that govern the LLM solut…
▽ More
We introduce a novel LLM based solution design approach that utilizes combinatorial optimization and sampling. Specifically, a set of factors that influence the quality of the solution are identified. They typically include factors that represent prompt types, LLM inputs alternatives, and parameters governing the generation and design alternatives. Identifying the factors that govern the LLM solution quality enables the infusion of subject matter expert knowledge. Next, a set of interactions between the factors are defined and combinatorial optimization is used to create a small subset $P$ that ensures all desired interactions occur in $P$. Each element $p \in P$ is then developed into an appropriate benchmark. Applying the alternative solutions on each combination, $p \in P$ and evaluating the results facilitate the design of a high quality LLM solution pipeline. The approach is especially applicable when the design and evaluation of each benchmark in $P$ is time-consuming and involves manual steps and human evaluation. Given its efficiency the approach can also be used as a baseline to compare and validate an autoML approach that searches over the factors governing the solution.
△ Less
Submitted 15 May, 2024;
originally announced May 2024.
-
Alignment Studio: Aligning Large Language Models to Particular Contextual Regulations
Authors:
Swapnaja Achintalwar,
Ioana Baldini,
Djallel Bouneffouf,
Joan Byamugisha,
Maria Chang,
Pierre Dognin,
Eitan Farchi,
Ndivhuwo Makondo,
Aleksandra Mojsilovic,
Manish Nagireddy,
Karthikeyan Natesan Ramamurthy,
Inkit Padhi,
Orna Raz,
Jesus Rios,
Prasanna Sattigeri,
Moninder Singh,
Siphiwe Thwala,
Rosario A. Uceda-Sosa,
Kush R. Varshney
Abstract:
The alignment of large language models is usually done by model providers to add or control behaviors that are common or universally understood across use cases and contexts. In contrast, in this article, we present an approach and architecture that empowers application developers to tune a model to their particular values, social norms, laws and other regulations, and orchestrate between potentia…
▽ More
The alignment of large language models is usually done by model providers to add or control behaviors that are common or universally understood across use cases and contexts. In contrast, in this article, we present an approach and architecture that empowers application developers to tune a model to their particular values, social norms, laws and other regulations, and orchestrate between potentially conflicting requirements in context. We lay out three main components of such an Alignment Studio architecture: Framers, Instructors, and Auditors that work in concert to control the behavior of a language model. We illustrate this approach with a running example of aligning a company's internal-facing enterprise chatbot to its business conduct guidelines.
△ Less
Submitted 8 March, 2024;
originally announced March 2024.
-
Detectors for Safe and Reliable LLMs: Implementations, Uses, and Limitations
Authors:
Swapnaja Achintalwar,
Adriana Alvarado Garcia,
Ateret Anaby-Tavor,
Ioana Baldini,
Sara E. Berger,
Bishwaranjan Bhattacharjee,
Djallel Bouneffouf,
Subhajit Chaudhury,
Pin-Yu Chen,
Lamogha Chiazor,
Elizabeth M. Daly,
Kirushikesh DB,
Rogério Abreu de Paula,
Pierre Dognin,
Eitan Farchi,
Soumya Ghosh,
Michael Hind,
Raya Horesh,
George Kour,
Ja Young Lee,
Nishtha Madaan,
Sameep Mehta,
Erik Miehling,
Keerthiram Murugesan,
Manish Nagireddy
, et al. (13 additional authors not shown)
Abstract:
Large language models (LLMs) are susceptible to a variety of risks, from non-faithful output to biased and toxic generations. Due to several limiting factors surrounding LLMs (training cost, API access, data availability, etc.), it may not always be feasible to impose direct safety constraints on a deployed model. Therefore, an efficient and reliable alternative is required. To this end, we presen…
▽ More
Large language models (LLMs) are susceptible to a variety of risks, from non-faithful output to biased and toxic generations. Due to several limiting factors surrounding LLMs (training cost, API access, data availability, etc.), it may not always be feasible to impose direct safety constraints on a deployed model. Therefore, an efficient and reliable alternative is required. To this end, we present our ongoing efforts to create and deploy a library of detectors: compact and easy-to-build classification models that provide labels for various harms. In addition to the detectors themselves, we discuss a wide range of uses for these detector models - from acting as guardrails to enabling effective AI governance. We also deep dive into inherent challenges in their development and discuss future work aimed at making the detectors more reliable and broadening their scope.
△ Less
Submitted 13 June, 2024; v1 submitted 9 March, 2024;
originally announced March 2024.
-
Unveiling Safety Vulnerabilities of Large Language Models
Authors:
George Kour,
Marcel Zalmanovici,
Naama Zwerdling,
Esther Goldbraich,
Ora Nova Fandina,
Ateret Anaby-Tavor,
Orna Raz,
Eitan Farchi
Abstract:
As large language models become more prevalent, their possible harmful or inappropriate responses are a cause for concern. This paper introduces a unique dataset containing adversarial examples in the form of questions, which we call AttaQ, designed to provoke such harmful or inappropriate responses. We assess the efficacy of our dataset by analyzing the vulnerabilities of various models when subj…
▽ More
As large language models become more prevalent, their possible harmful or inappropriate responses are a cause for concern. This paper introduces a unique dataset containing adversarial examples in the form of questions, which we call AttaQ, designed to provoke such harmful or inappropriate responses. We assess the efficacy of our dataset by analyzing the vulnerabilities of various models when subjected to it. Additionally, we introduce a novel automatic approach for identifying and naming vulnerable semantic regions - input semantic areas for which the model is likely to produce harmful outputs. This is achieved through the application of specialized clustering techniques that consider both the semantic similarity of the input attacks and the harmfulness of the model's responses. Automatically identifying vulnerable semantic regions enhances the evaluation of model weaknesses, facilitating targeted improvements to its safety mechanisms and overall reliability.
△ Less
Submitted 7 November, 2023;
originally announced November 2023.
-
Predicting Question-Answering Performance of Large Language Models through Semantic Consistency
Authors:
Ella Rabinovich,
Samuel Ackerman,
Orna Raz,
Eitan Farchi,
Ateret Anaby-Tavor
Abstract:
Semantic consistency of a language model is broadly defined as the model's ability to produce semantically-equivalent outputs, given semantically-equivalent inputs. We address the task of assessing question-answering (QA) semantic consistency of contemporary large language models (LLMs) by manually creating a benchmark dataset with high-quality paraphrases for factual questions, and release the da…
▽ More
Semantic consistency of a language model is broadly defined as the model's ability to produce semantically-equivalent outputs, given semantically-equivalent inputs. We address the task of assessing question-answering (QA) semantic consistency of contemporary large language models (LLMs) by manually creating a benchmark dataset with high-quality paraphrases for factual questions, and release the dataset to the community.
We further combine the semantic consistency metric with additional measurements suggested in prior work as correlating with LLM QA accuracy, for building and evaluating a framework for factual QA reference-less performance prediction -- predicting the likelihood of a language model to accurately answer a question. Evaluating the framework on five contemporary LLMs, we demonstrate encouraging, significantly outperforming baselines, results.
△ Less
Submitted 2 November, 2023;
originally announced November 2023.
-
Automatic Generation of Attention Rules For Containment of Machine Learning Model Errors
Authors:
Samuel Ackerman,
Axel Bendavid,
Eitan Farchi,
Orna Raz
Abstract:
Machine learning (ML) solutions are prevalent in many applications. However, many challenges exist in making these solutions business-grade. For instance, maintaining the error rate of the underlying ML models at an acceptably low level. Typically, the true relationship between feature inputs and the target feature to be predicted is uncertain, and hence statistical in nature. The approach we prop…
▽ More
Machine learning (ML) solutions are prevalent in many applications. However, many challenges exist in making these solutions business-grade. For instance, maintaining the error rate of the underlying ML models at an acceptably low level. Typically, the true relationship between feature inputs and the target feature to be predicted is uncertain, and hence statistical in nature. The approach we propose is to separate the observations that are the most likely to be predicted incorrectly into 'attention sets'. These can directly aid model diagnosis and improvement, and be used to decide on alternative courses of action for these problematic observations. We present several algorithms (`strategies') for determining optimal rules to separate these observations. In particular, we prefer strategies that use feature-based slicing because they are human-interpretable, model-agnostic, and require minimal supplementary inputs or knowledge. In addition, we show that these strategies outperform several common baselines, such as selecting observations with prediction confidence below a threshold. To evaluate strategies, we introduce metrics to measure various desired qualities, such as their performance, stability, and generalizability to unseen data; the strategies are evaluated on several publicly-available datasets. We use TOPSIS, a Multiple Criteria Decision Making method, to aggregate these metrics into a single quality score for each strategy, to allow comparison.
△ Less
Submitted 14 May, 2023;
originally announced May 2023.
-
Measuring the Measuring Tools: An Automatic Evaluation of Semantic Metrics for Text Corpora
Authors:
George Kour,
Samuel Ackerman,
Orna Raz,
Eitan Farchi,
Boaz Carmeli,
Ateret Anaby-Tavor
Abstract:
The ability to compare the semantic similarity between text corpora is important in a variety of natural language processing applications. However, standard methods for evaluating these metrics have yet to be established. We propose a set of automatic and interpretable measures for assessing the characteristics of corpus-level semantic similarity metrics, allowing sensible comparison of their beha…
▽ More
The ability to compare the semantic similarity between text corpora is important in a variety of natural language processing applications. However, standard methods for evaluating these metrics have yet to be established. We propose a set of automatic and interpretable measures for assessing the characteristics of corpus-level semantic similarity metrics, allowing sensible comparison of their behavior. We demonstrate the effectiveness of our evaluation measures in capturing fundamental characteristics by evaluating them on a collection of classical and state-of-the-art metrics. Our measures revealed that recently-developed metrics are becoming better in identifying semantic distributional mismatch while classical metrics are more sensitive to perturbations in the surface text levels.
△ Less
Submitted 29 November, 2022;
originally announced November 2022.
-
High-quality Conversational Systems
Authors:
Samuel Ackerman,
Ateret Anaby-Tavor,
Eitan Farchi,
Esther Goldbraich,
George Kour,
Ella Rabinovich,
Orna Raz,
Saritha Route,
Marcel Zalmanovici,
Naama Zwerdling
Abstract:
Conversational systems or chatbots are an example of AI-Infused Applications (AIIA). Chatbots are especially important as they are often the first interaction of clients with a business and are the entry point of a business into the AI (Artificial Intelligence) world. The quality of the chatbot is, therefore, key. However, as is the case in general with AIIAs, it is especially challenging to asses…
▽ More
Conversational systems or chatbots are an example of AI-Infused Applications (AIIA). Chatbots are especially important as they are often the first interaction of clients with a business and are the entry point of a business into the AI (Artificial Intelligence) world. The quality of the chatbot is, therefore, key. However, as is the case in general with AIIAs, it is especially challenging to assess and control the quality of chatbot systems. Beyond the inherent statistical nature of these systems, where occasional failure is acceptable, we identify two major challenges. The first is to release an initial system that is of sufficient quality such that humans will interact with it. The second is to maintain the quality, enhance its capabilities, improve it and make necessary adjustments based on changing user requests or drift. These challenges exist because it is impossible to predict the real distribution of user requests and the natural language they will use to express these requests. Moreover, any empirical distribution of requests is likely to change over time. This may be due to periodicity, changing usage, and drift of topics.
We provide a methodology and set of technologies to address these challenges and to provide automated assistance through a human-in-the-loop approach. We notice that it is crucial to connect between the different phases in the lifecycle of the chatbot development and to make sure it provides its expected business value. For example, that it frees human agents to deal with tasks other than answering human users. Our methodology and technologies apply during chatbot training in the pre-production phase, through to chatbot usage in the field in the post-production phase. They implement the `test first' paradigm by assisting in agile design, and support continuous integration through actionable insights.
△ Less
Submitted 28 April, 2022; v1 submitted 27 April, 2022;
originally announced April 2022.
-
Theory and Practice of Quality Assurance for Machine Learning Systems An Experiment Driven Approach
Authors:
Samuel Ackerman,
Guy Barash,
Eitan Farchi,
Orna Raz,
Onn Shehory
Abstract:
The crafting of machine learning (ML) based systems requires statistical control throughout its life cycle. Careful quantification of business requirements and identification of key factors that impact the business requirements reduces the risk of a project failure. The quantification of business requirements results in the definition of random variables representing the system key performance ind…
▽ More
The crafting of machine learning (ML) based systems requires statistical control throughout its life cycle. Careful quantification of business requirements and identification of key factors that impact the business requirements reduces the risk of a project failure. The quantification of business requirements results in the definition of random variables representing the system key performance indicators that need to be analyzed through statistical experiments. In addition, available data for training and experiment results impact the design of the system. Once the system is developed, it is tested and continually monitored to ensure it meets its business requirements. This is done through the continued application of statistical experiments to analyze and control the key performance indicators. This book teaches the art of crafting and develo** ML based systems. It advocates an "experiment first" approach stressing the need to define statistical experiments from the beginning of the project life cycle. It also discusses in detail how to apply statistical control on the ML based system throughout its lifecycle.
△ Less
Submitted 12 April, 2022; v1 submitted 2 January, 2022;
originally announced January 2022.
-
Classifier Data Quality: A Geometric Complexity Based Method for Automated Baseline And Insights Generation
Authors:
George Kour,
Marcel Zalmanovici,
Orna Raz,
Samuel Ackerman,
Ateret Anaby-Tavor
Abstract:
Testing Machine Learning (ML) models and AI-Infused Applications (AIIAs), or systems that contain ML models, is highly challenging. In addition to the challenges of testing classical software, it is acceptable and expected that statistical ML models sometimes output incorrect results. A major challenge is to determine when the level of incorrectness, e.g., model accuracy or F1 score for classifier…
▽ More
Testing Machine Learning (ML) models and AI-Infused Applications (AIIAs), or systems that contain ML models, is highly challenging. In addition to the challenges of testing classical software, it is acceptable and expected that statistical ML models sometimes output incorrect results. A major challenge is to determine when the level of incorrectness, e.g., model accuracy or F1 score for classifiers, is acceptable and when it is not. In addition to business requirements that should provide a threshold, it is a best practice to require any proposed ML solution to out-perform simple baseline models, such as a decision tree.
We have developed complexity measures, which quantify how difficult given observations are to assign to their true class label; these measures can then be used to automatically determine a baseline performance threshold. These measures are superior to the best practice baseline in that, for a linear computation cost, they also quantify each observation' classification complexity in an explainable form, regardless of the classifier model used. Our experiments with both numeric synthetic data and real natural language chatbot data demonstrate that the complexity measures effectively highlight data regions and observations that are likely to be misclassified.
△ Less
Submitted 27 October, 2022; v1 submitted 22 December, 2021;
originally announced December 2021.
-
Automatically detecting data drift in machine learning classifiers
Authors:
Samuel Ackerman,
Orna Raz,
Marcel Zalmanovici,
Aviad Zlotnick
Abstract:
Classifiers and other statistics-based machine learning (ML) techniques generalize, or learn, based on various statistical properties of the training data. The assumption underlying statistical ML resulting in theoretical or empirical performance guarantees is that the distribution of the training data is representative of the production data distribution. This assumption often breaks; for instanc…
▽ More
Classifiers and other statistics-based machine learning (ML) techniques generalize, or learn, based on various statistical properties of the training data. The assumption underlying statistical ML resulting in theoretical or empirical performance guarantees is that the distribution of the training data is representative of the production data distribution. This assumption often breaks; for instance, statistical distributions of the data may change. We term changes that affect ML performance `data drift' or `drift'.
Many classification techniques compute a measure of confidence in their results. This measure might not reflect the actual ML performance. A famous example is the Panda picture that is correctly classified as such with a confidence of about 60\%, but when noise is added it is incorrectly classified as a Gibbon with a confidence of above 99\%. However, the work we report on here suggests that a classifier's measure of confidence can be used for the purpose of detecting data drift.
We propose an approach based solely on classifier suggested labels and its confidence in them, for alerting on data distribution or feature space changes that are likely to cause data drift. Our approach identities degradation in model performance and does not require labeling of data in production which is often lacking or delayed. Our experiments with three different data sets and classifiers demonstrate the effectiveness of this approach in detecting data drift. This is especially encouraging as the classification itself may or may not be correct and no model input data is required. We further explore the statistical approach of sequential change-point tests to automatically determine the amount of data needed in order to identify drift while controlling the false positive rate (Type-1 error).
△ Less
Submitted 10 November, 2021;
originally announced November 2021.
-
Detecting model drift using polynomial relations
Authors:
Eliran Roffe,
Samuel Ackerman,
Orna Raz,
Eitan Farchi
Abstract:
Machine learning models serve critical functions, such as classifying loan applicants as good or bad risks. Each model is trained under the assumption that the data used in training and in the field come from the same underlying unknown distribution. Often, this assumption is broken in practice. It is desirable to identify when this occurs, to minimize the impact on model performance.
We suggest…
▽ More
Machine learning models serve critical functions, such as classifying loan applicants as good or bad risks. Each model is trained under the assumption that the data used in training and in the field come from the same underlying unknown distribution. Often, this assumption is broken in practice. It is desirable to identify when this occurs, to minimize the impact on model performance.
We suggest a new approach to detecting change in the data distribution by identifying polynomial relations between the data features. We measure the strength of each identified relation using its R-square value. A strong polynomial relation captures a significant trait of the data which should remain stable if the data distribution does not change. We thus use a set of learned strong polynomial relations to identify drift. For a set of polynomial relations that are stronger than a given threshold, we calculate the amount of drift observed for that relation. The amount of drift is measured by calculating the Bayes Factor for the polynomial relation likelihood of the baseline data versus field data. We empirically validate the approach by simulating a range of changes, and identify drift using the Bayes Factor of the polynomial relation likelihood change.
△ Less
Submitted 22 December, 2021; v1 submitted 24 October, 2021;
originally announced October 2021.
-
Density-based interpretable hypercube region partitioning for mixed numeric and categorical data
Authors:
Samuel Ackerman,
Eitan Farchi,
Orna Raz,
Marcel Zalmanovici,
Maya Zohar
Abstract:
Consider a structured dataset of features, such as $\{\textrm{SEX}, \textrm{INCOME}, \textrm{RACE}, \textrm{EXPERIENCE}\}$. A user may want to know where in the feature space observations are concentrated, and where it is sparse or empty. The existence of large sparse or empty regions can provide domain knowledge of soft or hard feature constraints (e.g., what is the typical income range, or that…
▽ More
Consider a structured dataset of features, such as $\{\textrm{SEX}, \textrm{INCOME}, \textrm{RACE}, \textrm{EXPERIENCE}\}$. A user may want to know where in the feature space observations are concentrated, and where it is sparse or empty. The existence of large sparse or empty regions can provide domain knowledge of soft or hard feature constraints (e.g., what is the typical income range, or that it may be unlikely to have a high income with few years of work experience). Also, these can suggest to the user that machine learning (ML) model predictions for data inputs in sparse or empty regions may be unreliable.
An interpretable region is a hyper-rectangle, such as $\{\textrm{RACE} \in\{\textrm{Black}, \textrm{White}\}\}\:\&$ $\{10 \leq \:\textrm{EXPERIENCE} \:\leq 13\}$, containing all observations satisfying the constraints; typically, such regions are defined by a small number of features. Our method constructs an observation density-based partition of the observed feature space in the dataset into such regions. It has a number of advantages over others in that it works on features of mixed type (numeric or categorical) in the original domain, and can separate out empty regions as well.
As can be seen from visualizations, the resulting partitions accord with spatial grou**s that a human eye might identify; the results should thus extend to higher dimensions. We also show some applications of the partition to other data analysis tasks, such as inferring about ML model error, measuring high-dimensional density variability, and causal inference for treatment effect. Many of these applications are made possible by the hyper-rectangular form of the partition regions.
△ Less
Submitted 8 November, 2021; v1 submitted 11 October, 2021;
originally announced October 2021.
-
FreaAI: Automated extraction of data slices to test machine learning models
Authors:
Samuel Ackerman,
Orna Raz,
Marcel Zalmanovici
Abstract:
Machine learning (ML) solutions are prevalent. However, many challenges exist in making these solutions business-grade. One major challenge is to ensure that the ML solution provides its expected business value. In order to do that, one has to bridge the gap between the way ML model performance is measured and the solution requirements. In previous work (Barash et al, "Bridging the gap...") we dem…
▽ More
Machine learning (ML) solutions are prevalent. However, many challenges exist in making these solutions business-grade. One major challenge is to ensure that the ML solution provides its expected business value. In order to do that, one has to bridge the gap between the way ML model performance is measured and the solution requirements. In previous work (Barash et al, "Bridging the gap...") we demonstrated the effectiveness of utilizing feature models in bridging this gap. Whereas ML performance metrics, such as the accuracy or F1-score of a classifier, typically measure the average ML performance, feature models shed light on explainable data slices that are too far from that average, and therefore might indicate unsatisfied requirements. For example, the overall accuracy of a bank text terms classifier may be very high, say $98\% \pm 2\%$, yet it might perform poorly for terms that include short descriptions and originate from commercial accounts. A business requirement, which may be implicit in the training data, may be to perform well regardless of the type of account and length of the description. Therefore, the under-performing data slice that includes short descriptions and commercial accounts suggests poorly-met requirements. In this paper we show the feasibility of automatically extracting feature models that result in explainable data slices over which the ML solution under-performs. Our novel technique, IBM FreaAI aka FreaAI, extracts such slices from structured ML test data or any other labeled data. We demonstrate that FreaAI can automatically produce explainable and statistically-significant data slices over seven open datasets.
△ Less
Submitted 12 August, 2021;
originally announced August 2021.
-
Machine Learning Model Drift Detection Via Weak Data Slices
Authors:
Samuel Ackerman,
Parijat Dube,
Eitan Farchi,
Orna Raz,
Marcel Zalmanovici
Abstract:
Detecting drift in performance of Machine Learning (ML) models is an acknowledged challenge. For ML models to become an integral part of business applications it is essential to detect when an ML model drifts away from acceptable operation. However, it is often the case that actual labels are difficult and expensive to get, for example, because they require expert judgment. Therefore, there is a n…
▽ More
Detecting drift in performance of Machine Learning (ML) models is an acknowledged challenge. For ML models to become an integral part of business applications it is essential to detect when an ML model drifts away from acceptable operation. However, it is often the case that actual labels are difficult and expensive to get, for example, because they require expert judgment. Therefore, there is a need for methods that detect likely degradation in ML operation without labels. We propose a method that utilizes feature space rules, called data slices, for drift detection. We provide experimental indications that our method is likely to identify that the ML model will likely change in performance, based on changes in the underlying data.
△ Less
Submitted 11 August, 2021;
originally announced August 2021.
-
The power of reciprocal knowledge sharing relationships for startup success
Authors:
T. J. Allen,
P. Gloor,
A. Fronzetti Colladon,
S. L. Woerner,
O. Raz
Abstract:
Purpose: The purpose of this paper is to examine the innovative capabilities of biotech start-ups in relation to geographic proximity and knowledge sharing interaction in the R&D network of a major high-tech cluster.
Design-methodology-approach: This study compares longitudinal informal communication networks of researchers at biotech start-ups with company patent applications in subsequent year…
▽ More
Purpose: The purpose of this paper is to examine the innovative capabilities of biotech start-ups in relation to geographic proximity and knowledge sharing interaction in the R&D network of a major high-tech cluster.
Design-methodology-approach: This study compares longitudinal informal communication networks of researchers at biotech start-ups with company patent applications in subsequent years. For a year, senior R&D staff members from over 70 biotech firms located in the Boston biotech cluster were polled and communication information about interaction with peers, universities and big pharmaceutical companies was collected, as well as their geolocation tags.
Findings: Location influences the amount of communication between firms, but not their innovation success. Rather, what matters is communication intensity and recollection by others. In particular, there is evidence that rotating leadership - changing between a more active and passive communication style - is a predictor of innovative performance.
Practical implications: Expensive real-estate investments can be replaced by maintaining social ties. A more dynamic communication style and more diverse social ties are beneficial to innovation.
Originality-value: Compared to earlier work that has shown a connection between location, network and firm performance, this paper offers a more differentiated view; including a novel measure of communication style, using a unique data set and providing new insights for firms who want to shape their communication patterns to improve innovation, independently of their location.
△ Less
Submitted 20 May, 2021;
originally announced May 2021.
-
Detection of data drift and outliers affecting machine learning model performance over time
Authors:
Samuel Ackerman,
Eitan Farchi,
Orna Raz,
Marcel Zalmanovici,
Parijat Dube
Abstract:
A trained ML model is deployed on another `test' dataset where target feature values (labels) are unknown. Drift is distribution change between the training and deployment data, which is concerning if model performance changes. For a cat/dog image classifier, for instance, drift during deployment could be rabbit images (new class) or cat/dog images with changed characteristics (change in distribut…
▽ More
A trained ML model is deployed on another `test' dataset where target feature values (labels) are unknown. Drift is distribution change between the training and deployment data, which is concerning if model performance changes. For a cat/dog image classifier, for instance, drift during deployment could be rabbit images (new class) or cat/dog images with changed characteristics (change in distribution). We wish to detect these changes but can't measure accuracy without deployment data labels. We instead detect drift indirectly by nonparametrically testing the distribution of model prediction confidence for changes. This generalizes our method and sidesteps domain-specific feature representation.
We address important statistical issues, particularly Type-1 error control in sequential testing, using Change Point Models (CPMs; see Adams and Ross 2012). We also use nonparametric outlier methods to show the user suspicious observations for model diagnosis, since the before/after change confidence distributions overlap significantly. In experiments to demonstrate robustness, we train on a subset of MNIST digit classes, then insert drift (e.g., unseen digit class) in deployment data in various settings (gradual/sudden changes in the drift proportion). A novel loss function is introduced to compare the performance (detection delay, Type-1 and 2 errors) of a drift detector under different levels of drift class contamination.
△ Less
Submitted 6 September, 2022; v1 submitted 16 December, 2020;
originally announced December 2020.
-
On rich lenses in planar arrangements of circles and related problems
Authors:
Esther Ezra,
Orit E. Raz,
Micha Sharir,
Joshua Zahl
Abstract:
We show that the maximum number of pairwise non-overlap** $k$-rich lenses (lenses formed by at least $k$ circles) in an arrangement of $n$ circles in the plane is $O\left(\frac{n^{3/2}\log{(n/k^3)}}{k^{5/2}} + \frac{n}{k} \right)$, and the sum of the degrees of the lenses of such a family (where the degree of a lens is the number of circles that form it) is…
▽ More
We show that the maximum number of pairwise non-overlap** $k$-rich lenses (lenses formed by at least $k$ circles) in an arrangement of $n$ circles in the plane is $O\left(\frac{n^{3/2}\log{(n/k^3)}}{k^{5/2}} + \frac{n}{k} \right)$, and the sum of the degrees of the lenses of such a family (where the degree of a lens is the number of circles that form it) is $O\left(\frac{n^{3/2}\log{(n/k^3)}}{k^{3/2}} + n\right)$. Two independent proofs of these bounds are given, each interesting in its own right (so we believe). We then show that these bounds lead to the known bound of Agarwal et al. (JACM 2004) and Marcus and Tardos (JCTA 2006) on the number of point-circle incidences in the plane. Extensions to families of more general algebraic curves and some other related problems are also considered.
△ Less
Submitted 7 December, 2020;
originally announced December 2020.
-
Subspace arrangements, graph rigidity and derandomization through submodular optimization
Authors:
Orit E. Raz,
Avi Wigderson
Abstract:
This paper presents a deterministic, strongly polynomial time algorithm for computing the matrix rank for a class of symbolic matrices (whose entries are polynomials over a field). This class was introduced, in a different language, by Lovász [Lov] in his study of flats in matroids, and proved a duality theorem putting this problem in $NP \cap coNP$. As such, our result is another demonstration wh…
▽ More
This paper presents a deterministic, strongly polynomial time algorithm for computing the matrix rank for a class of symbolic matrices (whose entries are polynomials over a field). This class was introduced, in a different language, by Lovász [Lov] in his study of flats in matroids, and proved a duality theorem putting this problem in $NP \cap coNP$. As such, our result is another demonstration where ``good characterization'' in the sense of Edmonds leads to an efficient algorithm. In a different paper Lovász [Lov79] proved that all such symbolic rank problems have efficient probabilistic algorithms, namely are in $BPP$. As such, our algorithm may be interpreted as a derandomization result, in the long sequence special cases of the PIT (Polynomial Identity Testing) problem. Finally, Lovász and Yemini [LoYe] showed how the same problem generalizes the graph rigidity problem in two dimensions. As such, our algorithm may be seen as a generalization of the well-known deterministic algorithm for the latter problem.
There are two somewhat unusual technical features in this paper. The first is the translation of Lovász' flats problem into a symbolic rank one. The second is the use of submodular optimization for derandomization. We hope that the tools developed for both will be useful for related problems, in particular for better understanding of graph rigidity in higher dimensions.
△ Less
Submitted 27 January, 2019;
originally announced January 2019.
-
An o-minimal Szemerédi-Trotter theorem
Authors:
Saugata Basu,
Orit E. Raz
Abstract:
We prove an analog of the Szemerédi-Trotter theorem in the plane for definable curves and points in any o-minimal structure over an arbitrary real closed field $\mathrm{R}$. One new ingredient in the proof is an extension of the well known crossing number inequality for graphs to the case of embeddings in any o-minimal structure over an arbitrary real closed field.
We prove an analog of the Szemerédi-Trotter theorem in the plane for definable curves and points in any o-minimal structure over an arbitrary real closed field $\mathrm{R}$. One new ingredient in the proof is an extension of the well known crossing number inequality for graphs to the case of embeddings in any o-minimal structure over an arbitrary real closed field.
△ Less
Submitted 12 July, 2017; v1 submitted 22 November, 2016;
originally announced November 2016.
-
Configurations of lines in space and combinatorial rigidity
Authors:
Orit E. Raz
Abstract:
Let $L$ be a sequence $(\ell_1,\ell_2,\ldots,\ell_n)$ of $n$ lines in $\mathbb{C}^3$. We define the {\it intersection graph} $G_L=([n],E)$ of $L$, where $[n]:=\{1,\ldots, n\}$, and with $\{i,j\}\in E$ if and only if $i\neq j$ and the corresponding lines $\ell_i$ and $\ell_j$ intersect, or are parallel (or coincide). For a graph $G=([n],E)$, we say that a sequence $L$ is a {\it realization} of $G$…
▽ More
Let $L$ be a sequence $(\ell_1,\ell_2,\ldots,\ell_n)$ of $n$ lines in $\mathbb{C}^3$. We define the {\it intersection graph} $G_L=([n],E)$ of $L$, where $[n]:=\{1,\ldots, n\}$, and with $\{i,j\}\in E$ if and only if $i\neq j$ and the corresponding lines $\ell_i$ and $\ell_j$ intersect, or are parallel (or coincide). For a graph $G=([n],E)$, we say that a sequence $L$ is a {\it realization} of $G$ if $G\subset G_L$. One of the main results of this paper is to provide a combinatorial characterization of graphs $G=([n],E)$ that have the following property: For every {\it generic} realization $L$ of $G$ that consists of $n$ pairwise distinct lines, we have $G_L=K_n$, in which case the lines of $L$ are either all concurrent or all coplanar.
The general statements that we obtain about lines, apart from their independent interest, turns out to be closely related to the notion of graph rigidity. The connection is established due to the so-called Elekes--Sharir framework, which allows us to transform the problem into an incidence problem involving lines in three dimensions. By exploiting the geometry of contacts between lines in 3D, we can obtain alternative, simpler, and more precise characterizations of the rigidity of graphs.
△ Less
Submitted 14 July, 2016;
originally announced July 2016.
-
The Elekes-Szabó Theorem in four dimensions
Authors:
Orit E. Raz,
Micha Sharir,
Frank de Zeeuw
Abstract:
Let $F\in\mathbb{C}[x,y,s,t]$ be an irreducible constant-degree polynomial, and let $A,B,C,D\subset\mathbb{C}$ be finite sets of size $n$. We show that $F$ vanishes on at most $O(n^{8/3})$ points of the Cartesian product $A\times B\times C\times D$, unless $F$ has a special group-related form. A similar statement holds for $A,B,C,D$ of unequal sizes. This is a four-dimensional extension of our rec…
▽ More
Let $F\in\mathbb{C}[x,y,s,t]$ be an irreducible constant-degree polynomial, and let $A,B,C,D\subset\mathbb{C}$ be finite sets of size $n$. We show that $F$ vanishes on at most $O(n^{8/3})$ points of the Cartesian product $A\times B\times C\times D$, unless $F$ has a special group-related form. A similar statement holds for $A,B,C,D$ of unequal sizes. This is a four-dimensional extension of our recent improved analysis of the original Elekes-Szabó theorem in three dimensions. We give three applications: an expansion bound for three-variable real polynomials that do not have a special form, a bound on the number of coplanar quadruples on a space curve that is neither planar nor quartic, and a bound on the number of four-point circles on a plane curve that has degree at least five.
△ Less
Submitted 1 November, 2016; v1 submitted 13 July, 2016;
originally announced July 2016.
-
A note on distinct distances
Authors:
Orit E. Raz
Abstract:
We show that, for a constant-degree algebraic curve $γ$ in $\mathbb{R}^D$, every set of $n$ points on $γ$ spans at least $Ω(n^{4/3})$ distinct distances, unless $γ$ is an {\it algebraic helix} (see Definition 1.1). This improves the earlier bound $Ω(n^{5/4})$ of Charalambides [Discrete Comput. Geom. (2014)].
We also show that, for every set $P$ of $n$ points that lie on a $d$-dimensional constan…
▽ More
We show that, for a constant-degree algebraic curve $γ$ in $\mathbb{R}^D$, every set of $n$ points on $γ$ spans at least $Ω(n^{4/3})$ distinct distances, unless $γ$ is an {\it algebraic helix} (see Definition 1.1). This improves the earlier bound $Ω(n^{5/4})$ of Charalambides [Discrete Comput. Geom. (2014)].
We also show that, for every set $P$ of $n$ points that lie on a $d$-dimensional constant-degree algebraic variety $V$ in $\mathbb{R}^D$, there exists a subset $S\subset P$ of size at least $Ω(n^{\frac{4}{9+12(d-1)}})$, such that $S$ spans $\binom{|S|}{2}$ distinct distances. This improves the earlier bound of $Ω(n^{\frac{1}{3d}})$ of Conlon et al. [SIAM J. Discrete Math. (2015)].
Both results are consequences of a common technical tool, given in Lemma 2.7 below.
△ Less
Submitted 14 April, 2020; v1 submitted 29 February, 2016;
originally announced March 2016.
-
The number of unit-area triangles in the plane: Theme and variations
Authors:
Orit E. Raz,
Micha Sharir
Abstract:
We show that the number of unit-area triangles determined by a set $S$ of $n$ points in the plane is $O(n^{20/9})$, improving the earlier bound $O(n^{9/4})$ of Apfelbaum and Sharir [Discrete Comput. Geom., 2010]. We also consider two special cases of this problem: (i) We show, using a somewhat subtle construction, that if $S$ consists of points on three lines, the number of unit-area triangles tha…
▽ More
We show that the number of unit-area triangles determined by a set $S$ of $n$ points in the plane is $O(n^{20/9})$, improving the earlier bound $O(n^{9/4})$ of Apfelbaum and Sharir [Discrete Comput. Geom., 2010]. We also consider two special cases of this problem: (i) We show, using a somewhat subtle construction, that if $S$ consists of points on three lines, the number of unit-area triangles that $S$ spans can be $Ω(n^2)$, for any triple of lines (it is always $O(n^2)$ in this case). (ii) We show that if $S$ is a {\em convex grid} of the form $A\times B$, where $A$, $B$ are {\em convex} sets of $n^{1/2}$ real numbers each (i.e., the sequences of differences of consecutive elements of $A$ and of $B$ are both strictly increasing), then $S$ determines $O(n^{31/14})$ unit-area triangles.
△ Less
Submitted 11 April, 2015; v1 submitted 2 January, 2015;
originally announced January 2015.
-
Partial-Matching and Hausdorff RMS Distance Under Translation: Combinatorics and Algorithms
Authors:
Rinat Ben-Avraham,
Matthias Henze,
Rafel Jaume,
Balázs Keszegh,
Orit E. Raz,
Micha Sharir,
Igor Tubis
Abstract:
We consider the RMS distance (sum of squared distances between pairs of points) under translation between two point sets in the plane, in two different setups. In the partial-matching setup, each point in the smaller set is matched to a distinct point in the bigger set. Although the problem is not known to be polynomial, we establish several structural properties of the underlying subdivision of t…
▽ More
We consider the RMS distance (sum of squared distances between pairs of points) under translation between two point sets in the plane, in two different setups. In the partial-matching setup, each point in the smaller set is matched to a distinct point in the bigger set. Although the problem is not known to be polynomial, we establish several structural properties of the underlying subdivision of the plane and derive improved bounds on its complexity. These results lead to the best known algorithm for finding a translation for which the partial-matching RMS distance between the point sets is minimized. In addition, we show how to compute a local minimum of the partial-matching RMS distance under translation, in polynomial time. In the Hausdorff setup, each point is paired to its nearest neighbor in the other set. We develop algorithms for finding a local minimum of the Hausdorff RMS distance in nearly linear time on the line, and in nearly quadratic time in the plane. These improve substantially the worst-case behavior of the popular ICP heuristics for solving this problem.
△ Less
Submitted 26 November, 2014;
originally announced November 2014.
-
Polynomials vanishing on grids: The Elekes-Rónyai problem revisited
Authors:
Orit E. Raz,
Micha Sharir,
József Solymosi
Abstract:
In this paper we characterize real bivariate polynomials which have a small range over large Cartesian products. We show that for every constant-degree bivariate real polynomial $f$, either $|f(A,B)|=Ω(n^{4/3})$, for every pair of finite sets $A,B\subset{\mathbb R}$, with $|A|=|B|=n$ (where the constant of proportionality depends on ${\rm deg} f$), or else $f$ must be of one of the special forms…
▽ More
In this paper we characterize real bivariate polynomials which have a small range over large Cartesian products. We show that for every constant-degree bivariate real polynomial $f$, either $|f(A,B)|=Ω(n^{4/3})$, for every pair of finite sets $A,B\subset{\mathbb R}$, with $|A|=|B|=n$ (where the constant of proportionality depends on ${\rm deg} f$), or else $f$ must be of one of the special forms $f(u,v)=h(\varphi(u)+ψ(v))$, or $f(u,v)=h(\varphi(u)\cdotψ(v))$, for some univariate polynomials $\varphi,ψ,h$ over ${\mathbb R}$. This significantly improves a result of Elekes and Rónyai (2000).
Our results are cast in a more general form, in which we give an upper bound for the number of zeros of $z=f(x,y)$ on a triple Cartesian product $A\times B\times C$, when the sizes $|A|$, $|B|$, $|C|$ need not be the same; the upper bound is $O(n^{11/6})$ when $|A|=|B|=|C|=n$, where the constant of proportionality depends on ${\rm deg} f$, unless $f$ has one of the aforementioned special forms.
This result provides a unified tool for improving bounds in various Erd\H os-type problems in geometry and additive combinatorics. Several applications of our results to problems of these kinds are presented. For example, we show that the number of distinct distances between $n$ points lying on a constant-degree parametric algebraic curve which does not contain a line, in any dimension, is $Ω(n^{4/3})$, extending the result of Pach and de Zeeuw (2013) and improving the bound of Charalambides (2012), for the special case where the curve under consideration has a polynomial parameterization. We also derive improved lower bounds for several variants of the sum-product problem in additive combinatorics.
△ Less
Submitted 19 March, 2014; v1 submitted 29 January, 2014;
originally announced January 2014.
-
On the zone of the boundary of a convex body
Authors:
Orit Esther Raz
Abstract:
We consider an arrangement $\A$ of $n$ hyperplanes in $\R^d$ and the zone $\Z$ in $\A$ of the boundary of an arbitrary convex set in $\R^d$ in such an arrangement. We show that, whereas the combinatorial complexity of $\Z$ is known only to be $O<n^{d-1}\log n>$ \cite{APS}, the outer part of the zone has complexity $O<n^{d-1}>$ (without the logarithmic factor). Whether this bound also holds for the…
▽ More
We consider an arrangement $\A$ of $n$ hyperplanes in $\R^d$ and the zone $\Z$ in $\A$ of the boundary of an arbitrary convex set in $\R^d$ in such an arrangement. We show that, whereas the combinatorial complexity of $\Z$ is known only to be $O<n^{d-1}\log n>$ \cite{APS}, the outer part of the zone has complexity $O<n^{d-1}>$ (without the logarithmic factor). Whether this bound also holds for the complexity of the inner part of the zone is still an open question (even for $d=2$).
△ Less
Submitted 10 June, 2013;
originally announced June 2013.