Skip to main content

Showing 1–27 of 27 results for author: Raz, O

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.13020  [pdf, other

    cs.CL cs.AI

    Using Combinatorial Optimization to Design a High quality LLM Solution

    Authors: Samuel Ackerman, Eitan Farchi, Rami Katan, Orna Raz

    Abstract: We introduce a novel LLM based solution design approach that utilizes combinatorial optimization and sampling. Specifically, a set of factors that influence the quality of the solution are identified. They typically include factors that represent prompt types, LLM inputs alternatives, and parameters governing the generation and design alternatives. Identifying the factors that govern the LLM solut… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

  2. arXiv:2403.09704  [pdf, other

    cs.CL cs.AI cs.LG

    Alignment Studio: Aligning Large Language Models to Particular Contextual Regulations

    Authors: Swapnaja Achintalwar, Ioana Baldini, Djallel Bouneffouf, Joan Byamugisha, Maria Chang, Pierre Dognin, Eitan Farchi, Ndivhuwo Makondo, Aleksandra Mojsilovic, Manish Nagireddy, Karthikeyan Natesan Ramamurthy, Inkit Padhi, Orna Raz, Jesus Rios, Prasanna Sattigeri, Moninder Singh, Siphiwe Thwala, Rosario A. Uceda-Sosa, Kush R. Varshney

    Abstract: The alignment of large language models is usually done by model providers to add or control behaviors that are common or universally understood across use cases and contexts. In contrast, in this article, we present an approach and architecture that empowers application developers to tune a model to their particular values, social norms, laws and other regulations, and orchestrate between potentia… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: 7 pages, 5 figures

  3. arXiv:2403.06009  [pdf, other

    cs.LG

    Detectors for Safe and Reliable LLMs: Implementations, Uses, and Limitations

    Authors: Swapnaja Achintalwar, Adriana Alvarado Garcia, Ateret Anaby-Tavor, Ioana Baldini, Sara E. Berger, Bishwaranjan Bhattacharjee, Djallel Bouneffouf, Subhajit Chaudhury, Pin-Yu Chen, Lamogha Chiazor, Elizabeth M. Daly, Kirushikesh DB, Rogério Abreu de Paula, Pierre Dognin, Eitan Farchi, Soumya Ghosh, Michael Hind, Raya Horesh, George Kour, Ja Young Lee, Nishtha Madaan, Sameep Mehta, Erik Miehling, Keerthiram Murugesan, Manish Nagireddy , et al. (13 additional authors not shown)

    Abstract: Large language models (LLMs) are susceptible to a variety of risks, from non-faithful output to biased and toxic generations. Due to several limiting factors surrounding LLMs (training cost, API access, data availability, etc.), it may not always be feasible to impose direct safety constraints on a deployed model. Therefore, an efficient and reliable alternative is required. To this end, we presen… ▽ More

    Submitted 13 June, 2024; v1 submitted 9 March, 2024; originally announced March 2024.

  4. arXiv:2311.04124  [pdf, other

    cs.CL cs.AI cs.LG

    Unveiling Safety Vulnerabilities of Large Language Models

    Authors: George Kour, Marcel Zalmanovici, Naama Zwerdling, Esther Goldbraich, Ora Nova Fandina, Ateret Anaby-Tavor, Orna Raz, Eitan Farchi

    Abstract: As large language models become more prevalent, their possible harmful or inappropriate responses are a cause for concern. This paper introduces a unique dataset containing adversarial examples in the form of questions, which we call AttaQ, designed to provoke such harmful or inappropriate responses. We assess the efficacy of our dataset by analyzing the vulnerabilities of various models when subj… ▽ More

    Submitted 7 November, 2023; originally announced November 2023.

    Comments: To be published in GEM workshop. Conference on Empirical Methods in Natural Language Processing (EMNLP). 2023

    ACM Class: I.2.7

  5. arXiv:2311.01152  [pdf, other

    cs.CL

    Predicting Question-Answering Performance of Large Language Models through Semantic Consistency

    Authors: Ella Rabinovich, Samuel Ackerman, Orna Raz, Eitan Farchi, Ateret Anaby-Tavor

    Abstract: Semantic consistency of a language model is broadly defined as the model's ability to produce semantically-equivalent outputs, given semantically-equivalent inputs. We address the task of assessing question-answering (QA) semantic consistency of contemporary large language models (LLMs) by manually creating a benchmark dataset with high-quality paraphrases for factual questions, and release the da… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

    Comments: EMNLP2023 GEM workshop, 17 pages

  6. arXiv:2305.08115  [pdf, other

    cs.LG stat.AP

    Automatic Generation of Attention Rules For Containment of Machine Learning Model Errors

    Authors: Samuel Ackerman, Axel Bendavid, Eitan Farchi, Orna Raz

    Abstract: Machine learning (ML) solutions are prevalent in many applications. However, many challenges exist in making these solutions business-grade. For instance, maintaining the error rate of the underlying ML models at an acceptably low level. Typically, the true relationship between feature inputs and the target feature to be predicted is uncertain, and hence statistical in nature. The approach we prop… ▽ More

    Submitted 14 May, 2023; originally announced May 2023.

  7. arXiv:2211.16259  [pdf, other

    cs.CL

    Measuring the Measuring Tools: An Automatic Evaluation of Semantic Metrics for Text Corpora

    Authors: George Kour, Samuel Ackerman, Orna Raz, Eitan Farchi, Boaz Carmeli, Ateret Anaby-Tavor

    Abstract: The ability to compare the semantic similarity between text corpora is important in a variety of natural language processing applications. However, standard methods for evaluating these metrics have yet to be established. We propose a set of automatic and interpretable measures for assessing the characteristics of corpus-level semantic similarity metrics, allowing sensible comparison of their beha… ▽ More

    Submitted 29 November, 2022; originally announced November 2022.

    Comments: Published at GEM (https://gem-benchmark.com/workshop) workshop at the Empirical Methods in Natural Language Processing (EMNLP) conference in 2022

  8. arXiv:2204.13043  [pdf, other

    cs.HC stat.AP

    High-quality Conversational Systems

    Authors: Samuel Ackerman, Ateret Anaby-Tavor, Eitan Farchi, Esther Goldbraich, George Kour, Ella Rabinovich, Orna Raz, Saritha Route, Marcel Zalmanovici, Naama Zwerdling

    Abstract: Conversational systems or chatbots are an example of AI-Infused Applications (AIIA). Chatbots are especially important as they are often the first interaction of clients with a business and are the entry point of a business into the AI (Artificial Intelligence) world. The quality of the chatbot is, therefore, key. However, as is the case in general with AIIAs, it is especially challenging to asses… ▽ More

    Submitted 28 April, 2022; v1 submitted 27 April, 2022; originally announced April 2022.

  9. arXiv:2201.00355  [pdf, other

    cs.LG cs.SE

    Theory and Practice of Quality Assurance for Machine Learning Systems An Experiment Driven Approach

    Authors: Samuel Ackerman, Guy Barash, Eitan Farchi, Orna Raz, Onn Shehory

    Abstract: The crafting of machine learning (ML) based systems requires statistical control throughout its life cycle. Careful quantification of business requirements and identification of key factors that impact the business requirements reduces the risk of a project failure. The quantification of business requirements results in the definition of random variables representing the system key performance ind… ▽ More

    Submitted 12 April, 2022; v1 submitted 2 January, 2022; originally announced January 2022.

  10. arXiv:2112.11832  [pdf, other

    cs.LG

    Classifier Data Quality: A Geometric Complexity Based Method for Automated Baseline And Insights Generation

    Authors: George Kour, Marcel Zalmanovici, Orna Raz, Samuel Ackerman, Ateret Anaby-Tavor

    Abstract: Testing Machine Learning (ML) models and AI-Infused Applications (AIIAs), or systems that contain ML models, is highly challenging. In addition to the challenges of testing classical software, it is acceptable and expected that statistical ML models sometimes output incorrect results. A major challenge is to determine when the level of incorrectness, e.g., model accuracy or F1 score for classifier… ▽ More

    Submitted 27 October, 2022; v1 submitted 22 December, 2021; originally announced December 2021.

    Comments: Accepted to EDSMLS workshop at AAAI conference

  11. arXiv:2111.05672  [pdf, other

    cs.LG

    Automatically detecting data drift in machine learning classifiers

    Authors: Samuel Ackerman, Orna Raz, Marcel Zalmanovici, Aviad Zlotnick

    Abstract: Classifiers and other statistics-based machine learning (ML) techniques generalize, or learn, based on various statistical properties of the training data. The assumption underlying statistical ML resulting in theoretical or empirical performance guarantees is that the distribution of the training data is representative of the production data distribution. This assumption often breaks; for instanc… ▽ More

    Submitted 10 November, 2021; originally announced November 2021.

    Journal ref: Originally published in proceedings of Engineering Dependable and Secure Machine Learning Systems (EDSMLS) workshop at AAAI 2019 conference

  12. arXiv:2110.12506  [pdf, other

    cs.LG cs.AI

    Detecting model drift using polynomial relations

    Authors: Eliran Roffe, Samuel Ackerman, Orna Raz, Eitan Farchi

    Abstract: Machine learning models serve critical functions, such as classifying loan applicants as good or bad risks. Each model is trained under the assumption that the data used in training and in the field come from the same underlying unknown distribution. Often, this assumption is broken in practice. It is desirable to identify when this occurs, to minimize the impact on model performance. We suggest… ▽ More

    Submitted 22 December, 2021; v1 submitted 24 October, 2021; originally announced October 2021.

  13. arXiv:2110.05430  [pdf, other

    cs.LG stat.AP

    Density-based interpretable hypercube region partitioning for mixed numeric and categorical data

    Authors: Samuel Ackerman, Eitan Farchi, Orna Raz, Marcel Zalmanovici, Maya Zohar

    Abstract: Consider a structured dataset of features, such as $\{\textrm{SEX}, \textrm{INCOME}, \textrm{RACE}, \textrm{EXPERIENCE}\}$. A user may want to know where in the feature space observations are concentrated, and where it is sparse or empty. The existence of large sparse or empty regions can provide domain knowledge of soft or hard feature constraints (e.g., what is the typical income range, or that… ▽ More

    Submitted 8 November, 2021; v1 submitted 11 October, 2021; originally announced October 2021.

  14. FreaAI: Automated extraction of data slices to test machine learning models

    Authors: Samuel Ackerman, Orna Raz, Marcel Zalmanovici

    Abstract: Machine learning (ML) solutions are prevalent. However, many challenges exist in making these solutions business-grade. One major challenge is to ensure that the ML solution provides its expected business value. In order to do that, one has to bridge the gap between the way ML model performance is measured and the solution requirements. In previous work (Barash et al, "Bridging the gap...") we dem… ▽ More

    Submitted 12 August, 2021; originally announced August 2021.

    Journal ref: International Workshop on Engineering Dependable and Secure Machine Learning Systems, at EDSMLS 2020

  15. arXiv:2108.05319  [pdf, other

    cs.LG stat.AP

    Machine Learning Model Drift Detection Via Weak Data Slices

    Authors: Samuel Ackerman, Parijat Dube, Eitan Farchi, Orna Raz, Marcel Zalmanovici

    Abstract: Detecting drift in performance of Machine Learning (ML) models is an acknowledged challenge. For ML models to become an integral part of business applications it is essential to detect when an ML model drifts away from acceptable operation. However, it is often the case that actual labels are difficult and expensive to get, for example, because they require expert judgment. Therefore, there is a n… ▽ More

    Submitted 11 August, 2021; originally announced August 2021.

    Journal ref: DeepTest workshop of ICSE, 2021

  16. arXiv:2105.11538  [pdf

    cs.SI physics.soc-ph

    The power of reciprocal knowledge sharing relationships for startup success

    Authors: T. J. Allen, P. Gloor, A. Fronzetti Colladon, S. L. Woerner, O. Raz

    Abstract: Purpose: The purpose of this paper is to examine the innovative capabilities of biotech start-ups in relation to geographic proximity and knowledge sharing interaction in the R&D network of a major high-tech cluster. Design-methodology-approach: This study compares longitudinal informal communication networks of researchers at biotech start-ups with company patent applications in subsequent year… ▽ More

    Submitted 20 May, 2021; originally announced May 2021.

    ACM Class: J.4

    Journal ref: Journal of Small Business and Enterprise Development 23(3), 636-651 (2016)

  17. arXiv:2012.09258  [pdf, other

    stat.AP cs.LG stat.ML

    Detection of data drift and outliers affecting machine learning model performance over time

    Authors: Samuel Ackerman, Eitan Farchi, Orna Raz, Marcel Zalmanovici, Parijat Dube

    Abstract: A trained ML model is deployed on another `test' dataset where target feature values (labels) are unknown. Drift is distribution change between the training and deployment data, which is concerning if model performance changes. For a cat/dog image classifier, for instance, drift during deployment could be rabbit images (new class) or cat/dog images with changed characteristics (change in distribut… ▽ More

    Submitted 6 September, 2022; v1 submitted 16 December, 2020; originally announced December 2020.

    Comments: In: JSM Proceedings, Nonparametric Statistics Section, 20202. Philadelphia, PA: American Statistical Association. 144--160

  18. arXiv:2012.04204  [pdf, other

    math.CO cs.CG

    On rich lenses in planar arrangements of circles and related problems

    Authors: Esther Ezra, Orit E. Raz, Micha Sharir, Joshua Zahl

    Abstract: We show that the maximum number of pairwise non-overlap** $k$-rich lenses (lenses formed by at least $k$ circles) in an arrangement of $n$ circles in the plane is $O\left(\frac{n^{3/2}\log{(n/k^3)}}{k^{5/2}} + \frac{n}{k} \right)$, and the sum of the degrees of the lenses of such a family (where the degree of a lens is the number of circles that form it) is… ▽ More

    Submitted 7 December, 2020; originally announced December 2020.

    Comments: 15 pages, 3 figures

    MSC Class: 05D99; 52C10; 52C45; 68R05

  19. arXiv:1901.09423  [pdf, ps, other

    cs.CC math.CO

    Subspace arrangements, graph rigidity and derandomization through submodular optimization

    Authors: Orit E. Raz, Avi Wigderson

    Abstract: This paper presents a deterministic, strongly polynomial time algorithm for computing the matrix rank for a class of symbolic matrices (whose entries are polynomials over a field). This class was introduced, in a different language, by Lovász [Lov] in his study of flats in matroids, and proved a duality theorem putting this problem in $NP \cap coNP$. As such, our result is another demonstration wh… ▽ More

    Submitted 27 January, 2019; originally announced January 2019.

  20. arXiv:1611.07362  [pdf, other

    math.LO cs.CG math.CO

    An o-minimal Szemerédi-Trotter theorem

    Authors: Saugata Basu, Orit E. Raz

    Abstract: We prove an analog of the Szemerédi-Trotter theorem in the plane for definable curves and points in any o-minimal structure over an arbitrary real closed field $\mathrm{R}$. One new ingredient in the proof is an extension of the well known crossing number inequality for graphs to the case of embeddings in any o-minimal structure over an arbitrary real closed field.

    Submitted 12 July, 2017; v1 submitted 22 November, 2016; originally announced November 2016.

    Comments: 15 pages. Final version to appear in The Quarterly Journal of Mathematics

    MSC Class: 03C64; 05D40

  21. arXiv:1607.04083  [pdf, ps, other

    math.CO cs.CG cs.DM

    Configurations of lines in space and combinatorial rigidity

    Authors: Orit E. Raz

    Abstract: Let $L$ be a sequence $(\ell_1,\ell_2,\ldots,\ell_n)$ of $n$ lines in $\mathbb{C}^3$. We define the {\it intersection graph} $G_L=([n],E)$ of $L$, where $[n]:=\{1,\ldots, n\}$, and with $\{i,j\}\in E$ if and only if $i\neq j$ and the corresponding lines $\ell_i$ and $\ell_j$ intersect, or are parallel (or coincide). For a graph $G=([n],E)$, we say that a sequence $L$ is a {\it realization} of $G$… ▽ More

    Submitted 14 July, 2016; originally announced July 2016.

    Comments: 15 pages

  22. arXiv:1607.03600  [pdf, ps, other

    math.CO cs.DM

    The Elekes-Szabó Theorem in four dimensions

    Authors: Orit E. Raz, Micha Sharir, Frank de Zeeuw

    Abstract: Let $F\in\mathbb{C}[x,y,s,t]$ be an irreducible constant-degree polynomial, and let $A,B,C,D\subset\mathbb{C}$ be finite sets of size $n$. We show that $F$ vanishes on at most $O(n^{8/3})$ points of the Cartesian product $A\times B\times C\times D$, unless $F$ has a special group-related form. A similar statement holds for $A,B,C,D$ of unequal sizes. This is a four-dimensional extension of our rec… ▽ More

    Submitted 1 November, 2016; v1 submitted 13 July, 2016; originally announced July 2016.

    Comments: 15 pages. v2: We added an application to a problem about coplanar quadruples on space curves

  23. arXiv:1603.00740  [pdf, ps, other

    math.MG cs.CG math.CO

    A note on distinct distances

    Authors: Orit E. Raz

    Abstract: We show that, for a constant-degree algebraic curve $γ$ in $\mathbb{R}^D$, every set of $n$ points on $γ$ spans at least $Ω(n^{4/3})$ distinct distances, unless $γ$ is an {\it algebraic helix} (see Definition 1.1). This improves the earlier bound $Ω(n^{5/4})$ of Charalambides [Discrete Comput. Geom. (2014)]. We also show that, for every set $P$ of $n$ points that lie on a $d$-dimensional constan… ▽ More

    Submitted 14 April, 2020; v1 submitted 29 February, 2016; originally announced March 2016.

    Comments: 16 pages

    MSC Class: 52C10

    Journal ref: Combinator. Probab. Comp. 29 (2020) 650-663

  24. arXiv:1501.00379  [pdf, other

    math.CO cs.CG cs.DM math.MG

    The number of unit-area triangles in the plane: Theme and variations

    Authors: Orit E. Raz, Micha Sharir

    Abstract: We show that the number of unit-area triangles determined by a set $S$ of $n$ points in the plane is $O(n^{20/9})$, improving the earlier bound $O(n^{9/4})$ of Apfelbaum and Sharir [Discrete Comput. Geom., 2010]. We also consider two special cases of this problem: (i) We show, using a somewhat subtle construction, that if $S$ consists of points on three lines, the number of unit-area triangles tha… ▽ More

    Submitted 11 April, 2015; v1 submitted 2 January, 2015; originally announced January 2015.

    MSC Class: 52C10

  25. arXiv:1411.7273  [pdf, other

    cs.CG math.CO

    Partial-Matching and Hausdorff RMS Distance Under Translation: Combinatorics and Algorithms

    Authors: Rinat Ben-Avraham, Matthias Henze, Rafel Jaume, Balázs Keszegh, Orit E. Raz, Micha Sharir, Igor Tubis

    Abstract: We consider the RMS distance (sum of squared distances between pairs of points) under translation between two point sets in the plane, in two different setups. In the partial-matching setup, each point in the smaller set is matched to a distinct point in the bigger set. Although the problem is not known to be polynomial, we establish several structural properties of the underlying subdivision of t… ▽ More

    Submitted 26 November, 2014; originally announced November 2014.

    Comments: 31 pages, 6 figures

  26. arXiv:1401.7419  [pdf, ps, other

    cs.CG math.CO

    Polynomials vanishing on grids: The Elekes-Rónyai problem revisited

    Authors: Orit E. Raz, Micha Sharir, József Solymosi

    Abstract: In this paper we characterize real bivariate polynomials which have a small range over large Cartesian products. We show that for every constant-degree bivariate real polynomial $f$, either $|f(A,B)|=Ω(n^{4/3})$, for every pair of finite sets $A,B\subset{\mathbb R}$, with $|A|=|B|=n$ (where the constant of proportionality depends on ${\rm deg} f$), or else $f$ must be of one of the special forms… ▽ More

    Submitted 19 March, 2014; v1 submitted 29 January, 2014; originally announced January 2014.

  27. arXiv:1306.2104  [pdf, other

    cs.CG

    On the zone of the boundary of a convex body

    Authors: Orit Esther Raz

    Abstract: We consider an arrangement $\A$ of $n$ hyperplanes in $\R^d$ and the zone $\Z$ in $\A$ of the boundary of an arbitrary convex set in $\R^d$ in such an arrangement. We show that, whereas the combinatorial complexity of $\Z$ is known only to be $O<n^{d-1}\log n>$ \cite{APS}, the outer part of the zone has complexity $O<n^{d-1}>$ (without the logarithmic factor). Whether this bound also holds for the… ▽ More

    Submitted 10 June, 2013; originally announced June 2013.