Skip to main content

Showing 1–50 of 83 results for author: Rubinstein, I

.
  1. arXiv:2405.13375  [pdf, other

    cs.LG stat.ML

    Adaptive Data Analysis for Growing Data

    Authors: Neil G. Marchant, Benjamin I. P. Rubinstein

    Abstract: Reuse of data in adaptive workflows poses challenges regarding overfitting and the statistical validity of results. Previous work has demonstrated that interacting with data via differentially private algorithms can mitigate overfitting, achieving worst-case generalization guarantees with asymptotically optimal data requirements. However, such past work assumes data is static and cannot accommodat… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  2. arXiv:2405.11575  [pdf, other

    cs.CL cs.CR

    SEEP: Training Dynamics Grounds Latent Representation Search for Mitigating Backdoor Poisoning Attacks

    Authors: Xuanli He, Qiongkai Xu, Jun Wang, Benjamin I. P. Rubinstein, Trevor Cohn

    Abstract: Modern NLP models are often trained on public datasets drawn from diverse sources, rendering them vulnerable to data poisoning attacks. These attacks can manipulate the model's behavior in ways engineered by the attacker. One such tactic involves the implantation of backdoors, achieved by poisoning specific training instances with a textual trigger and a target class label. Several strategies have… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

    Comments: accepted to TACL

  3. arXiv:2405.08892  [pdf, other

    cs.LG

    RS-Reg: Probabilistic and Robust Certified Regression Through Randomized Smoothing

    Authors: Aref Miri Rekavandi, Olga Ohrimenko, Benjamin I. P. Rubinstein

    Abstract: Randomized smoothing has shown promising certified robustness against adversaries in classification tasks. Despite such success with only zeroth-order access to base models, randomized smoothing has not been extended to a general form of regression. By defining robustness in regression tasks flexibly through probabilities, we demonstrate how to establish upper bounds on input data point perturbati… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  4. arXiv:2404.19597  [pdf, other

    cs.CL cs.CR

    Transferring Troubles: Cross-Lingual Transferability of Backdoor Attacks in LLMs with Instruction Tuning

    Authors: Xuanli He, Jun Wang, Qiongkai Xu, Pasquale Minervini, Pontus Stenetorp, Benjamin I. P. Rubinstein, Trevor Cohn

    Abstract: The implications of backdoor attacks on English-centric large language models (LLMs) have been widely examined - such attacks can be achieved by embedding malicious behaviors during training and activated under specific conditions that trigger malicious outputs. However, the impact of backdoor attacks on multilingual models remains under-explored. Our research focuses on cross-lingual backdoor att… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: work in progress

  5. arXiv:2404.02393  [pdf, other

    cs.CL

    Backdoor Attack on Multilingual Machine Translation

    Authors: Jun Wang, Qiongkai Xu, Xuanli He, Benjamin I. P. Rubinstein, Trevor Cohn

    Abstract: While multilingual machine translation (MNMT) systems hold substantial promise, they also have security vulnerabilities. Our research highlights that MNMT systems can be susceptible to a particularly devious style of backdoor attack, whereby an attacker injects poisoned data into a low-resource language pair to cause malicious translations in other languages, including high-resource languages. Our… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: NAACL main long paper

  6. arXiv:2401.17628  [pdf, other

    cs.CR

    Elephants Do Not Forget: Differential Privacy with State Continuity for Privacy Budget

    Authors: Jiankai **, Chitchanok Chuengsatiansup, Toby Murray, Benjamin I. P. Rubinstein, Yuval Yarom, Olga Ohrimenko

    Abstract: Current implementations of differentially-private (DP) systems either lack support to track the global privacy budget consumed on a dataset, or fail to faithfully maintain the state continuity of this budget. We show that failure to maintain a privacy budget enables an adversary to mount replay, rollback and fork attacks - obtaining answers to many more queries than what a secure system would allo… ▽ More

    Submitted 31 January, 2024; originally announced January 2024.

  7. arXiv:2309.11005  [pdf, other

    cs.LG cs.CR

    It's Simplex! Disaggregating Measures to Improve Certified Robustness

    Authors: Andrew C. Cullen, Paul Montague, Shijie Liu, Sarah M. Erfani, Benjamin I. P. Rubinstein

    Abstract: Certified robustness circumvents the fragility of defences against adversarial attacks, by endowing model predictions with guarantees of class invariance for attacks up to a calculated size. While there is value in these certifications, the techniques through which we assess their performance do not present a proper accounting of their strengths and weaknesses, as their analysis has eschewed consi… ▽ More

    Submitted 19 September, 2023; originally announced September 2023.

    Comments: IEEE S&P 2024, IEEE Security & Privacy 2024, 14 pages

  8. Enhancing the Antidote: Improved Pointwise Certifications against Poisoning Attacks

    Authors: Shijie Liu, Andrew C. Cullen, Paul Montague, Sarah M. Erfani, Benjamin I. P. Rubinstein

    Abstract: Poisoning attacks can disproportionately influence model behaviour by making small changes to the training corpus. While defences against specific poisoning attacks do exist, they in general do not provide any guarantees, leaving them potentially countered by novel attacks. In contrast, by examining worst-case behaviours Certified Defences make it possible to provide guarantees of the robustness o… ▽ More

    Submitted 18 March, 2024; v1 submitted 14 August, 2023; originally announced August 2023.

    Journal ref: Proceedings of the 2023 AAAI Conference on Artificial Intelligence, 37(7), 8861-8869

  9. arXiv:2305.07156  [pdf, ps, other

    cs.IT

    Improved Upper and Lower Bounds on the Capacity of the Binary Deletion Channel

    Authors: Ittai Rubinstein, Roni Con

    Abstract: The {\em binary deletion channel} with deletion probability $d$ ($\text{BDC}_d$) is a random channel that deletes each bit of the input message i.i.d with probability $d$. It has been studied extensively as a canonical example of a channel with synchronization errors. Perhaps the most important question regarding the BDC is determining its capacity. Mitzenmacher and Drinea (ITIT 2006) and Kirsch… ▽ More

    Submitted 11 May, 2023; originally announced May 2023.

    MSC Class: 94B65 ACM Class: E.4

  10. arXiv:2302.04379  [pdf, other

    cs.LG cs.CR

    Et Tu Certifications: Robustness Certificates Yield Better Adversarial Examples

    Authors: Andrew C. Cullen, Shijie Liu, Paul Montague, Sarah M. Erfani, Benjamin I. P. Rubinstein

    Abstract: In guaranteeing the absence of adversarial examples in an instance's neighbourhood, certification mechanisms play an important role in demonstrating neural net robustness. In this paper, we ask if these certifications can compromise the very models they help to protect? Our new \emph{Certification Aware Attack} exploits certifications to produce computationally efficient norm-minimising adversaria… ▽ More

    Submitted 11 June, 2024; v1 submitted 8 February, 2023; originally announced February 2023.

    Comments: 17 pages, 8 figures

    ACM Class: I.2.6; I.4.9

  11. arXiv:2302.01757  [pdf, other

    cs.CR cs.LG stat.ML

    RS-Del: Edit Distance Robustness Certificates for Sequence Classifiers via Randomized Deletion

    Authors: Zhuoqun Huang, Neil G. Marchant, Keane Lucas, Lujo Bauer, Olga Ohrimenko, Benjamin I. P. Rubinstein

    Abstract: Randomized smoothing is a leading approach for constructing classifiers that are certifiably robust against adversarial examples. Existing work on randomized smoothing has focused on classifiers with continuous inputs, such as images, where $\ell_p$-norm bounded adversaries are commonly studied. However, there has been limited work for classifiers with discrete or variable-size inputs, such as for… ▽ More

    Submitted 24 January, 2024; v1 submitted 30 January, 2023; originally announced February 2023.

    Comments: Final camera-ready version for NeurIPS 2023. 36 pages, 7 figures, 12 tables. Includes 20 pages of appendices. Code available at https://github.com/Dovermore/randomized-deletion

  12. Bayesian Graphical Entity Resolution Using Exchangeable Random Partition Priors

    Authors: Neil G. Marchant, Benjamin I. P. Rubinstein, Rebecca C. Steorts

    Abstract: Entity resolution (record linkage or deduplication) is the process of identifying and linking duplicate records in databases. In this paper, we propose a Bayesian graphical approach for entity resolution that links records to latent entities, where the prior representation on the linkage structure is exchangeable. First, we adopt a flexible and tractable set of priors for the linkage structure, wh… ▽ More

    Submitted 7 January, 2023; originally announced January 2023.

    Comments: 27 pages, 4 figures, 3 tables. Includes 37 pages of appendices. This is an accepted manuscript to be published in the Journal of Survey Statistics and Methodology

  13. arXiv:2210.06077  [pdf, other

    cs.LG

    Double Bubble, Toil and Trouble: Enhancing Certified Robustness through Transitivity

    Authors: Andrew C. Cullen, Paul Montague, Shijie Liu, Sarah M. Erfani, Benjamin I. P. Rubinstein

    Abstract: In response to subtle adversarial examples flip** classifications of neural network models, recent research has promoted certified robustness as a solution. There, invariance of predictions to all norm-bounded attacks is achieved through randomised smoothing of network inputs. Today's state-of-the-art certifications make optimal use of the class output scores at the input instance under test: no… ▽ More

    Submitted 12 October, 2022; originally announced October 2022.

    Comments: Accepted for Neurips`22, 19 pages, 14 figures, for associated code see https://github.com/andrew-cullen/DoubleBubble

    ACM Class: I.2.6; I.4.9

  14. arXiv:2210.05455  [pdf, ps, other

    cs.LG cs.DM

    Unlabelled Sample Compression Schemes for Intersection-Closed Classes and Extremal Classes

    Authors: J. Hyam Rubinstein, Benjamin I. P. Rubinstein

    Abstract: The sample compressibility of concept classes plays an important role in learning theory, as a sufficient condition for PAC learnability, and more recently as an avenue for robust generalisation in adaptive data analysis. Whether compression schemes of size $O(d)$ must necessarily exist for all classes of VC dimension $d$ is unknown, but conjectured to be true by Warmuth. Recently Chalopin, Chepoi… ▽ More

    Submitted 11 October, 2022; originally announced October 2022.

    Comments: Appearing at NeurIPS2022

  15. arXiv:2207.11575  [pdf, other

    cs.DB cs.CR cs.LG

    Testing the Robustness of Learned Index Structures

    Authors: Matthias Bachfischer, Renata Borovica-Gajic, Benjamin I. P. Rubinstein

    Abstract: While early empirical evidence has supported the case for learned index structures as having favourable average-case performance, little is known about their worst-case performance. By contrast, classical structures are known to achieve optimal worst-case behaviour. This work evaluates the robustness of learned index structures in the presence of adversarial workloads. To simulate adversarial work… ▽ More

    Submitted 23 July, 2022; originally announced July 2022.

  16. arXiv:2207.11489  [pdf, ps, other

    cs.IT cs.DS math.PR

    Average-Case to (shifted) Worst-Case Reduction for the Trace Reconstruction Problem

    Authors: Ittai Rubinstein

    Abstract: The {\em insertion-deletion channel} takes as input a binary string $x \in\{0, 1\}^n$, and outputs a string $\widetilde{x}$ where some of the bits have been deleted and others inserted independently at random. In the {\em trace reconstruction problem}, one is given many outputs (called {\em traces}) of the insertion-deletion channel on the same input message $x$, and asked to recover the input mes… ▽ More

    Submitted 12 August, 2022; v1 submitted 23 July, 2022; originally announced July 2022.

  17. arXiv:2205.10159  [pdf, other

    cs.CR

    Getting a-Round Guarantees: Floating-Point Attacks on Certified Robustness

    Authors: Jiankai **, Olga Ohrimenko, Benjamin I. P. Rubinstein

    Abstract: Adversarial examples pose a security risk as they can alter decisions of a machine learning classifier through slight input perturbations. Certified robustness has been proposed as a mitigation where given an input $\mathbf{x}$, a classifier returns a prediction and a certified radius $R$ with a provable guarantee that any perturbation to $\mathbf{x}$ with $R$-bounded norm will not alter the class… ▽ More

    Submitted 4 October, 2023; v1 submitted 20 May, 2022; originally announced May 2022.

  18. arXiv:2112.15498  [pdf, other

    cs.SE cs.CR cs.LG

    State Selection Algorithms and Their Impact on The Performance of Stateful Network Protocol Fuzzing

    Authors: Dongge Liu, Van-Thuan Pham, Gidon Ernst, Toby Murray, Benjamin I. P. Rubinstein

    Abstract: The statefulness property of network protocol implementations poses a unique challenge for testing and verification techniques, including Fuzzing. Stateful fuzzers tackle this challenge by leveraging state models to partition the state space and assist the test generation process. Since not all states are equally important and fuzzing campaigns have time limits, fuzzers need effective state select… ▽ More

    Submitted 7 January, 2022; v1 submitted 24 December, 2021; originally announced December 2021.

    Comments: 10 pages, 8 figures, coloured, conference

  19. arXiv:2112.05307  [pdf, other

    cs.CR

    Are We There Yet? Timing and Floating-Point Attacks on Differential Privacy Systems

    Authors: Jiankai **, Eleanor McMurtry, Benjamin I. P. Rubinstein, Olga Ohrimenko

    Abstract: Differential privacy is a de facto privacy framework that has seen adoption in practice via a number of mature software platforms. Implementation of differentially private (DP) mechanisms has to be done carefully to ensure end-to-end security guarantees. In this paper we study two implementation flaws in the noise generation commonly used in DP systems. First we examine the Gaussian mechanism's su… ▽ More

    Submitted 15 June, 2022; v1 submitted 9 December, 2021; originally announced December 2021.

    Comments: In Proceedings of the 43rd IEEE Symposium on Security and Privacy (IEEE S&P 2022)

    Journal ref: https://www.computer.org/csdl/proceedings-article/sp/2022/131600b547/1CIO7Ty2xr2

  20. arXiv:2111.00261  [pdf, other

    cs.IT

    Explicit and Efficient Construction of (nearly) Optimal Rate Codes for Binary Deletion Channel and the Poisson Repeat Channel

    Authors: Ittai Rubinstein

    Abstract: Two of the most common models for channels with synchronisation errors are the Binary Deletion Channel with parameter $p$ ($\text{BDC}_p$) -- a channel where every bit of the codeword is deleted i.i.d with probability $p$, and the Poisson Repeat Channel with parameter $λ$ ($\text{PRC}_λ$) -- a channel where every bit of the codeword is repeated $\text{Poisson}(λ)$ times. Previous constructions b… ▽ More

    Submitted 17 June, 2022; v1 submitted 30 October, 2021; originally announced November 2021.

  21. arXiv:2109.14208  [pdf, other

    cs.GT cs.CR

    A Communication Security Game on Switched Systems for Autonomous Vehicle Platoons

    Authors: Guoxin Sun, Tansu Alpcan, Benjamin I. P. Rubinstein, Seyit Camtepe

    Abstract: Vehicle-to-vehicle communication enables autonomous platoons to boost traffic efficiency and safety, while ensuring string stability with a constant spacing policy. However, communication-based controllers are susceptible to a range of cyber-attacks. In this paper, we propose a distributed attack mitigation defense framework with a dual-mode control system reconfiguration scheme to prevent a compr… ▽ More

    Submitted 29 September, 2021; originally announced September 2021.

    Comments: 9 pages, 5 figures; full version of paper accepted to CDC2021

  22. arXiv:2109.11803  [pdf, other

    cs.LG

    Local Intrinsic Dimensionality Signals Adversarial Perturbations

    Authors: Sandamal Weerasinghe, Tansu Alpcan, Sarah M. Erfani, Christopher Leckie, Benjamin I. P. Rubinstein

    Abstract: The vulnerability of machine learning models to adversarial perturbations has motivated a significant amount of research under the broad umbrella of adversarial machine learning. Sophisticated attacks may cause learning algorithms to learn decision functions or make decisions with poor predictive performance. In this context, there is a growing body of literature that uses local intrinsic dimensio… ▽ More

    Submitted 24 September, 2021; originally announced September 2021.

    Comments: 13 pages

  23. arXiv:2109.08266  [pdf, other

    cs.LG cs.CR

    Hard to Forget: Poisoning Attacks on Certified Machine Unlearning

    Authors: Neil G. Marchant, Benjamin I. P. Rubinstein, Scott Alfeld

    Abstract: The right to erasure requires removal of a user's information from data held by organizations, with rigorous interpretations extending to downstream products such as learned models. Retraining from scratch with the particular user's data omitted fully removes its influence on the resulting model, but comes with a high computational cost. Machine "unlearning" mitigates the cost incurred by full ret… ▽ More

    Submitted 9 February, 2022; v1 submitted 16 September, 2021; originally announced September 2021.

    Comments: Align with camera-ready submission to AAAI-22. Changes include: switched to row-wise normalization in Algorithm 3, added link to GitHub repository, added Appendix C with additional results on long-term effectiveness

  24. arXiv:2108.10130  [pdf, other

    cs.DB cs.LG

    No DBA? No regret! Multi-armed bandits for index tuning of analytical and HTAP workloads with provable guarantees

    Authors: R. Malinga Perera, Bastian Oetomo, Benjamin I. P. Rubinstein, Renata Borovica-Gajic

    Abstract: Automating physical database design has remained a long-term interest in database research due to substantial performance gains afforded by optimised structures. Despite significant progress, a majority of today's commercial solutions are highly manual, requiring offline invocation by database administrators (DBAs) who are expected to identify and supply representative training workloads. Even the… ▽ More

    Submitted 23 August, 2021; originally announced August 2021.

    Comments: 25 pages, 20 figures, 5 tables. arXiv admin note: substantial text overlap with arXiv:2010.09208

  25. arXiv:2107.08357  [pdf, other

    cs.CL cs.CR

    As Easy as 1, 2, 3: Behavioural Testing of NMT Systems for Numerical Translation

    Authors: Jun Wang, Chang Xu, Francisco Guzman, Ahmed El-Kishky, Benjamin I. P. Rubinstein, Trevor Cohn

    Abstract: Mistranslated numbers have the potential to cause serious effects, such as financial loss or medical misinformation. In this work we develop comprehensive assessments of the robustness of neural machine translation systems to numerical text via behavioural testing. We explore a variety of numerical translation capabilities a system is expected to exhibit and design effective test examples to expos… ▽ More

    Submitted 18 July, 2021; originally announced July 2021.

    Comments: Findings of ACL, to appear

  26. arXiv:2107.05243  [pdf, other

    cs.CL cs.CR

    Putting words into the system's mouth: A targeted attack on neural machine translation using monolingual data poisoning

    Authors: Jun Wang, Chang Xu, Francisco Guzman, Ahmed El-Kishky, Yuqing Tang, Benjamin I. P. Rubinstein, Trevor Cohn

    Abstract: Neural machine translation systems are known to be vulnerable to adversarial test inputs, however, as we show in this paper, these systems are also vulnerable to training attacks. Specifically, we propose a poisoning attack in which a malicious adversary inserts a small poisoned sample of monolingual text into the training set of a system trained using back-translation. This sample is designed to… ▽ More

    Submitted 12 July, 2021; originally announced July 2021.

    Comments: Findings of ACL, to appear

  27. arXiv:2106.12057  [pdf, other

    cond-mat.stat-mech q-bio.PE

    Multivariate Generating Functions for Information Spread on Multi-Type Random Graphs

    Authors: Yaron Oz, Ittai Rubinstein, Muli Safra

    Abstract: We study the spread of information on multi-type directed random graphs. In such graphs the vertices are partitioned into distinct types (communities) that have different transmission rates between themselves and with other types. We construct multivariate generating functions and use multi-type branching processes to derive an equation for the size of the large out-components in multi-type random… ▽ More

    Submitted 26 February, 2022; v1 submitted 22 June, 2021; originally announced June 2021.

    Comments: 27 pages, 4 figures

  28. arXiv:2011.02142  [pdf

    cs.CR cs.CY

    Not fit for Purpose: A critical analysis of the 'Five Safes'

    Authors: Chris Culnane, Benjamin I. P. Rubinstein, David Watts

    Abstract: Adopted by government agencies in Australia, New Zealand and the UK as policy instrument or as embodied into legislation, the 'Five Safes' framework aims to manage risks of releasing data derived from personal information. Despite its popularity, the Five Safes has undergone little legal or technical critical analysis. We argue that the Fives Safes is fundamentally flawed: from being disconnected… ▽ More

    Submitted 4 November, 2020; originally announced November 2020.

  29. A Targeted Attack on Black-Box Neural Machine Translation with Parallel Data Poisoning

    Authors: Chang Xu, Jun Wang, Yuqing Tang, Francisco Guzman, Benjamin I. P. Rubinstein, Trevor Cohn

    Abstract: As modern neural machine translation (NMT) systems have been widely deployed, their security vulnerabilities require close scrutiny. Most recently, NMT systems have been found vulnerable to targeted attacks which cause them to produce specific, unsolicited, and even harmful translations. These attacks are usually exploited in a white-box setting, where adversarial inputs causing targeted translati… ▽ More

    Submitted 15 February, 2021; v1 submitted 1 November, 2020; originally announced November 2020.

    Comments: In Proceedings of the 2021 World Wide Web Conference (WWW 2021)

  30. arXiv:2010.09208  [pdf, other

    cs.DB cs.LG

    DBA bandits: Self-driving index tuning under ad-hoc, analytical workloads with safety guarantees

    Authors: R. Malinga Perera, Bastian Oetomo, Benjamin I. P. Rubinstein, Renata Borovica-Gajic

    Abstract: Automating physical database design has remained a long-term interest in database research due to substantial performance gains afforded by optimised structures. Despite significant progress, a majority of today's commercial solutions are highly manual, requiring offline invocation by database administrators (DBAs) who are expected to identify and supply representative training workloads. Unfortun… ▽ More

    Submitted 19 October, 2020; v1 submitted 19 October, 2020; originally announced October 2020.

    Comments: 12 pages, 8 figures

  31. arXiv:2009.01923  [pdf, other

    q-bio.PE cond-mat.stat-mech physics.soc-ph

    Heterogeneity and Superspreading Effect on Herd Immunity

    Authors: Yaron Oz, Ittai Rubinstein, Muli Safra

    Abstract: We model and calculate the fraction of infected population necessary to reach herd immunity, taking into account the heterogeneity in infectiousness and susceptibility, as well as the correlation between those two parameters. We show that these cause the effective reproduction number to decrease more rapidly, and consequently have a drastic effect on the estimate of the necessary percentage of the… ▽ More

    Submitted 15 January, 2021; v1 submitted 1 September, 2020; originally announced September 2020.

    Comments: 16 pages, 5 figures, includes population based simulations

  32. arXiv:2008.07352  [pdf, other

    q-bio.PE cond-mat.stat-mech physics.soc-ph

    Superspreaders and High Variance Infectious Diseases

    Authors: Yaron Oz, Ittai Rubinstein, Muli Safra

    Abstract: A well-known characteristic of pandemics such as COVID-19 is the high level of transmission heterogeneity in the infection spread: not all infected individuals spread the disease at the same rate and some individuals (superspreaders) are responsible for most of the infections. To quantify this phenomenon requires the analysis of the effect of the variance and higher moments of the infection distri… ▽ More

    Submitted 12 October, 2020; v1 submitted 17 August, 2020; originally announced August 2020.

    Comments: 9 pages, 5 figures

  33. Electro-osmotic Instability of Concentration Enrichment in Curved Geometries for an Aqueous Electrolyte

    Authors: Bingrui Xu, Zhibo Gu, Wei Liu, Peng Huo, Yueting Zhou, S. M. Rubinstein, M. Z. Bazant, B. Zaltzman, I. Rubinstein, Daosheng Deng

    Abstract: We report that an electro-osmotic instability of concentration enrichment in curved geometries for an aqueous electrolyte, as opposed to the well-known one, is initiated exclusively at the enriched interface (anode), rather than at the depleted one (cathode). For this instability, the limitation of unrealistically high material Peclet number in planar geometry is eliminated by the strong electric… ▽ More

    Submitted 3 August, 2020; originally announced August 2020.

    Comments: 5 pages, 4 figures

    Journal ref: Phys. Rev. Fluids 5, 091701 (2020)

  34. arXiv:2007.05975  [pdf, ps, other

    cs.IT cs.CR cs.LG stat.ML

    A Graph Symmetrisation Bound on Channel Information Leakage under Blowfish Privacy

    Authors: Tobias Edwards, Benjamin I. P. Rubinstein, Zuhe Zhang, Sanming Zhou

    Abstract: Blowfish privacy is a recent generalisation of differential privacy that enables improved utility while maintaining privacy policies with semantic guarantees, a factor that has driven the popularity of differential privacy in computer science. This paper relates Blowfish privacy to an important measure of privacy loss of information channels from the communications theory community: min-entropy le… ▽ More

    Submitted 13 October, 2021; v1 submitted 12 July, 2020; originally announced July 2020.

    Comments: 11 pages, 5 figures; accepted to IEEE Transactions on Information Theory

  35. arXiv:2006.15417  [pdf, other

    cs.CV cs.AI cs.LG

    Invertible Concept-based Explanations for CNN Models with Non-negative Concept Activation Vectors

    Authors: Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A. Ehinger, Benjamin I. P. Rubinstein

    Abstract: Convolutional neural network (CNN) models for computer vision are powerful but lack explainability in their most basic form. This deficiency remains a key challenge when applying CNNs in important domains. Recent work on explanations through feature importance of approximate linear models has moved from input-level features (pixels or segments) to features from mid-layer feature maps in the form o… ▽ More

    Submitted 17 June, 2021; v1 submitted 27 June, 2020; originally announced June 2020.

  36. arXiv:2006.13120  [pdf, other

    cs.LG stat.ML

    Discrete Few-Shot Learning for Pan Privacy

    Authors: Roei Gelbhart, Benjamin I. P. Rubinstein

    Abstract: In this paper we present the first baseline results for the task of few-shot learning of discrete embedding vectors for image recognition. Few-shot learning is a highly researched task, commonly leveraged by recognition systems that are resource constrained to train on a small number of images per class. Few-shot systems typically store a continuous embedding vector of each class, posing a risk to… ▽ More

    Submitted 23 June, 2020; originally announced June 2020.

  37. arXiv:2006.06963  [pdf, other

    cs.LG cs.IR stat.ML

    Needle in a Haystack: Label-Efficient Evaluation under Extreme Class Imbalance

    Authors: Neil G. Marchant, Benjamin I. P. Rubinstein

    Abstract: Important tasks like record linkage and extreme classification demonstrate extreme class imbalance, with 1 minority instance to every 1 million or more majority instances. Obtaining a sufficient sample of all classes, even just to achieve statistically-significant evaluation, is so challenging that most current approaches yield poor estimates or incur impractical cost. Where importance sampling ha… ▽ More

    Submitted 2 June, 2021; v1 submitted 12 June, 2020; originally announced June 2020.

    Comments: 30 pages, 8 figures, updated to match version accepted for publication at KDD'21

    ACM Class: H.3.4; I.5.2

  38. Assessing Centrality Without Knowing Connections

    Authors: Leyla Roohi, Benjamin I. P. Rubinstein, Vanessa Teague

    Abstract: We consider the privacy-preserving computation of node influence in distributed social networks, as measured by egocentric betweenness centrality (EBC). Motivated by modern communication networks spanning multiple providers, we show for the first time how multiple mutually-distrusting parties can successfully compute node EBC while revealing only differentially-private information about their inte… ▽ More

    Submitted 28 May, 2020; originally announced May 2020.

    Comments: Full report of paper appearing in PAKDD2020

    Journal ref: In: Advances in Knowledge Discovery and Data Mining. PAKDD 2020. Lecture Notes in Computer Science, vol 12085. Springer, Cham, pages 152-163 (2020)

  39. Legion: Best-First Concolic Testing

    Authors: Dongge Liu, Gidon Ernst, Toby Murray, Benjamin I. P. Rubinstein

    Abstract: Concolic execution and fuzzing are two complementary coverage-based testing techniques. How to achieve the best of both remains an open challenge. To address this research problem, we propose and evaluate Legion. Legion re-engineers the Monte Carlo tree search (MCTS) framework from the AI literature to treat automated test generation as a problem of sequential decision-making under uncertainty. It… ▽ More

    Submitted 22 September, 2020; v1 submitted 14 February, 2020; originally announced February 2020.

    Comments: 12 pages, 2 Algorithms, 3 Figures, 2 Tables, ASE2020

  40. arXiv:1911.10326  [pdf

    physics.optics cs.LG

    Deep learning reconstruction of ultrashort pulses from 2D spatial intensity patterns recorded by an all-in-line system in a single-shot

    Authors: Ron Ziv, Alex Dikopoltsev, Tom Zahavy, Ittai Rubinstein, Pavel Sidorenko, Oren Cohen, Mordechai Segev

    Abstract: We propose a simple all-in-line single-shot scheme for diagnostics of ultrashort laser pulses, consisting of a multi-mode fiber, a nonlinear crystal and a CCD camera. The system records a 2D spatial intensity pattern, from which the pulse shape (amplitude and phase) are recovered, through a fast Deep Learning algorithm. We explore this scheme in simulations and demonstrate the recovery of ultrasho… ▽ More

    Submitted 23 November, 2019; originally announced November 2019.

  41. arXiv:1909.06039  [pdf, other

    stat.CO cs.DB cs.LG stat.ML

    d-blink: Distributed End-to-End Bayesian Entity Resolution

    Authors: Neil G. Marchant, Andee Kaplan, Daniel N. Elazar, Benjamin I. P. Rubinstein, Rebecca C. Steorts

    Abstract: Entity resolution (ER; also known as record linkage or de-duplication) is the process of merging noisy databases, often in the absence of unique identifiers. A major advancement in ER methodology has been the application of Bayesian generative models, which provide a natural framework for inferring latent entities with rigorous quantification of uncertainty. Despite these advantages, existing mode… ▽ More

    Submitted 22 September, 2020; v1 submitted 13 September, 2019; originally announced September 2019.

    Comments: 32 pages, 6 figures, 5 tables. Includes 22 pages of supplementary material. This revision incorporates a case study on the 2010 U.S. Decennial Census

    MSC Class: 62F15; 65C40; 68W15

  42. arXiv:1908.05004  [pdf, other

    cs.CR

    Stop the Open Data Bus, We Want to Get Off

    Authors: Dr. Chris Culnane, A/Prof. Benjamin I. P. Rubinstein, A/Prof. Vanessa Teague

    Abstract: The subject of this report is the re-identification of individuals in the Myki public transport dataset released as part of the Melbourne Datathon 2018. We demonstrate the ease with which we were able to re-identify ourselves, our co-travellers, and complete strangers; our analysis raises concerns about the nature and granularity of the data released, in particular the ability to identify vulnerab… ▽ More

    Submitted 14 August, 2019; originally announced August 2019.

  43. arXiv:1902.09062  [pdf, other

    stat.ML cs.CR cs.LG

    Adversarial Reinforcement Learning under Partial Observability in Autonomous Computer Network Defence

    Authors: Yi Han, David Hubczenko, Paul Montague, Olivier De Vel, Tamas Abraham, Benjamin I. P. Rubinstein, Christopher Leckie, Tansu Alpcan, Sarah Erfani

    Abstract: Recent studies have demonstrated that reinforcement learning (RL) agents are susceptible to adversarial manipulation, similar to vulnerabilities previously demonstrated in the supervised learning setting. While most existing work studies the problem in the context of computer vision or console games, this paper focuses on reinforcement learning in autonomous cyber defence under partial observabili… ▽ More

    Submitted 16 August, 2020; v1 submitted 24 February, 2019; originally announced February 2019.

    Comments: 8 pages, 4 figures

  44. arXiv:1902.08918  [pdf, other

    cs.LG cs.IR stat.ML

    Truth Inference at Scale: A Bayesian Model for Adjudicating Highly Redundant Crowd Annotations

    Authors: Yuan Li, Benjamin I. P. Rubinstein, Trevor Cohn

    Abstract: Crowd-sourcing is a cheap and popular means of creating training and evaluation datasets for machine learning, however it poses the problem of `truth inference', as individual workers cannot be wholly trusted to provide reliable annotations. Research into models of annotation aggregation attempts to infer a latent `true' annotation, which has been shown to improve the utility of crowd-sourced data… ▽ More

    Submitted 24 February, 2019; originally announced February 2019.

    Comments: Accepted at the Web Conference/WWW 2019 (camera ready)

  45. arXiv:1902.07500  [pdf, other

    cs.LG stat.ML

    A Note on Bounding Regret of the C$^2$UCB Contextual Combinatorial Bandit

    Authors: Bastian Oetomo, Malinga Perera, Renata Borovica-Gajic, Benjamin I. P. Rubinstein

    Abstract: We revisit the proof by Qin et al. (2014) of bounded regret of the C$^2$UCB contextual combinatorial bandit. We demonstrate an error in the proof of volumetric expansion of the moment matrix, used in upper bounding a function of context vector norms. We prove a relaxed inequality that yields the originally-stated regret bound.

    Submitted 20 February, 2019; originally announced February 2019.

    Comments: 3 pages

  46. arXiv:1901.05562  [pdf, other

    cs.CR cs.LG cs.SI

    Differentially-Private Two-Party Egocentric Betweenness Centrality

    Authors: Leyla Roohi, Benjamin I. P. Rubinstein, Vanessa Teague

    Abstract: We describe a novel protocol for computing the egocentric betweenness centrality of a node when relevant edge information is spread between two mutually distrusting parties such as two telecommunications providers. While each node belongs to one network or the other, its ego network might include edges unknown to its network provider. We develop a protocol of differentially-private mechanisms to h… ▽ More

    Submitted 16 January, 2019; originally announced January 2019.

    Comments: 10 pages; full report with proofs of paper accepted into INFOCOM'2019

  47. arXiv:1808.05770  [pdf, other

    cs.CR cs.AI cs.LG stat.ML

    Reinforcement Learning for Autonomous Defence in Software-Defined Networking

    Authors: Yi Han, Benjamin I. P. Rubinstein, Tamas Abraham, Tansu Alpcan, Olivier De Vel, Sarah Erfani, David Hubczenko, Christopher Leckie, Paul Montague

    Abstract: Despite the successful application of machine learning (ML) in a wide range of domains, adaptability---the very property that makes machine learning desirable---can be exploited by adversaries to contaminate training and evade classification. In this paper, we investigate the feasibility of applying a specific class of machine learning algorithms, namely, reinforcement learning (RL) algorithms, fo… ▽ More

    Submitted 17 August, 2018; originally announced August 2018.

    Comments: 20 pages, 8 figures

  48. arXiv:1802.07975  [pdf, other

    cs.CR

    Options for encoding names for data linking at the Australian Bureau of Statistics

    Authors: Chris Culnane, Benjamin I. P. Rubinstein, Vanessa Teague

    Abstract: Publicly, ABS has said it would use a cryptographic hash function to convert names collected in the 2016 Census of Population and Housing into an unrecognisable value in a way that is not reversible. In 2016, the ABS engaged the University of Melbourne to provide expert advice on cryptographic hash functions to meet this objective. For complex unit-record level data, including Census data, auxil… ▽ More

    Submitted 22 February, 2018; originally announced February 2018.

    Comments: University of Melbourne Research Contract 85449779. After receiving a draft of this report, ABS conducted a further assessment of Options 2 and 3, which will be published on their website

  49. arXiv:1712.05627  [pdf

    cs.CY cs.CR

    Health Data in an Open World

    Authors: Chris Culnane, Benjamin I. P. Rubinstein, Vanessa Teague

    Abstract: With the aim of informing sound policy about data sharing and privacy, we describe successful re-identification of patients in an Australian de-identified open health dataset. As in prior studies of similar datasets, a few mundane facts often suffice to isolate an individual. Some people can be identified by name based on publicly available information. Decreasing the precision of the unit-record… ▽ More

    Submitted 15 December, 2017; originally announced December 2017.

  50. arXiv:1712.00871  [pdf, other

    cs.CR

    Vulnerabilities in the use of similarity tables in combination with pseudonymisation to preserve data privacy in the UK Office for National Statistics' Privacy-Preserving Record Linkage

    Authors: Chris Culnane, Benjamin I. P. Rubinstein, Vanessa Teague

    Abstract: In the course of a survey of privacy-preserving record linkage, we reviewed the approach taken by the UK Office for National Statistics (ONS) as described in their series of reports "Beyond 2011". Our review identifies a number of matters of concern. Some of the issues discovered are sufficiently severe to present a risk to privacy.

    Submitted 3 December, 2017; originally announced December 2017.