-
Quantum computing with Qiskit
Authors:
Ali Javadi-Abhari,
Matthew Treinish,
Kevin Krsulich,
Christopher J. Wood,
Jake Lishman,
Julien Gacon,
Simon Martiel,
Paul D. Nation,
Lev S. Bishop,
Andrew W. Cross,
Blake R. Johnson,
Jay M. Gambetta
Abstract:
We describe Qiskit, a software development kit for quantum information science. We discuss the key design decisions that have shaped its development, and examine the software architecture and its core components. We demonstrate an end-to-end workflow for solving a problem in condensed matter physics on a quantum computer that serves to highlight some of Qiskit's capabilities, for example the repre…
▽ More
We describe Qiskit, a software development kit for quantum information science. We discuss the key design decisions that have shaped its development, and examine the software architecture and its core components. We demonstrate an end-to-end workflow for solving a problem in condensed matter physics on a quantum computer that serves to highlight some of Qiskit's capabilities, for example the representation and optimization of circuits at various abstraction levels, its scalability and retargetability to new gates, and the use of quantum-classical computations via dynamic circuits. Lastly, we discuss some of the ecosystem of tools and plugins that extend Qiskit for various tasks, and the future ahead.
△ Less
Submitted 18 June, 2024; v1 submitted 14 May, 2024;
originally announced May 2024.
-
MuTox: Universal MUltilingual Audio-based TOXicity Dataset and Zero-shot Detector
Authors:
Marta R. Costa-jussà,
Mariano Coria Meglioli,
Pierre Andrews,
David Dale,
Prangthip Hansanti,
Elahe Kalbassi,
Alex Mourachko,
Christophe Ropers,
Carleigh Wood
Abstract:
Research in toxicity detection in natural language processing for the speech modality (audio-based) is quite limited, particularly for languages other than English. To address these limitations and lay the groundwork for truly multilingual audio-based toxicity detection, we introduce MuTox, the first highly multilingual audio-based dataset with toxicity labels. The dataset comprises 20,000 audio u…
▽ More
Research in toxicity detection in natural language processing for the speech modality (audio-based) is quite limited, particularly for languages other than English. To address these limitations and lay the groundwork for truly multilingual audio-based toxicity detection, we introduce MuTox, the first highly multilingual audio-based dataset with toxicity labels. The dataset comprises 20,000 audio utterances for English and Spanish, and 4,000 for the other 19 languages. To demonstrate the quality of this dataset, we trained the MuTox audio-based toxicity classifier, which enables zero-shot toxicity detection across a wide range of languages. This classifier outperforms existing text-based trainable classifiers by more than 1% AUC, while expanding the language coverage more than tenfold. When compared to a wordlist-based classifier that covers a similar number of languages, MuTox improves precision and recall by approximately 2.5 times. This significant improvement underscores the potential of MuTox in advancing the field of audio-based toxicity detection.
△ Less
Submitted 27 June, 2024; v1 submitted 10 January, 2024;
originally announced January 2024.
-
Audiobox: Unified Audio Generation with Natural Language Prompts
Authors:
Apoorv Vyas,
Bowen Shi,
Matthew Le,
Andros Tjandra,
Yi-Chiao Wu,
Baishan Guo,
Jiemin Zhang,
Xinyue Zhang,
Robert Adkins,
William Ngan,
Jeff Wang,
Ivan Cruz,
Bapi Akula,
Akinniyi Akinyemi,
Brian Ellis,
Rashel Moritz,
Yael Yungster,
Alice Rakotoarison,
Liang Tan,
Chris Summers,
Carleigh Wood,
Joshua Lane,
Mary Williamson,
Wei-Ning Hsu
Abstract:
Audio is an essential part of our life, but creating it often requires expertise and is time-consuming. Research communities have made great progress over the past year advancing the performance of large scale audio generative models for a single modality (speech, sound, or music) through adopting more powerful generative models and scaling data. However, these models lack controllability in sever…
▽ More
Audio is an essential part of our life, but creating it often requires expertise and is time-consuming. Research communities have made great progress over the past year advancing the performance of large scale audio generative models for a single modality (speech, sound, or music) through adopting more powerful generative models and scaling data. However, these models lack controllability in several aspects: speech generation models cannot synthesize novel styles based on text description and are limited on domain coverage such as outdoor environments; sound generation models only provide coarse-grained control based on descriptions like "a person speaking" and would only generate mumbling human voices. This paper presents Audiobox, a unified model based on flow-matching that is capable of generating various audio modalities. We design description-based and example-based prompting to enhance controllability and unify speech and sound generation paradigms. We allow transcript, vocal, and other audio styles to be controlled independently when generating speech. To improve model generalization with limited labels, we adapt a self-supervised infilling objective to pre-train on large quantities of unlabeled audio. Audiobox sets new benchmarks on speech and sound generation (0.745 similarity on Librispeech for zero-shot TTS; 0.77 FAD on AudioCaps for text-to-sound) and unlocks new methods for generating audio with novel vocal and acoustic styles. We further integrate Bespoke Solvers, which speeds up generation by over 25 times compared to the default ODE solver for flow-matching, without loss of performance on several tasks. Our demo is available at https://audiobox.metademolab.com/
△ Less
Submitted 25 December, 2023;
originally announced December 2023.
-
Seamless: Multilingual Expressive and Streaming Speech Translation
Authors:
Seamless Communication,
Loïc Barrault,
Yu-An Chung,
Mariano Coria Meglioli,
David Dale,
Ning Dong,
Mark Duppenthaler,
Paul-Ambroise Duquenne,
Brian Ellis,
Hady Elsahar,
Justin Haaheim,
John Hoffman,
Min-Jae Hwang,
Hirofumi Inaguma,
Christopher Klaiber,
Ilia Kulikov,
Pengwei Li,
Daniel Licht,
Jean Maillard,
Ruslan Mavlyutov,
Alice Rakotoarison,
Kaushik Ram Sadagopan,
Abinesh Ramakrishnan,
Tuan Tran,
Guillaume Wenzek
, et al. (40 additional authors not shown)
Abstract:
Large-scale automatic speech translation systems today lack key features that help machine-mediated communication feel seamless when compared to human-to-human dialogue. In this work, we introduce a family of models that enable end-to-end expressive and multilingual translations in a streaming fashion. First, we contribute an improved version of the massively multilingual and multimodal SeamlessM4…
▽ More
Large-scale automatic speech translation systems today lack key features that help machine-mediated communication feel seamless when compared to human-to-human dialogue. In this work, we introduce a family of models that enable end-to-end expressive and multilingual translations in a streaming fashion. First, we contribute an improved version of the massively multilingual and multimodal SeamlessM4T model-SeamlessM4T v2. This newer model, incorporating an updated UnitY2 framework, was trained on more low-resource language data. SeamlessM4T v2 provides the foundation on which our next two models are initiated. SeamlessExpressive enables translation that preserves vocal styles and prosody. Compared to previous efforts in expressive speech research, our work addresses certain underexplored aspects of prosody, such as speech rate and pauses, while also preserving the style of one's voice. As for SeamlessStreaming, our model leverages the Efficient Monotonic Multihead Attention mechanism to generate low-latency target translations without waiting for complete source utterances. As the first of its kind, SeamlessStreaming enables simultaneous speech-to-speech/text translation for multiple source and target languages. To ensure that our models can be used safely and responsibly, we implemented the first known red-teaming effort for multimodal machine translation, a system for the detection and mitigation of added toxicity, a systematic evaluation of gender bias, and an inaudible localized watermarking mechanism designed to dampen the impact of deepfakes. Consequently, we bring major components from SeamlessExpressive and SeamlessStreaming together to form Seamless, the first publicly available system that unlocks expressive cross-lingual communication in real-time. The contributions to this work are publicly released and accessible at https://github.com/facebookresearch/seamless_communication
△ Less
Submitted 8 December, 2023;
originally announced December 2023.
-
SeamlessM4T: Massively Multilingual & Multimodal Machine Translation
Authors:
Seamless Communication,
Loïc Barrault,
Yu-An Chung,
Mariano Cora Meglioli,
David Dale,
Ning Dong,
Paul-Ambroise Duquenne,
Hady Elsahar,
Hongyu Gong,
Kevin Heffernan,
John Hoffman,
Christopher Klaiber,
Pengwei Li,
Daniel Licht,
Jean Maillard,
Alice Rakotoarison,
Kaushik Ram Sadagopan,
Guillaume Wenzek,
Ethan Ye,
Bapi Akula,
Peng-Jen Chen,
Naji El Hachem,
Brian Ellis,
Gabriel Mejia Gonzalez,
Justin Haaheim
, et al. (43 additional authors not shown)
Abstract:
What does it take to create the Babel Fish, a tool that can help individuals translate speech between any two languages? While recent breakthroughs in text-based models have pushed machine translation coverage beyond 200 languages, unified speech-to-speech translation models have yet to achieve similar strides. More specifically, conventional speech-to-speech translation systems rely on cascaded s…
▽ More
What does it take to create the Babel Fish, a tool that can help individuals translate speech between any two languages? While recent breakthroughs in text-based models have pushed machine translation coverage beyond 200 languages, unified speech-to-speech translation models have yet to achieve similar strides. More specifically, conventional speech-to-speech translation systems rely on cascaded systems that perform translation progressively, putting high-performing unified systems out of reach. To address these gaps, we introduce SeamlessM4T, a single model that supports speech-to-speech translation, speech-to-text translation, text-to-speech translation, text-to-text translation, and automatic speech recognition for up to 100 languages. To build this, we used 1 million hours of open speech audio data to learn self-supervised speech representations with w2v-BERT 2.0. Subsequently, we created a multimodal corpus of automatically aligned speech translations. Filtered and combined with human-labeled and pseudo-labeled data, we developed the first multilingual system capable of translating from and into English for both speech and text. On FLEURS, SeamlessM4T sets a new standard for translations into multiple target languages, achieving an improvement of 20% BLEU over the previous SOTA in direct speech-to-text translation. Compared to strong cascaded models, SeamlessM4T improves the quality of into-English translation by 1.3 BLEU points in speech-to-text and by 2.6 ASR-BLEU points in speech-to-speech. Tested for robustness, our system performs better against background noises and speaker variations in speech-to-text tasks compared to the current SOTA model. Critically, we evaluated SeamlessM4T on gender bias and added toxicity to assess translation safety. Finally, all contributions in this work are open-sourced and accessible at https://github.com/facebookresearch/seamless_communication
△ Less
Submitted 24 October, 2023; v1 submitted 22 August, 2023;
originally announced August 2023.
-
Efficient techniques to GPU Accelerations of Multi-Shot Quantum Computing Simulations
Authors:
Jun Doi,
Hiroshi Horii,
Christopher Wood
Abstract:
Quantum computers are becoming practical for computing numerous applications. However, simulating quantum computing on classical computers is still demanding yet useful because current quantum computers are limited because of computer resources, hardware limits, instability, and noises. Improving quantum computing simulation performance in classical computers will contribute to the development of…
▽ More
Quantum computers are becoming practical for computing numerous applications. However, simulating quantum computing on classical computers is still demanding yet useful because current quantum computers are limited because of computer resources, hardware limits, instability, and noises. Improving quantum computing simulation performance in classical computers will contribute to the development of quantum computers and their algorithms. Quantum computing simulations on classical computers require long performance times, especially for quantum circuits with a large number of qubits or when simulating a large number of shots for noise simulations or circuits with intermediate measures. Graphical processing units (GPU) are suitable to accelerate quantum computer simulations by exploiting their computational power and high bandwidth memory and they have a large advantage in simulating relatively larger qubits circuits. However, GPUs are inefficient at simulating multi-shots runs with noises because the randomness prevents highly parallelization. In addition, GPUs have a disadvantage in simulating circuits with a small number of qubits because of the large overheads in GPU kernel execution. In this paper, we introduce optimization techniques for multi-shot simulations on GPUs. We gather multiple shots of simulations into a single GPU kernel execution to reduce overheads by scheduling randomness caused by noises. In addition, we introduce shot-branching that reduces calculations and memory usage for multi-shot simulations. By using these techniques, we speed up x10 from previous implementations.
△ Less
Submitted 7 August, 2023;
originally announced August 2023.
-
Multilingual Holistic Bias: Extending Descriptors and Patterns to Unveil Demographic Biases in Languages at Scale
Authors:
Marta R. Costa-jussà,
Pierre Andrews,
Eric Smith,
Prangthip Hansanti,
Christophe Ropers,
Elahe Kalbassi,
Cynthia Gao,
Daniel Licht,
Carleigh Wood
Abstract:
We introduce a multilingual extension of the HOLISTICBIAS dataset, the largest English template-based taxonomy of textual people references: MULTILINGUALHOLISTICBIAS. This extension consists of 20,459 sentences in 50 languages distributed across all 13 demographic axes. Source sentences are built from combinations of 118 demographic descriptors and three patterns, excluding nonsensical combination…
▽ More
We introduce a multilingual extension of the HOLISTICBIAS dataset, the largest English template-based taxonomy of textual people references: MULTILINGUALHOLISTICBIAS. This extension consists of 20,459 sentences in 50 languages distributed across all 13 demographic axes. Source sentences are built from combinations of 118 demographic descriptors and three patterns, excluding nonsensical combinations. Multilingual translations include alternatives for gendered languages that cover gendered translations when there is ambiguity in English. Our benchmark is intended to uncover demographic imbalances and be the tool to quantify mitigations towards them.
Our initial findings show that translation quality for EN-to-XX translations is an average of 8 spBLEU better when evaluating with the masculine human reference compared to feminine. In the opposite direction, XX-to-EN, we compare the robustness of the model when the source input only differs in gender (masculine or feminine) and masculine translations are an average of almost 4 spBLEU better than feminine. When embedding sentences to a joint multilingual sentence representations space, we find that for most languages masculine translations are significantly closer to the English neutral sentences when embedded.
△ Less
Submitted 22 May, 2023;
originally announced May 2023.
-
You get PADDING, everybody gets PADDING! You get privacy? Evaluating practical QUIC website fingerprinting protections for the masses
Authors:
Sandra Siby,
Ludovic Barman,
Christopher Wood,
Marwan Fayed,
Nick Sullivan,
Carmela Troncoso
Abstract:
Website fingerprinting (WF) is a well-know threat to users' web privacy. New internet standards, such as QUIC, include padding to support defenses against WF. Previous work only analyzes the effectiveness of defenses when users are behind a VPN. Yet, this is not how most users browse the Internet. In this paper, we provide a comprehensive evaluation of QUIC-padding-based defenses against WF when u…
▽ More
Website fingerprinting (WF) is a well-know threat to users' web privacy. New internet standards, such as QUIC, include padding to support defenses against WF. Previous work only analyzes the effectiveness of defenses when users are behind a VPN. Yet, this is not how most users browse the Internet. In this paper, we provide a comprehensive evaluation of QUIC-padding-based defenses against WF when users directly browse the web. We confirm previous claims that network-layer padding cannot provide good protection against powerful adversaries capable of observing all traffic traces. We further demonstrate that such padding is ineffective even against adversaries with constraints on traffic visibility and processing power. At the application layer, we show that defenses need to be deployed by both first and third parties, and that they can only thwart traffic analysis in limited situations. We identify challenges to deploy effective WF defenses and provide recommendations to address them.
△ Less
Submitted 15 December, 2022; v1 submitted 15 March, 2022;
originally announced March 2022.
-
Might I Get Pwned: A Second Generation Compromised Credential Checking Service
Authors:
Bijeeta Pal,
Mazharul Islam,
Marina Sanusi,
Nick Sullivan,
Luke Valenta,
Tara Whalen,
Christopher Wood,
Thomas Ristenpart,
Rahul Chattejee
Abstract:
Credential stuffing attacks use stolen passwords to log into victim accounts. To defend against these attacks, recently deployed compromised credential checking (C3) services provide APIs that help users and companies check whether a username, password pair is exposed. These services however only check if the exact password is leaked, and therefore do not mitigate credential tweaking attacks - att…
▽ More
Credential stuffing attacks use stolen passwords to log into victim accounts. To defend against these attacks, recently deployed compromised credential checking (C3) services provide APIs that help users and companies check whether a username, password pair is exposed. These services however only check if the exact password is leaked, and therefore do not mitigate credential tweaking attacks - attempts to compromise a user account with variants of a user's leaked passwords. Recent work has shown credential tweaking attacks can compromise accounts quite effectively even when the credential stuffing countermeasures are in place. We initiate work on C3 services that protect users from credential tweaking attacks. The core underlying challenge is how to identify passwords that are similar to their leaked passwords while preserving honest clients' privacy and also preventing malicious clients from extracting breach data from the service. We formalize the problem and explore ways to measure password similarity that balance efficacy, performance, and security. Based on this study, we design "Might I Get Pwned" (MIGP), a new kind of breach alerting service. Our simulations show that MIGP reduces the efficacy of state-of-the-art 1000-guess credential tweaking attacks by 94%. MIGP preserves user privacy and limits potential exposure of sensitive breach entries. We show that the protocol is fast, with response time close to existing C3 services. We worked with Cloudflare to deploy MIGP in practice.
△ Less
Submitted 18 March, 2022; v1 submitted 29 September, 2021;
originally announced September 2021.
-
Efficient, Interpretable Graph Neural Network Representation for Angle-dependent Properties and its Application to Optical Spectroscopy
Authors:
Tim Hsu,
Tuan Anh Pham,
Nathan Keilbart,
Stephen Weitzner,
James Chapman,
Penghao Xiao,
S. Roger Qiu,
Xiao Chen,
Brandon C. Wood
Abstract:
Graph neural networks are attractive for learning properties of atomic structures thanks to their intuitive graph encoding of atoms and bonds. However, conventional encoding does not include angular information, which is critical for describing atomic arrangements in disordered systems. In this work, we extend the recently proposed ALIGNN encoding, which incorporates bond angles, to also include d…
▽ More
Graph neural networks are attractive for learning properties of atomic structures thanks to their intuitive graph encoding of atoms and bonds. However, conventional encoding does not include angular information, which is critical for describing atomic arrangements in disordered systems. In this work, we extend the recently proposed ALIGNN encoding, which incorporates bond angles, to also include dihedral angles (ALIGNN-d). This simple extension leads to a memory-efficient graph representation that captures the complete geometry of atomic structures. ALIGNN-d is applied to predict the infrared optical response of dynamically disordered Cu(II) aqua complexes, leveraging the intrinsic interpretability to elucidate the relative contributions of individual structural components. Bond and dihedral angles are found to be critical contributors to the fine structure of the absorption response, with distortions representing transitions between more common geometries exhibiting the strongest absorption intensity. Future directions for further development of ALIGNN-d are discussed.
△ Less
Submitted 15 February, 2022; v1 submitted 23 September, 2021;
originally announced September 2021.
-
PDBench: Evaluating Computational Methods for Protein Sequence Design
Authors:
Leonardo V. Castorina,
Rokas Petrenas,
Kartic Subr,
Christopher W. Wood
Abstract:
Proteins perform critical processes in all living systems: converting solar energy into chemical energy, replicating DNA, as the basis of highly performant materials, sensing and much more. While an incredible range of functionality has been sampled in nature, it accounts for a tiny fraction of the possible protein universe. If we could tap into this pool of unexplored protein structures, we could…
▽ More
Proteins perform critical processes in all living systems: converting solar energy into chemical energy, replicating DNA, as the basis of highly performant materials, sensing and much more. While an incredible range of functionality has been sampled in nature, it accounts for a tiny fraction of the possible protein universe. If we could tap into this pool of unexplored protein structures, we could search for novel proteins with useful properties that we could apply to tackle the environmental and medical challenges facing humanity. This is the purpose of protein design.
Sequence design is an important aspect of protein design, and many successful methods to do this have been developed. Recently, deep-learning methods that frame it as a classification problem have emerged as a powerful approach. Beyond their reported improvement in performance, their primary advantage over physics-based methods is that the computational burden is shifted from the user to the developers, thereby increasing accessibility to the design method. Despite this trend, the tools for assessment and comparison of such models remain quite generic. The goal of this paper is to both address the timely problem of evaluation and to shine a spotlight, within the Machine Learning community, on specific assessment criteria that will accelerate impact.
We present a carefully curated benchmark set of proteins and propose a number of standard tests to assess the performance of deep learning based methods. Our robust benchmark provides biological insight into the behaviour of design methods, which is essential for evaluating their performance and utility. We compare five existing models with two novel models for sequence prediction. Finally, we test the designs produced by these models with AlphaFold2, a state-of-the-art structure-prediction algorithm, to determine if they are likely to fold into the intended 3D shapes.
△ Less
Submitted 28 September, 2021; v1 submitted 16 September, 2021;
originally announced September 2021.
-
A generative adversarial approach to facilitate archival-quality histopathologic diagnoses from frozen tissue sections
Authors:
Kianoush Falahkheirkhah,
Tao Guo,
Michael Hwang,
Pheroze Tamboli,
Christopher G Wood,
Jose A Karam,
Kanishka Sircar,
Rohit Bhargava
Abstract:
In clinical diagnostics and research involving histopathology, formalin fixed paraffin embedded (FFPE) tissue is almost universally favored for its superb image quality. However, tissue processing time (more than 24 hours) can slow decision-making. In contrast, fresh frozen (FF) processing (less than 1 hour) can yield rapid information but diagnostic accuracy is suboptimal due to lack of clearing,…
▽ More
In clinical diagnostics and research involving histopathology, formalin fixed paraffin embedded (FFPE) tissue is almost universally favored for its superb image quality. However, tissue processing time (more than 24 hours) can slow decision-making. In contrast, fresh frozen (FF) processing (less than 1 hour) can yield rapid information but diagnostic accuracy is suboptimal due to lack of clearing, morphologic deformation and more frequent artifacts. Here, we bridge this gap using artificial intelligence. We synthesize FFPE-like images ,virtual FFPE, from FF images using a generative adversarial network (GAN) from 98 paired kidney samples derived from 40 patients. Five board-certified pathologists evaluated the results in a blinded test. Image quality of the virtual FFPE data was assessed to be high and showed a close resemblance to real FFPE images. Clinical assessments of disease on the virtual FFPE images showed a higher inter-observer agreement compared to FF images. The nearly instantaneously generated virtual FFPE images can not only reduce time to information but can facilitate more precise diagnosis from routine FF images without extraneous costs and effort.
△ Less
Submitted 24 August, 2021;
originally announced August 2021.
-
Oblivious DNS over HTTPS (ODoH): A Practical Privacy Enhancement to DNS
Authors:
Sudheesh Singanamalla,
Suphanat Chunhapanya,
Marek Vavruša,
Tanya Verma,
Peter Wu,
Marwan Fayed,
Kurtis Heimerl,
Nick Sullivan,
Christopher Wood
Abstract:
The Domain Name System (DNS) is the foundation of a human-usable Internet, responding to client queries for host-names with corresponding IP addresses and records. Traditional DNS is also unencrypted, and leaks user information to network operators. Recent efforts to secure DNS using DNS over TLS (DoT) and DNS over HTTPS (DoH) have been gaining traction, ostensibly protecting traffic and hiding co…
▽ More
The Domain Name System (DNS) is the foundation of a human-usable Internet, responding to client queries for host-names with corresponding IP addresses and records. Traditional DNS is also unencrypted, and leaks user information to network operators. Recent efforts to secure DNS using DNS over TLS (DoT) and DNS over HTTPS (DoH) have been gaining traction, ostensibly protecting traffic and hiding content from on-lookers. However, one of the criticisms of DoT and DoH is brought to bear by the small number of large-scale deployments (e.g., Comcast, Google, Cloudflare): DNS resolvers can associate query contents with client identities in the form of IP addresses. Oblivious DNS over HTTPS(ODoH) safeguards against this problem. In this paper we ask what it would take to make ODoH practical? We describe ODoH, a practical DNS protocol aimed at resolving this issue by both protecting the client's content and identity. We implement and deploy the protocol, and perform measurements to show that ODoH has comparable performance to protocols like DoH and DoT which are gaining widespread adoption, while improving client privacy, making ODoH a practical privacy enhancing replacement for the usage of DNS.
△ Less
Submitted 19 November, 2020;
originally announced November 2020.
-
Turning Machines: a simple algorithmic model for molecular robotics
Authors:
Irina Kostitsyna,
Cai Wood,
Damien Woods
Abstract:
Molecular robotics is challenging, so it seems best to keep it simple. We consider an abstract molecular robotics model based on simple folding instructions that execute asynchronously. Turning Machines are a simple 1D to 2D folding model, also easily generalisable to 2D to 3D folding. A Turning Machine starts out as a line of connected monomers in the discrete plane, each with an associated turni…
▽ More
Molecular robotics is challenging, so it seems best to keep it simple. We consider an abstract molecular robotics model based on simple folding instructions that execute asynchronously. Turning Machines are a simple 1D to 2D folding model, also easily generalisable to 2D to 3D folding. A Turning Machine starts out as a line of connected monomers in the discrete plane, each with an associated turning number. A monomer turns relative to its neighbours, executing a unit-distance translation that drags other monomers along with it, and through collective motion the initial set of monomers eventually folds into a programmed shape. We provide a suite of tools for reasoning about Turning Machines by fully characterising their ability to execute line rotations: executing an almost-full line rotation of $5π/3$ radians is possible, yet a full $2π$ rotation is impossible. Furthermore, line rotations up to $5π/3$ are executed efficiently, in $O(\log n)$ expected time in our continuous time Markov chain time model. We then show that such line-rotations represent a fundamental primitive in the model, by using them to efficiently and asynchronously fold shapes. In particular, arbitrarily large zig-zag-rastered squares and zig-zag paths are foldable, as are $y$-monotone shapes albeit with error (bounded by perimeter length). Finally, we give shapes that despite having paths that traverse all their points, are in fact impossible to fold, as well as techniques for folding certain classes of (scaled) shapes without error. Our approach relies on careful geometric-based analyses of the feats possible and impossible by a very simple robotic system, and pushes conceptional hardness towards mathematical analysis and away from molecular implementation.
△ Less
Submitted 24 January, 2022; v1 submitted 1 September, 2020;
originally announced September 2020.
-
DeepWeeds: A Multiclass Weed Species Image Dataset for Deep Learning
Authors:
Alex Olsen,
Dmitry A. Konovalov,
Bronson Philippa,
Peter Ridd,
Jake C. Wood,
Jamie Johns,
Wesley Banks,
Benjamin Girgenti,
Owen Kenny,
James Whinney,
Brendan Calvert,
Mostafa Rahimi Azghadi,
Ronald D. White
Abstract:
Robotic weed control has seen increased research of late with its potential for boosting productivity in agriculture. Majority of works focus on develo** robotics for croplands, ignoring the weed management problems facing rangeland stock farmers. Perhaps the greatest obstacle to widespread uptake of robotic weed control is the robust classification of weed species in their natural environment.…
▽ More
Robotic weed control has seen increased research of late with its potential for boosting productivity in agriculture. Majority of works focus on develo** robotics for croplands, ignoring the weed management problems facing rangeland stock farmers. Perhaps the greatest obstacle to widespread uptake of robotic weed control is the robust classification of weed species in their natural environment. The unparalleled successes of deep learning make it an ideal candidate for recognising various weed species in the complex rangeland environment. This work contributes the first large, public, multiclass image dataset of weed species from the Australian rangelands; allowing for the development of robust classification methods to make robotic weed control viable. The DeepWeeds dataset consists of 17,509 labelled images of eight nationally significant weed species native to eight locations across northern Australia. This paper presents a baseline for classification performance on the dataset using the benchmark deep learning models, Inception-v3 and ResNet-50. These models achieved an average classification accuracy of 95.1% and 95.7%, respectively. We also demonstrate real time performance of the ResNet-50 architecture, with an average inference time of 53.4 ms per image. These strong results bode well for future field implementation of robotic weed control methods in the Australian rangelands.
△ Less
Submitted 14 February, 2019; v1 submitted 9 October, 2018;
originally announced October 2018.
-
Qiskit Backend Specifications for OpenQASM and OpenPulse Experiments
Authors:
David C. McKay,
Thomas Alexander,
Luciano Bello,
Michael J. Biercuk,
Lev Bishop,
Jiayin Chen,
Jerry M. Chow,
Antonio D. Córcoles,
Daniel Egger,
Stefan Filipp,
Juan Gomez,
Michael Hush,
Ali Javadi-Abhari,
Diego Moreda,
Paul Nation,
Brent Paulovicks,
Erick Winston,
Christopher J. Wood,
James Wootton,
Jay M. Gambetta
Abstract:
As interest in quantum computing grows, there is a pressing need for standardized API's so that algorithm designers, circuit designers, and physicists can be provided a common reference frame for designing, executing, and optimizing experiments. There is also a need for a language specification that goes beyond gates and allows users to specify the time dynamics of a quantum experiment and recover…
▽ More
As interest in quantum computing grows, there is a pressing need for standardized API's so that algorithm designers, circuit designers, and physicists can be provided a common reference frame for designing, executing, and optimizing experiments. There is also a need for a language specification that goes beyond gates and allows users to specify the time dynamics of a quantum experiment and recover the time dynamics of the output. In this document we provide a specification for a common interface to backends (simulators and experiments) and a standarized data structure (Qobj --- quantum object) for sending experiments to those backends via Qiskit. We also introduce OpenPulse, a language for specifying pulse level control (i.e. control of the continuous time dynamics) of a general quantum device independent of the specific hardware implementation.
△ Less
Submitted 10 September, 2018;
originally announced September 2018.
-
Content-Centric Networking - Architectural Overview and Protocol Description
Authors:
Marc Mosko,
Ignacio Solis,
Christopher A. Wood
Abstract:
This document describes the core concepts of the CCNx architecture and presents a minimum network protocol based on two messages: Interests and Content Objects. It specifies the set of mandatory and optional fields within those messages and describes their behavior and interpretation. This architecture and protocol specification is independent of a specific wire encoding.
This document describes the core concepts of the CCNx architecture and presents a minimum network protocol based on two messages: Interests and Content Objects. It specifies the set of mandatory and optional fields within those messages and describes their behavior and interpretation. This architecture and protocol specification is independent of a specific wire encoding.
△ Less
Submitted 22 June, 2017;
originally announced June 2017.
-
Living in a PIT-less World: A Case Against Stateful Forwarding in Content-Centric Networking
Authors:
Cesar Ghali,
Gene Tsudik,
Ersin Uzun,
Christopher A. Wood
Abstract:
Information-Centric Networking (ICN) is a recent paradigm that claims to mitigate some limitations of the current IP-based Internet architecture. The centerpiece of ICN is named and addressable content, rather than hosts or interfaces. Content-Centric Networking (CCN) is a prominent ICN instance that shares the fundamental architectural design with its equally popular academic sibling Named-Data N…
▽ More
Information-Centric Networking (ICN) is a recent paradigm that claims to mitigate some limitations of the current IP-based Internet architecture. The centerpiece of ICN is named and addressable content, rather than hosts or interfaces. Content-Centric Networking (CCN) is a prominent ICN instance that shares the fundamental architectural design with its equally popular academic sibling Named-Data Networking (NDN). CCN eschews source addresses and creates one-time virtual circuits for every content request (called an interest). As an interest is forwarded it creates state in intervening routers and the requested content back is delivered over the reverse path using that state.
Although a stateful forwarding plane might be beneficial in terms of efficiency, and resilience to certain types of attacks, this has not been decisively proven via realistic experiments. Since kee** per-interest state complicates router operations and makes the infrastructure susceptible to router state exhaustion attacks (e.g., there is currently no effective defense against interest flooding attacks), the value of the stateful forwarding plane in CCN should be re-examined.
In this paper, we explore supposed benefits and various problems of the stateful forwarding plane. We then argue that its benefits are uncertain at best and it should not be a mandatory CCN feature. To this end, we propose a new stateless architecture for CCN that provides nearly all functionality of the stateful design without its headaches. We analyze performance and resource requirements of the proposed architecture, via experiments.
△ Less
Submitted 24 December, 2015;
originally announced December 2015.
-
BEAD: Best Effort Autonomous Deletion in Content-Centric Networking
Authors:
Cesar Ghali,
Gene Tsudik,
Christopher A. Wood
Abstract:
A core feature of Content-Centric Networking (CCN) is opportunistic content caching in routers. It enables routers to satisfy content requests with in-network cached copies, thereby reducing bandwidth utilization, decreasing congestion, and improving overall content retrieval latency.
One major drawback of in-network caching is that content producers have no knowledge about where their content i…
▽ More
A core feature of Content-Centric Networking (CCN) is opportunistic content caching in routers. It enables routers to satisfy content requests with in-network cached copies, thereby reducing bandwidth utilization, decreasing congestion, and improving overall content retrieval latency.
One major drawback of in-network caching is that content producers have no knowledge about where their content is stored. This is problematic if a producer wishes to delete its content. In this paper, we show how to address this problem with a protocol called BEAD (Best-Effort Autonomous Deletion). BEAD achieves content deletion via small and secure packets that resemble current CCN messages. We discuss several methods of routing BEAD messages from producers to caching routers with varying levels of network overhead and efficacy. We assess BEAD performance via simulations and provide a detailed analysis of its properties.
△ Less
Submitted 22 December, 2015;
originally announced December 2015.
-
Practical Accounting in Content-Centric Networking (extended version)
Authors:
Cesar Ghali,
Gene Tsudik,
Christopher A. Wood,
Edmund Yeh
Abstract:
Content-Centric Networking (CCN) is a new class of network architectures designed to address some key limitations of the current IP-based Internet. One of its main features is in-network content caching, which allows requests for content to be served by routers. Despite improved bandwidth utilization and lower latency for popular content retrieval, in-network content caching offers producers no me…
▽ More
Content-Centric Networking (CCN) is a new class of network architectures designed to address some key limitations of the current IP-based Internet. One of its main features is in-network content caching, which allows requests for content to be served by routers. Despite improved bandwidth utilization and lower latency for popular content retrieval, in-network content caching offers producers no means of collecting information about content that is requested and later served from network caches. Such information is often needed for accounting purposes. In this paper, we design some secure accounting schemes that vary in the degree of consumer, router, and producer involvement. Next, we identify and analyze performance and security tradeoffs, and show that specific per-consumer accounting is impossible in the presence of router caches and without application-specific support. We then recommend accounting strategies that entail a few simple requirements for CCN architectures. Finally, our experimental results show that forms of native and secure CCN accounting are both more viable and practical than application-specific approaches with little modification to the existing architecture and protocol.
△ Less
Submitted 7 October, 2015;
originally announced October 2015.
-
Interest-Based Access Control for Content Centric Networks (extended version)
Authors:
Cesar Ghali,
Marc A. Schlosberg,
Gene Tsudik,
Christopher A. Wood
Abstract:
Content-Centric Networking (CCN) is an emerging network architecture designed to overcome limitations of the current IP-based Internet. One of the fundamental tenets of CCN is that data, or content, is a named and addressable entity in the network. Consumers request content by issuing interest messages with the desired content name. These interests are forwarded by routers to producers, and the re…
▽ More
Content-Centric Networking (CCN) is an emerging network architecture designed to overcome limitations of the current IP-based Internet. One of the fundamental tenets of CCN is that data, or content, is a named and addressable entity in the network. Consumers request content by issuing interest messages with the desired content name. These interests are forwarded by routers to producers, and the resulting content object is returned and optionally cached at each router along the path. In-network caching makes it difficult to enforce access control policies on sensitive content outside of the producer since routers only use interest information for forwarding decisions. To that end, we propose an Interest-Based Access Control (IBAC) scheme that enables access control enforcement using only information contained in interest messages, i.e., by making sensitive content names unpredictable to unauthorized parties. Our IBAC scheme supports both hash- and encryption-based name obfuscation. We address the problem of interest replay attacks by formulating a mutual trust framework between producers and consumers that enables routers to perform authorization checks when satisfying interests from their cache. We assess the computational, storage, and bandwidth overhead of each IBAC variant. Our design is flexible and allows producers to arbitrarily specify and enforce any type of access control on content, without having to deal with the problems of content encryption and key distribution. This is the first comprehensive design for CCN access control using only information contained in interest messages.
△ Less
Submitted 22 May, 2015;
originally announced May 2015.
-
Secure Fragmentation for Content-Centric Networks (extended version)
Authors:
Cesar Ghali,
Ashok Narayanan,
David Oran,
Gene Tsudik,
Christopher A. Wood
Abstract:
Content-Centric Networking (CCN) is a communication paradigm that emphasizes content distribution. Named-Data Networking (NDN) is an instantiation of CCN, a candidate Future Internet Architecture. NDN supports human-readable content naming and router-based content caching which lends itself to efficient, secure, and scalable content distribution. Because of NDN's fundamental requirement that each…
▽ More
Content-Centric Networking (CCN) is a communication paradigm that emphasizes content distribution. Named-Data Networking (NDN) is an instantiation of CCN, a candidate Future Internet Architecture. NDN supports human-readable content naming and router-based content caching which lends itself to efficient, secure, and scalable content distribution. Because of NDN's fundamental requirement that each content object must be signed by its producer, fragmentation has been considered incompatible with NDN since it precludes authentication of individual content fragments by routers. The alternative is to perform hop-by-hop reassembly, which incurs prohibitive delays. In this paper, we show that secure and efficient content fragmentation is both possible and even advantageous in NDN and similar content-centric network architectures that involve signed content. We design a concrete technique that facilitates efficient and secure content fragmentation in NDN, discuss its security guarantees and assess performance. We also describe a prototype implementation and compare performance of cut-through with hop-by-hop fragmentation and reassembly.
△ Less
Submitted 19 August, 2015; v1 submitted 12 May, 2014;
originally announced May 2014.