-
Multimodal Large Language Models with Fusion Low Rank Adaptation for Device Directed Speech Detection
Authors:
Shruti Palaskar,
Oggi Rudovic,
Sameer Dharur,
Florian Pesce,
Gautam Krishna,
Aswin Sivaraman,
Jack Berkowitz,
Ahmed Hussen Abdelaziz,
Saurabh Adya,
Ahmed Tewfik
Abstract:
Although Large Language Models (LLMs) have shown promise for human-like conversations, they are primarily pre-trained on text data. Incorporating audio or video improves performance, but collecting large-scale multimodal data and pre-training multimodal LLMs is challenging. To this end, we propose a Fusion Low Rank Adaptation (FLoRA) technique that efficiently adapts a pre-trained unimodal LLM to…
▽ More
Although Large Language Models (LLMs) have shown promise for human-like conversations, they are primarily pre-trained on text data. Incorporating audio or video improves performance, but collecting large-scale multimodal data and pre-training multimodal LLMs is challenging. To this end, we propose a Fusion Low Rank Adaptation (FLoRA) technique that efficiently adapts a pre-trained unimodal LLM to consume new, previously unseen modalities via low rank adaptation. For device-directed speech detection, using FLoRA, the multimodal LLM achieves 22% relative reduction in equal error rate (EER) over the text-only approach and attains performance parity with its full fine-tuning (FFT) counterpart while needing to tune only a fraction of its parameters. Furthermore, with the newly introduced adapter dropout, FLoRA is robust to missing data, improving over FFT by 20% lower EER and 56% lower false accept rate. The proposed approach scales well for model sizes from 16M to 3B parameters.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Jasper: Scalable and Fair Multicast for Financial Exchanges in the Cloud
Authors:
Muhammad Haseeb,
**kun Geng,
Ulysses Butler,
Xiyu Hao,
Daniel Duclos-Cavalcanti,
Anirudh Sivaraman
Abstract:
Financial exchanges have recently shown an interest in migrating to the public cloud for scalability, elasticity, and cost savings. However, financial exchanges often have strict network requirements that can be difficult to meet on the cloud. Notably, market participants (MPs) trade based on market data about different activities in the market. Exchanges often use switch multicast to disseminate…
▽ More
Financial exchanges have recently shown an interest in migrating to the public cloud for scalability, elasticity, and cost savings. However, financial exchanges often have strict network requirements that can be difficult to meet on the cloud. Notably, market participants (MPs) trade based on market data about different activities in the market. Exchanges often use switch multicast to disseminate market data to MPs. However, if one MP receives market data earlier than another, that MP would have an unfair advantage. To prevent this, financial exchanges often equalize exchange-to-MP cable lengths to provide near-simultaneous reception of market data at MPs.
As a cloud tenant, however, building a fair multicast service is challenging because of the lack of switch support for multicast, high latency variance, and the lack of native mechanisms for simultaneous data delivery in the cloud. Jasper introduces a solution that creates an overlay multicast tree within a cloud region that minimizes latency and latency variations through hedging, leverages recent advancements in clock synchronization to achieve simultaneous delivery, and addresses various sources of latency through an optimized DPDK/eBPF implementation -- while scaling to a thousand receivers. Jasper outperforms a prior system, CloudEx, and a commercial multicast solution provided by Amazon Web Services. We present different deployment models and their performance impact. A deployment model where MPs and the exchange do not have to trust each other is realized using confidential computing.
△ Less
Submitted 2 June, 2024; v1 submitted 14 February, 2024;
originally announced February 2024.
-
Application-Defined Receive Side Dispatching on the NIC
Authors:
Tao Wang,
**kun Lin,
Gianni Antichi,
Aurojit Panda,
Anirudh Sivaraman
Abstract:
Recently, some application (L7) processing has been moved to the network stack (including proxies) as a way to provide a common and application-agnostic interface for security policies, simplify service management, etc. This paper looks at whether L7 network functionality can be offloaded to SmartNICs to improve performance and reduce overheads. We investigate this question by examining how to off…
▽ More
Recently, some application (L7) processing has been moved to the network stack (including proxies) as a way to provide a common and application-agnostic interface for security policies, simplify service management, etc. This paper looks at whether L7 network functionality can be offloaded to SmartNICs to improve performance and reduce overheads. We investigate this question by examining how to offload to FPGA NICs the task of L7 dispatch: determining which application process must handle a particular application request. We find that existing protocols, e.g., gRPC on TCP or QUIC, are a hindrance for offloading because the data required to make L7 dispatch decisions can be spread out over many packets, requiring the NIC to buffer and parse a large number of packets to reconstruct L7 data. This paper presents QingNiao-an approach that co-designs application libraries, network protocols, and hardware-to make L7 dispatch feasible under NIC memory constraints. We prototype QingNiao on a 100Gbps FPGA NIC. Experiments with real-world applications show that QingNiao supports a variety of dispatch rules and achieves 7.5x to 8x throughput compared to state-of-the-art software L7 dispatcher. We have open-sourced QingNiao at [1].
△ Less
Submitted 8 December, 2023;
originally announced December 2023.
-
State-Compute Replication: Parallelizing High-Speed Stateful Packet Processing
Authors:
Qiongwen Xu,
Sebastiano Miano,
Xiangyu Gao,
Tao Wang,
Adithya Murugadass,
Songyuan Zhang,
Anirudh Sivaraman,
Gianni Antichi,
Srinivas Narayana
Abstract:
With the slowdown of Moore's law, CPU-oriented packet processing in software will be significantly outpaced by emerging line speeds of network interface cards (NICs). Single-core packet-processing throughput has saturated.
We consider the problem of high-speed packet processing with multiple CPU cores. The key challenge is state--memory that multiple packets must read and update. The prevailing…
▽ More
With the slowdown of Moore's law, CPU-oriented packet processing in software will be significantly outpaced by emerging line speeds of network interface cards (NICs). Single-core packet-processing throughput has saturated.
We consider the problem of high-speed packet processing with multiple CPU cores. The key challenge is state--memory that multiple packets must read and update. The prevailing method to scale throughput with multiple cores involves state sharding, processing all packets that update the same state, i.e., flow, at the same core. However, given the heavy-tailed nature of realistic flow size distributions, this method will be untenable in the near future, since total throughput is severely limited by single core performance.
This paper introduces state-compute replication, a principle to scale the throughput of a single stateful flow across multiple cores using replication. Our design leverages a packet history sequencer running on a NIC or top-of-the-rack switch to enable multiple cores to update state without explicit synchronization. Our experiments with realistic data center and wide-area Internet traces shows that state-compute replication can scale total packet-processing throughput linearly with cores, deterministically and independent of flow size distributions, across a range of realistic packet-processing programs.
△ Less
Submitted 16 June, 2024; v1 submitted 25 September, 2023;
originally announced September 2023.
-
P4Testgen: An Extensible Test Oracle For P4
Authors:
Fabian Ruffy,
Jed Liu,
Prathima Kotikalapudi,
Vojtěch Havel,
Hanneli Tavante,
Rob Sherwood,
Vladyslav Dubina,
Volodymyr Peschanenko,
Anirudh Sivaraman,
Nate Foster
Abstract:
We present P4Testgen, a test oracle for the P4$_{16}$ language. P4Testgen supports automatic test generation for any P4 target and is designed to be extensible to many P4 targets. It models the complete semantics of the target's packet-processing pipeline including the P4 language, architectures and externs, and target-specific extensions. To handle non-deterministic behaviors and complex externs…
▽ More
We present P4Testgen, a test oracle for the P4$_{16}$ language. P4Testgen supports automatic test generation for any P4 target and is designed to be extensible to many P4 targets. It models the complete semantics of the target's packet-processing pipeline including the P4 language, architectures and externs, and target-specific extensions. To handle non-deterministic behaviors and complex externs (e.g., checksums and hash functions), P4Testgen uses taint tracking and concolic execution. It also provides path selection strategies that reduce the number of tests required to achieve full coverage.
We have instantiated P4Testgen for the V1model, eBPF, PNA, and Tofino P4 architectures. Each extension required effort commensurate with the complexity of the target. We validated the tests generated by P4Testgen by running them across the entire P4C test suite as well as the programs supplied with the Tofino P4 Studio. Using the tool, we have also confirmed 25 bugs in mature, production toolchains for BMv2 and Tofino.
△ Less
Submitted 6 August, 2023; v1 submitted 28 November, 2022;
originally announced November 2022.
-
The Potential of Neural Speech Synthesis-based Data Augmentation for Personalized Speech Enhancement
Authors:
Anastasia Kuznetsova,
Aswin Sivaraman,
Minje Kim
Abstract:
With the advances in deep learning, speech enhancement systems benefited from large neural network architectures and achieved state-of-the-art quality. However, speaker-agnostic methods are not always desirable, both in terms of quality and their complexity, when they are to be used in a resource-constrained environment. One promising way is personalized speech enhancement (PSE), which is a smalle…
▽ More
With the advances in deep learning, speech enhancement systems benefited from large neural network architectures and achieved state-of-the-art quality. However, speaker-agnostic methods are not always desirable, both in terms of quality and their complexity, when they are to be used in a resource-constrained environment. One promising way is personalized speech enhancement (PSE), which is a smaller and easier speech enhancement problem for small models to solve, because it focuses on a particular test-time user. To achieve the personalization goal, while dealing with the typical lack of personal data, we investigate the effect of data augmentation based on neural speech synthesis (NSS). In the proposed method, we show that the quality of the NSS system's synthetic data matters, and if they are good enough the augmented dataset can be used to improve the PSE system that outperforms the speaker-agnostic baseline. The proposed PSE systems show significant complexity reduction while preserving the enhancement quality.
△ Less
Submitted 14 November, 2022;
originally announced November 2022.
-
High-Level Synthesis for Packet-Processing Pipelines
Authors:
Xiangyu Gao,
Divya Raghunathan,
Ruijie Fang,
Tao Wang,
Xiaotong Zhu,
Anirudh Sivaraman,
Srinivas Narayana,
Aarti Gupta
Abstract:
Compiling high-level programs to target high-speed packet-processing pipelines is a challenging combinatorial optimization problem. The compiler must configure the pipeline's resources to match the high-level semantics of the program, while packing all of the program's computation into the pipeline's limited resources. State of the art approaches tackle individual aspects of this problem. Yet, the…
▽ More
Compiling high-level programs to target high-speed packet-processing pipelines is a challenging combinatorial optimization problem. The compiler must configure the pipeline's resources to match the high-level semantics of the program, while packing all of the program's computation into the pipeline's limited resources. State of the art approaches tackle individual aspects of this problem. Yet, they miss opportunities to efficiently produce globally high-quality outcomes. We argue that High-Level Synthesis (HLS), previously applied to ASIC/FPGA design, is the right framework to decompose the compilation problem for pipelines into smaller pieces with modular solutions. We design an HLS-based compiler that works in three phases. Transformation rewrites programs to use more abundant pipeline resources, avoiding scarce ones. Synthesis breaks complex transactional code into configurations of pipelined compute units. Allocation maps the program's compute and memory to the hardware resources. We prototype these ideas in a compiler, CaT, which targets the Tofino pipeline and a cycle-accurate simulator of a Verilog hardware model of an RMT pipeline. CaT can handle programs that existing compilers cannot currently run on pipelines, generating code faster than existing compilers, while using fewer pipeline resources.
△ Less
Submitted 18 November, 2022; v1 submitted 11 November, 2022;
originally announced November 2022.
-
Nezha: Deployable and High-Performance Consensus Using Synchronized Clocks
Authors:
**kun Geng,
Anirudh Sivaraman,
Balaji Prabhakar,
Mendel Rosenblum
Abstract:
This paper presents a high-performance consensus protocol, Nezha, which can be deployed by cloud tenants without any support from their cloud provider. Nezha bridges the gap between protocols such as Multi-Paxos and Raft, which can be readily deployed and protocols such as NOPaxos and Speculative Paxos, that provide better performance, but require access to technologies such as programmable switch…
▽ More
This paper presents a high-performance consensus protocol, Nezha, which can be deployed by cloud tenants without any support from their cloud provider. Nezha bridges the gap between protocols such as Multi-Paxos and Raft, which can be readily deployed and protocols such as NOPaxos and Speculative Paxos, that provide better performance, but require access to technologies such as programmable switches and in-network prioritization, which cloud tenants do not have. Nezha uses a new multicast primitive called deadline-ordered multicast (DOM). DOM uses high-accuracy software clock synchronization to synchronize sender and receiver clocks. Senders tag messages with deadlines in synchronized time; receivers process messages in deadline order, on or after their deadline.
We compare Nezha with Multi-Paxos, Fast Paxos, Raft, a NOPaxos version we optimized for the cloud, and 2 recent protocols, Domino and TOQ-EPaxos, that use synchronized clocks. In throughput, Nezha outperforms all baselines by a median of 5.4x (range: 1.9-20.9x). In latency, Nezha outperforms five baselines by a median of 2.3x (range: 1.3-4.0x), with one exception: it sacrifices 33% latency compared with our optimized NOPaxos in one test. We also prototype two applications, a key-value store and a fair-access stock exchange, on top of Nezha to show that Nezha only modestly reduces their performance relative to an unreplicated system. Nezha is available at https://github.com/Steamgjk/Nezha.
△ Less
Submitted 24 March, 2023; v1 submitted 3 June, 2022;
originally announced June 2022.
-
FETA: Fairness Enforced Verifying, Training, and Predicting Algorithms for Neural Networks
Authors:
Kiarash Mohammadi,
Aishwarya Sivaraman,
Golnoosh Farnadi
Abstract:
Algorithmic decision making driven by neural networks has become very prominent in applications that directly affect people's quality of life. In this paper, we study the problem of verifying, training, and guaranteeing individual fairness of neural network models. A popular approach for enforcing fairness is to translate a fairness notion into constraints over the parameters of the model. However…
▽ More
Algorithmic decision making driven by neural networks has become very prominent in applications that directly affect people's quality of life. In this paper, we study the problem of verifying, training, and guaranteeing individual fairness of neural network models. A popular approach for enforcing fairness is to translate a fairness notion into constraints over the parameters of the model. However, such a translation does not always guarantee fair predictions of the trained neural network model. To address this challenge, we develop a counterexample-guided post-processing technique to provably enforce fairness constraints at prediction time. Contrary to prior work that enforces fairness only on points around test or train data, we are able to enforce and guarantee fairness on all points in the input domain. Additionally, we propose an in-processing technique to use fairness as an inductive bias by iteratively incorporating fairness counterexamples in the learning process. We have implemented these techniques in a tool called FETA. Empirical evaluation on real-world datasets indicates that FETA is not only able to guarantee fairness on-the-fly at prediction time but also is able to train accurate models exhibiting a much higher degree of individual fairness.
△ Less
Submitted 30 January, 2023; v1 submitted 1 June, 2022;
originally announced June 2022.
-
Collective Autoscaling for Cloud Microservices
Authors:
Vighnesh Sachidananda,
Anirudh Sivaraman
Abstract:
As cloud applications shift from monoliths to loosely coupled microservices, application developers must decide how many compute resources (e.g., number of replicated containers) to assign to each microservice within an application. This decision affects both (1) the dollar cost to the application developer and (2) the end-to-end latency perceived by the application user. Today, individual microse…
▽ More
As cloud applications shift from monoliths to loosely coupled microservices, application developers must decide how many compute resources (e.g., number of replicated containers) to assign to each microservice within an application. This decision affects both (1) the dollar cost to the application developer and (2) the end-to-end latency perceived by the application user. Today, individual microservices are autoscaled independently by adding VMs whenever per-microservice CPU or memory utilization crosses a configurable threshold. However, an application user's end-to-end latency consists of time spent on multiple microservices and each microservice might need a different number of VMs to achieve an overall end-to-end latency.
We present COLA, an autoscaler for microservice-based applications, which collectively allocates VMs to microservices with a global goal of minimizing dollar cost while kee** end-to-end application latency under a given target. Using 5 open-source applications, we compared COLA to several utilization and machine learning based autoscalers. We evaluate COLA across different compute settings on Google Kubernetes Engine (GKE) in which users manage compute resources, GKE standard, and a new mode of operation in which the cloud provider manages compute infrastructure, GKE Autopilot. COLA meets a desired median or tail latency target on 53 of 63 workloads where it provides a cost reduction of 19.3%, on average, over the next cheapest autoscaler. COLA is the most cost effective autoscaling policy for 48 of these 53 workloads. The cost savings from managing a cluster with COLA result in COLA paying for its training cost in a few days. On smaller applications, for which we can exhaustively search microservice configurations, we find that COLA is optimal for 90% of cases and near optimal otherwise.
△ Less
Submitted 7 August, 2022; v1 submitted 1 December, 2021;
originally announced December 2021.
-
Adapting Speech Separation to Real-World Meetings Using Mixture Invariant Training
Authors:
Aswin Sivaraman,
Scott Wisdom,
Hakan Erdogan,
John R. Hershey
Abstract:
The recently-proposed mixture invariant training (MixIT) is an unsupervised method for training single-channel sound separation models in the sense that it does not require ground-truth isolated reference sources. In this paper, we investigate using MixIT to adapt a separation model on real far-field overlap** reverberant and noisy speech data from the AMI Corpus. The models are tested on real A…
▽ More
The recently-proposed mixture invariant training (MixIT) is an unsupervised method for training single-channel sound separation models in the sense that it does not require ground-truth isolated reference sources. In this paper, we investigate using MixIT to adapt a separation model on real far-field overlap** reverberant and noisy speech data from the AMI Corpus. The models are tested on real AMI recordings containing overlap** speech, and are evaluated subjectively by human listeners. To objectively evaluate our models, we also devise a synthetic AMI test set. For human evaluations on real recordings, we also propose a modification of the standard MUSHRA protocol to handle imperfect reference signals, which we call MUSHIRA. Holding network architectures constant, we find that a fine-tuned semi-supervised model yields the largest SI-SNR improvement, PESQ scores, and human listening ratings across synthetic and real datasets, outperforming unadapted generalist models trained on orders of magnitude more data. Our results show that unsupervised learning through MixIT enables model adaptation on real-world unlabeled spontaneous speech recordings.
△ Less
Submitted 20 October, 2021;
originally announced October 2021.
-
Mining Idioms in the Wild
Authors:
Aishwarya Sivaraman,
Rui Abreu,
Andrew Scott,
Tobi Akomolede,
Satish Chandra
Abstract:
Existing code repositories contain numerous instances of code patterns that are idiomatic ways of accomplishing a particular programming task. Sometimes, the programming language in use supports specific operators or APIs that can express the same idiomatic imperative code much more succinctly. However, those code patterns linger in repositories because the developers may be unaware of the new API…
▽ More
Existing code repositories contain numerous instances of code patterns that are idiomatic ways of accomplishing a particular programming task. Sometimes, the programming language in use supports specific operators or APIs that can express the same idiomatic imperative code much more succinctly. However, those code patterns linger in repositories because the developers may be unaware of the new APIs or have not gotten around to them. Detection of idiomatic code can also point to the need for new APIs.
We share our experiences in mine idiomatic patterns from the Hack repo at Facebook. We found that existing techniques either cannot identify meaningful patterns from syntax trees or require test-suite-based dynamic analysis to incorporate semantic properties to mine useful patterns. The key insight of the approach proposed in this paper -- \emph{Jezero} -- is that semantic idioms from a large codebase can be learned from \emph{canonicalized} dataflow trees. We propose a scalable, lightweight static analysis-based approach to construct such a tree that is well suited to mine semantic idioms using nonparametric Bayesian methods.
Our experiments with Jezero on Hack code shows a clear advantage of adding canonicalized dataflow information to ASTs: \emph{Jezero} was significantly more effective than a baseline that did not have the dataflow augmentation in being able to effectively find refactoring opportunities from unannotated legacy code.
△ Less
Submitted 13 July, 2021;
originally announced July 2021.
-
Revelio: ML-Generated Debugging Queries for Distributed Systems
Authors:
Pradeep Dogga,
Karthik Narasimhan,
Anirudh Sivaraman,
Shiv Kumar Saini,
George Varghese,
Ravi Netravali
Abstract:
A major difficulty in debugging distributed systems lies in manually determining which of the many available debugging tools to use and how to query its logs. Our own study of a production debugging workflow confirms the magnitude of this burden. This paper explores whether a machine-learning model can assist developers in distributed systems debugging. We present Revelio, a debugging assistant wh…
▽ More
A major difficulty in debugging distributed systems lies in manually determining which of the many available debugging tools to use and how to query its logs. Our own study of a production debugging workflow confirms the magnitude of this burden. This paper explores whether a machine-learning model can assist developers in distributed systems debugging. We present Revelio, a debugging assistant which takes user reports and system logs as input, and outputs debugging queries that developers can use to find a bug's root cause. The key challenges lie in (1) combining inputs of different types (e.g., natural language reports and quantitative logs) and (2) generalizing to unseen faults. Revelio addresses these by employing deep neural networks to uniformly embed diverse input sources and potential queries into a high-dimensional vector space. In addition, it exploits observations from production systems to factorize query generation into two computationally and statistically simpler learning tasks. To evaluate Revelio, we built a testbed with multiple distributed applications and debugging tools. By injecting faults and training on logs and reports from 800 Mechanical Turkers, we show that Revelio includes the most helpful query in its predicted list of top-3 relevant queries 96% of the time. Our developer study confirms the utility of Revelio.
△ Less
Submitted 27 June, 2021;
originally announced June 2021.
-
Zero-Shot Personalized Speech Enhancement through Speaker-Informed Model Selection
Authors:
Aswin Sivaraman,
Minje Kim
Abstract:
This paper presents a novel zero-shot learning approach towards personalized speech enhancement through the use of a sparsely active ensemble model. Optimizing speech denoising systems towards a particular test-time speaker can improve performance and reduce run-time complexity. However, test-time model adaptation may be challenging if collecting data from the test-time speaker is not possible. To…
▽ More
This paper presents a novel zero-shot learning approach towards personalized speech enhancement through the use of a sparsely active ensemble model. Optimizing speech denoising systems towards a particular test-time speaker can improve performance and reduce run-time complexity. However, test-time model adaptation may be challenging if collecting data from the test-time speaker is not possible. To this end, we propose using an ensemble model wherein each specialist module denoises noisy utterances from a distinct partition of training set speakers. The gating module inexpensively estimates test-time speaker characteristics in the form of an embedding vector and selects the most appropriate specialist module for denoising the test signal. Grou** the training set speakers into non-overlap** semantically similar groups is non-trivial and ill-defined. To do this, we first train a Siamese network using noisy speech pairs to maximize or minimize the similarity of its output vectors depending on whether the utterances derive from the same speaker or not. Next, we perform k-means clustering on the latent space formed by the averaged embedding vectors per training set speaker. In this way, we designate speaker groups and train specialist modules optimized around partitions of the complete training set. Our experiments show that ensemble models made up of low-capacity specialists can outperform high-capacity generalist models with greater efficiency and improved adaptation towards unseen test-time speakers.
△ Less
Submitted 7 May, 2021;
originally announced May 2021.
-
Personalized Speech Enhancement through Self-Supervised Data Augmentation and Purification
Authors:
Aswin Sivaraman,
Sunwoo Kim,
Minje Kim
Abstract:
Training personalized speech enhancement models is innately a no-shot learning problem due to privacy constraints and limited access to noise-free speech from the target user. If there is an abundance of unlabeled noisy speech from the test-time user, a personalized speech enhancement model can be trained using self-supervised learning. One straightforward approach to model personalization is to u…
▽ More
Training personalized speech enhancement models is innately a no-shot learning problem due to privacy constraints and limited access to noise-free speech from the target user. If there is an abundance of unlabeled noisy speech from the test-time user, a personalized speech enhancement model can be trained using self-supervised learning. One straightforward approach to model personalization is to use the target speaker's noisy recordings as pseudo-sources. Then, a pseudo denoising model learns to remove injected training noises and recover the pseudo-sources. However, this approach is volatile as it depends on the quality of the pseudo-sources, which may be too noisy. As a remedy, we propose an improvement to the self-supervised approach through data purification. We first train an SNR predictor model to estimate the frame-by-frame SNR of the pseudo-sources. Then, the predictor's estimates are converted into weights which adjust the frame-by-frame contribution of the pseudo-sources towards training the personalized model. We empirically show that the proposed data purification step improves the usability of the speaker-specific noisy data in the context of personalized speech enhancement. Without relying on any clean speech recordings or speaker embeddings, our approach may be seen as privacy-preserving.
△ Less
Submitted 5 April, 2021;
originally announced April 2021.
-
Efficient Personalized Speech Enhancement through Self-Supervised Learning
Authors:
Aswin Sivaraman,
Minje Kim
Abstract:
This work presents self-supervised learning methods for develo** monaural speaker-specific (i.e., personalized) speech enhancement models. While generalist models must broadly address many speakers, specialist models can adapt their enhancement function towards a particular speaker's voice, expecting to solve a narrower problem. Hence, specialists are capable of achieving more optimal performanc…
▽ More
This work presents self-supervised learning methods for develo** monaural speaker-specific (i.e., personalized) speech enhancement models. While generalist models must broadly address many speakers, specialist models can adapt their enhancement function towards a particular speaker's voice, expecting to solve a narrower problem. Hence, specialists are capable of achieving more optimal performance in addition to reducing computational complexity. However, naive personalization methods can require clean speech from the target user, which is inconvenient to acquire, e.g., due to subpar recording conditions. To this end, we pose personalization as either a zero-shot task, in which no additional clean speech of the target speaker is used for training, or a few-shot learning task, in which the goal is to minimize the duration of the clean speech used for transfer learning. With this paper, we propose self-supervised learning methods as a solution to both zero- and few-shot personalization tasks. The proposed methods are designed to learn the personalized speech features from unlabeled data (i.e., in-the-wild noisy recordings from the target user) without knowing the corresponding clean sources. Our experiments investigate three different self-supervised learning mechanisms. The results show that self-supervised models achieve zero-shot and few-shot personalization using fewer model parameters and less clean data from the target user, achieving the data efficiency and model compression goals.
△ Less
Submitted 27 July, 2022; v1 submitted 5 April, 2021;
originally announced April 2021.
-
Detecting Extraneous Content in Podcasts
Authors:
Sravana Reddy,
Yongze Yu,
Aasish Pappu,
Aswin Sivaraman,
Rezvaneh Rezapour,
Rosie Jones
Abstract:
Podcast episodes often contain material extraneous to the main content, such as advertisements, interleaved within the audio and the written descriptions. We present classifiers that leverage both textual and listening patterns in order to detect such content in podcast descriptions and audio transcripts. We demonstrate that our models are effective by evaluating them on the downstream task of pod…
▽ More
Podcast episodes often contain material extraneous to the main content, such as advertisements, interleaved within the audio and the written descriptions. We present classifiers that leverage both textual and listening patterns in order to detect such content in podcast descriptions and audio transcripts. We demonstrate that our models are effective by evaluating them on the downstream task of podcast summarization and show that we can substantively improve ROUGE scores and reduce the extraneous content generated in the summaries.
△ Less
Submitted 3 March, 2021;
originally announced March 2021.
-
Synthesizing Safe and Efficient Kernel Extensions for Packet Processing
Authors:
Qiongwen Xu,
Michael D. Wong,
Tanvi Wagle,
Srinivas Narayana,
Anirudh Sivaraman
Abstract:
Extended Berkeley Packet Filter (BPF) has emerged as a powerful method to extend packet-processing functionality in the Linux operating system. BPF allows users to write code in high-level languages (like C or Rust) and execute them at specific hooks in the kernel, such as the network device driver. To ensure safe execution of a user-developed BPF program in kernel context, Linux uses an in-kernel…
▽ More
Extended Berkeley Packet Filter (BPF) has emerged as a powerful method to extend packet-processing functionality in the Linux operating system. BPF allows users to write code in high-level languages (like C or Rust) and execute them at specific hooks in the kernel, such as the network device driver. To ensure safe execution of a user-developed BPF program in kernel context, Linux uses an in-kernel static checker. The checker allows a program to execute only if it can prove that the program is crash-free, always accesses memory within safe bounds, and avoids leaking kernel data.
BPF programming is not easy. One, even modest-sized BPF programs are deemed too large to analyze and rejected by the kernel checker. Two, the kernel checker may incorrectly determine that a BPF program exhibits unsafe behaviors. Three, even small performance optimizations to BPF code (e.g., 5% gains) must be meticulously hand-crafted by expert developers. Traditional optimizing compilers for BPF are often inadequate since the kernel checker's safety constraints are incompatible with rule-based optimizations.
We present K2, a program-synthesis-based compiler that automatically optimizes BPF bytecode with formal correctness and safety guarantees. K2 produces code with 6--26% reduced size, 1.36%--55.03% lower average packet-processing latency, and 0--4.75% higher throughput (packets per second per core) relative to the best clang-compiled program, across benchmarks drawn from Cilium, Facebook, and the Linux kernel. K2 incorporates several domain-specific techniques to make synthesis practical by accelerating equivalence-checking of BPF programs by 6 orders of magnitude.
△ Less
Submitted 14 July, 2021; v1 submitted 26 February, 2021;
originally announced March 2021.
-
The case for model-driven interpretability of delay-based congestion control protocols
Authors:
Muhammad Khan,
Yasir Zaki,
Shiva Iyer,
Talal Ahamd,
Thomas Pötsch,
Jay Chen,
Anirudh Sivaraman,
Lakshmi Subramanian
Abstract:
Analyzing and interpreting the exact behavior of new delay-based congestion control protocols with complex non-linear control loops is exceptionally difficult in highly variable networks such as cellular networks. This paper proposes a Model-Driven Interpretability (MDI) congestion control framework, which derives a model version of a delay-based protocol by simplifying a congestion control protoc…
▽ More
Analyzing and interpreting the exact behavior of new delay-based congestion control protocols with complex non-linear control loops is exceptionally difficult in highly variable networks such as cellular networks. This paper proposes a Model-Driven Interpretability (MDI) congestion control framework, which derives a model version of a delay-based protocol by simplifying a congestion control protocol's response into a guided random walk over a two-dimensional Markov model. We demonstrate the case for the MDI framework by using MDI to analyze and interpret the behavior of two delay-based protocols over cellular channels: Verus and Copa. Our results show a successful approximation of throughput and delay characteristics of the protocols' model versions across variable network conditions. The learned model of a protocol provides key insights into an algorithm's convergence properties.
△ Less
Submitted 9 February, 2021;
originally announced February 2021.
-
Isolation mechanisms for high-speed packet-processing pipelines
Authors:
Tao Wang,
Xiangrui Yang,
Gianni Antichi,
Anirudh Sivaraman,
Aurojit Panda
Abstract:
Data-plane programmability is now mainstream. As we find more use cases, deployments need to be able to run multiple packet-processing modules in a single device. These are likely to be developed by independent teams, either within the same organization or from multiple organizations. Therefore, we need isolation mechanisms to ensure that modules on the same device do not interfere with each other…
▽ More
Data-plane programmability is now mainstream. As we find more use cases, deployments need to be able to run multiple packet-processing modules in a single device. These are likely to be developed by independent teams, either within the same organization or from multiple organizations. Therefore, we need isolation mechanisms to ensure that modules on the same device do not interfere with each other. This paper presents Menshen, an extension of the Reconfigurable Match Tables (RMT) pipeline that enforces isolation between different packet-processing modules. Menshen is comprised of a set of lightweight hardware primitives and an extension to the open source P4-16 reference compiler that act in conjunction to meet this goal. We have prototyped Menshen on two FPGA platforms (NetFPGA and Corundum). We show that our design provides isolation, and allows new modules to be loaded without impacting the ones already running. Finally, we demonstrate that feasibility of implementing Menshen on ASICs by using the FreePDK45nm technology library and the Synopsys DC synthesis software, showing that our design meets timing at a 1GHz clock frequency and needs approximately 6% additional chip area. We have open sourced the code for Menshen's hardware and software at https://isolation.quest/.
△ Less
Submitted 2 March, 2022; v1 submitted 29 January, 2021;
originally announced January 2021.
-
Self-Supervised Learning from Contrastive Mixtures for Personalized Speech Enhancement
Authors:
Aswin Sivaraman,
Minje Kim
Abstract:
This work explores how self-supervised learning can be universally used to discover speaker-specific features towards enabling personalized speech enhancement models. We specifically address the few-shot learning scenario where access to cleaning recordings of a test-time speaker is limited to a few seconds, but noisy recordings of the speaker are abundant. We develop a simple contrastive learning…
▽ More
This work explores how self-supervised learning can be universally used to discover speaker-specific features towards enabling personalized speech enhancement models. We specifically address the few-shot learning scenario where access to cleaning recordings of a test-time speaker is limited to a few seconds, but noisy recordings of the speaker are abundant. We develop a simple contrastive learning procedure which treats the abundant noisy data as makeshift training targets through pairwise noise injection: the model is pretrained to maximize agreement between pairs of differently deformed identical utterances and to minimize agreement between pairs of similarly deformed nonidentical utterances. Our experiments compare the proposed pretraining approach with two baseline alternatives: speaker-agnostic fully-supervised pretraining, and speaker-specific self-supervised pretraining without contrastive loss terms. Of all three approaches, the proposed method using contrastive mixtures is found to be most robust to model compression (using 85% fewer parameters) and reduced clean speech (requiring only 3 seconds).
△ Less
Submitted 9 August, 2022; v1 submitted 6 November, 2020;
originally announced November 2020.
-
Counterexample-Guided Learning of Monotonic Neural Networks
Authors:
Aishwarya Sivaraman,
Golnoosh Farnadi,
Todd Millstein,
Guy Van den Broeck
Abstract:
The widespread adoption of deep learning is often attributed to its automatic feature construction with minimal inductive bias. However, in many real-world tasks, the learned function is intended to satisfy domain-specific constraints. We focus on monotonicity constraints, which are common and require that the function's output increases with increasing values of specific input features. We develo…
▽ More
The widespread adoption of deep learning is often attributed to its automatic feature construction with minimal inductive bias. However, in many real-world tasks, the learned function is intended to satisfy domain-specific constraints. We focus on monotonicity constraints, which are common and require that the function's output increases with increasing values of specific input features. We develop a counterexample-guided technique to provably enforce monotonicity constraints at prediction time. Additionally, we propose a technique to use monotonicity as an inductive bias for deep learning. It works by iteratively incorporating monotonicity counterexamples in the learning process. Contrary to prior work in monotonic learning, we target general ReLU neural networks and do not further restrict the hypothesis space. We have implemented these techniques in a tool called COMET. Experiments on real-world datasets demonstrate that our approach achieves state-of-the-art results compared to existing monotonic learners, and can improve the model quality compared to those that were trained without taking monotonicity constraints into account.
△ Less
Submitted 15 June, 2020;
originally announced June 2020.
-
Gauntlet: Finding Bugs in Compilers for Programmable Packet Processing
Authors:
Fabian Ruffy,
Tao Wang,
Anirudh Sivaraman
Abstract:
Programmable packet-processing devices such as programmable switches and network interface cards are becoming mainstream. These devices are configured in a domain-specific language such as P4, using a compiler to translate packet-processing programs into instructions for different targets. As networks with programmable devices become widespread, it is critical that these compilers be dependable.…
▽ More
Programmable packet-processing devices such as programmable switches and network interface cards are becoming mainstream. These devices are configured in a domain-specific language such as P4, using a compiler to translate packet-processing programs into instructions for different targets. As networks with programmable devices become widespread, it is critical that these compilers be dependable.
This paper considers the problem of finding bugs in compilers for packet processing in the context of P4-16. We introduce domain-specific techniques to induce both abnormal termination of the compiler (crash bugs) and miscompilation (semantic bugs). We apply these techniques to (1) the open-source P4 compiler (P4C) infrastructure, which serves as a common base for different P4 back ends; (2) the P4 back end for the P4 reference software switch; and (3) the P4 back end for the Barefoot Tofino switch.
Across the 3 platforms, over 8 months of bug finding, our tool Gauntlet detected 96 new and distinct bugs (62 crash and 34 semantic), which we confirmed with the respective compiler developers. 54 have been fixed (31 crash and 23 semantic); the remaining have been assigned to a developer. Our bug-finding efforts also led to 6 P4 specification changes. We have open sourced Gauntlet at p4gauntlet.github.io and it now runs within P4C's continuous integration pipeline.
△ Less
Submitted 25 October, 2020; v1 submitted 1 June, 2020;
originally announced June 2020.
-
Sparse Mixture of Local Experts for Efficient Speech Enhancement
Authors:
Aswin Sivaraman,
Minje Kim
Abstract:
In this paper, we investigate a deep learning approach for speech denoising through an efficient ensemble of specialist neural networks. By splitting up the speech denoising task into non-overlap** subproblems and introducing a classifier, we are able to improve denoising performance while also reducing computational complexity. More specifically, the proposed model incorporates a gating network…
▽ More
In this paper, we investigate a deep learning approach for speech denoising through an efficient ensemble of specialist neural networks. By splitting up the speech denoising task into non-overlap** subproblems and introducing a classifier, we are able to improve denoising performance while also reducing computational complexity. More specifically, the proposed model incorporates a gating network which assigns noisy speech signals to an appropriate specialist network based on either speech degradation level or speaker gender. In our experiments, a baseline recurrent network is compared against an ensemble of similarly-designed smaller recurrent networks regulated by the auxiliary gating network. Using stochastically generated batches from a large noisy speech corpus, the proposed model learns to estimate a time-frequency masking matrix based on the magnitude spectrogram of an input mixture signal. Both baseline and specialist networks are trained to estimate the ideal ratio mask, while the gating network is trained to perform subproblem classification. Our findings demonstrate that a fine-tuned ensemble network is able to exceed the speech denoising capabilities of a generalist network, doing so with fewer model parameters.
△ Less
Submitted 16 May, 2020;
originally announced May 2020.
-
Testing Compilers for Programmable Switches Through Switch Hardware Simulation
Authors:
Michael D. Wong,
Aatish Kishan Varma,
Anirudh Sivaraman
Abstract:
Programmable switches have emerged as powerful and flexible alternatives to fixed-function forwarding devices. But because of the unique hardware constraints of network switches, the design and implementation of compilers targeting these devices is tedious and error prone. Despite the important role that compilers play in software development, there is a dearth of tools for testing compilers for p…
▽ More
Programmable switches have emerged as powerful and flexible alternatives to fixed-function forwarding devices. But because of the unique hardware constraints of network switches, the design and implementation of compilers targeting these devices is tedious and error prone. Despite the important role that compilers play in software development, there is a dearth of tools for testing compilers for programmable network devices. We present Druzhba, a programmable switch simulator used for testing compilers targeting programmable packet-processing substrates. We show that we can model the low-level behavior of a switch's programmable hardware. We further show how our machine model can be used by compiler developers to target Druzhba as a compiler backend. Generated machine code programs are fed into Druzhba and tested using a fuzzing-based approach that allows compiler developers to test the correctness of their compilers. Using a program-synthesis-based compiler as a case study, we demonstrate how Druzhba has been successful in testing compiler-generated machine code for our simulated switch pipeline instruction set.
△ Less
Submitted 27 October, 2020; v1 submitted 5 May, 2020;
originally announced May 2020.
-
Deep Autotuner: A Data-Driven Approach to Natural-Sounding Pitch Correction for Singing Voice in Karaoke Performances
Authors:
Sanna Wager,
George Tzanetakis,
Cheng-i Wang,
Lijiang Guo,
Aswin Sivaraman,
Minje Kim
Abstract:
We describe a machine-learning approach to pitch correcting a solo singing performance in a karaoke setting, where the solo voice and accompaniment are on separate tracks. The proposed approach addresses the situation where no musical score of the vocals nor the accompaniment exists: It predicts the amount of correction from the relationship between the spectral contents of the vocal and accompani…
▽ More
We describe a machine-learning approach to pitch correcting a solo singing performance in a karaoke setting, where the solo voice and accompaniment are on separate tracks. The proposed approach addresses the situation where no musical score of the vocals nor the accompaniment exists: It predicts the amount of correction from the relationship between the spectral contents of the vocal and accompaniment tracks. Hence, the pitch shift in cents suggested by the model can be used to make the voice sound in tune with the accompaniment. This approach differs from commercially used automatic pitch correction systems, where notes in the vocal tracks are shifted to be centered around notes in a user-defined score or mapped to the closest pitch among the twelve equal-tempered scale degrees. We train the model using a dataset of 4,702 amateur karaoke performances selected for good intonation. We present a Convolutional Gated Recurrent Unit (CGRU) model to accomplish this task. This method can be extended into unsupervised pitch correction of a vocal performance, popularly referred to as autotuning.
△ Less
Submitted 3 February, 2019;
originally announced February 2019.
-
Active Inductive Logic Programming for Code Search
Authors:
Aishwarya Sivaraman,
Tianyi Zhang,
Guy Van den Broeck,
Miryung Kim
Abstract:
Modern search techniques either cannot efficiently incorporate human feedback to refine search results or to express structural or semantic properties of desired code. The key insight of our interactive code search technique ALICE is that user feedback could be actively incorporated to allow users to easily express and refine search queries. We design a query language to model the structure and se…
▽ More
Modern search techniques either cannot efficiently incorporate human feedback to refine search results or to express structural or semantic properties of desired code. The key insight of our interactive code search technique ALICE is that user feedback could be actively incorporated to allow users to easily express and refine search queries. We design a query language to model the structure and semantics of code as logic facts. Given a code example with user annotations, ALICE automatically extracts a logic query from features that are tagged as important. Users can refine the search query by labeling one or more examples as desired (positive) or irrelevant (negative). ALICE then infers a new logic query that separates the positives from negative examples via active inductive logic programming. Our comprehensive and systematic simulation experiment shows that ALICE removes a large number of false positives quickly by actively incorporating user feedback. Its search algorithm is also robust to noise and user labeling mistakes. Our choice of leveraging both positive and negative examples and the nested containment structure of selected code is effective in refining search queries. Compared with an existing technique, Critics, ALICE does not require a user to manually construct a search pattern and yet achieves comparable precision and recall with fewer search iterations on average. A case study with users shows that ALICE is easy to use and helps express complex code patterns.
△ Less
Submitted 23 February, 2019; v1 submitted 13 December, 2018;
originally announced December 2018.
-
A Data-Driven Approach to Smooth Pitch Correction for Singing Voice in Pop Music
Authors:
Sanna Wager,
Lijiang Guo,
Aswin Sivaraman,
Minje Kim
Abstract:
In this paper, we present a machine-learning approach to pitch correction for voice in a karaoke setting, where the vocals and accompaniment are on separate tracks and time-aligned. The network takes as input the time-frequency representation of the two tracks and predicts the amount of pitch-shifting in cents required to make the voice sound in-tune with the accompaniment. It is trained on exampl…
▽ More
In this paper, we present a machine-learning approach to pitch correction for voice in a karaoke setting, where the vocals and accompaniment are on separate tracks and time-aligned. The network takes as input the time-frequency representation of the two tracks and predicts the amount of pitch-shifting in cents required to make the voice sound in-tune with the accompaniment. It is trained on examples of semi-professional singing. The proposed approach differs from existing real-time pitch correction methods by replacing pitch tracking and map** to a discrete set of notes---for example, the twelve classes of the equal-tempered scale---with learning a correction that is continuous both in frequency and in time directly from the harmonics of the vocal and accompaniment tracks. A Recurrent Neural Network (RNN) model provides a correction that takes context into account, preserving expressive pitch bending and vibrato. This method can be extended into unsupervised pitch correction of a vocal performance---popularly referred to as autotuning.
△ Less
Submitted 7 May, 2018;
originally announced May 2018.
-
On Psychoacoustically Weighted Cost Functions Towards Resource-Efficient Deep Neural Networks for Speech Denoising
Authors:
Kai Zhen,
Aswin Sivaraman,
Jongmo Sung,
Minje Kim
Abstract:
We present a psychoacoustically enhanced cost function to balance network complexity and perceptual performance of deep neural networks for speech denoising. While training the network, we utilize perceptual weights added to the ordinary mean-squared error to emphasize contribution from frequency bins which are most audible while ignoring error from inaudible bins. To generate the weights, we empl…
▽ More
We present a psychoacoustically enhanced cost function to balance network complexity and perceptual performance of deep neural networks for speech denoising. While training the network, we utilize perceptual weights added to the ordinary mean-squared error to emphasize contribution from frequency bins which are most audible while ignoring error from inaudible bins. To generate the weights, we employ psychoacoustic models to compute the global masking threshold from the clean speech spectra. We then evaluate the speech denoising performance of our perceptually guided neural network by using both objective and perceptual sound quality metrics, testing on various network structures ranging from shallow and narrow ones to deep and wide ones. The experimental results showcase our method as a valid approach for infusing perceptual significance to deep neural network operations. In particular, the more perceptually sensible enhancement in performance seen by simple neural network topologies proves that the proposed method can lead to resource-efficient speech denoising implementations in small devices without degrading the perceived signal fidelity.
△ Less
Submitted 29 January, 2018;
originally announced January 2018.
-
Programmable Packet Scheduling
Authors:
Anirudh Sivaraman,
Suvinay Subramanian,
Anurag Agrawal,
Sharad Chole,
Shang-Tse Chuang,
Tom Edsall,
Mohammad Alizadeh,
Sachin Katti,
Nick McKeown,
Hari Balakrishnan
Abstract:
Switches today provide a small set of scheduling algorithms. While we can tweak scheduling parameters, we cannot modify algorithmic logic, or add a completely new algorithm, after the switch has been designed. This paper presents a design for a programmable packet scheduler, which allows scheduling algorithms---potentially algorithms that are unknown today---to be programmed into a switch without…
▽ More
Switches today provide a small set of scheduling algorithms. While we can tweak scheduling parameters, we cannot modify algorithmic logic, or add a completely new algorithm, after the switch has been designed. This paper presents a design for a programmable packet scheduler, which allows scheduling algorithms---potentially algorithms that are unknown today---to be programmed into a switch without requiring hardware redesign.
Our design builds on the observation that scheduling algorithms make two decisions: in what order to schedule packets and when to schedule them. Further, in many scheduling algorithms these decisions can be made when packets are enqueued. We leverage this observation to build a programmable scheduler using a single abstraction: the push-in first-out queue (PIFO), a priority queue that maintains the scheduling order and time for such algorithms.
We show that a programmable scheduler using PIFOs lets us program a wide variety of scheduling algorithms. We present a detailed hardware design for this scheduler for a 64-port 10 Gbit/s shared-memory switch with <4% chip area overhead on a 16-nm standard-cell library. Our design lets us program many sophisticated algorithms, such as a 5-level hierarchical scheduler with programmable scheduling algorithms at each level.
△ Less
Submitted 18 February, 2016;
originally announced February 2016.
-
Packet Transactions: High-level Programming for Line-Rate Switches
Authors:
Anirudh Sivaraman,
Mihai Budiu,
Alvin Cheung,
Changhoon Kim,
Steve Licking,
George Varghese,
Hari Balakrishnan,
Mohammad Alizadeh,
Nick McKeown
Abstract:
Many algorithms for congestion control, scheduling, network measurement, active queue management, security, and load balancing require custom processing of packets as they traverse the data plane of a network switch. To run at line rate, these data-plane algorithms must be in hardware. With today's switch hardware, algorithms cannot be changed, nor new algorithms installed, after a switch has been…
▽ More
Many algorithms for congestion control, scheduling, network measurement, active queue management, security, and load balancing require custom processing of packets as they traverse the data plane of a network switch. To run at line rate, these data-plane algorithms must be in hardware. With today's switch hardware, algorithms cannot be changed, nor new algorithms installed, after a switch has been built.
This paper shows how to program data-plane algorithms in a high-level language and compile those programs into low-level microcode that can run on emerging programmable line-rate switching chipsets. The key challenge is that these algorithms create and modify algorithmic state. The key idea to achieve line-rate programmability for stateful algorithms is the notion of a packet transaction : a sequential code block that is atomic and isolated from other such code blocks. We have developed this idea in Domino, a C-like imperative language to express data-plane algorithms. We show with many examples that Domino provides a convenient and natural way to express sophisticated data-plane algorithms, and show that these algorithms can be run at line rate with modest estimated die-area overhead.
△ Less
Submitted 29 January, 2016; v1 submitted 15 December, 2015;
originally announced December 2015.