-
A Design Space for Intelligent and Interactive Writing Assistants
Authors:
Mina Lee,
Katy Ilonka Gero,
John Joon Young Chung,
Simon Buckingham Shum,
Vipul Raheja,
Hua Shen,
Subhashini Venugopalan,
Thiemo Wambsganss,
David Zhou,
Emad A. Alghamdi,
Tal August,
Avinash Bhat,
Madiha Zahrah Choksi,
Senjuti Dutta,
** L. C. Guo,
Md Naimul Hoque,
Yewon Kim,
Simon Knight,
Seyed Parsa Neshaei,
Agnia Sergeyuk,
Antonette Shibani,
Disha Shrivastava,
Lila Shroff,
Jessi Stark,
Sarah Sterman
, et al. (11 additional authors not shown)
Abstract:
In our era of rapid technological advancement, the research landscape for writing assistants has become increasingly fragmented across various research communities. We seek to address this challenge by proposing a design space as a structured way to examine and explore the multidimensional space of intelligent and interactive writing assistants. Through a large community collaboration, we explore…
▽ More
In our era of rapid technological advancement, the research landscape for writing assistants has become increasingly fragmented across various research communities. We seek to address this challenge by proposing a design space as a structured way to examine and explore the multidimensional space of intelligent and interactive writing assistants. Through a large community collaboration, we explore five aspects of writing assistants: task, user, technology, interaction, and ecosystem. Within each aspect, we define dimensions (i.e., fundamental components of an aspect) and codes (i.e., potential options for each dimension) by systematically reviewing 115 papers. Our design space aims to offer researchers and designers a practical tool to navigate, comprehend, and compare the various possibilities of writing assistants, and aid in the envisioning and design of new writing assistants.
△ Less
Submitted 26 March, 2024; v1 submitted 21 March, 2024;
originally announced March 2024.
-
Quantum Many-Body Physics Calculations with Large Language Models
Authors:
Haining Pan,
Nayantara Mudur,
Will Taranto,
Maria Tikhanovskaya,
Subhashini Venugopalan,
Yasaman Bahri,
Michael P. Brenner,
Eun-Ah Kim
Abstract:
Large language models (LLMs) have demonstrated an unprecedented ability to perform complex tasks in multiple domains, including mathematical and scientific reasoning. We demonstrate that with carefully designed prompts, LLMs can accurately carry out key calculations in research papers in theoretical physics. We focus on a broadly used approximation method in quantum physics: the Hartree-Fock metho…
▽ More
Large language models (LLMs) have demonstrated an unprecedented ability to perform complex tasks in multiple domains, including mathematical and scientific reasoning. We demonstrate that with carefully designed prompts, LLMs can accurately carry out key calculations in research papers in theoretical physics. We focus on a broadly used approximation method in quantum physics: the Hartree-Fock method, requiring an analytic multi-step calculation deriving approximate Hamiltonian and corresponding self-consistency equations. To carry out the calculations using LLMs, we design multi-step prompt templates that break down the analytic calculation into standardized steps with placeholders for problem-specific information. We evaluate GPT-4's performance in executing the calculation for 15 research papers from the past decade, demonstrating that, with correction of intermediate steps, it can correctly derive the final Hartree-Fock Hamiltonian in 13 cases and makes minor errors in 2 cases. Aggregating across all research papers, we find an average score of 87.5 (out of 100) on the execution of individual calculation steps. Overall, the requisite skill for doing these calculations is at the graduate level in quantum condensed matter theory. We further use LLMs to mitigate the two primary bottlenecks in this evaluation process: (i) extracting information from papers to fill in templates and (ii) automatic scoring of the calculation steps, demonstrating good results in both cases. The strong performance is the first step for develo** algorithms that automatically explore theoretical hypotheses at an unprecedented scale.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
Parameter Efficient Tuning Allows Scalable Personalization of LLMs for Text Entry: A Case Study on Abbreviation Expansion
Authors:
Katrin Tomanek,
Shanqing Cai,
Subhashini Venugopalan
Abstract:
Abbreviation expansion is a strategy used to speed up communication by limiting the amount of ty** and using a language model to suggest expansions. Here we look at personalizing a Large Language Model's (LLM) suggestions based on prior conversations to enhance the relevance of predictions, particularly when the user data is small (~1000 samples). Specifically, we compare fine-tuning, prompt-tun…
▽ More
Abbreviation expansion is a strategy used to speed up communication by limiting the amount of ty** and using a language model to suggest expansions. Here we look at personalizing a Large Language Model's (LLM) suggestions based on prior conversations to enhance the relevance of predictions, particularly when the user data is small (~1000 samples). Specifically, we compare fine-tuning, prompt-tuning, and retrieval augmented generation of expanded text suggestions for abbreviated inputs. Our case study with a deployed 8B parameter LLM on a real user living with ALS, and experiments on movie character personalization indicates that (1) customization may be necessary in some scenarios and prompt-tuning generalizes well to those, (2) fine-tuning on in-domain data (with as few as 600 samples) still shows some gains, however (3) retrieval augmented few-shot selection also outperforms fine-tuning. (4) Parameter efficient tuning allows for efficient and scalable personalization. For prompt-tuning, we also find that initializing the learned "soft-prompts" to user relevant concept tokens leads to higher accuracy than random initialization.
△ Less
Submitted 21 December, 2023;
originally announced December 2023.
-
Using Large Language Models to Accelerate Communication for Users with Severe Motor Impairments
Authors:
Shanqing Cai,
Subhashini Venugopalan,
Katie Seaver,
Xiang Xiao,
Katrin Tomanek,
Sri Jalasutram,
Meredith Ringel Morris,
Shaun Kane,
Ajit Narayanan,
Robert L. MacDonald,
Emily Kornman,
Daniel Vance,
Blair Casey,
Steve M. Gleason,
Philip Q. Nelson,
Michael P. Brenner
Abstract:
Finding ways to accelerate text input for individuals with profound motor impairments has been a long-standing area of research. Closing the speed gap for augmentative and alternative communication (AAC) devices such as eye-tracking keyboards is important for improving the quality of life for such individuals. Recent advances in neural networks of natural language pose new opportunities for re-thi…
▽ More
Finding ways to accelerate text input for individuals with profound motor impairments has been a long-standing area of research. Closing the speed gap for augmentative and alternative communication (AAC) devices such as eye-tracking keyboards is important for improving the quality of life for such individuals. Recent advances in neural networks of natural language pose new opportunities for re-thinking strategies and user interfaces for enhanced text-entry for AAC users. In this paper, we present SpeakFaster, consisting of large language models (LLMs) and a co-designed user interface for text entry in a highly-abbreviated form, allowing saving 57% more motor actions than traditional predictive keyboards in offline simulation. A pilot study with 19 non-AAC participants ty** on a mobile device by hand demonstrated gains in motor savings in line with the offline simulation, while introducing relatively small effects on overall ty** speed. Lab and field testing on two eye-gaze ty** users with amyotrophic lateral sclerosis (ALS) demonstrated text-entry rates 29-60% faster than traditional baselines, due to significant saving of expensive keystrokes achieved through phrase and word predictions from context-aware LLMs. These findings provide a strong foundation for further exploration of substantially-accelerated text communication for motor-impaired users and demonstrate a direction for applying LLMs to text-based user interfaces.
△ Less
Submitted 3 December, 2023;
originally announced December 2023.
-
Accurate Prediction of Experimental Band Gaps from Large Language Model-Based Data Extraction
Authors:
Samuel J. Yang,
Shutong Li,
Subhashini Venugopalan,
Vahe Tshitoyan,
Muratahan Aykol,
Amil Merchant,
Ekin Dogus Cubuk,
Gowoon Cheon
Abstract:
Machine learning is transforming materials discovery by providing rapid predictions of material properties, which enables large-scale screening for target materials. However, such models require training data. While automated data extraction from scientific literature has potential, current auto-generated datasets often lack sufficient accuracy and critical structural and processing details of mat…
▽ More
Machine learning is transforming materials discovery by providing rapid predictions of material properties, which enables large-scale screening for target materials. However, such models require training data. While automated data extraction from scientific literature has potential, current auto-generated datasets often lack sufficient accuracy and critical structural and processing details of materials that influence the properties. Using band gap as an example, we demonstrate Large language model (LLM)-prompt-based extraction yields an order of magnitude lower error rate. Combined with additional prompts to select a subset of experimentally measured properties from pure, single-crystalline bulk materials, this results in an automatically extracted dataset that's larger and more diverse than the largest existing human-curated database of experimental band gaps. Compared to the existing human-curated database, we show the model trained on our extracted database achieves a 19% reduction in the mean absolute error of predicted band gaps. Finally, we demonstrate that LLMs are able to train models predicting band gap on the extracted data, achieving an automated pipeline of data extraction to materials property prediction.
△ Less
Submitted 22 November, 2023;
originally announced November 2023.
-
Measuring Stakeholder Agreement and Stability in a Decentralised Organisation
Authors:
Sarad Venugopalan,
Heiko Aydt
Abstract:
A decentralised organisation (DO) is a multi-stakeholder institution where decision making is assigned to various levels of the organisation. Decentralised stakeholders play an important role in the governance of a decentralised organisation. The ability to measure DO stability will help monitor the health of the organisation and acts as an early warning system for disagreement and group exit, lea…
▽ More
A decentralised organisation (DO) is a multi-stakeholder institution where decision making is assigned to various levels of the organisation. Decentralised stakeholders play an important role in the governance of a decentralised organisation. The ability to measure DO stability will help monitor the health of the organisation and acts as an early warning system for disagreement and group exit, leading to its destabilisation/collapse. For example, blockchain hard forks. We propose the organisational tension quadrilateral to study agreement between stakeholders and build a tool based on voting data (information as vote choices) to measure its stability. The stakeholders are permitted to vote their choice into an electronic ballot box. Here, each vote choice represents a measure of agreement. When voting ends, this information is aggregated and used to build a metric for DO stability. To the best of our knowledge, there are no similar tools available to measure DO stability.
△ Less
Submitted 14 April, 2023;
originally announced April 2023.
-
Incentivising Building Data Availability and Accessibility Using Tokenized Data Assets
Authors:
Sarad Venugopalan,
Heiko Aydt
Abstract:
Smart cities are data driven and collect data from a variety of sources. Certain types of data such as building data is under-represented and remains harder to find despite its value. Our goal is to incentivise the stakeholders to make building data easier to avail by turning it into an asset. We use tokenized building data assets on a blockchain to improve data accessibility. This is achieved by…
▽ More
Smart cities are data driven and collect data from a variety of sources. Certain types of data such as building data is under-represented and remains harder to find despite its value. Our goal is to incentivise the stakeholders to make building data easier to avail by turning it into an asset. We use tokenized building data assets on a blockchain to improve data accessibility. This is achieved by connecting building data owners with the consumers of building information via tokens (fungible and non-fungible), which serves the purpose of coordinating the activities of the built ecosystem. Further, we present our system architecture designed to sustain the economic incentives for interested parties and individuals.
△ Less
Submitted 14 April, 2023;
originally announced April 2023.
-
Speech Intelligibility Classifiers from 550k Disordered Speech Samples
Authors:
Subhashini Venugopalan,
Jimmy Tobin,
Samuel J. Yang,
Katie Seaver,
Richard J. N. Cave,
Pan-Pan Jiang,
Neil Zeghidour,
Rus Heywood,
Jordan Green,
Michael P. Brenner
Abstract:
We developed dysarthric speech intelligibility classifiers on 551,176 disordered speech samples contributed by a diverse set of 468 speakers, with a range of self-reported speaking disorders and rated for their overall intelligibility on a five-point scale. We trained three models following different deep learning approaches and evaluated them on ~94K utterances from 100 speakers. We further found…
▽ More
We developed dysarthric speech intelligibility classifiers on 551,176 disordered speech samples contributed by a diverse set of 468 speakers, with a range of self-reported speaking disorders and rated for their overall intelligibility on a five-point scale. We trained three models following different deep learning approaches and evaluated them on ~94K utterances from 100 speakers. We further found the models to generalize well (without further training) on the TORGO database (100% accuracy), UASpeech (0.93 correlation), ALS-TDI PMP (0.81 AUC) datasets as well as on a dataset of realistic unprompted speech we gathered (106 dysarthric and 76 control speakers,~2300 samples).
△ Less
Submitted 15 March, 2023; v1 submitted 13 March, 2023;
originally announced March 2023.
-
Clinical BERTScore: An Improved Measure of Automatic Speech Recognition Performance in Clinical Settings
Authors:
Joel Shor,
Ruyue Agnes Bi,
Subhashini Venugopalan,
Steven Ibara,
Roman Goldenberg,
Ehud Rivlin
Abstract:
Automatic Speech Recognition (ASR) in medical contexts has the potential to save time, cut costs, increase report accuracy, and reduce physician burnout. However, the healthcare industry has been slower to adopt this technology, in part due to the importance of avoiding medically-relevant transcription mistakes. In this work, we present the Clinical BERTScore (CBERTScore), an ASR metric that penal…
▽ More
Automatic Speech Recognition (ASR) in medical contexts has the potential to save time, cut costs, increase report accuracy, and reduce physician burnout. However, the healthcare industry has been slower to adopt this technology, in part due to the importance of avoiding medically-relevant transcription mistakes. In this work, we present the Clinical BERTScore (CBERTScore), an ASR metric that penalizes clinically-relevant mistakes more than others. We demonstrate that this metric more closely aligns with clinician preferences on medical sentences as compared to other metrics (WER, BLUE, METEOR, etc), sometimes by wide margins. We collect a benchmark of 18 clinician preferences on 149 realistic medical sentences called the Clinician Transcript Preference benchmark (CTP) and make it publicly available for the community to further develop clinically-aware ASR metrics. To our knowledge, this is the first public dataset of its kind. We demonstrate that CBERTScore more closely matches what clinicians prefer.
△ Less
Submitted 28 April, 2023; v1 submitted 10 March, 2023;
originally announced March 2023.
-
Dance of the DAOs: Building Data Assets as a Use Case
Authors:
Sarad Venugopalan,
Heiko Aydt
Abstract:
Decentralised Autonomous Organisations (DAOs) have recently piqued the interest of participants from diverse backgrounds, including business owners, engineers, individual and institutional investors. In part, the promised autonomy (less rigid structure and more voice) in decision making along with ease of market access, has resulted in its participants pouring in their time and economic resources.…
▽ More
Decentralised Autonomous Organisations (DAOs) have recently piqued the interest of participants from diverse backgrounds, including business owners, engineers, individual and institutional investors. In part, the promised autonomy (less rigid structure and more voice) in decision making along with ease of market access, has resulted in its participants pouring in their time and economic resources. In a DAO, governance is typically enacted via posting proposals and collectively voting on it. The winning proposals are then implemented. However, governance alone may be insufficient, when its participants economic incentives are misaligned. Governance and tokenomics need to work in tandem to ensure business stability. We present a case study on an example building data asset from the construction industry and present its tokenomics. We show its working, both as a caretaker and strategic DAO, to illustrate its effects on governance and DAO stability. The case study serves as an example for participants to decide whether their DAO tokenomics are aligned with participation incentives. Finally, we propose the DAO tension quadrilateral to study DAO stability and build a tool to measure agreement among its participants.
△ Less
Submitted 14 January, 2023;
originally announced January 2023.
-
Improving Confidentiality for NFT Referenced Data Stores
Authors:
Sarad Venugopalan,
Heiko Aydt
Abstract:
A non-fungible token (NFT) references a data store location, typically, using a URL or another unique identifier. At the minimum, a NFT is expected to guarantee ownership and control over the tokenised asset. However, information stored on a third party data store may be copied and stolen. We propose a solution to give control back to the information owner by storing encrypted content on the data…
▽ More
A non-fungible token (NFT) references a data store location, typically, using a URL or another unique identifier. At the minimum, a NFT is expected to guarantee ownership and control over the tokenised asset. However, information stored on a third party data store may be copied and stolen. We propose a solution to give control back to the information owner by storing encrypted content on the data store and providing additional security against hacks and zero day exploits. The content on our data store is never decrypted or returned to its owner for decryption during rekeying. Also, the key size in our protocol does not increase with each rekeying. With this, we reduce the synchronisation steps and maintain a bounded key size.
△ Less
Submitted 14 January, 2023;
originally announced January 2023.
-
Assessing ASR Model Quality on Disordered Speech using BERTScore
Authors:
Jimmy Tobin,
Qisheng Li,
Subhashini Venugopalan,
Katie Seaver,
Richard Cave,
Katrin Tomanek
Abstract:
Word Error Rate (WER) is the primary metric used to assess automatic speech recognition (ASR) model quality. It has been shown that ASR models tend to have much higher WER on speakers with speech impairments than typical English speakers. It is hard to determine if models can be be useful at such high error rates. This study investigates the use of BERTScore, an evaluation metric for text generati…
▽ More
Word Error Rate (WER) is the primary metric used to assess automatic speech recognition (ASR) model quality. It has been shown that ASR models tend to have much higher WER on speakers with speech impairments than typical English speakers. It is hard to determine if models can be be useful at such high error rates. This study investigates the use of BERTScore, an evaluation metric for text generation, to provide a more informative measure of ASR model quality and usefulness. Both BERTScore and WER were compared to prediction errors manually annotated by Speech Language Pathologists for error type and assessment. BERTScore was found to be more correlated with human assessment of error type and assessment. BERTScore was specifically more robust to orthographic changes (contraction and normalization errors) where meaning was preserved. Furthermore, BERTScore was a better fit of error assessment than WER, as measured using an ordinal logistic regression and the Akaike's Information Criterion (AIC). Overall, our findings suggest that BERTScore can complement WER when assessing ASR model performance from a practical perspective, especially for accessibility applications where models are useful even at lower accuracy than for typical speech.
△ Less
Submitted 21 September, 2022;
originally announced September 2022.
-
Is Attention All That NeRF Needs?
Authors:
Mukund Varma T,
Peihao Wang,
Xuxi Chen,
Tianlong Chen,
Subhashini Venugopalan,
Zhangyang Wang
Abstract:
We present Generalizable NeRF Transformer (GNT), a transformer-based architecture that reconstructs Neural Radiance Fields (NeRFs) and learns to renders novel views on the fly from source views. While prior works on NeRFs optimize a scene representation by inverting a handcrafted rendering equation, GNT achieves neural representation and rendering that generalizes across scenes using transformers…
▽ More
We present Generalizable NeRF Transformer (GNT), a transformer-based architecture that reconstructs Neural Radiance Fields (NeRFs) and learns to renders novel views on the fly from source views. While prior works on NeRFs optimize a scene representation by inverting a handcrafted rendering equation, GNT achieves neural representation and rendering that generalizes across scenes using transformers at two stages. (1) The view transformer leverages multi-view geometry as an inductive bias for attention-based scene representation, and predicts coordinate-aligned features by aggregating information from epipolar lines on the neighboring views. (2) The ray transformer renders novel views using attention to decode the features from the view transformer along the sampled points during ray marching. Our experiments demonstrate that when optimized on a single scene, GNT can successfully reconstruct NeRF without an explicit rendering formula due to the learned ray renderer. When trained on multiple scenes, GNT consistently achieves state-of-the-art performance when transferring to unseen scenes and outperform all other methods by ~10% on average. Our analysis of the learned attention maps to infer depth and occlusion indicate that attention enables learning a physically-grounded rendering. Our results show the promise of transformers as a universal modeling tool for graphics. Please refer to our project page for video results: https://vita-group.github.io/GNT/.
△ Less
Submitted 1 March, 2023; v1 submitted 27 July, 2022;
originally announced July 2022.
-
Context-Aware Abbreviation Expansion Using Large Language Models
Authors:
Shanqing Cai,
Subhashini Venugopalan,
Katrin Tomanek,
Ajit Narayanan,
Meredith Ringel Morris,
Michael P. Brenner
Abstract:
Motivated by the need for accelerating text entry in augmentative and alternative communication (AAC) for people with severe motor impairments, we propose a paradigm in which phrases are abbreviated aggressively as primarily word-initial letters. Our approach is to expand the abbreviations into full-phrase options by leveraging conversation context with the power of pretrained large language model…
▽ More
Motivated by the need for accelerating text entry in augmentative and alternative communication (AAC) for people with severe motor impairments, we propose a paradigm in which phrases are abbreviated aggressively as primarily word-initial letters. Our approach is to expand the abbreviations into full-phrase options by leveraging conversation context with the power of pretrained large language models (LLMs). Through zero-shot, few-shot, and fine-tuning experiments on four public conversation datasets, we show that for replies to the initial turn of a dialog, an LLM with 64B parameters is able to exactly expand over 70% of phrases with abbreviation length up to 10, leading to an effective keystroke saving rate of up to about 77% on these exact expansions. Including a small amount of context in the form of a single conversation turn more than doubles abbreviation expansion accuracies compared to having no context, an effect that is more pronounced for longer phrases. Additionally, the robustness of models against typo noise can be enhanced through fine-tuning on noisy data.
△ Less
Submitted 10 May, 2022; v1 submitted 7 May, 2022;
originally announced May 2022.
-
Protecting the Integrity of IoT Sensor Data and Firmware With A Feather-Light Blockchain Infrastructure
Authors:
Daniel Reijsbergen,
Aung Maw,
Sarad Venugopalan,
Dianshi Yang,
Tien Tuan Anh Dinh,
Jianying Zhou
Abstract:
Smart cities deploy large numbers of sensors and collect a tremendous amount of data from them. For example, Advanced Metering Infrastructures (AMIs), which consist of physical meters that collect usage data about public utilities such as power and water, are an important building block in a smart city. In a typical sensor network, the measurement devices are connected through a computer network,…
▽ More
Smart cities deploy large numbers of sensors and collect a tremendous amount of data from them. For example, Advanced Metering Infrastructures (AMIs), which consist of physical meters that collect usage data about public utilities such as power and water, are an important building block in a smart city. In a typical sensor network, the measurement devices are connected through a computer network, which exposes them to cyber attacks. Furthermore, the data is centrally managed at the operator's servers, making it vulnerable to insider threats.
Our goal is to protect the integrity of data collected by large-scale sensor networks and the firmware in measurement devices from cyber attacks and insider threats. To this end, we first develop a comprehensive threat model for attacks against data and firmware integrity, which can target any of the stakeholders in the operation of the sensor network. Next, we use our threat model to analyze existing defense mechanisms, including signature checks, remote firmware attestation, anomaly detection, and blockchain-based secure logs. However, the large size of the Trusted Computing Base and a lack of scalability limit the applicability of these existing mechanisms. We propose the Feather-Light Blockchain Infrastructure (FLBI) framework to address these limitations. Our framework leverages a two-layer architecture and cryptographic threshold signature chains to support large networks of low-capacity devices such as meters and data aggregators. We have fully implemented the FLBI's end-to-end functionality on the Hyperledger Fabric and private Ethereum blockchain platforms. Our experiments show that the FLBI is able to support millions of end devices.
△ Less
Submitted 30 April, 2022;
originally announced May 2022.
-
TRILLsson: Distilled Universal Paralinguistic Speech Representations
Authors:
Joel Shor,
Subhashini Venugopalan
Abstract:
Recent advances in self-supervision have dramatically improved the quality of speech representations. However, deployment of state-of-the-art embedding models on devices has been restricted due to their limited public availability and large resource footprint. Our work addresses these issues by publicly releasing a collection of paralinguistic speech models that are small and near state-of-the-art…
▽ More
Recent advances in self-supervision have dramatically improved the quality of speech representations. However, deployment of state-of-the-art embedding models on devices has been restricted due to their limited public availability and large resource footprint. Our work addresses these issues by publicly releasing a collection of paralinguistic speech models that are small and near state-of-the-art performance. Our approach is based on knowledge distillation, and our models are distilled on public data only. We explore different architectures and thoroughly evaluate our models on the Non-Semantic Speech (NOSS) benchmark. Our largest distilled model is less than 15% the size of the original model (314MB vs 2.2GB), achieves over 96% the accuracy on 6 of 7 tasks, and is trained on 6.5% the data. The smallest model is 1% in size (22MB) and achieves over 90% the accuracy on 6 of 7 tasks. Our models outperform the open source Wav2Vec 2.0 model on 6 of 7 tasks, and our smallest model outperforms the open source Wav2Vec 2.0 on both emotion recognition tasks despite being 7% the size.
△ Less
Submitted 20 March, 2022; v1 submitted 1 March, 2022;
originally announced March 2022.
-
Using a Cross-Task Grid of Linear Probes to Interpret CNN Model Predictions On Retinal Images
Authors:
Katy Blumer,
Subhashini Venugopalan,
Michael P. Brenner,
Jon Kleinberg
Abstract:
We analyze a dataset of retinal images using linear probes: linear regression models trained on some "target" task, using embeddings from a deep convolutional (CNN) model trained on some "source" task as input. We use this method across all possible pairings of 93 tasks in the UK Biobank dataset of retinal images, leading to ~164k different models. We analyze the performance of these linear probes…
▽ More
We analyze a dataset of retinal images using linear probes: linear regression models trained on some "target" task, using embeddings from a deep convolutional (CNN) model trained on some "source" task as input. We use this method across all possible pairings of 93 tasks in the UK Biobank dataset of retinal images, leading to ~164k different models. We analyze the performance of these linear probes by source and target task and by layer depth. We observe that representations from the middle layers of the network are more generalizable. We find that some target tasks are easily predicted irrespective of the source task, and that some other target tasks are more accurately predicted from correlated source tasks than from embeddings trained on the same task.
△ Less
Submitted 23 July, 2021;
originally announced July 2021.
-
Always on Voting: A Framework for Repetitive Voting on the Blockchain
Authors:
Sarad Venugopalan,
Ivana Stančíková,
Ivan Homoliak
Abstract:
Elections repeat commonly after a fixed time interval, ranging from months to years. This results in limitations on governance since elected candidates or policies are difficult to remove before the next elections, if needed, and allowed by the corresponding law. Participants may decide (through a public deliberation) to change their choices but have no opportunity to vote for these choices before…
▽ More
Elections repeat commonly after a fixed time interval, ranging from months to years. This results in limitations on governance since elected candidates or policies are difficult to remove before the next elections, if needed, and allowed by the corresponding law. Participants may decide (through a public deliberation) to change their choices but have no opportunity to vote for these choices before the next elections. Another issue is the peak-end effect, where the judgment of voters is based on how they felt a short time before the elections. To address these issues, we propose Always on Voting (AoV) -- a repetitive voting framework that allows participants to vote and change elected candidates or policies without waiting for the next elections. Participants are permitted to privately change their vote at any point in time, while the effect of their change is manifested at the end of each epoch, whose duration is shorter than the time between two main elections. To thwart the problem of peak-end effect in epochs, the ends of epochs are randomized and made unpredictable, while preserved within soft bounds. These goals are achieved using the synergy between a Bitcoin puzzle oracle, verifiable delay function, and smart contracts.
△ Less
Submitted 24 September, 2023; v1 submitted 22 July, 2021;
originally announced July 2021.
-
Comparing Supervised Models And Learned Speech Representations For Classifying Intelligibility Of Disordered Speech On Selected Phrases
Authors:
Subhashini Venugopalan,
Joel Shor,
Manoj Plakal,
Jimmy Tobin,
Katrin Tomanek,
Jordan R. Green,
Michael P. Brenner
Abstract:
Automatic classification of disordered speech can provide an objective tool for identifying the presence and severity of speech impairment. Classification approaches can also help identify hard-to-recognize speech samples to teach ASR systems about the variable manifestations of impaired speech. Here, we develop and compare different deep learning techniques to classify the intelligibility of diso…
▽ More
Automatic classification of disordered speech can provide an objective tool for identifying the presence and severity of speech impairment. Classification approaches can also help identify hard-to-recognize speech samples to teach ASR systems about the variable manifestations of impaired speech. Here, we develop and compare different deep learning techniques to classify the intelligibility of disordered speech on selected phrases. We collected samples from a diverse set of 661 speakers with a variety of self-reported disorders speaking 29 words or phrases, which were rated by speech-language pathologists for their overall intelligibility using a five-point Likert scale. We then evaluated classifiers developed using 3 approaches: (1) a convolutional neural network (CNN) trained for the task, (2) classifiers trained on non-semantic speech representations from CNNs that used an unsupervised objective [1], and (3) classifiers trained on the acoustic (encoder) embeddings from an ASR system trained on typical speech [2]. We found that the ASR encoder's embeddings considerably outperform the other two on detecting and classifying disordered speech. Further analysis shows that the ASR embeddings cluster speech by the spoken phrase, while the non-semantic embeddings cluster speech by speaker. Also, longer phrases are more indicative of intelligibility deficits than single words.
△ Less
Submitted 8 July, 2021;
originally announced July 2021.
-
Guided Integrated Gradients: An Adaptive Path Method for Removing Noise
Authors:
Andrei Kapishnikov,
Subhashini Venugopalan,
Besim Avci,
Ben Wedin,
Michael Terry,
Tolga Bolukbasi
Abstract:
Integrated Gradients (IG) is a commonly used feature attribution method for deep neural networks. While IG has many desirable properties, the method often produces spurious/noisy pixel attributions in regions that are not related to the predicted class when applied to visual models. While this has been previously noted, most existing solutions are aimed at addressing the symptoms by explicitly red…
▽ More
Integrated Gradients (IG) is a commonly used feature attribution method for deep neural networks. While IG has many desirable properties, the method often produces spurious/noisy pixel attributions in regions that are not related to the predicted class when applied to visual models. While this has been previously noted, most existing solutions are aimed at addressing the symptoms by explicitly reducing the noise in the resulting attributions. In this work, we show that one of the causes of the problem is the accumulation of noise along the IG path. To minimize the effect of this source of noise, we propose adapting the attribution path itself -- conditioning the path not just on the image but also on the model being explained. We introduce Adaptive Path Methods (APMs) as a generalization of path methods, and Guided IG as a specific instance of an APM. Empirically, Guided IG creates saliency maps better aligned with the model's prediction and the input image that is being explained. We show through qualitative and quantitative experiments that Guided IG outperforms other, related methods in nearly every experiment.
△ Less
Submitted 17 June, 2021;
originally announced June 2021.
-
EPICTWIN: An Electric Power Digital Twin for Cyber Security Testing, Research and Education
Authors:
Nandha Kumar Kandasamy,
Sarad Venugopalan,
Tin Kit Wong,
Leu Junming Nicholas
Abstract:
Cyber-Physical Systems (CPS) rely on advanced communication and control technologies to efficiently manage devices and the flow of information in the system. However, a wide variety of potential security challenges has emerged due to the evolution of critical infrastructures (CI) from siloed sub-systems into connected and integrated networks. This is also the case for CI such as a smart grid. Smar…
▽ More
Cyber-Physical Systems (CPS) rely on advanced communication and control technologies to efficiently manage devices and the flow of information in the system. However, a wide variety of potential security challenges has emerged due to the evolution of critical infrastructures (CI) from siloed sub-systems into connected and integrated networks. This is also the case for CI such as a smart grid. Smart grid security studies are carried out on physical test-beds to provide its users a platform to train and test cyber attacks, in a safe and controlled environment. However, it has limitations w.r.t modifying physical configuration and difficulty to scale.
To overcome these shortcomings, we built a digital power twin for a physical test-bed that is used for cyber security studies on smart grids. On the developed twin, the users can deploy real world attacks and countermeasures, to test and study its effectiveness. The difference from the physical test-bed is that its users may easily modify their power system components and configurations. Further, reproducing the twin for using and advancing the research is significantly cheaper. The developed twin has advanced features compared to any equivalent system in the literature. To illustrate a typical use case, we present a case study where a cyber attack is launched and discuss its implications.
△ Less
Submitted 10 May, 2021;
originally announced May 2021.
-
BBB-Voting: 1-out-of-k Blockchain-Based Boardroom Voting
Authors:
Sarad Venugopalan,
Ivan Homoliak,
Zengpeng Li,
Pawel Szalachowski
Abstract:
Voting is a means to agree on a collective decision based on available choices (e.g., candidates), where participants agree to abide by their outcome. To improve some features of e-voting, decentralized blockchain-based solutions can be employed, where the blockchain represents a public bulletin board that in contrast to a centralized bulletin board provides extremely high availability, censorship…
▽ More
Voting is a means to agree on a collective decision based on available choices (e.g., candidates), where participants agree to abide by their outcome. To improve some features of e-voting, decentralized blockchain-based solutions can be employed, where the blockchain represents a public bulletin board that in contrast to a centralized bulletin board provides extremely high availability, censorship resistance, and correct code execution. A blockchain ensures that all entities in the voting system have the same view of the actions made by others due to its immutability and append-only features. The existing remote blockchain-based boardroom voting solution called Open Voting Network (OVN) provides the privacy of votes, universal & End-to-End verifiability, and perfect ballot secrecy; however, it supports only two choices and lacks robustness enabling recovery from stalling participants.
We present BBB-Voting, an equivalent blockchain-based approach for decentralized voting such as OVN, but in contrast to it, BBB-Voting supports 1-out-of-$k$ choices and provides robustness that enables recovery from stalling participants. We make a cost-optimized implementation using an Ethereum-based environment respecting Ethereum Enterprise Alliance standards, which we compare with OVN and show that our work decreases the costs for voters by 13.5% in normalized gas consumption. Finally, we show how BBB-Voting can be extended to support the number of participants limited only by the expenses paid by the authority and the computing power to obtain the tally.
△ Less
Submitted 10 May, 2023; v1 submitted 18 October, 2020;
originally announced October 2020.
-
Predicting Risk of Develo** Diabetic Retinopathy using Deep Learning
Authors:
Ashish Bora,
Siva Balasubramanian,
Boris Babenko,
Sunny Virmani,
Subhashini Venugopalan,
Akinori Mitani,
Guilherme de Oliveira Marinho,
Jorge Cuadros,
Paisan Ruamviboonsuk,
Greg S Corrado,
Lily Peng,
Dale R Webster,
Avinash V Varadarajan,
Naama Hammel,
Yun Liu,
Pinal Bavishi
Abstract:
Diabetic retinopathy (DR) screening is instrumental in preventing blindness, but faces a scaling challenge as the number of diabetic patients rises. Risk stratification for the development of DR may help optimize screening intervals to reduce costs while improving vision-related outcomes. We created and validated two versions of a deep learning system (DLS) to predict the development of mild-or-wo…
▽ More
Diabetic retinopathy (DR) screening is instrumental in preventing blindness, but faces a scaling challenge as the number of diabetic patients rises. Risk stratification for the development of DR may help optimize screening intervals to reduce costs while improving vision-related outcomes. We created and validated two versions of a deep learning system (DLS) to predict the development of mild-or-worse ("Mild+") DR in diabetic patients undergoing DR screening. The two versions used either three-fields or a single field of color fundus photographs (CFPs) as input. The training set was derived from 575,431 eyes, of which 28,899 had known 2-year outcome, and the remaining were used to augment the training process via multi-task learning. Validation was performed on both an internal validation set (set A; 7,976 eyes; 3,678 with known outcome) and an external validation set (set B; 4,762 eyes; 2,345 with known outcome). For predicting 2-year development of DR, the 3-field DLS had an area under the receiver operating characteristic curve (AUC) of 0.79 (95%CI, 0.78-0.81) on validation set A. On validation set B (which contained only a single field), the 1-field DLS's AUC was 0.70 (95%CI, 0.67-0.74). The DLS was prognostic even after adjusting for available risk factors (p<0.001). When added to the risk factors, the 3-field DLS improved the AUC from 0.72 (95%CI, 0.68-0.76) to 0.81 (95%CI, 0.77-0.84) in validation set A, and the 1-field DLS improved the AUC from 0.62 (95%CI, 0.58-0.66) to 0.71 (95%CI, 0.68-0.75) in validation set B. The DLSs in this study identified prognostic information for DR development from CFPs. This information is independent of and more informative than the available risk factors.
△ Less
Submitted 10 August, 2020;
originally announced August 2020.
-
Scientific Discovery by Generating Counterfactuals using Image Translation
Authors:
Arunachalam Narayanaswamy,
Subhashini Venugopalan,
Dale R. Webster,
Lily Peng,
Greg Corrado,
Paisan Ruamviboonsuk,
Pinal Bavishi,
Rory Sayres,
Abigail Huang,
Siva Balasubramanian,
Michael Brenner,
Philip Nelson,
Avinash V. Varadarajan
Abstract:
Model explanation techniques play a critical role in understanding the source of a model's performance and making its decisions transparent. Here we investigate if explanation techniques can also be used as a mechanism for scientific discovery. We make three contributions: first, we propose a framework to convert predictions from explanation techniques to a mechanism of discovery. Second, we show…
▽ More
Model explanation techniques play a critical role in understanding the source of a model's performance and making its decisions transparent. Here we investigate if explanation techniques can also be used as a mechanism for scientific discovery. We make three contributions: first, we propose a framework to convert predictions from explanation techniques to a mechanism of discovery. Second, we show how generative models in combination with black-box predictors can be used to generate hypotheses (without human priors) that can be critically examined. Third, with these techniques we study classification models for retinal images predicting Diabetic Macular Edema (DME), where recent work showed that a CNN trained on these images is likely learning novel features in the image. We demonstrate that the proposed framework is able to explain the underlying scientific mechanism, thus bridging the gap between the model's performance and human understanding.
△ Less
Submitted 19 July, 2020; v1 submitted 10 July, 2020;
originally announced July 2020.
-
Decentralized Lightweight Detection of Eclipse Attacks on Bitcoin Clients
Authors:
Bithin Alangot,
Daniel Reijsbergen,
Sarad Venugopalan,
Pawel Szalachowski
Abstract:
Clients of permissionless blockchain systems, like Bitcoin, rely on an underlying peer-to-peer network to send and receive transactions. It is critical that a client is connected to at least one honest peer, as otherwise the client can be convinced to accept a maliciously forked view of the blockchain. In such an eclipse attack, the client is unable to reliably distinguish the canonical view of th…
▽ More
Clients of permissionless blockchain systems, like Bitcoin, rely on an underlying peer-to-peer network to send and receive transactions. It is critical that a client is connected to at least one honest peer, as otherwise the client can be convinced to accept a maliciously forked view of the blockchain. In such an eclipse attack, the client is unable to reliably distinguish the canonical view of the blockchain from the view provided by the attacker. The consequences of this can be catastrophic if the client makes business decisions based on a distorted view of the blockchain transactions. In this paper, we investigate the design space and propose two approaches for Bitcoin clients to detect whether an eclipse attack against them is ongoing. Each approach chooses a different trade-off between average attack detection time and network load. The first scheme is based on the detection of suspicious block timestamps. The second scheme allows blockchain clients to utilize their natural connections to the Internet (i.e., standard web activity) to gossip about their blockchain views with contacted servers and their other clients. Our proposals improve upon previously proposed eclipse attack countermeasures without introducing any dedicated infrastructure or changes to the Bitcoin protocol and network, and we discuss an implementation. We demonstrate the effectiveness of the gossip-based schemes through rigorous analysis using original Internet traffic traces and real-world deployment. The results indicate that our protocol incurs a negligible overhead and detects eclipse attacks rapidly with high probability, and is well-suited for practical deployment.
△ Less
Submitted 5 July, 2020;
originally announced July 2020.
-
Scaling Symbolic Methods using Gradients for Neural Model Explanation
Authors:
Subham Sekhar Sahoo,
Subhashini Venugopalan,
Li Li,
Rishabh Singh,
Patrick Riley
Abstract:
Symbolic techniques based on Satisfiability Modulo Theory (SMT) solvers have been proposed for analyzing and verifying neural network properties, but their usage has been fairly limited owing to their poor scalability with larger networks. In this work, we propose a technique for combining gradient-based methods with symbolic techniques to scale such analyses and demonstrate its application for mo…
▽ More
Symbolic techniques based on Satisfiability Modulo Theory (SMT) solvers have been proposed for analyzing and verifying neural network properties, but their usage has been fairly limited owing to their poor scalability with larger networks. In this work, we propose a technique for combining gradient-based methods with symbolic techniques to scale such analyses and demonstrate its application for model explanation. In particular, we apply this technique to identify minimal regions in an input that are most relevant for a neural network's prediction. Our approach uses gradient information (based on Integrated Gradients) to focus on a subset of neurons in the first layer, which allows our technique to scale to large networks. The corresponding SMT constraints encode the minimal input mask discovery problem such that after masking the input, the activations of the selected neurons are still above a threshold. After solving for the minimal masks, our approach scores the mask regions to generate a relative ordering of the features within the mask. This produces a saliency map which explains "where a model is looking" when making a prediction. We evaluate our technique on three datasets - MNIST, ImageNet, and Beer Reviews, and demonstrate both quantitatively and qualitatively that the regions generated by our approach are sparser and achieve higher saliency scores compared to the gradient-based methods alone. Code and examples are at - https://github.com/google-research/google-research/tree/master/smug_saliency
△ Less
Submitted 5 May, 2021; v1 submitted 29 June, 2020;
originally announced June 2020.
-
Fukaya categories of blowups
Authors:
Sushmita Venugopalan,
Chris T. Woodward,
Guangbo Xu
Abstract:
We compute the Fukaya category of the symplectic blowup of a compact rational symplectic manifold at a point in the following sense: Suppose a collection of Lagrangian branes satisfy Abouzaid's criterion for split-generation of a bulk-deformed Fukaya category of cleanly-intersecting Lagrangian branes. We show that for a small blow-up parameter, their inverse images in the blowup together with a co…
▽ More
We compute the Fukaya category of the symplectic blowup of a compact rational symplectic manifold at a point in the following sense: Suppose a collection of Lagrangian branes satisfy Abouzaid's criterion for split-generation of a bulk-deformed Fukaya category of cleanly-intersecting Lagrangian branes. We show that for a small blow-up parameter, their inverse images in the blowup together with a collection of branes near the exceptional locus split-generate the Fukaya category of the blowup. This categorifies a result on quantum cohomology by Bayer and is an example of a more general conjectural description of the behavior of the Fukaya category under transitions occuring in the minimal model program, namely that mmp transitions generate additional summands.
△ Less
Submitted 6 June, 2023; v1 submitted 22 June, 2020;
originally announced June 2020.
-
Tropical Fukaya Algebras
Authors:
Sushmita Venugopalan,
Chris Woodward
Abstract:
We introduce a tropical version of the Fukaya algebra of a Lagrangian submanifold and use it to show that tropical Lagrangian tori are weakly unobstructed. Tropical graphs arise as large-scale behavior of pseudoholomorphic disks under a multiple cut operation on a symplectic manifold that produces a collection of cut spaces each containing relative normal crossing divisors, following works of Ione…
▽ More
We introduce a tropical version of the Fukaya algebra of a Lagrangian submanifold and use it to show that tropical Lagrangian tori are weakly unobstructed. Tropical graphs arise as large-scale behavior of pseudoholomorphic disks under a multiple cut operation on a symplectic manifold that produces a collection of cut spaces each containing relative normal crossing divisors, following works of Ionel and Brett Parker. Given a Lagrangian submanifold in the complement of the relative divisors in one of the cut spaces, the structure maps of the broken Fukaya algebra count broken disks associated to rigid tropical graphs. We introduce a further degeneration of the matching conditions (similar in spirit to Bourgeois' version of symplectic field theory) which results in a tropical Fukaya algebra whose structure maps are, in good cases, sums of products over vertices of tropical graphs. We show the tropical Fukaya algebra is homotopy equivalent to the original Fukaya algebra. In the case of toric Lagrangians contained in a toric component of the degeneration, an invariance argument implies the existence of projective Maurer-Cartan solutions. We also give various computations of potentials, such as those of Lagrangians in cubic surfaces and flag varieties.
△ Less
Submitted 25 October, 2022; v1 submitted 29 April, 2020;
originally announced April 2020.
-
Attribution in Scale and Space
Authors:
Shawn Xu,
Subhashini Venugopalan,
Mukund Sundararajan
Abstract:
We study the attribution problem [28] for deep networks applied to perception tasks. For vision tasks, attribution techniques attribute the prediction of a network to the pixels of the input image. We propose a new technique called \emph{Blur Integrated Gradients}. This technique has several advantages over other methods. First, it can tell at what scale a network recognizes an object. It produces…
▽ More
We study the attribution problem [28] for deep networks applied to perception tasks. For vision tasks, attribution techniques attribute the prediction of a network to the pixels of the input image. We propose a new technique called \emph{Blur Integrated Gradients}. This technique has several advantages over other methods. First, it can tell at what scale a network recognizes an object. It produces scores in the scale/frequency dimension, that we find captures interesting phenomena. Second, it satisfies the scale-space axioms [14], which imply that it employs perturbations that are free of artifact. We therefore produce explanations that are cleaner and consistent with the operation of deep networks. Third, it eliminates the need for a 'baseline' parameter for Integrated Gradients [31] for perception tasks. This is desirable because the choice of baseline has a significant effect on the explanations. We compare the proposed technique against previous techniques and demonstrate application on three tasks: ImageNet object recognition, Diabetic Retinopathy prediction, and AudioSet audio event identification.
△ Less
Submitted 8 April, 2020; v1 submitted 3 April, 2020;
originally announced April 2020.
-
It's easy to fool yourself: Case studies on identifying bias and confounding in bio-medical datasets
Authors:
Subhashini Venugopalan,
Arunachalam Narayanaswamy,
Samuel Yang,
Anton Geraschenko,
Scott Lipnick,
Nina Makhortova,
James Hawrot,
Christine Marques,
Joao Pereira,
Michael Brenner,
Lee Rubin,
Brian Wainger,
Marc Berndl
Abstract:
Confounding variables are a well known source of nuisance in biomedical studies. They present an even greater challenge when we combine them with black-box machine learning techniques that operate on raw data. This work presents two case studies. In one, we discovered biases arising from systematic errors in the data generation process. In the other, we found a spurious source of signal unrelated…
▽ More
Confounding variables are a well known source of nuisance in biomedical studies. They present an even greater challenge when we combine them with black-box machine learning techniques that operate on raw data. This work presents two case studies. In one, we discovered biases arising from systematic errors in the data generation process. In the other, we found a spurious source of signal unrelated to the prediction task at hand. In both cases, our prediction models performed well but under careful examination hidden confounders and biases were revealed. These are cautionary tales on the limits of using machine learning techniques on raw data from scientific experiments.
△ Less
Submitted 6 April, 2020; v1 submitted 12 December, 2019;
originally announced December 2019.
-
The Security Reference Architecture for Blockchains: Towards a Standardized Model for Studying Vulnerabilities, Threats, and Defenses
Authors:
Ivan Homoliak,
Sarad Venugopalan,
Qingze Hum,
Daniel Reijsbergen,
Richard Schumi,
Pawel Szalachowski
Abstract:
Blockchains are distributed systems, in which security is a critical factor for their success. However, despite their increasing popularity and adoption, there is a lack of standardized models that study blockchain-related security threats. To fill this gap, the main focus of our work is to systematize and extend the knowledge about the security and privacy aspects of blockchains and contribute to…
▽ More
Blockchains are distributed systems, in which security is a critical factor for their success. However, despite their increasing popularity and adoption, there is a lack of standardized models that study blockchain-related security threats. To fill this gap, the main focus of our work is to systematize and extend the knowledge about the security and privacy aspects of blockchains and contribute to the standardization of this domain.
We propose the security reference architecture (SRA) for blockchains, which adopts a stacked model (similar to the ISO/OSI) describing the nature and hierarchy of various security and privacy aspects. The SRA contains four layers: (1) the network layer, (2) the consensus layer, (3) the replicated state machine layer, and (4) the application layer. At each of these layers, we identify known security threats, their origin, and countermeasures, while we also analyze several cross-layer dependencies. Next, to enable better reasoning about security aspects of blockchains by the practitioners, we propose a blockchain-specific version of the threat-risk assessment standard ISO/IEC 15408 by embedding the stacked model into this standard. Finally, we provide designers of blockchain platforms and applications with a design methodology following the model of SRA and its hierarchy.
△ Less
Submitted 28 October, 2020; v1 submitted 22 October, 2019;
originally announced October 2019.
-
Novikov's theorem in higher dimensions?
Authors:
Sushmita Venugopalan
Abstract:
Novikov's theorem is a rigidity result on the class of taut foliations on three-manifolds. For higher dimensional manifolds, the existence of a strong symplectic form has been proposed as an analogue for tautness in order to achieve similar rigidity. This leads to the natural question of whether strong symplectic foliations satisfy an analogue of Novikov's theorem. In this paper, we construct a fi…
▽ More
Novikov's theorem is a rigidity result on the class of taut foliations on three-manifolds. For higher dimensional manifolds, the existence of a strong symplectic form has been proposed as an analogue for tautness in order to achieve similar rigidity. This leads to the natural question of whether strong symplectic foliations satisfy an analogue of Novikov's theorem. In this paper, we construct a five-dimensional manifold with a strong symplectic foliation that does not satisfy the expected analogue of Novikov's theorem.
△ Less
Submitted 12 July, 2019;
originally announced July 2019.
-
A Security Reference Architecture for Blockchains
Authors:
Ivan Homoliak,
Sarad Venugopalan,
Qingze Hum,
Pawel Szalachowski
Abstract:
Due to their interesting features, blockchains have become popular in recent years. They are full-stack systems where security is a critical factor for their success. The main focus of this work is to systematize knowledge about security and privacy issues of blockchains. To this end, we propose a security reference architecture based on models that demonstrate the stacked hierarchy of various thr…
▽ More
Due to their interesting features, blockchains have become popular in recent years. They are full-stack systems where security is a critical factor for their success. The main focus of this work is to systematize knowledge about security and privacy issues of blockchains. To this end, we propose a security reference architecture based on models that demonstrate the stacked hierarchy of various threats (similar to the ISO/OSI hierarchy) as well as threat-risk assessment using ISO/IEC 15408. In contrast to the previous surveys, we focus on the categorization of security incidents based on their origins and using the proposed architecture we present existing prevention and mitigation techniques. The scope of our work mainly covers aspects related to the decentralized nature of blockchains, while we mention common operational security issues and countermeasures only tangentially.
△ Less
Submitted 15 April, 2019;
originally announced April 2019.
-
Predicting optical coherence tomography-derived diabetic macular edema grades from fundus photographs using deep learning
Authors:
Avinash Varadarajan,
Pinal Bavishi,
Paisan Raumviboonsuk,
Peranut Chotcomwongse,
Subhashini Venugopalan,
Arunachalam Narayanaswamy,
Jorge Cuadros,
Kuniyoshi Kanai,
George Bresnick,
Mongkol Tadarati,
Sukhum Silpa-archa,
Jirawut Limwattanayingyong,
Variya Nganthavee,
Joe Ledsam,
Pearse A Keane,
Greg S Corrado,
Lily Peng,
Dale R Webster
Abstract:
Diabetic eye disease is one of the fastest growing causes of preventable blindness. With the advent of anti-VEGF (vascular endothelial growth factor) therapies, it has become increasingly important to detect center-involved diabetic macular edema (ci-DME). However, center-involved diabetic macular edema is diagnosed using optical coherence tomography (OCT), which is not generally available at scre…
▽ More
Diabetic eye disease is one of the fastest growing causes of preventable blindness. With the advent of anti-VEGF (vascular endothelial growth factor) therapies, it has become increasingly important to detect center-involved diabetic macular edema (ci-DME). However, center-involved diabetic macular edema is diagnosed using optical coherence tomography (OCT), which is not generally available at screening sites because of cost and workflow constraints. Instead, screening programs rely on the detection of hard exudates in color fundus photographs as a proxy for DME, often resulting in high false positive or false negative calls. To improve the accuracy of DME screening, we trained a deep learning model to use color fundus photographs to predict ci-DME. Our model had an ROC-AUC of 0.89 (95% CI: 0.87-0.91), which corresponds to a sensitivity of 85% at a specificity of 80%. In comparison, three retinal specialists had similar sensitivities (82-85%), but only half the specificity (45-50%, p<0.001 for each comparison with model). The positive predictive value (PPV) of the model was 61% (95% CI: 56-66%), approximately double the 36-38% by the retinal specialists. In addition to predicting ci-DME, our model was able to detect the presence of intraretinal fluid with an AUC of 0.81 (95% CI: 0.81-0.86) and subretinal fluid with an AUC of 0.88 (95% CI: 0.85-0.91). The ability of deep learning algorithms to make clinically relevant predictions that generally require sophisticated 3D-imaging equipment from simple 2D images has broad relevance to many other applications in medical imaging.
△ Less
Submitted 31 July, 2019; v1 submitted 18 October, 2018;
originally announced October 2018.
-
Symplectic foliated fillings of sphere cotangent bundles
Authors:
Francisco Presas,
Sushmita Venugopalan
Abstract:
We classify symplectically foliated fillings of certain contact foliated manifolds. We show that up to symplectic deformation, the unique minimal symplectically foliated filling of the foliated sphere cotangent bundle of the Reeb foliation in the 3-sphere is the associated disk cotangent bundle. En route to the proof, we study another foliated manifold, namely the product of a circle and an annulu…
▽ More
We classify symplectically foliated fillings of certain contact foliated manifolds. We show that up to symplectic deformation, the unique minimal symplectically foliated filling of the foliated sphere cotangent bundle of the Reeb foliation in the 3-sphere is the associated disk cotangent bundle. En route to the proof, we study another foliated manifold, namely the product of a circle and an annulus with almost horizontal foliation. In this case, the foliated unit cotangent bundle does not have a unique minimal symplectic filling. We classify the foliated fillings of this manifold up to symplectic deformation equivalence using combinatorial invariants of the filling.
△ Less
Submitted 27 September, 2018;
originally announced September 2018.
-
Detecting Cancer Metastases on Gigapixel Pathology Images
Authors:
Yun Liu,
Krishna Gadepalli,
Mohammad Norouzi,
George E. Dahl,
Timo Kohlberger,
Aleksey Boyko,
Subhashini Venugopalan,
Aleksei Timofeev,
Philip Q. Nelson,
Greg S. Corrado,
Jason D. Hipp,
Lily Peng,
Martin C. Stumpe
Abstract:
Each year, the treatment decisions for more than 230,000 breast cancer patients in the U.S. hinge on whether the cancer has metastasized away from the breast. Metastasis detection is currently performed by pathologists reviewing large expanses of biological tissues. This process is labor intensive and error-prone. We present a framework to automatically detect and localize tumors as small as 100 x…
▽ More
Each year, the treatment decisions for more than 230,000 breast cancer patients in the U.S. hinge on whether the cancer has metastasized away from the breast. Metastasis detection is currently performed by pathologists reviewing large expanses of biological tissues. This process is labor intensive and error-prone. We present a framework to automatically detect and localize tumors as small as 100 x 100 pixels in gigapixel microscopy images sized 100,000 x 100,000 pixels. Our method leverages a convolutional neural network (CNN) architecture and obtains state-of-the-art results on the Camelyon16 dataset in the challenging lesion-level tumor detection task. At 8 false positives per image, we detect 92.4% of the tumors, relative to 82.7% by the previous best automated approach. For comparison, a human pathologist attempting exhaustive search achieved 73.2% sensitivity. We achieve image-level AUC scores above 97% on both the Camelyon16 test set and an independent set of 110 slides. In addition, we discover that two slides in the Camelyon16 training set were erroneously labeled normal. Our approach could considerably reduce false negative rates in metastasis detection.
△ Less
Submitted 7 March, 2017; v1 submitted 3 March, 2017;
originally announced March 2017.
-
Utilizing Large Scale Vision and Text Datasets for Image Segmentation from Referring Expressions
Authors:
Ronghang Hu,
Marcus Rohrbach,
Subhashini Venugopalan,
Trevor Darrell
Abstract:
Image segmentation from referring expressions is a joint vision and language modeling task, where the input is an image and a textual expression describing a particular region in the image; and the goal is to localize and segment the specific image region based on the given expression. One major difficulty to train such language-based image segmentation systems is the lack of datasets with joint v…
▽ More
Image segmentation from referring expressions is a joint vision and language modeling task, where the input is an image and a textual expression describing a particular region in the image; and the goal is to localize and segment the specific image region based on the given expression. One major difficulty to train such language-based image segmentation systems is the lack of datasets with joint vision and text annotations. Although existing vision datasets such as MS COCO provide image captions, there are few datasets with region-level textual annotations for images, and these are often smaller in scale. In this paper, we explore how existing large scale vision-only and text-only datasets can be utilized to train models for image segmentation from referring expressions. We propose a method to address this problem, and show in experiments that our method can help this joint vision and language modeling task with vision-only and text-only data and outperforms previous results.
△ Less
Submitted 29 August, 2016;
originally announced August 2016.
-
Captioning Images with Diverse Objects
Authors:
Subhashini Venugopalan,
Lisa Anne Hendricks,
Marcus Rohrbach,
Raymond Mooney,
Trevor Darrell,
Kate Saenko
Abstract:
Recent captioning models are limited in their ability to scale and describe concepts unseen in paired image-text corpora. We propose the Novel Object Captioner (NOC), a deep visual semantic captioning model that can describe a large number of object categories not present in existing image-caption datasets. Our model takes advantage of external sources -- labeled images from object recognition dat…
▽ More
Recent captioning models are limited in their ability to scale and describe concepts unseen in paired image-text corpora. We propose the Novel Object Captioner (NOC), a deep visual semantic captioning model that can describe a large number of object categories not present in existing image-caption datasets. Our model takes advantage of external sources -- labeled images from object recognition datasets, and semantic knowledge extracted from unannotated text. We propose minimizing a joint objective which can learn from these diverse data sources and leverage distributional semantic embeddings, enabling the model to generalize and describe novel objects outside of image-caption datasets. We demonstrate that our model exploits semantic information to generate captions for hundreds of object categories in the ImageNet object recognition dataset that are not observed in MSCOCO image-caption training data, as well as many categories that are observed very rarely. Both automatic evaluations and human judgements show that our model considerably outperforms prior work in being able to describe many more categories of objects.
△ Less
Submitted 20 July, 2017; v1 submitted 24 June, 2016;
originally announced June 2016.
-
Improving LSTM-based Video Description with Linguistic Knowledge Mined from Text
Authors:
Subhashini Venugopalan,
Lisa Anne Hendricks,
Raymond Mooney,
Kate Saenko
Abstract:
This paper investigates how linguistic knowledge mined from large text corpora can aid the generation of natural language descriptions of videos. Specifically, we integrate both a neural language model and distributional semantics trained on large text corpora into a recent LSTM-based architecture for video description. We evaluate our approach on a collection of Youtube videos as well as two larg…
▽ More
This paper investigates how linguistic knowledge mined from large text corpora can aid the generation of natural language descriptions of videos. Specifically, we integrate both a neural language model and distributional semantics trained on large text corpora into a recent LSTM-based architecture for video description. We evaluate our approach on a collection of Youtube videos as well as two large movie description datasets showing significant improvements in grammaticality while modestly improving descriptive quality.
△ Less
Submitted 29 November, 2016; v1 submitted 6 April, 2016;
originally announced April 2016.
-
Local model for the moduli space of affine vortices
Authors:
Sushmita Venugopalan,
Guangbo Xu
Abstract:
We show that the moduli space of regular affine vortices, which are solutions of the symplectic vortex equation over the complex plane, has the structure of a smooth manifold. The construction uses Ziltener's Fredholm theory results [31]. We also extend the result to the case of affine vortices over the upper half plane. These results are necessary ingredients in defining the "open quantum Kirwan…
▽ More
We show that the moduli space of regular affine vortices, which are solutions of the symplectic vortex equation over the complex plane, has the structure of a smooth manifold. The construction uses Ziltener's Fredholm theory results [31]. We also extend the result to the case of affine vortices over the upper half plane. These results are necessary ingredients in defining the "open quantum Kirwan map" proposed by Woodward [24].
△ Less
Submitted 23 December, 2016; v1 submitted 21 December, 2015;
originally announced December 2015.
-
Deep Compositional Captioning: Describing Novel Object Categories without Paired Training Data
Authors:
Lisa Anne Hendricks,
Subhashini Venugopalan,
Marcus Rohrbach,
Raymond Mooney,
Kate Saenko,
Trevor Darrell
Abstract:
While recent deep neural network models have achieved promising results on the image captioning task, they rely largely on the availability of corpora with paired image and sentence captions to describe objects in context. In this work, we propose the Deep Compositional Captioner (DCC) to address the task of generating descriptions of novel objects which are not present in paired image-sentence da…
▽ More
While recent deep neural network models have achieved promising results on the image captioning task, they rely largely on the availability of corpora with paired image and sentence captions to describe objects in context. In this work, we propose the Deep Compositional Captioner (DCC) to address the task of generating descriptions of novel objects which are not present in paired image-sentence datasets. Our method achieves this by leveraging large object recognition datasets and external text corpora and by transferring knowledge between semantically similar concepts. Current deep caption models can only describe objects contained in paired image-sentence corpora, despite the fact that they are pre-trained with large object recognition datasets, namely ImageNet. In contrast, our model can compose sentences that describe novel objects and their interactions with other objects. We demonstrate our model's ability to describe novel concepts by empirically evaluating its performance on MSCOCO and show qualitative results on ImageNet images of objects for which no paired image-caption data exist. Further, we extend our approach to generate descriptions of objects in video clips. Our results show that DCC has distinct advantages over existing image and video captioning approaches for generating descriptions of new objects in context.
△ Less
Submitted 27 April, 2016; v1 submitted 17 November, 2015;
originally announced November 2015.
-
A Multi-scale Multiple Instance Video Description Network
Authors:
Huijuan Xu,
Subhashini Venugopalan,
Vasili Ramanishka,
Marcus Rohrbach,
Kate Saenko
Abstract:
Generating natural language descriptions for in-the-wild videos is a challenging task. Most state-of-the-art methods for solving this problem borrow existing deep convolutional neural network (CNN) architectures (AlexNet, GoogLeNet) to extract a visual representation of the input video. However, these deep CNN architectures are designed for single-label centered-positioned object classification. W…
▽ More
Generating natural language descriptions for in-the-wild videos is a challenging task. Most state-of-the-art methods for solving this problem borrow existing deep convolutional neural network (CNN) architectures (AlexNet, GoogLeNet) to extract a visual representation of the input video. However, these deep CNN architectures are designed for single-label centered-positioned object classification. While they generate strong semantic features, they have no inherent structure allowing them to detect multiple objects of different sizes and locations in the frame. Our paper tries to solve this problem by integrating the base CNN into several fully convolutional neural networks (FCNs) to form a multi-scale network that handles multiple receptive field sizes in the original image. FCNs, previously applied to image segmentation, can generate class heat-maps efficiently compared to sliding window mechanisms, and can easily handle multiple scales. To further handle the ambiguity over multiple objects and locations, we incorporate the Multiple Instance Learning mechanism (MIL) to consider objects in different positions and at different scales simultaneously. We integrate our multi-scale multi-instance architecture with a sequence-to-sequence recurrent neural network to generate sentence descriptions based on the visual representation. Ours is the first end-to-end trainable architecture that is capable of multi-scale region processing. Evaluation on a Youtube video dataset shows the advantage of our approach compared to the original single-scale whole frame CNN model. Our flexible and efficient architecture can potentially be extended to support other video processing tasks.
△ Less
Submitted 18 March, 2016; v1 submitted 21 May, 2015;
originally announced May 2015.
-
Sequence to Sequence -- Video to Text
Authors:
Subhashini Venugopalan,
Marcus Rohrbach,
Jeff Donahue,
Raymond Mooney,
Trevor Darrell,
Kate Saenko
Abstract:
Real-world videos often have complex dynamics; and methods for generating open-domain video descriptions should be sensitive to temporal structure and allow both input (sequence of frames) and output (sequence of words) of variable length. To approach this problem, we propose a novel end-to-end sequence-to-sequence model to generate captions for videos. For this we exploit recurrent neural network…
▽ More
Real-world videos often have complex dynamics; and methods for generating open-domain video descriptions should be sensitive to temporal structure and allow both input (sequence of frames) and output (sequence of words) of variable length. To approach this problem, we propose a novel end-to-end sequence-to-sequence model to generate captions for videos. For this we exploit recurrent neural networks, specifically LSTMs, which have demonstrated state-of-the-art performance in image caption generation. Our LSTM model is trained on video-sentence pairs and learns to associate a sequence of video frames to a sequence of words in order to generate a description of the event in the video clip. Our model naturally is able to learn the temporal structure of the sequence of frames as well as the sequence model of the generated sentences, i.e. a language model. We evaluate several variants of our model that exploit different visual features on a standard set of YouTube videos and two movie description datasets (M-VAD and MPII-MD).
△ Less
Submitted 19 October, 2015; v1 submitted 3 May, 2015;
originally announced May 2015.
-
Translating Videos to Natural Language Using Deep Recurrent Neural Networks
Authors:
Subhashini Venugopalan,
Huijuan Xu,
Jeff Donahue,
Marcus Rohrbach,
Raymond Mooney,
Kate Saenko
Abstract:
Solving the visual symbol grounding problem has long been a goal of artificial intelligence. The field appears to be advancing closer to this goal with recent breakthroughs in deep learning for natural language grounding in static images. In this paper, we propose to translate videos directly to sentences using a unified deep neural network with both convolutional and recurrent structure. Describe…
▽ More
Solving the visual symbol grounding problem has long been a goal of artificial intelligence. The field appears to be advancing closer to this goal with recent breakthroughs in deep learning for natural language grounding in static images. In this paper, we propose to translate videos directly to sentences using a unified deep neural network with both convolutional and recurrent structure. Described video datasets are scarce, and most existing methods have been applied to toy domains with a small vocabulary of possible words. By transferring knowledge from 1.2M+ images with category labels and 100,000+ images with captions, our method is able to create sentence descriptions of open-domain videos with large vocabularies. We compare our approach with recent work using language generation metrics, subject, verb, and object prediction accuracy, and a human evaluation.
△ Less
Submitted 30 April, 2015; v1 submitted 15 December, 2014;
originally announced December 2014.
-
Long-term Recurrent Convolutional Networks for Visual Recognition and Description
Authors:
Jeff Donahue,
Lisa Anne Hendricks,
Marcus Rohrbach,
Subhashini Venugopalan,
Sergio Guadarrama,
Kate Saenko,
Trevor Darrell
Abstract:
Models based on deep convolutional networks have dominated recent image interpretation tasks; we investigate whether models which are also recurrent, or "temporally deep", are effective for tasks involving sequences, visual and otherwise. We develop a novel recurrent convolutional architecture suitable for large-scale visual learning which is end-to-end trainable, and demonstrate the value of thes…
▽ More
Models based on deep convolutional networks have dominated recent image interpretation tasks; we investigate whether models which are also recurrent, or "temporally deep", are effective for tasks involving sequences, visual and otherwise. We develop a novel recurrent convolutional architecture suitable for large-scale visual learning which is end-to-end trainable, and demonstrate the value of these models on benchmark video recognition tasks, image description and retrieval problems, and video narration challenges. In contrast to current models which assume a fixed spatio-temporal receptive field or simple temporal averaging for sequential processing, recurrent convolutional models are "doubly deep"' in that they can be compositional in spatial and temporal "layers". Such models may have advantages when target concepts are complex and/or training data are limited. Learning long-term dependencies is possible when nonlinearities are incorporated into the network state updates. Long-term RNN models are appealing in that they directly can map variable-length inputs (e.g., video frames) to variable length outputs (e.g., natural language text) and can model complex temporal dynamics; yet they can be optimized with backpropagation. Our recurrent long-term models are directly connected to modern visual convnet models and can be jointly trained to simultaneously learn temporal dynamics and convolutional perceptual representations. Our results show such models have distinct advantages over state-of-the-art models for recognition or generation which are separately defined and/or optimized.
△ Less
Submitted 31 May, 2016; v1 submitted 17 November, 2014;
originally announced November 2014.
-
Vortices on surfaces with cylindrical ends
Authors:
Sushmita Venugopalan
Abstract:
We consider Riemann surfaces obtained from nodal curves with infinite cylinders in the place of nodal and marked points, and study the space of finite energy vortices defined on these surfaces. To compactify the space of vortices, we need to consider stable vortices - these incorporate breaking of cylinders and sphere bubbling in the fibers. In this paper, we prove that the space of gauge equivale…
▽ More
We consider Riemann surfaces obtained from nodal curves with infinite cylinders in the place of nodal and marked points, and study the space of finite energy vortices defined on these surfaces. To compactify the space of vortices, we need to consider stable vortices - these incorporate breaking of cylinders and sphere bubbling in the fibers. In this paper, we prove that the space of gauge equivalence classes of stable vortices representing a fixed equivariant homology class is compact and Hausdorff under the Gromov topology. We also show that this space is homeomorphic to the moduli space of quasimaps defined by Ciocan-Fontanine, Kim and Maulik.
△ Less
Submitted 22 July, 2015; v1 submitted 4 December, 2013;
originally announced December 2013.
-
Classification of affine vortices
Authors:
Sushmita Venugopalan,
Christopher T. Woodward
Abstract:
We prove a Hitchin-Kobayashi correspondence for affine vortices generalizing a result of Jaffe-Taubes for the action of the circle on the affine line. Namely, suppose a compact Lie group K has a Hamiltonian action on a Kaehler manifold X which is either compact or convex at infinity with a proper moment map, and so that stable=semistable for the action of the complexified Lie group G. Then, for so…
▽ More
We prove a Hitchin-Kobayashi correspondence for affine vortices generalizing a result of Jaffe-Taubes for the action of the circle on the affine line. Namely, suppose a compact Lie group K has a Hamiltonian action on a Kaehler manifold X which is either compact or convex at infinity with a proper moment map, and so that stable=semistable for the action of the complexified Lie group G. Then, for some sufficiently divisible integer n, there is a bijection between gauge equivalence classes of K-vortices with target X modulo gauge and isomorphism classes of maps from the weighted projective line P(1,n) to X/G that map the stacky point at infinity P(n) to the semistable locus in X. The results allow the construction and partial computation of the quantum Kirwan map in Woodward, and play a role in the conjectures of Dimofte, Gukov, and Hollande relating vortex counts to knot invariants.
△ Less
Submitted 7 July, 2015; v1 submitted 29 January, 2013;
originally announced January 2013.
-
Yang-Mills heat flow on gauged holomorphic maps
Authors:
Sushmita Venugopalan
Abstract:
We study the gradient flow lines of a Yang-Mills-type functional on the space of gauged holomorphic maps $\mathcal{H}(P,X)$, where $P$ is a principal bundle on a Riemann surface $Σ$ and $X$ is a Kähler Hamiltonian $G$-manifold. For compact $Σ$, possibly with boundary, we prove long time existence of the gradient flow. The flow lines converge to critical points of the functional. So, there is a str…
▽ More
We study the gradient flow lines of a Yang-Mills-type functional on the space of gauged holomorphic maps $\mathcal{H}(P,X)$, where $P$ is a principal bundle on a Riemann surface $Σ$ and $X$ is a Kähler Hamiltonian $G$-manifold. For compact $Σ$, possibly with boundary, we prove long time existence of the gradient flow. The flow lines converge to critical points of the functional. So, there is a stratification on $\mathcal{H}(P,X)$ that is invariant under the action of the complexified gauge group.
Symplectic vortices are the zeros of the functional we study. When $Σ$ has boundary, similar to Donaldson's result for the Hermitian Yang-Mills equations, we show that there is only a single stratum - any element of $\mathcal{H}(P,X)$ can be complex gauge transformed to a symplectic vortex. This is a version of Mundet's Hitchin-Kobayashi result on a surface with boundary.
△ Less
Submitted 2 December, 2016; v1 submitted 9 January, 2012;
originally announced January 2012.
-
Prediction of Retained Capacity and EODV of Li-ion Batteries in LEO Spacecraft Batteries
Authors:
S. Ramakrishnan,
S. Venugopalan,
A. Ebenezer Jeyakumar
Abstract:
In resent years ANN is widely reported for modeling in different areas of science including electro chemistry. This includes modeling of different technological batteries such as lead acid battery, Nickel cadmium batteries etc. Lithium ion batteries are advance battery technology which satisfy most of the space mission requirements. Low earth orbit (LEO)space craft batteries undergo large number o…
▽ More
In resent years ANN is widely reported for modeling in different areas of science including electro chemistry. This includes modeling of different technological batteries such as lead acid battery, Nickel cadmium batteries etc. Lithium ion batteries are advance battery technology which satisfy most of the space mission requirements. Low earth orbit (LEO)space craft batteries undergo large number of charge discharge cycles (about 25000 cycles)compared to other ground level or space applications. This study is indented to develop ANN model for about 25000 cycles, cycled under various temperature, Depth Of Discharge (DOD) settings with constant charge voltage limit to predict the retained capacity and End of Discharge Voltage (EODV). To extract firm conclusion and distinguish the capability of ANN method, the predicted values are compared with experimental result by statistical method and Bland Altman plot.
△ Less
Submitted 26 April, 2010;
originally announced April 2010.