Skip to main content

Showing 1–30 of 30 results for author: Brahma, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  2. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  3. arXiv:2311.07911  [pdf, other

    cs.CL cs.AI cs.LG

    Instruction-Following Evaluation for Large Language Models

    Authors: Jeffrey Zhou, Tianjian Lu, Swaroop Mishra, Siddhartha Brahma, Sujoy Basu, Yi Luan, Denny Zhou, Le Hou

    Abstract: One core capability of Large Language Models (LLMs) is to follow natural language instructions. However, the evaluation of such abilities is not standardized: Human evaluations are expensive, slow, and not objectively reproducible, while LLM-based auto-evaluation is potentially biased or limited by the ability of the evaluator LLM. To overcome these issues, we introduce Instruction-Following Eval… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

    MSC Class: 68T50 (Primary) 68T99 (Secondary) ACM Class: I.2.7

  4. arXiv:2305.10403  [pdf, other

    cs.CL cs.AI

    PaLM 2 Technical Report

    Authors: Rohan Anil, Andrew M. Dai, Orhan Firat, Melvin Johnson, Dmitry Lepikhin, Alexandre Passos, Siamak Shakeri, Emanuel Taropa, Paige Bailey, Zhifeng Chen, Eric Chu, Jonathan H. Clark, Laurent El Shafey, Yan** Huang, Kathy Meier-Hellstern, Gaurav Mishra, Erica Moreira, Mark Omernick, Kevin Robinson, Sebastian Ruder, Yi Tay, Kefan Xiao, Yuanzhong Xu, Yu**g Zhang, Gustavo Hernandez Abrego , et al. (103 additional authors not shown)

    Abstract: We introduce PaLM 2, a new state-of-the-art language model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM. PaLM 2 is a Transformer-based model trained using a mixture of objectives. Through extensive evaluations on English and multilingual language, and reasoning tasks, we demonstrate that PaLM 2 has significantly improved quality on… ▽ More

    Submitted 13 September, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

  5. arXiv:2304.04947  [pdf, other

    cs.CL

    Conditional Adapters: Parameter-efficient Transfer Learning with Fast Inference

    Authors: Tao Lei, Junwen Bai, Siddhartha Brahma, Joshua Ainslie, Kenton Lee, Yanqi Zhou, Nan Du, Vincent Y. Zhao, Yuexin Wu, Bo Li, Yu Zhang, Ming-Wei Chang

    Abstract: We propose Conditional Adapter (CoDA), a parameter-efficient transfer learning method that also improves inference efficiency. CoDA generalizes beyond standard adapter approaches to enable a new way of balancing speed and accuracy using conditional computation. Starting with an existing dense pretrained model, CoDA adds sparse activation together with a small number of new parameters and a light-w… ▽ More

    Submitted 26 November, 2023; v1 submitted 10 April, 2023; originally announced April 2023.

    Comments: NeurIPS camera ready version

  6. arXiv:2303.09752  [pdf, other

    cs.CL cs.LG

    CoLT5: Faster Long-Range Transformers with Conditional Computation

    Authors: Joshua Ainslie, Tao Lei, Michiel de Jong, Santiago Ontañón, Siddhartha Brahma, Yury Zemlyanskiy, David Uthus, Mandy Guo, James Lee-Thorp, Yi Tay, Yun-Hsuan Sung, Sumit Sanghai

    Abstract: Many natural language processing tasks benefit from long inputs, but processing long documents with Transformers is expensive -- not only due to quadratic attention complexity but also from applying feedforward and projection layers to every token. However, not all tokens are equally important, especially for longer documents. We propose CoLT5, a long-input Transformer model that builds on this in… ▽ More

    Submitted 23 October, 2023; v1 submitted 16 March, 2023; originally announced March 2023.

    Comments: Accepted at EMNLP 2023

  7. arXiv:2211.01267  [pdf, other

    cs.CL cs.IR

    Multi-Vector Retrieval as Sparse Alignment

    Authors: Yujie Qian, **hyuk Lee, Sai Meher Karthik Duddu, Zhuyun Dai, Siddhartha Brahma, Iftekhar Naim, Tao Lei, Vincent Y. Zhao

    Abstract: Multi-vector retrieval models improve over single-vector dual encoders on many information retrieval tasks. In this paper, we cast the multi-vector retrieval problem as sparse alignment between query and document tokens. We propose AligneR, a novel multi-vector retrieval model that learns sparsified pairwise alignments between query and document tokens (e.g. `dog' vs. `puppy') and per-token unary… ▽ More

    Submitted 2 November, 2022; originally announced November 2022.

  8. arXiv:2210.11416  [pdf, other

    cs.LG cs.CL

    Scaling Instruction-Finetuned Language Models

    Authors: Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Yunxuan Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Alex Castro-Ros, Marie Pellat, Kevin Robinson, Dasha Valter, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yan** Huang , et al. (10 additional authors not shown)

    Abstract: Finetuning language models on a collection of datasets phrased as instructions has been shown to improve model performance and generalization to unseen tasks. In this paper we explore instruction finetuning with a particular focus on (1) scaling the number of tasks, (2) scaling the model size, and (3) finetuning on chain-of-thought data. We find that instruction finetuning with the above aspects d… ▽ More

    Submitted 6 December, 2022; v1 submitted 20 October, 2022; originally announced October 2022.

    Comments: Public checkpoints: https://huggingface.co/docs/transformers/model_doc/flan-t5

  9. arXiv:2210.03841  [pdf, other

    cs.CL

    Breaking BERT: Evaluating and Optimizing Sparsified Attention

    Authors: Siddhartha Brahma, Polina Zablotskaia, David Mimno

    Abstract: Transformers allow attention between all pairs of tokens, but there is reason to believe that most of these connections - and their quadratic time and memory - may not be necessary. But which ones? We evaluate the impact of sparsification patterns with a series of ablation experiments. First, we compare masks based on syntax, lexical similarity, and token position to random connections, and measur… ▽ More

    Submitted 7 October, 2022; originally announced October 2022.

    Comments: Shorter version accepted to SNN2021 workshop

  10. arXiv:2011.14459  [pdf, other

    cs.CL

    Improved Semantic Role Labeling using Parameterized Neighborhood Memory Adaptation

    Authors: Ishan **dal, Ranit Aharonov, Siddhartha Brahma, Huaiyu Zhu, Yunyao Li

    Abstract: Deep neural models achieve some of the best results for semantic role labeling. Inspired by instance-based learning that utilizes nearest neighbors to handle low-frequency context-specific training samples, we investigate the use of memory adaptation techniques in deep neural models. We propose a parameterized neighborhood memory adaptive (PNMA) method that uses a parameterized representation of t… ▽ More

    Submitted 29 November, 2020; originally announced November 2020.

  11. arXiv:2011.04732  [pdf, other

    cs.CL

    CLAR: A Cross-Lingual Argument Regularizer for Semantic Role Labeling

    Authors: Ishan **dal, Yunyao Li, Siddhartha Brahma, Huaiyu Zhu

    Abstract: Semantic role labeling (SRL) identifies predicate-argument structure(s) in a given sentence. Although different languages have different argument annotations, polyglot training, the idea of training one model on multiple languages, has previously been shown to outperform monolingual baselines, especially for low resource languages. In fact, even a simple combination of data has been shown to be ef… ▽ More

    Submitted 9 November, 2020; originally announced November 2020.

    Comments: EMNLP 2020, ACL Findings

  12. Small but Mighty: New Benchmarks for Split and Rephrase

    Authors: Li Zhang, Huaiyu Zhu, Siddhartha Brahma, Yunyao Li

    Abstract: Split and Rephrase is a text simplification task of rewriting a complex sentence into simpler ones. As a relatively new task, it is paramount to ensure the soundness of its evaluation benchmark and metric. We find that the widely used benchmark dataset universally contains easily exploitable syntactic cues caused by its automatic generation process. Taking advantage of such cues, we show that even… ▽ More

    Submitted 12 December, 2020; v1 submitted 17 September, 2020; originally announced September 2020.

    Comments: In EMNLP 2020

    Journal ref: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (2020) 1198-1205

  13. arXiv:1808.08424  [pdf, ps, other

    cs.DC

    Efficiently Processing Workflow Provenance Queries on SPARK

    Authors: Rajmohan C, Pranay Lohia, Himanshu Gupta, Siddhartha Brahma, Mauricio Hernandez, Sameep Mehta

    Abstract: In this paper, we investigate how we can leverage Spark platform for efficiently processing provenance queries on large volumes of workflow provenance data. We focus on processing provenance queries at attribute-value level which is the finest granularity available. We propose a novel weakly connected component based framework which is carefully engineered to quickly determine a minimal volume of… ▽ More

    Submitted 25 October, 2018; v1 submitted 25 August, 2018; originally announced August 2018.

  14. arXiv:1808.05908  [pdf, ps, other

    cs.CL cs.AI

    Improved Language Modeling by Decoding the Past

    Authors: Siddhartha Brahma

    Abstract: Highly regularized LSTMs achieve impressive results on several benchmark datasets in language modeling. We propose a new regularization method based on decoding the last token in the context using the predicted distribution of the next token. This biases the model towards retaining more contextual information, in turn improving its ability to predict the next token. With negligible overhead in the… ▽ More

    Submitted 23 January, 2019; v1 submitted 14 August, 2018; originally announced August 2018.

  15. arXiv:1808.04343  [pdf, other

    cs.CL cs.AI

    REGMAPR - Text Matching Made Easy

    Authors: Siddhartha Brahma

    Abstract: Text matching is a fundamental problem in natural language processing. Neural models using bidirectional LSTMs for sentence encoding and inter-sentence attention mechanisms perform remarkably well on several benchmark datasets. We propose REGMAPR - a simple and general architecture for text matching that does not use inter-sentence attention. Starting from a Siamese architecture, we augment the em… ▽ More

    Submitted 10 September, 2018; v1 submitted 13 August, 2018; originally announced August 2018.

  16. arXiv:1808.04217  [pdf, ps, other

    cs.CL cs.AI

    Unsupervised Learning of Sentence Representations Using Sequence Consistency

    Authors: Siddhartha Brahma

    Abstract: Computing universal distributed representations of sentences is a fundamental task in natural language processing. We propose ConsSent, a simple yet surprisingly powerful unsupervised method to learn such representations by enforcing consistency constraints on sequences of tokens. We consider two classes of such constraints -- sequences that form a sentence and between two sequences that form a se… ▽ More

    Submitted 23 January, 2019; v1 submitted 10 August, 2018; originally announced August 2018.

  17. arXiv:1805.07340  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    Improved Sentence Modeling using Suffix Bidirectional LSTM

    Authors: Siddhartha Brahma

    Abstract: Recurrent neural networks have become ubiquitous in computing representations of sequential data, especially textual data in natural language processing. In particular, Bidirectional LSTMs are at the heart of several neural models achieving state-of-the-art performance in a wide variety of tasks in NLP. However, BiLSTMs are known to suffer from sequential bias - the contextual representation of a… ▽ More

    Submitted 10 September, 2018; v1 submitted 18 May, 2018; originally announced May 2018.

  18. arXiv:1802.07374  [pdf, ps, other

    cs.CL cs.AI

    On the scaling of polynomial features for representation matching

    Authors: Siddhartha Brahma

    Abstract: In many neural models, new features as polynomial functions of existing ones are used to augment representations. Using the natural language inference task as an example, we investigate the use of scaled polynomials of degree 2 and above as matching features. We find that scaling degree 2 features has the highest impact on performance, reducing classification error by 5% in the best models.

    Submitted 20 February, 2018; originally announced February 2018.

    Comments: 4 pages, Submitted to ICLR 2018 workshop

  19. arXiv:1802.07370  [pdf, other

    cs.CL cs.AI

    SufiSent - Universal Sentence Representations Using Suffix Encodings

    Authors: Siddhartha Brahma

    Abstract: Computing universal distributed representations of sentences is a fundamental task in natural language processing. We propose a method to learn such representations by encoding the suffixes of word sequences in a sentence and training on the Stanford Natural Language Inference (SNLI) dataset. We demonstrate the effectiveness of our approach by evaluating it on the SentEval benchmark, improving on… ▽ More

    Submitted 20 February, 2018; originally announced February 2018.

    Comments: 4 pages, Submitted to ICLR 2018 workshop

  20. arXiv:1612.02062  [pdf, other

    cs.NI

    Consistency in the face of change: an adaptive approach to physical layer cooperation

    Authors: Ayan Sengupta, Yahya H. Ezzeldin, Siddhartha Brahma, Christina Fragouli, Suhas Diggavi

    Abstract: Most existing works on physical-layer (PHY) cooperation (beyond routing) focus on how to best use a given, static relay network--while wireless networks are anything but static. In this paper, we pose a different set of questions: given that we have multiple devices within range, which relay(s) do we use for PHY cooperation, to maintain a consistent target performance? How can we efficiently adapt… ▽ More

    Submitted 6 December, 2016; originally announced December 2016.

  21. arXiv:1610.01085  [pdf, other

    cs.AI

    Towards the Design of Prospect-Theory based Human Decision Rules for Hypothesis Testing

    Authors: V. Sriram Siddhardh Nadendla, Swastik Brahma, Pramod K. Varshney

    Abstract: Detection rules have traditionally been designed for rational agents that minimize the Bayes risk (average decision cost). With the advent of crowd-sensing systems, there is a need to redesign binary hypothesis testing rules for behavioral agents, whose cognitive behavior is not captured by traditional utility functions such as Bayes risk. In this paper, we adopt prospect theory based models for d… ▽ More

    Submitted 4 October, 2016; originally announced October 2016.

    Comments: 8 pages, 5 figures, Presented at the 54th Annual Allerton Conference on Communication, Control, and Computing, 2016

  22. Optimal Auction Design with Quantized Bids

    Authors: Nianxia Cao, Swastik Brahma, Pramod K. Varshney

    Abstract: This letter considers the design of an auction mechanism to sell the object of a seller when the buyers quantize their private value estimates regarding the object prior to communicating them to the seller. The designed auction mechanism maximizes the utility of the seller (i.e., the auction is optimal), prevents buyers from communicating falsified quantized bids (i.e., the auction is incentive-co… ▽ More

    Submitted 28 September, 2015; originally announced September 2015.

    Comments: 6 pages, 3 figures, TSP letter

  23. arXiv:1508.03011  [pdf, other

    cs.NI cs.GT cs.IT

    Matching-based Spectrum Allocation in Cognitive Radio Networks

    Authors: Raghed El-Bardan, Walid Saad, Swastik Brahma, Pramod K. Varshney

    Abstract: In this paper, a novel spectrum association approach for cognitive radio networks (CRNs) is proposed. Based on a measure of both inference and confidence as well as on a measure of quality-of-service, the association between secondary users (SUs) in the network and frequency bands licensed to primary users (PUs) is investigated. The problem is formulated as a matching game between SUs and PUs. In… ▽ More

    Submitted 12 August, 2015; originally announced August 2015.

    Comments: 16 pages, 4 figures

  24. arXiv:1504.03413  [pdf, ps, other

    eess.SY cs.DC stat.AP stat.ML

    Consensus based Detection in the Presence of Data Falsification Attacks

    Authors: Bhavya Kailkhura, Swastik Brahma, Pramod K. Varshney

    Abstract: This paper considers the problem of detection in distributed networks in the presence of data falsification (Byzantine) attacks. Detection approaches considered in the paper are based on fully distributed consensus algorithms, where all of the nodes exchange information only with their neighbors in the absence of a fusion center. In such networks, we characterize the negative effect of Byzantines… ▽ More

    Submitted 13 April, 2015; originally announced April 2015.

  25. arXiv:1410.5904  [pdf, ps, other

    cs.CR cs.DC math.OC stat.AP

    Distributed Detection in Tree Networks: Byzantines and Mitigation Techniques

    Authors: Bhavya Kailkhura, Swastik Brahma, Berkan Dulek, Yunghsiang S Han, Pramod K. Varshney

    Abstract: In this paper, the problem of distributed detection in tree networks in the presence of Byzantines is considered. Closed form expressions for optimal attacking strategies that minimize the miss detection error exponent at the fusion center (FC) are obtained. We also look at the problem from the network designer's (FC's) perspective. We study the problem of designing optimal distributed detection p… ▽ More

    Submitted 21 October, 2014; originally announced October 2014.

  26. arXiv:1408.3434  [pdf, other

    stat.AP cs.CR cs.GT

    Asymptotic Analysis of Distributed Bayesian Detection with Byzantine Data

    Authors: Bhavya Kailkhura, Yunghsiang S. Han, Swastik Brahma, Pramod K. Varshney

    Abstract: In this letter, we consider the problem of distributed Bayesian detection in the presence of data falsifying Byzantines in the network. The problem of distributed detection is formulated as a binary hypothesis test at the fusion center (FC) based on 1-bit data sent by the sensors. Adopting Chernoff information as our performance metric, we study the detection performance of the system under Byzant… ▽ More

    Submitted 14 August, 2014; originally announced August 2014.

    Comments: arXiv admin note: substantial text overlap with arXiv:1307.3544

  27. Target Tracking via Crowdsourcing: A Mechanism Design Approach

    Authors: Nianxia Cao, Swastik Brahma, Pramod K. Varshney

    Abstract: In this paper, we propose a crowdsourcing based framework for myopic target tracking by designing an incentive-compatible mechanism based optimal auction in a wireless sensor network (WSN) containing sensors that are selfish and profit-motivated. For typical WSNs which have limited bandwidth, the fusion center (FC) has to distribute the total number of bits that can be transmitted from the sensors… ▽ More

    Submitted 21 May, 2014; originally announced May 2014.

    Comments: 13 pages, 11 figures, IEEE Signal Processing Transaction

  28. arXiv:1403.6807  [pdf, other

    cs.NI cs.GT cs.IT eess.SY

    Optimal Spectrum Auction Design with Two-Dimensional Truthful Revelations under Uncertain Spectrum Availability

    Authors: V. Sriram Siddhardh Nadendla, Swastik Brahma, Pramod K. Varshney

    Abstract: In this paper, we propose a novel sealed-bid auction framework to address the problem of dynamic spectrum allocation in cognitive radio (CR) networks. We design an optimal auction mechanism that maximizes the moderator's expected utility, when the spectrum is not available with certainty. We assume that the moderator employs collaborative spectrum sensing in order to make a reliable inference abou… ▽ More

    Submitted 13 November, 2015; v1 submitted 26 March, 2014; originally announced March 2014.

    Comments: 14 double-column pages, 7 figures, 2 tables, Under review in IEEE/ACM Transactions in Networking

  29. arXiv:1309.4513  [pdf, ps, other

    stat.AP cs.CR math.CO

    Distributed Detection in Tree Topologies with Byzantines

    Authors: Bhavya Kailkhura, Swastik Brahma, Yunghsiang S. Han, Pramod K. Varshney

    Abstract: In this paper, we consider the problem of distributed detection in tree topologies in the presence of Byzantines. The expression for minimum attacking power required by the Byzantines to blind the fusion center (FC) is obtained. More specifically, we show that when more than a certain fraction of individual node decisions are falsified, the decision fusion scheme becomes completely incapable. We o… ▽ More

    Submitted 17 September, 2013; originally announced September 2013.

  30. arXiv:1307.3544  [pdf, other

    cs.IT cs.CR cs.DC cs.GT stat.AP

    Distributed Bayesian Detection with Byzantine Data

    Authors: Bhavya Kailkhura, Yunghsiang S. Han, Swastik Brahma, Pramod K. Varshney

    Abstract: In this paper, we consider the problem of distributed Bayesian detection in the presence of Byzantines in the network. It is assumed that a fraction of the nodes in the network are compromised and reprogrammed by an adversary to transmit false information to the fusion center (FC) to degrade detection performance. The problem of distributed detection is formulated as a binary hypothesis test at th… ▽ More

    Submitted 3 September, 2014; v1 submitted 12 July, 2013; originally announced July 2013.

    Comments: 32 pages, 4 figures, Submitted to IEEE Transactions on Signal Processing