Skip to main content

Showing 1–9 of 9 results for author: Kafle, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  2. arXiv:2302.02250  [pdf, other

    cs.NI cs.LG

    Generalization of Deep Reinforcement Learning for Jammer-Resilient Frequency and Power Allocation

    Authors: Swatantra Kafle, Jithin Jagannath, Zackary Kane, Noor Biswas, Prem Sagar Vasanth Kumar, Anu Jagannath

    Abstract: We tackle the problem of joint frequency and power allocation while emphasizing the generalization capability of a deep reinforcement learning model. Most of the existing methods solve reinforcement learning-based wireless problems for a specific pre-determined wireless network scenario. The performance of a trained agent tends to be very specific to the network and deteriorates when used in a dif… ▽ More

    Submitted 6 May, 2023; v1 submitted 4 February, 2023; originally announced February 2023.

    Comments: Accepted to IEEE Communications Letters

  3. arXiv:2012.11673  [pdf, ps, other

    cs.CV cs.LG

    Smoothed Gaussian Mixture Models for Video Classification and Recommendation

    Authors: Sirjan Kafle, Aman Gupta, Xue Xia, Ananth Sankar, Xi Chen, Di Wen, Liang Zhang

    Abstract: Cluster-and-aggregate techniques such as Vector of Locally Aggregated Descriptors (VLAD), and their end-to-end discriminatively trained equivalents like NetVLAD have recently been popular for video classification and action recognition tasks. These techniques operate by assigning video frames to clusters and then representing the video by aggregating residuals of frames with respect to the mean of… ▽ More

    Submitted 17 December, 2020; originally announced December 2020.

    Comments: 11 pages, 3 figures, 7 tables

    ACM Class: I.2.10

  4. arXiv:2005.00224  [pdf, ps, other

    math.OC cs.DC

    Distributed Stochastic Non-Convex Optimization: Momentum-Based Variance Reduction

    Authors: Prashant Khanduri, Pranay Sharma, Swatantra Kafle, Saikiran Bulusu, Ketan Rajawat, Pramod K. Varshney

    Abstract: In this work, we propose a distributed algorithm for stochastic non-convex optimization. We consider a worker-server architecture where a set of $K$ worker nodes (WNs) in collaboration with a server node (SN) jointly aim to minimize a global, potentially non-convex objective function. The objective function is assumed to be the sum of local objective functions available at each WN, with each node… ▽ More

    Submitted 1 May, 2020; originally announced May 2020.

  5. arXiv:1912.06036  [pdf, ps, other

    math.OC cs.DC cs.LG cs.MA stat.ML

    Parallel Restarted SPIDER -- Communication Efficient Distributed Nonconvex Optimization with Optimal Computation Complexity

    Authors: Pranay Sharma, Swatantra Kafle, Prashant Khanduri, Saikiran Bulusu, Ketan Rajawat, Pramod K. Varshney

    Abstract: In this paper, we propose a distributed algorithm for stochastic smooth, non-convex optimization. We assume a worker-server architecture where $N$ nodes, each having $n$ (potentially infinite) number of samples, collaborate with the help of a central server to perform the optimization task. The global objective is to minimize the average of local cost functions available at individual nodes. The p… ▽ More

    Submitted 6 November, 2020; v1 submitted 12 December, 2019; originally announced December 2019.

  6. arXiv:1908.10414  [pdf

    cs.HC

    Artificial Intelligence Fairness in the Context of Accessibility Research on Intelligent Systems for People who are Deaf or Hard of Hearing

    Authors: Sushant Kafle, Abraham Glasser, Sedeeq Al-khazraji, Larwan Berke, Matthew Seita, Matt Huenerfauth

    Abstract: We discuss issues of Artificial Intelligence (AI) fairness for people with disabilities, with examples drawn from our research on human-computer interaction (HCI) for AI-based systems for people who are Deaf or Hard of Hearing (DHH). In particular, we discuss the need for inclusion of data from people with disabilities in training sets, the lack of interpretability of AI systems, ethical responsib… ▽ More

    Submitted 2 September, 2019; v1 submitted 27 August, 2019; originally announced August 2019.

    Comments: 6 pages, ACM ASSETS 2019 Workshop on AI Fairness for People with Disabilities

  7. arXiv:1903.12238  [pdf, other

    cs.CL

    Modeling Acoustic-Prosodic Cues for Word Importance Prediction in Spoken Dialogues

    Authors: Sushant Kafle, Cecilia O. Alm, Matt Huenerfauth

    Abstract: Prosodic cues in conversational speech aid listeners in discerning a message. We investigate whether acoustic cues in spoken dialogue can be used to identify the importance of individual words to the meaning of a conversation turn. Individuals who are Deaf and Hard of Hearing often rely on real-time captions in live meetings. Word error rate, a traditional metric for evaluating automatic speech re… ▽ More

    Submitted 16 July, 2019; v1 submitted 28 March, 2019; originally announced March 2019.

    Comments: 8 pages, 2 figures

    Journal ref: Proceedings of the Eighth Workshop on Speech and Language Processing for Assistive Technologies. 2019

  8. arXiv:1801.09746  [pdf, other

    cs.CL

    A Corpus for Modeling Word Importance in Spoken Dialogue Transcripts

    Authors: Sushant Kafle, Matt Huenerfauth

    Abstract: Motivated by a project to create a system for people who are deaf or hard-of-hearing that would use automatic speech recognition (ASR) to produce real-time text captions of spoken English during in-person meetings with hearing individuals, we have augmented a transcript of the Switchboard conversational dialogue corpus with an overlay of word-importance annotations, with a numeric score for each w… ▽ More

    Submitted 16 July, 2019; v1 submitted 29 January, 2018; originally announced January 2018.

    Comments: Language Resources and Evaluation Conference (LREC)

    Journal ref: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

  9. Evaluating the Usability of Automatically Generated Captions for People who are Deaf or Hard of Hearing

    Authors: Sushant Kafle, Matt Huenerfauth

    Abstract: The accuracy of Automated Speech Recognition (ASR) technology has improved, but it is still imperfect in many settings. Researchers who evaluate ASR performance often focus on improving the Word Error Rate (WER) metric, but WER has been found to have little correlation with human-subject performance on many applications. We propose a new captioning-focused evaluation metric that better predicts th… ▽ More

    Submitted 5 December, 2017; originally announced December 2017.

    Comments: 10 pages, 8 figures, published in ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '17)

    Journal ref: ASSETS'17 (2017) 165-174