Skip to main content

Showing 1–10 of 10 results for author: Sukthanker, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.10299  [pdf, other

    cs.LG cs.AI

    HW-GPT-Bench: Hardware-Aware Architecture Benchmark for Language Models

    Authors: Rhea Sanjay Sukthanker, Arber Zela, Benedikt Staffler, Aaron Klein, Lennart Purucker, Joerg K. H. Franke, Frank Hutter

    Abstract: The increasing size of language models necessitates a thorough analysis across multiple dimensions to assess trade-offs among crucial hardware metrics such as latency, energy consumption, GPU memory usage, and performance. Identifying optimal model configurations under specific hardware constraints is becoming essential but remains challenging due to the computational load of exhaustive training a… ▽ More

    Submitted 21 June, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

    Comments: 48 pages, 69 figures, 10 tables

  2. arXiv:2402.18213  [pdf, other

    cs.LG cs.CV stat.ML

    Multi-objective Differentiable Neural Architecture Search

    Authors: Rhea Sanjay Sukthanker, Arber Zela, Benedikt Staffler, Samuel Dooley, Josif Grabocka, Frank Hutter

    Abstract: Pareto front profiling in multi-objective optimization (MOO), i.e. finding a diverse set of Pareto optimal solutions, is challenging, especially with expensive objectives like neural network training. Typically, in MOO neural architecture search (NAS), we aim to balance performance and hardware metrics across devices. Prior NAS approaches simplify this task by incorporating hardware constraints in… ▽ More

    Submitted 19 June, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

    Comments: 37 pages, 27 figures

  3. arXiv:2312.10440  [pdf, other

    cs.LG cs.AI

    Weight-Entanglement Meets Gradient-Based Neural Architecture Search

    Authors: Rhea Sanjay Sukthanker, Arjun Krishnakumar, Mahmoud Safari, Frank Hutter

    Abstract: Weight sharing is a fundamental concept in neural architecture search (NAS), enabling gradient-based methods to explore cell-based architecture spaces significantly faster than traditional blackbox approaches. In parallel, weight \emph{entanglement} has emerged as a technique for intricate parameter sharing among architectures within macro-level search spaces. %However, the macro structure of such… ▽ More

    Submitted 16 December, 2023; originally announced December 2023.

  4. arXiv:2301.08727  [pdf, other

    cs.LG cs.AI stat.ML

    Neural Architecture Search: Insights from 1000 Papers

    Authors: Colin White, Mahmoud Safari, Rhea Sukthanker, Binxin Ru, Thomas Elsken, Arber Zela, Debadeepta Dey, Frank Hutter

    Abstract: In the past decade, advances in deep learning have resulted in breakthroughs in a variety of areas, including computer vision, natural language understanding, speech recognition, and reinforcement learning. Specialized, high-performing neural architectures are crucial to the success of deep learning in these areas. Neural architecture search (NAS), the process of automating the design of neural ar… ▽ More

    Submitted 25 January, 2023; v1 submitted 20 January, 2023; originally announced January 2023.

  5. arXiv:2211.01842  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Construction of Hierarchical Neural Architecture Search Spaces based on Context-free Grammars

    Authors: Simon Schrodi, Danny Stoll, Binxin Ru, Rhea Sukthanker, Thomas Brox, Frank Hutter

    Abstract: The discovery of neural architectures from simple building blocks is a long-standing goal of Neural Architecture Search (NAS). Hierarchical search spaces are a promising step towards this goal but lack a unifying search space design framework and typically only search over some limited aspect of architectures. In this work, we introduce a unifying search space design framework based on context-fre… ▽ More

    Submitted 8 December, 2023; v1 submitted 3 November, 2022; originally announced November 2022.

    Comments: NeurIPS 2023

  6. arXiv:2210.09943  [pdf, other

    cs.CV cs.AI cs.CY cs.LG

    Rethinking Bias Mitigation: Fairer Architectures Make for Fairer Face Recognition

    Authors: Samuel Dooley, Rhea Sanjay Sukthanker, John P. Dickerson, Colin White, Frank Hutter, Micah Goldblum

    Abstract: Face recognition systems are widely deployed in safety-critical applications, including law enforcement, yet they exhibit bias across a range of socio-demographic dimensions, such as gender and race. Conventional wisdom dictates that model biases arise from biased training data. As a consequence, previous works on bias mitigation largely focused on pre-processing the training data, adding penaltie… ▽ More

    Submitted 6 December, 2023; v1 submitted 18 October, 2022; originally announced October 2022.

  7. arXiv:2106.03959  [pdf, other

    cs.LG cs.CV

    Generative Flows with Invertible Attentions

    Authors: Rhea Sanjay Sukthanker, Zhiwu Huang, Suryansh Kumar, Radu Timofte, Luc Van Gool

    Abstract: Flow-based generative models have shown an excellent ability to explicitly learn the probability density function of data via a sequence of invertible transformations. Yet, learning attentions in generative flows remains understudied, while it has made breakthroughs in other domains. To fill the gap, this paper introduces two types of invertible attention mechanisms, i.e., map-based and transforme… ▽ More

    Submitted 31 March, 2022; v1 submitted 7 June, 2021; originally announced June 2021.

    Comments: Accepted to CVPR 2022

  8. arXiv:2101.06658  [pdf, other

    cs.CV cs.LG eess.IV

    Trilevel Neural Architecture Search for Efficient Single Image Super-Resolution

    Authors: Yan Wu, Zhiwu Huang, Suryansh Kumar, Rhea Sanjay Sukthanker, Radu Timofte, Luc Van Gool

    Abstract: Modern solutions to the single image super-resolution (SISR) problem using deep neural networks aim not only at better performance accuracy but also at a lighter and computationally efficient model. To that end, recently, neural architecture search (NAS) approaches have shown some tremendous potential. Following the same underlying, in this paper, we suggest a novel trilevel NAS method that provid… ▽ More

    Submitted 23 April, 2021; v1 submitted 17 January, 2021; originally announced January 2021.

  9. arXiv:2010.14535  [pdf, other

    cs.LG cs.CV cs.NE

    Neural Architecture Search of SPD Manifold Networks

    Authors: Rhea Sanjay Sukthanker, Zhiwu Huang, Suryansh Kumar, Erik Goron Endsjo, Yan Wu, Luc Van Gool

    Abstract: In this paper, we propose a new neural architecture search (NAS) problem of Symmetric Positive Definite (SPD) manifold networks, aiming to automate the design of SPD neural architectures. To address this problem, we first introduce a geometrically rich and diverse SPD neural architecture search space for an efficient SPD cell design. Further, we model our new NAS problem with a one-shot training p… ▽ More

    Submitted 13 June, 2021; v1 submitted 27 October, 2020; originally announced October 2020.

    Comments: This paper is accepted for publication at IJCAI 2021

  10. arXiv:1805.11824  [pdf, other

    cs.CL

    Anaphora and Coreference Resolution: A Review

    Authors: Rhea Sukthanker, Soujanya Poria, Erik Cambria, Ramkumar Thirunavukarasu

    Abstract: Entity resolution aims at resolving repeated references to an entity in a document and forms a core component of natural language processing (NLP) research. This field possesses immense potential to improve the performance of other NLP fields like machine translation, sentiment analysis, paraphrase detection, summarization, etc. The area of entity resolution in NLP has seen proliferation of resear… ▽ More

    Submitted 30 May, 2018; originally announced May 2018.