Skip to main content

Showing 1–7 of 7 results for author: Salmani, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2402.02244  [pdf, other

    cs.CL cs.LG

    Beyond the Limits: A Survey of Techniques to Extend the Context Length in Large Language Models

    Authors: Xindi Wang, Mahsa Salmani, Parsa Omidi, Xiangyu Ren, Mehdi Rezagholizadeh, Armaghan Eshaghi

    Abstract: Recently, large language models (LLMs) have shown remarkable capabilities including understanding context, engaging in logical reasoning, and generating responses. However, this is achieved at the expense of stringent computational and memory requirements, hindering their ability to effectively support long input sequences. This survey provides an inclusive review of the recent techniques and meth… ▽ More

    Submitted 29 May, 2024; v1 submitted 3 February, 2024; originally announced February 2024.

    Comments: Accepted to IJCAI 2024 Survey Track -- camera-ready version

  2. arXiv:2310.18975  [pdf, other

    cs.CV cs.LG

    Blacksmith: Fast Adversarial Training of Vision Transformers via a Mixture of Single-step and Multi-step Methods

    Authors: Mahdi Salmani, Alireza Dehghanpour Farashah, Mohammad Azizmalayeri, Mahdi Amiri, Navid Eslami, Mohammad Taghi Manzuri, Mohammad Hossein Rohban

    Abstract: Despite the remarkable success achieved by deep learning algorithms in various domains, such as computer vision, they remain vulnerable to adversarial perturbations. Adversarial Training (AT) stands out as one of the most effective solutions to address this issue; however, single-step AT can lead to Catastrophic Overfitting (CO). This scenario occurs when the adversarially trained network suddenly… ▽ More

    Submitted 29 October, 2023; originally announced October 2023.

  3. arXiv:2308.12871  [pdf, other

    cs.DC cs.LG cs.PF

    IPA: Inference Pipeline Adaptation to Achieve High Accuracy and Cost-Efficiency

    Authors: Saeid Ghafouri, Kamran Razavi, Mehran Salmani, Alireza Sanaee, Tania Lorido-Botran, Lin Wang, Joseph Doyle, Pooyan Jamshidi

    Abstract: Efficiently optimizing multi-model inference pipelines for fast, accurate, and cost-effective inference is a crucial challenge in machine learning production systems, given their tight end-to-end latency requirements. To simplify the exploration of the vast and intricate trade-off space of latency, accuracy, and cost in inference pipelines, providers frequently opt to consider one of them. However… ▽ More

    Submitted 26 May, 2024; v1 submitted 24 August, 2023; originally announced August 2023.

    Journal ref: Journal of Systems Research, 4(1) (2024)

  4. arXiv:2304.10892  [pdf, other

    cs.LG cs.DC eess.SY

    Reconciling High Accuracy, Cost-Efficiency, and Low Latency of Inference Serving Systems

    Authors: Mehran Salmani, Saeid Ghafouri, Alireza Sanaee, Kamran Razavi, Max Mühlhäuser, Joseph Doyle, Pooyan Jamshidi, Mohsen Sharifi

    Abstract: The use of machine learning (ML) inference for various applications is growing drastically. ML inference services engage with users directly, requiring fast and accurate responses. Moreover, these services face dynamic workloads of requests, imposing changes in their computing resources. Failing to right-size computing resources results in either latency service level objectives (SLOs) violations… ▽ More

    Submitted 24 April, 2023; v1 submitted 21 April, 2023; originally announced April 2023.

  5. Photonic Computing to Accelerate Data Processing in Wireless Communications

    Authors: Mahsa Salmani, Armaghan Eshaghi, Enxiao Luan, Sreenil Saha

    Abstract: Massive multiple-input multiple-output (MIMO) systems are considered as one of the leading technologies employed in the next generations of wireless communication networks (5G), which promise to provide higher spectral efficiency, lower latency, and more reliability. Due to the massive number of devices served by the base stations (BS) equipped with large antenna arrays, massive-MIMO systems need… ▽ More

    Submitted 12 March, 2021; originally announced March 2021.

  6. arXiv:1809.07453  [pdf, ps, other

    cs.IT

    Uplink Resource Allocation for Multiple Access Computational Offloading (Extended Version)

    Authors: Mahsa Salmani, Timothy N. Davidson

    Abstract: The mobile edge computing framework offers the opportunity to reduce the energy that devices must expend to complete computational tasks. The extent of that energy reduction depends on the nature of the tasks, and on the choice of the multiple access scheme. In this paper, we first address the uplink communication resource allocation for offloading systems that exploit the full capabilities of the… ▽ More

    Submitted 29 April, 2019; v1 submitted 19 September, 2018; originally announced September 2018.

  7. arXiv:1805.04981  [pdf, other

    cs.IT

    Multiple Access Computational Offloading: Communication Resource Allocation in the Two-User Case (Extended Version)

    Authors: Mahsa Salmani, Timothy N. Davidson

    Abstract: By offering shared computational facilities to which mobile devices can offload their computational tasks, the mobile edge computing framework is expanding the scope of applications that can be provided on resource-constrained devices. When multiple devices seek to use such a facility simultaneously, both the available computational resources and the available communication resources need to be ap… ▽ More

    Submitted 14 October, 2018; v1 submitted 13 May, 2018; originally announced May 2018.

    Comments: 50 pages (single-column), 12 figures, A condensed version of this manuscript is submitted to TSP