Skip to main content

Showing 1–13 of 13 results for author: Unger, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.17145  [pdf, other

    cs.DC cs.AI cs.LG

    GraphPipe: Improving Performance and Scalability of DNN Training with Graph Pipeline Parallelism

    Authors: Byungsoo Jeon, Mengdi Wu, Shiyi Cao, Sunghyun Kim, Sunghyun Park, Neeraj Aggarwal, Colin Unger, Daiyaan Arfeen, Peiyuan Liao, Xupeng Miao, Mohammad Alizadeh, Gregory R. Ganger, Tianqi Chen, Zhihao Jia

    Abstract: Deep neural networks (DNNs) continue to grow rapidly in size, making them infeasible to train on a single device. Pipeline parallelism is commonly used in existing DNN systems to support large-scale DNN training by partitioning a DNN into multiple stages, which concurrently perform DNN training for different micro-batches in a pipeline fashion. However, existing pipeline-parallel approaches only c… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  2. arXiv:2402.18789  [pdf, other

    cs.DC cs.CL cs.LG

    FlexLLM: A System for Co-Serving Large Language Model Inference and Parameter-Efficient Finetuning

    Authors: Xupeng Miao, Gabriele Oliaro, Xinhao Cheng, Mengdi Wu, Colin Unger, Zhihao Jia

    Abstract: Parameter-efficient finetuning (PEFT) is a widely used technique to adapt large language models for different tasks. Service providers typically create separate systems for users to perform PEFT model finetuning and inference tasks. This is because existing systems cannot handle workloads that include a mix of inference and PEFT finetuning requests. As a result, shared GPU resources are underutili… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

  3. ANNETTE: Accurate Neural Network Execution Time Estimation with Stacked Models

    Authors: Matthias Wess, Matvey Ivanov, Anvesh Nookala, Christoph Unger, Alexander Wendt, Axel Jantsch

    Abstract: With new accelerator hardware for DNN, the computing power for AI applications has increased rapidly. However, as DNN algorithms become more complex and optimized for specific applications, latency requirements remain challenging, and it is critical to find the optimal points in the design space. To decouple the architectural search from the target hardware, we propose a time estimation framework… ▽ More

    Submitted 7 May, 2021; originally announced May 2021.

    Journal ref: in IEEE Access, vol. 9, pp. 3545-3556, 2021

  4. arXiv:2103.08031  [pdf, other

    cs.LG cs.CR

    BreakingBED -- Breaking Binary and Efficient Deep Neural Networks by Adversarial Attacks

    Authors: Manoj Rohit Vemparala, Alexander Frickenstein, Nael Fasfous, Lukas Frickenstein, Qi Zhao, Sabine Kuhn, Daniel Ehrhardt, Yuankai Wu, Christian Unger, Naveen Shankar Nagaraja, Walter Stechele

    Abstract: Deploying convolutional neural networks (CNNs) for embedded applications presents many challenges in balancing resource-efficiency and task-related accuracy. These two aspects have been well-researched in the field of CNN compression. In real-world applications, a third important aspect comes into play, namely the robustness of the CNN. In this paper, we thoroughly study the robustness of uncompre… ▽ More

    Submitted 14 March, 2021; originally announced March 2021.

  5. arXiv:2101.02663  [pdf, other

    cs.CV cs.AI

    L2PF -- Learning to Prune Faster

    Authors: Manoj-Rohit Vemparala, Nael Fasfous, Alexander Frickenstein, Mhd Ali Moraly, Aquib Jamal, Lukas Frickenstein, Christian Unger, Naveen-Shankar Nagaraja, Walter Stechele

    Abstract: Various applications in the field of autonomous driving are based on convolutional neural networks (CNNs), especially for processing camera data. The optimization of such CNNs is a major challenge in continuous development. Newly learned features must be brought into vehicles as quickly as possible, and as such, it is not feasible to spend redundant GPU hours during compression. In this context, w… ▽ More

    Submitted 7 January, 2021; originally announced January 2021.

  6. arXiv:2011.01603  [pdf, other

    cs.CV

    A Deep Temporal Fusion Framework for Scene Flow Using a Learnable Motion Model and Occlusions

    Authors: René Schuster, Christian Unger, Didier Stricker

    Abstract: Motion estimation is one of the core challenges in computer vision. With traditional dual-frame approaches, occlusions and out-of-view motions are a limiting factor, especially in the context of environmental perception for vehicles due to the large (ego-) motion of objects. Our work proposes a novel data-driven approach for temporal fusion of scene flow estimates in a multi-frame setup to overcom… ▽ More

    Submitted 4 November, 2020; v1 submitted 3 November, 2020; originally announced November 2020.

    Comments: Accepted to WACV21

  7. arXiv:2010.10842  [pdf, other

    cs.CV

    MonoComb: A Sparse-to-Dense Combination Approach for Monocular Scene Flow

    Authors: René Schuster, Christian Unger, Didier Stricker

    Abstract: Contrary to the ongoing trend in automotive applications towards usage of more diverse and more sensors, this work tries to solve the complex scene flow problem under a monocular camera setup, i.e. using a single sensor. Towards this end, we exploit the latest achievements in single image depth estimation, optical flow, and sparse-to-dense interpolation and propose a monocular combination approach… ▽ More

    Submitted 12 November, 2020; v1 submitted 21 October, 2020; originally announced October 2020.

    Comments: Accepted to ACM CSCS 2020

  8. arXiv:2008.09346  [pdf, other

    cs.CV

    SSGP: Sparse Spatial Guided Propagation for Robust and Generic Interpolation

    Authors: René Schuster, Oliver Wasenmüller, Christian Unger, Didier Stricker

    Abstract: Interpolation of sparse pixel information towards a dense target resolution finds its application across multiple disciplines in computer vision. State-of-the-art interpolation of motion fields applies model-based interpolation that makes use of edge information extracted from the target image. For depth completion, data-driven learning approaches are widespread. Our work is inspired by latest tre… ▽ More

    Submitted 4 November, 2020; v1 submitted 21 August, 2020; originally announced August 2020.

    Comments: Accepted to WACV 2021

  9. arXiv:2007.13384  [pdf, other

    cs.LG cs.CV stat.ML

    ALF: Autoencoder-based Low-rank Filter-sharing for Efficient Convolutional Neural Networks

    Authors: Alexander Frickenstein, Manoj-Rohit Vemparala, Nael Fasfous, Laura Hauenschild, Naveen-Shankar Nagaraja, Christian Unger, Walter Stechele

    Abstract: Closing the gap between the hardware requirements of state-of-the-art convolutional neural networks and the limited resources constraining embedded applications is the next big challenge in deep learning research. The computational complexity and memory footprint of such neural networks are typically daunting for deployment in resource constrained environments. Model compression techniques, such a… ▽ More

    Submitted 27 July, 2020; originally announced July 2020.

    Comments: Accepted by DAC'20

  10. arXiv:2006.08178  [pdf, other

    cs.CV

    Binary DAD-Net: Binarized Driveable Area Detection Network for Autonomous Driving

    Authors: Alexander Frickenstein, Manoj Rohit Vemparala, Jakob Mayr, Naveen Shankar Nagaraja, Christian Unger, Federico Tombari, Walter Stechele

    Abstract: Driveable area detection is a key component for various applications in the field of autonomous driving (AD), such as ground-plane detection, obstacle detection and maneuver planning. Additionally, bulky and over-parameterized networks can be easily forgone and replaced with smaller networks for faster inference on embedded systems. The driveable area detection, posed as a two class segmentation t… ▽ More

    Submitted 15 June, 2020; originally announced June 2020.

    Comments: IEEE International Conference on Robotics and Automation (ICRA) 2020

  11. arXiv:1904.06167  [pdf, other

    cs.CV

    An Empirical Evaluation Study on the Training of SDC Features for Dense Pixel Matching

    Authors: René Schuster, Oliver Wasenmüller, Christian Unger, Didier Stricker

    Abstract: Training a deep neural network is a non-trivial task. Not only the tuning of hyperparameters, but also the gathering and selection of training data, the design of the loss function, and the construction of training schedules is important to get the most out of a model. In this study, we perform a set of experiments all related to these issues. The model for which different training strategies are… ▽ More

    Submitted 12 April, 2019; originally announced April 2019.

  12. arXiv:1904.03076  [pdf, other

    cs.CV

    SDC - Stacked Dilated Convolution: A Unified Descriptor Network for Dense Matching Tasks

    Authors: René Schuster, Oliver Wasenmüller, Christian Unger, Didier Stricker

    Abstract: Dense pixel matching is important for many computer vision tasks such as disparity and flow estimation. We present a robust, unified descriptor network that considers a large context region with high spatial variance. Our network has a very large receptive field and avoids striding layers to maintain spatial resolution. These properties are achieved by creating a novel neural network layer that co… ▽ More

    Submitted 5 April, 2019; originally announced April 2019.

  13. SceneFlowFields++: Multi-frame Matching, Visibility Prediction, and Robust Interpolation for Scene Flow Estimation

    Authors: René Schuster, Oliver Wasenmüller, Christian Unger, Georg Kuschk, Didier Stricker

    Abstract: State-of-the-art scene flow algorithms pursue the conflicting targets of accuracy, run time, and robustness. With the successful concept of pixel-wise matching and sparse-to-dense interpolation, we push the limits of scene flow estimation. Avoiding strong assumptions on the domain or the problem yields a more robust algorithm. This algorithm is fast because we avoid explicit regularization during… ▽ More

    Submitted 28 October, 2019; v1 submitted 26 February, 2019; originally announced February 2019.

    Comments: arXiv admin note: text overlap with arXiv:1710.10096