Skip to main content

Showing 1–29 of 29 results for author: Kohler, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2401.10460  [pdf, other

    cs.SD cs.LG eess.AS

    Ultra-lightweight Neural Differential DSP Vocoder For High Quality Speech Synthesis

    Authors: Prabhav Agrawal, Thilo Koehler, Prashant Serai, Qing He

    Abstract: Neural vocoders model the raw audio waveform and synthesize high-quality audio, but even the highly efficient ones, like MB-MelGAN and LPCNet, fail to run real-time on a low-end device like a smartglass. A pure digital signal processing (DSP) based vocoder can be implemented via lightweight fast Fourier transforms (FFT), and therefore, is a magnitude faster than any neural vocoder. A DSP vocoder o… ▽ More

    Submitted 18 January, 2024; originally announced January 2024.

    Comments: Accepted for ICASSP 2024

  2. arXiv:2309.09112  [pdf, other

    cs.PL cs.AR

    Rewriting History: Repurposing Domain-Specific CGRAs

    Authors: Jackson Woodruff, Thomas Koehler, Alexander Brauckmann, Chris Cummins, Sam Ainsworth, Michael F. P. O'Boyle

    Abstract: Coarse-grained reconfigurable arrays (CGRAs) are domain-specific devices promising both the flexibility of FPGAs and the performance of ASICs. However, with restricted domains comes a danger: designing chips that cannot accelerate enough current and future software to justify the hardware cost. We introduce FlexC, the first flexible CGRA compiler, which allows CGRAs to be adapted to operations the… ▽ More

    Submitted 16 September, 2023; originally announced September 2023.

  3. arXiv:2308.12584  [pdf, other

    cs.CV cs.LG

    LORD: Leveraging Open-Set Recognition with Unknown Data

    Authors: Tobias Koch, Christian Riess, Thomas Köhler

    Abstract: Handling entirely unknown data is a challenge for any deployed classifier. Classification models are typically trained on a static pre-defined dataset and are kept in the dark for the open unassigned feature space. As a result, they struggle to deal with out-of-distribution data during inference. Addressing this task on the class-level is termed open-set recognition (OSR). However, most OSR method… ▽ More

    Submitted 24 August, 2023; originally announced August 2023.

    Comments: Accepted at ICCV 2023 Workshop (Out-Of-Distribution Generalization in Computer Vision)

  4. A Domain-Extensible Compiler with Controllable Automation of Optimisations

    Authors: Thomas Koehler

    Abstract: In high performance domains like image processing, physics simulation or machine learning, program performance is critical. Programmers called performance engineers are responsible for the challenging task of optimising programs. Two major challenges prevent modern compilers targeting heterogeneous architectures from reliably automating optimisation. First, domain-specific compilers such as Halide… ▽ More

    Submitted 22 December, 2022; originally announced December 2022.

    Comments: PhD Thesis made at the University of Glasgow, 163 pages

  5. arXiv:2210.16045  [pdf, other

    cs.SD cs.CL eess.AS

    Towards zero-shot Text-based voice editing using acoustic context conditioning, utterance embeddings, and reference encoders

    Authors: Jason Fong, Yun Wang, Prabhav Agrawal, Vimal Manohar, Jilong Wu, Thilo Köhler, Qing He

    Abstract: Text-based voice editing (TBVE) uses synthetic output from text-to-speech (TTS) systems to replace words in an original recording. Recent work has used neural models to produce edited speech that is similar to the original speech in terms of clarity, speaker identity, and prosody. However, one limitation of prior work is the usage of finetuning to optimise performance: this requires further model… ▽ More

    Submitted 28 October, 2022; originally announced October 2022.

    Comments: Submitted to ICASSP 2023

  6. arXiv:2205.14892  [pdf, other

    cs.LG cs.CV

    Exploring the Open World Using Incremental Extreme Value Machines

    Authors: Tobias Koch, Felix Liebezeit, Christian Riess, Vincent Christlein, Thomas Köhler

    Abstract: Dynamic environments require adaptive applications. One particular machine learning problem in dynamic environments is open world recognition. It characterizes a continuously changing domain where only some classes are seen in one batch of the training data and such batches can only be learned incrementally. Open world recognition is a demanding task that is, to the best of our knowledge, addresse… ▽ More

    Submitted 30 May, 2022; originally announced May 2022.

    Comments: Accepted at ICPR 2022

  7. arXiv:2205.11952  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    3D helical CT Reconstruction with a Memory Efficient Learned Primal-Dual Architecture

    Authors: Jevgenija Rudzusika, Buda Bajić, Thomas Koehler, Ozan Öktem

    Abstract: Deep learning based computed tomography (CT) reconstruction has demonstrated outstanding performance on simulated 2D low-dose CT data. This applies in particular to domain adapted neural networks, which incorporate a handcrafted physics model for CT imaging. Empirical evidence shows that employing such architectures reduces the demand for training data and improves upon generalisation. However, th… ▽ More

    Submitted 28 November, 2023; v1 submitted 24 May, 2022; originally announced May 2022.

  8. arXiv:2201.03611  [pdf, other

    cs.PL

    RISE & Shine: Language-Oriented Compiler Design

    Authors: Michel Steuwer, Thomas Koehler, Bastian Köpcke, Federico Pizzuti

    Abstract: The trend towards specialization of software and hardware - fuelled by the end of Moore's law and the still accelerating interest in domain-specific computing, such as machine learning - forces us to radically rethink our compiler designs. The era of a universal compiler framework built around a single one-size-fits-all intermediate representation (IR) is over. This realization has sparked the cre… ▽ More

    Submitted 10 January, 2022; originally announced January 2022.

  9. arXiv:2112.05876  [pdf, other

    stat.AP cs.LG econ.EM

    The Past as a Stochastic Process

    Authors: David H. Wolpert, Michael H. Price, Stefani A. Crabtree, Timothy A. Kohler, Jurgen Jost, James Evans, Peter F. Stadler, Hajime Shimao, Manfred D. Laubichler

    Abstract: Historical processes manifest remarkable diversity. Nevertheless, scholars have long attempted to identify patterns and categorize historical actors and influences with some success. A stochastic process framework provides a structured approach for the analysis of large historical datasets that allows for detection of sometimes surprising patterns, identification of relevant causal actors both end… ▽ More

    Submitted 10 December, 2021; originally announced December 2021.

    Comments: 20 pages, 4 figures

  10. arXiv:2111.13040  [pdf, other

    cs.PL cs.PF

    Sketch-Guided Equality Saturation: Scaling Equality Saturation to Complex Optimizations of Functional Programs

    Authors: Thomas Koehler, Phil Trinder, Michel Steuwer

    Abstract: Generating high-performance code for diverse hardware and application domains is challenging. Functional array programming languages with patterns like map and reduce have been successfully combined with term rewriting to define and explore optimization spaces. However, deciding what sequence of rewrites to apply is hard and has a huge impact on the performance of the rewritten program. Equality s… ▽ More

    Submitted 3 June, 2022; v1 submitted 25 November, 2021; originally announced November 2021.

    Comments: 23 pages excluding references, submitted to OOPLSA 2022

  11. arXiv:2108.11730  [pdf, other

    stat.ML cs.CV cs.LG cs.NE eess.IV math.OC

    Deep learning based dictionary learning and tomographic image reconstruction

    Authors: Jevgenija Rudzusika, Thomas Koehler, Ozan Öktem

    Abstract: This work presents an approach for image reconstruction in clinical low-dose tomography that combines principles from sparse signal processing with ideas from deep learning. First, we describe sparse signal representation in terms of dictionaries from a statistical perspective and interpret dictionary learning as a process of aligning distribution that arises from a generative model with empirical… ▽ More

    Submitted 26 August, 2021; originally announced August 2021.

    Comments: 34 pages, 5 figures

    Journal ref: SIAM Journal on Imaging Sciences, Vol 15, Iss 4. (2002)

  12. arXiv:2104.00705  [pdf, other

    cs.SD cs.AI eess.AS

    Multi-rate attention architecture for fast streamable Text-to-speech spectrum modeling

    Authors: Qing He, Thilo Koehler, Jilong Wu

    Abstract: Typical high quality text-to-speech (TTS) systems today use a two-stage architecture, with a spectrum model stage that generates spectral frames and a vocoder stage that generates the actual audio. High-quality spectrum models usually incorporate the encoder-decoder architecture with self-attention or bi-directional long short-term (BLSTM) units. While these models can produce high quality speech,… ▽ More

    Submitted 1 April, 2021; originally announced April 2021.

  13. arXiv:2011.12985  [pdf, other

    cs.SD cs.LG eess.AS

    FBWave: Efficient and Scalable Neural Vocoders for Streaming Text-To-Speech on the Edge

    Authors: Bichen Wu, Qing He, Peizhao Zhang, Thilo Koehler, Kurt Keutzer, Peter Vajda

    Abstract: Nowadays more and more applications can benefit from edge-based text-to-speech (TTS). However, most existing TTS models are too computationally expensive and are not flexible enough to be deployed on the diverse variety of edge devices with their equally diverse computational capacities. To address this, we propose FBWave, a family of efficient and scalable neural vocoders that can achieve optimal… ▽ More

    Submitted 25 November, 2020; originally announced November 2020.

  14. Joint Super-Resolution and Rectification for Solar Cell Inspection

    Authors: Mathis Hoffmann, Thomas Köhler, Bernd Doll, Frank Schebesch, Florian Talkenberg, Ian Marius Peters, Christoph J. Brabec, Andreas Maier, Vincent Christlein

    Abstract: Visual inspection of solar modules is an important monitoring facility in photovoltaic power plants. Since a single measurement of fast CMOS sensors is limited in spatial resolution and often not sufficient to reliably detect small defects, we apply multi-frame super-resolution (MFSR) to a sequence of low resolution measurements. In addition, the rectification and removal of lens distortion simpli… ▽ More

    Submitted 7 April, 2021; v1 submitted 10 November, 2020; originally announced November 2020.

  15. arXiv:2002.06758  [pdf, other

    cs.SD eess.AS

    Interactive Text-to-Speech System via Joint Style Analysis

    Authors: Yang Gao, Weiyi Zheng, Zhaojun Yang, Thilo Kohler, Christian Fuegen, Qing He

    Abstract: While modern TTS technologies have made significant advancements in audio quality, there is still a lack of behavior naturalness compared to conversing with people. We propose a style-embedded TTS system that generates styled responses based on the speech query style. To achieve this, the system includes a style extraction model that extracts a style embedding from the speech query, which is then… ▽ More

    Submitted 21 September, 2020; v1 submitted 16 February, 2020; originally announced February 2020.

    Comments: Accepted by Interspeech 2020

  16. arXiv:2002.02268  [pdf, other

    cs.PL cs.PF

    A Language for Describing Optimization Strategies

    Authors: Bastian Hagedorn, Johannes Lenfers, Thomas Koehler, Sergei Gorlatch, Michel Steuwer

    Abstract: Optimizing programs to run efficiently on modern parallel hardware is hard but crucial for many applications. The predominantly used imperative languages - like C or OpenCL - force the programmer to intertwine the code describing functionality and optimizations. This results in a nightmare for portability which is particularly problematic given the accelerating trend towards specialized hardware d… ▽ More

    Submitted 6 February, 2020; originally announced February 2020.

    Comments: https://elevate-lang.org/ https://github.com/elevate-lang

  17. arXiv:1911.04762  [pdf, other

    eess.IV cs.CV

    Merging-ISP: Multi-Exposure High Dynamic Range Image Signal Processing

    Authors: Prashant Chaudhari, Franziska Schirrmacher, Andreas Maier, Christian Riess, Thomas Köhler

    Abstract: High dynamic range (HDR) imaging combines multiple images with different exposure times into a single high-quality image. The image signal processing pipeline (ISP) is a core component in digital cameras to perform these operations. It includes demosaicing of raw color filter array (CFA) data at different exposure times, alignment of the exposures, conversion to HDR domain, and exposure merging in… ▽ More

    Submitted 4 October, 2021; v1 submitted 12 November, 2019; originally announced November 2019.

    Comments: Computational Photography, DAGM GCPR 2021

  18. arXiv:1910.12612  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    G2G: TTS-Driven Pronunciation Learning for Graphemic Hybrid ASR

    Authors: Duc Le, Thilo Koehler, Christian Fuegen, Michael L. Seltzer

    Abstract: Grapheme-based acoustic modeling has recently been shown to outperform phoneme-based approaches in both hybrid and end-to-end automatic speech recognition (ASR), even on non-phonemic languages like English. However, graphemic ASR still has problems with rare long-tail words that do not follow the standard spelling conventions seen in training, such as entity names. In this work, we present a novel… ▽ More

    Submitted 13 February, 2020; v1 submitted 22 October, 2019; originally announced October 2019.

    Comments: To appear at ICASSP 2020

  19. arXiv:1812.09375  [pdf, other

    cs.CV

    Multi-Frame Super-Resolution Reconstruction with Applications to Medical Imaging

    Authors: Thomas Köhler

    Abstract: The optical resolution of a digital camera is one of its most crucial parameters with broad relevance for consumer electronics, surveillance systems, remote sensing, or medical imaging. However, resolution is physically limited by the optics and sensor characteristics. In addition, practical and economic reasons often stipulate the use of out-dated or low-cost hardware. Super-resolution is a class… ▽ More

    Submitted 21 December, 2018; originally announced December 2018.

    Comments: Ph.D. thesis at the Friedrich-Alexander-Universität (FAU) Erlangen-Nürnberg; source code is available at https://www5.cs.fau.de/de/forschung/software/multi-frame-super-resolution-toolbox/ . https://opus4.kobv.de/opus4-fau/frontdoor/index/index/docId/9145

  20. arXiv:1809.06420  [pdf, other

    cs.CV

    Toward Bridging the Simulated-to-Real Gap: Benchmarking Super-Resolution on Real Data

    Authors: Thomas Köhler, Michel Bätz, Farzad Naderi, André Kaup, Andreas Maier, Christian Riess

    Abstract: Capturing ground truth data to benchmark super-resolution (SR) is challenging. Therefore, current quantitative studies are mainly evaluated on simulated data artificially sampled from ground truth images. We argue that such evaluations overestimate the actual performance of SR methods compared to their behavior on real images. Toward bridging this simulated-to-real gap, we introduce the Super-Reso… ▽ More

    Submitted 16 June, 2019; v1 submitted 17 September, 2018; originally announced September 2018.

    Comments: To appear in IEEE Transactions on Pattern Analysis and Machine Intelligence; data and source code available at https://superresolution.tf.fau.de/

  21. P4CEP: Towards In-Network Complex Event Processing

    Authors: Thomas Kohler, Ruben Mayer, Frank Dürr, Marius Maaß, Sukanya Bhowmik, Kurt Rothermel

    Abstract: In-network computing using programmable networking hardware is a strong trend in networking that promises to reduce latency and consumption of server resources through offloading to network elements (programmable switches and smart NICs). In particular, the data plane programming language P4 together with powerful P4 networking hardware has spawned projects offloading services into the network, e.… ▽ More

    Submitted 12 June, 2018; originally announced June 2018.

    Comments: 6 pages. Author's version

  22. arXiv:1802.05518  [pdf, ps, other

    cs.CV

    Learning from a Handful Volumes: MRI Resolution Enhancement with Volumetric Super-Resolution Forests

    Authors: Aline Sindel, Katharina Breininger, Johannes Käßer, Andreas Hess, Andreas Maier, Thomas Köhler

    Abstract: Magnetic resonance imaging (MRI) enables 3-D imaging of anatomical structures. However, the acquisition of MR volumes with high spatial resolution leads to long scan times. To this end, we propose volumetric super-resolution forests (VSRF) to enhance MRI resolution retrospectively. Our method learns a locally linear map** between low-resolution and high-resolution volumetric image patches by emp… ▽ More

    Submitted 15 February, 2018; originally announced February 2018.

    Comments: Preprint submitted to ICIP 2018

  23. arXiv:1802.03943  [pdf, other

    cs.CV

    Temporal and volumetric denoising via quantile sparse image prior

    Authors: Franziska Schirrmacher, Thomas Köhler, Tobias Lindenberger, Lennart Husvogt, Jürgen Endres, James G. Fujimoto, Joachim Hornegger, Arnd Dörfler, Philip Hoelter, Andreas K. Maier

    Abstract: This paper introduces an universal and structure-preserving regularization term, called quantile sparse image (QuaSI) prior. The prior is suitable for denoising images from various medical imaging modalities. We demonstrate its effectiveness on volumetric optical coherence tomography (OCT) and computed tomography (CT) data, which show different noise and image characteristics. OCT offers high-reso… ▽ More

    Submitted 17 June, 2019; v1 submitted 12 February, 2018; originally announced February 2018.

    Comments: Accepted for MICCAI2017 special issue

  24. arXiv:1709.04881  [pdf, other

    cs.CV

    Benchmarking Super-Resolution Algorithms on Real Data

    Authors: Thomas Köhler, Michel Bätz, Farzad Naderi, André Kaup, Andreas K. Maier, Christian Riess

    Abstract: Over the past decades, various super-resolution (SR) techniques have been developed to enhance the spatial resolution of digital images. Despite the great number of methodical contributions, there is still a lack of comparative validations of SR under practical conditions, as capturing real ground truth data is a challenging task. Therefore, current studies are either evaluated 1) on simulated dat… ▽ More

    Submitted 8 September, 2017; originally announced September 2017.

  25. arXiv:1703.02942  [pdf, other

    cs.CV

    QuaSI: Quantile Sparse Image Prior for Spatio-Temporal Denoising of Retinal OCT Data

    Authors: Franziska Schirrmacher, Thomas Köhler, Lennart Husvogt, James G. Fujimoto, Joachim Hornegger, Andreas K. Maier

    Abstract: Optical coherence tomography (OCT) enables high-resolution and non-invasive 3D imaging of the human retina but is inherently impaired by speckle noise. This paper introduces a spatio-temporal denoising algorithm for OCT data on a B-scan level using a novel quantile sparse image (QuaSI) prior. To remove speckle noise while preserving image structures of diagnostic relevance, we implement our QuaSI… ▽ More

    Submitted 8 March, 2017; originally announced March 2017.

    Comments: submitted to MICCAI'17

  26. arXiv:1702.04449  [pdf, other

    cs.SI physics.soc-ph

    Modeling Social Organizations as Communication Networks

    Authors: David Wolpert, Justin Grana, Brendan Tracey, Tim Kohler, Artemy Kolchinsky

    Abstract: We identify the "organization" of a human social group as the communication network(s) within that group. We then introduce three theoretical approaches to analyzing what determines the structures of human organizations. All three approaches adopt a group-selection perspective, so that the group's network structure is (approximately) optimal, given the information-processing limitations of agents… ▽ More

    Submitted 14 February, 2017; originally announced February 2017.

  27. arXiv:1610.04421  [pdf, other

    cs.NI

    ZeroSDN: A Message Bus for Flexible and Light-weight Network Control Distribution in SDN

    Authors: Frank Dürr, Thomas Kohler, Jonas Grunert, Andre Kutzleb

    Abstract: Recent years have seen an evolution of SDN control plane architectures, starting from simple monolithic controllers, over modular monolithic controllers, to distributed controllers. We observe, however, that today's distributed controllers still exhibit inflexibility with respect to the distribution of control logic. Therefore, we propose a novel architecture of a distributed SDN controller in thi… ▽ More

    Submitted 14 October, 2016; originally announced October 2016.

    Report number: TR-2016-06 ACM Class: C.2.1; C.2.3

  28. Confidence-aware Levenberg-Marquardt optimization for joint motion estimation and super-resolution

    Authors: Cosmin Bercea, Andreas Maier, Thomas Köhler

    Abstract: Motion estimation across low-resolution frames and the reconstruction of high-resolution images are two coupled subproblems of multi-frame super-resolution. This paper introduces a new joint optimization approach for motion estimation and image reconstruction to address this interdependence. Our method is formulated via non-linear least squares optimization and combines two principles of robust su… ▽ More

    Submitted 6 September, 2016; originally announced September 2016.

    Comments: accepted for ICIP 2016

    Journal ref: 2016 IEEE International Conference on Image Processing (ICIP)

  29. arXiv:1602.03458  [pdf, other

    cs.CV

    Super-Resolved Retinal Image Mosaicing

    Authors: Thomas Köhler, Axel Heinrich, Andreas Maier, Joachim Hornegger, Ralf P. Tornow

    Abstract: The acquisition of high-resolution retinal fundus images with a large field of view (FOV) is challenging due to technological, physiological and economic reasons. This paper proposes a fully automatic framework to reconstruct retinal images of high spatial resolution and increased FOV from multiple low-resolution images captured with non-mydriatic, mobile and video-capable but low-cost cameras. Wi… ▽ More

    Submitted 10 February, 2016; originally announced February 2016.

    Comments: accepted for 2016 IEEE 13th International Symposium on Biomedical Imaging (ISBI 2016)