Skip to main content

Showing 1–38 of 38 results for author: Cavigelli, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.10549  [pdf, other

    cs.SD cs.LG eess.AS

    On-Device Domain Learning for Keyword Spotting on Low-Power Extreme Edge Embedded Systems

    Authors: Cristian Cioflan, Lukas Cavigelli, Manuele Rusci, Miguel de Prado, Luca Benini

    Abstract: Keyword spotting accuracy degrades when neural networks are exposed to noisy environments. On-site adaptation to previously unseen noise is crucial to recovering accuracy loss, and on-device learning is required to ensure that the adaptation process happens entirely on the edge device. In this work, we propose a fully on-device domain adaptation system achieving up to 14% accuracy gains over alrea… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

    Comments: 5 pages, 2 tables, 2 figures. Accepted at IEEE AICAS 2024

  2. arXiv:2403.07802  [pdf, other

    cs.SD cs.LG eess.AS

    Boosting keyword spotting through on-device learnable user speech characteristics

    Authors: Cristian Cioflan, Lukas Cavigelli, Luca Benini

    Abstract: Keyword spotting systems for always-on TinyML-constrained applications require on-site tuning to boost the accuracy of offline trained classifiers when deployed in unseen inference conditions. Adapting to the speech peculiarities of target users requires many in-domain samples, often unavailable in real-world scenarios. Furthermore, current on-device learning techniques rely on computationally int… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

    Comments: 5 pages, 3 tables, 2 figures. Accepted as a full paper by the tinyML Research Symposium 2024

  3. arXiv:2311.10207  [pdf, other

    cs.AR cs.CV cs.LG stat.ML

    Stella Nera: Achieving 161 TOp/s/W with Multiplier-free DNN Acceleration based on Approximate Matrix Multiplication

    Authors: Jannis Schönleber, Lukas Cavigelli, Renzo Andri, Matteo Perotti, Luca Benini

    Abstract: From classical HPC to deep learning, MatMul is at the heart of today's computing. The recent Maddness method approximates MatMul without the need for multiplication by using a hash-based version of product quantization (PQ) indexing into a look-up table (LUT). Stella Nera is the first Maddness accelerator and it achieves 15x higher area efficiency (GMAC/s/mm^2) and more than 25x higher energy effi… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

    Comments: 6 pages, 7 figures, preprint under review

  4. Ara2: Exploring Single- and Multi-Core Vector Processing with an Efficient RVV 1.0 Compliant Open-Source Processor

    Authors: Matteo Perotti, Matheus Cavalcante, Renzo Andri, Lukas Cavigelli, Luca Benini

    Abstract: Vector processing is highly effective in boosting processor performance and efficiency for data-parallel workloads. In this paper, we present Ara2, the first fully open-source vector processor to support the RISC-V V 1.0 frozen ISA. We evaluate Ara2's performance on a diverse set of data-parallel kernels for various problem sizes and vector-unit configurations, achieving an average functional-unit… ▽ More

    Submitted 17 June, 2024; v1 submitted 13 November, 2023; originally announced November 2023.

    Comments: To be published in: IEEE Transactions on Computers

  5. arXiv:2310.03507  [pdf, other

    cs.CV cs.GR cs.MM

    RL-based Stateful Neural Adaptive Sampling and Denoising for Real-Time Path Tracing

    Authors: Antoine Scardigli, Lukas Cavigelli, Lorenz K. Müller

    Abstract: Monte-Carlo path tracing is a powerful technique for realistic image synthesis but suffers from high levels of noise at low sample counts, limiting its use in real-time applications. To address this, we propose a framework with end-to-end training of a sampling importance network, a latent space encoder network, and a denoiser network. Our approach uses reinforcement learning to optimize the sampl… ▽ More

    Submitted 5 October, 2023; originally announced October 2023.

    Comments: Submitted to NeurIPS. https://openreview.net/forum?id=xNyR7DXUzJ

  6. arXiv:2305.19917  [pdf, other

    cs.AR cs.DC

    ReDSEa: Automated Acceleration of Triangular Solver on Supercloud Heterogeneous Systems

    Authors: Georgios Zacharopoulos, Ilias Bournias, Verner Vlacic, Lukas Cavigelli

    Abstract: When utilized effectively, Supercloud heterogeneous systems have the potential to significantly enhance performance. Our ReDSEa tool-chain automates the map**, load balancing, scheduling, parallelism, and overlap** processes for the Triangular System Solver (TS) on a heterogeneous system consisting of a Huawei Kunpeng ARM multi-core CPU and an Ascend 910 AI HW accelerator. We propose an LLVM c… ▽ More

    Submitted 31 May, 2023; originally announced May 2023.

    Comments: 4 pages, SSH-S0C DAC 2023 Workshop

  7. arXiv:2305.04546  [pdf, other

    cs.AR

    Flex-SFU: Accelerating DNN Activation Functions by Non-Uniform Piecewise Approximation

    Authors: Enrico Reggiani, Renzo Andri, Lukas Cavigelli

    Abstract: Modern DNN workloads increasingly rely on activation functions consisting of computationally complex operations. This poses a challenge to current accelerators optimized for convolutions and matrix-matrix multiplications. This work presents Flex-SFU, a lightweight hardware accelerator for activation functions implementing non-uniform piecewise interpolation supporting multiple data formats. Non-Un… ▽ More

    Submitted 8 May, 2023; originally announced May 2023.

    Comments: 6 pages, 6 figures, 3 tables

  8. A ''New Ara'' for Vector Computing: An Open Source Highly Efficient RISC-V V 1.0 Vector Processor Design

    Authors: Matteo Perotti, Matheus Cavalcante, Nils Wistoff, Renzo Andri, Lukas Cavigelli, Luca Benini

    Abstract: Vector architectures are gaining traction for highly efficient processing of data-parallel workloads, driven by all major ISAs (RISC-V, Arm, Intel), and boosted by landmark chips, like the Arm SVE-based Fujitsu A64FX, powering the TOP500 leader Fugaku. The RISC-V V extension has recently reached 1.0-Frozen status. Here, we present its first open-source implementation, discuss the new specification… ▽ More

    Submitted 17 October, 2022; originally announced October 2022.

  9. arXiv:2209.12982  [pdf, other

    cs.AR cs.CV cs.LG

    Going Further With Winograd Convolutions: Tap-Wise Quantization for Efficient Inference on 4x4 Tile

    Authors: Renzo Andri, Beatrice Bussolino, Antonio Cipolletta, Lukas Cavigelli, Zhe Wang

    Abstract: Most of today's computer vision pipelines are built around deep neural networks, where convolution operations require most of the generally high compute effort. The Winograd convolution algorithm computes convolutions with fewer MACs compared to the standard algorithm, reducing the operation count by a factor of 2.25x for 3x3 convolutions when using the version with 2x2-sized tiles $F_2$. Even tho… ▽ More

    Submitted 26 September, 2022; originally announced September 2022.

    Comments: Accepted at IEEE/ACM MICRO 2022 (1-5 October 2022)

  10. arXiv:2202.07462  [pdf, other

    cs.LG cs.AI cs.AR cs.DC cs.NE eess.SP

    Vau da muntanialas: Energy-efficient multi-die scalable acceleration of RNN inference

    Authors: Gianna Paulin, Francesco Conti, Lukas Cavigelli, Luca Benini

    Abstract: Recurrent neural networks such as Long Short-Term Memories (LSTMs) learn temporal dependencies by kee** an internal state, making them ideal for time-series problems such as speech recognition. However, the output-to-input feedback creates distinctive memory bandwidth and scalability challenges in designing accelerators for RNNs. We present Muntaniala, an RNN accelerator architecture for LSTM in… ▽ More

    Submitted 14 February, 2022; originally announced February 2022.

    Journal ref: IEEE Transactions on Circuits and Systems I: Regular Papers, IEEE, Volume: 69, Issue: 1, January 2022, Date of Publication (Early Access) 30 July 2021

  11. arXiv:2201.03386  [pdf, other

    cs.SD cs.AI cs.AR eess.AS

    Sub-mW Keyword Spotting on an MCU: Analog Binary Feature Extraction and Binary Neural Networks

    Authors: Gianmarco Cerutti, Lukas Cavigelli, Renzo Andri, Michele Magno, Elisabetta Farella, Luca Benini

    Abstract: Keyword spotting (KWS) is a crucial function enabling the interaction with the many ubiquitous smart devices in our surroundings, either activating them through wake-word or directly as a human-computer interface. For many applications, KWS is the entry point for our interactions with the device and, thus, an always-on workload. Many smart devices are mobile and their battery lifetime is heavily i… ▽ More

    Submitted 10 January, 2022; originally announced January 2022.

  12. arXiv:2105.01755  [pdf, other

    cs.LG

    Reinforcement Learning for Scalable Logic Optimization with Graph Neural Networks

    Authors: Xavier Timoneda, Lukas Cavigelli

    Abstract: Logic optimization is an NP-hard problem commonly approached through hand-engineered heuristics. We propose to combine graph convolutional networks with reinforcement learning and a novel, scalable node embedding method to learn which local transforms should be applied to the logic graph. We show that this method achieves a similar size reduction as ABC on smaller circuits and outperforms it by 1.… ▽ More

    Submitted 4 May, 2021; originally announced May 2021.

  13. arXiv:2103.13740  [pdf, other

    cs.LG cs.AI

    ECG-TCN: Wearable Cardiac Arrhythmia Detection with a Temporal Convolutional Network

    Authors: Thorir Mar Ingolfsson, Xiaying Wang, Michael Hersche, Alessio Burrello, Lukas Cavigelli, Luca Benini

    Abstract: Personalized ubiquitous healthcare solutions require energy-efficient wearable platforms that provide an accurate classification of bio-signals while consuming low average power for long-term battery-operated use. Single lead electrocardiogram (ECG) signals provide the ability to detect, classify, and even predict cardiac arrhythmia. In this paper, we propose a novel temporal convolutional network… ▽ More

    Submitted 14 June, 2021; v1 submitted 25 March, 2021; originally announced March 2021.

    Comments: 4 pages, 1 figure, 2 tables

  14. Sound Event Detection with Binary Neural Networks on Tightly Power-Constrained IoT Devices

    Authors: Gianmarco Cerutti, Renzo Andri, Lukas Cavigelli, Michele Magno, Elisabetta Farella, Luca Benini

    Abstract: Sound event detection (SED) is a hot topic in consumer and smart city applications. Existing approaches based on Deep Neural Networks are very effective, but highly demanding in terms of memory, power, and throughput when targeting ultra-low power always-on devices. Latency, availability, cost, and privacy requirements are pushing recent IoT systems to process the data on the node, close to the… ▽ More

    Submitted 12 January, 2021; originally announced January 2021.

    Comments: 6 pages conference

  15. arXiv:2011.01713  [pdf, other

    cs.AR

    CUTIE: Beyond PetaOp/s/W Ternary DNN Inference Acceleration with Better-than-Binary Energy Efficiency

    Authors: Moritz Scherer, Georg Rutishauser, Lukas Cavigelli, Luca Benini

    Abstract: We present a 3.1 POp/s/W fully digital hardware accelerator for ternary neural networks. CUTIE, the Completely Unrolled Ternary Inference Engine, focuses on minimizing non-computational energy and switching activity so that dynamic power spent on storing (locally or globally) intermediate results is minimized. This is achieved by 1) a data path architecture completely unrolled in the feature map a… ▽ More

    Submitted 4 February, 2021; v1 submitted 3 November, 2020; originally announced November 2020.

  16. arXiv:2006.00622  [pdf, other

    eess.SP cs.HC cs.LG stat.ML

    EEG-TCNet: An Accurate Temporal Convolutional Network for Embedded Motor-Imagery Brain-Machine Interfaces

    Authors: Thorir Mar Ingolfsson, Michael Hersche, Xiaying Wang, Nobuaki Kobayashi, Lukas Cavigelli, Luca Benini

    Abstract: In recent years, deep learning (DL) has contributed significantly to the improvement of motor-imagery brain-machine interfaces (MI-BMIs) based on electroencephalography(EEG). While achieving high classification accuracy, DL models have also grown in size, requiring a vast amount of memory and computational resources. This poses a major challenge to an embedded BMI solution that guarantees user pri… ▽ More

    Submitted 31 May, 2020; originally announced June 2020.

    Comments: 8 pages, 6 figures, 5 tables

  17. arXiv:2005.07137  [pdf, other

    eess.SP cs.AR

    ChewBaccaNN: A Flexible 223 TOPS/W BNN Accelerator

    Authors: Renzo Andri, Geethan Karunaratne, Lukas Cavigelli, Luca Benini

    Abstract: Binary Neural Networks enable smart IoT devices, as they significantly reduce the required memory footprint and computational complexity while retaining a high network performance and flexibility. This paper presents ChewBaccaNN, a 0.7 mm$^2$ sized binary convolutional neural network (CNN) accelerator designed in GlobalFoundries 22 nm technology. By exploiting efficient data re-use, data buffering… ▽ More

    Submitted 26 February, 2021; v1 submitted 12 May, 2020; originally announced May 2020.

    Comments: Accepted at IEEE ISCAS 2021, Daegu, South Korea, 23-26 May 2021

  18. Q-EEGNet: an Energy-Efficient 8-bit Quantized Parallel EEGNet Implementation for Edge Motor-Imagery Brain--Machine Interfaces

    Authors: Tibor Schneider, Xiaying Wang, Michael Hersche, Lukas Cavigelli, Luca Benini

    Abstract: Motor-Imagery Brain--Machine Interfaces (MI-BMIs)promise direct and accessible communication between human brains and machines by analyzing brain activities recorded with Electroencephalography (EEG). Latency, reliability, and privacy constraints make it unsuitable to offload the computation to the cloud. Practical use cases demand a wearable, battery-operated device with low average power consump… ▽ More

    Submitted 16 January, 2023; v1 submitted 24 April, 2020; originally announced April 2020.

  19. arXiv:2001.01091  [pdf, other

    cs.CV

    RPR: Random Partition Relaxation for Training; Binary and Ternary Weight Neural Networks

    Authors: Lukas Cavigelli, Luca Benini

    Abstract: We present Random Partition Relaxation (RPR), a method for strong quantization of neural networks weight to binary (+1/-1) and ternary (+1/0/-1) values. Starting from a pre-trained model, we quantize the weights and then relax random partitions of them to their continuous values for retraining before re-quantizing them and switching to another weight partition for further adaptation. We demonstrat… ▽ More

    Submitted 4 January, 2020; originally announced January 2020.

  20. HR-SAR-Net: A Deep Neural Network for Urban Scene Segmentation from High-Resolution SAR Data

    Authors: Xiaying Wang, Lukas Cavigelli, Manuel Eggimann, Michele Magno, Luca Benini

    Abstract: Synthetic aperture radar (SAR) data is becoming increasingly available to a wide range of users through commercial service providers with resolutions reaching 0.5m/px. Segmenting SAR data still requires skilled personnel, limiting the potential for large-scale use. We show that it is possible to automatically and reliably perform urban scene segmentation from next-gen resolution SAR data (0.15m/px… ▽ More

    Submitted 16 January, 2023; v1 submitted 9 December, 2019; originally announced December 2019.

  21. arXiv:1911.03314  [pdf, other

    cs.LG eess.SP stat.ML

    FANN-on-MCU: An Open-Source Toolkit for Energy-Efficient Neural Network Inference at the Edge of the Internet of Things

    Authors: Xiaying Wang, Michele Magno, Lukas Cavigelli, Luca Benini

    Abstract: The growing number of low-power smart devices in the Internet of Things is coupled with the concept of "Edge Computing", that is moving some of the intelligence, especially machine learning, towards the edge of the network. Enabling machine learning algorithms to run on resource-constrained hardware, typically on low-power smart devices, is challenging in terms of hardware (optimized and energy-ef… ▽ More

    Submitted 17 February, 2022; v1 submitted 8 November, 2019; originally announced November 2019.

  22. arXiv:1908.11645  [pdf, other

    cs.CV cs.AR eess.IV

    EBPC: Extended Bit-Plane Compression for Deep Neural Network Inference and Training Accelerators

    Authors: Lukas Cavigelli, Georg Rutishauser, Luca Benini

    Abstract: In the wake of the success of convolutional neural networks in image classification, object recognition, speech recognition, etc., the demand for deploying these compute-intensive ML models on embedded and mobile systems with tight power and energy constraints at low cost, as well as for boosting throughput in data centers, is growing rapidly. This has sparked a surge of research into specialized… ▽ More

    Submitted 25 October, 2019; v1 submitted 30 August, 2019; originally announced August 2019.

    Comments: arXiv admin note: substantial text overlap with arXiv:1810.03979

  23. arXiv:1905.10452  [pdf, ps, other

    cs.LG cs.CV stat.ML

    Additive Noise Annealing and Approximation Properties of Quantized Neural Networks

    Authors: Matteo Spallanzani, Lukas Cavigelli, Gian Paolo Leonardi, Marko Bertogna, Luca Benini

    Abstract: We present a theoretical and experimental investigation of the quantization problem for artificial neural networks. We provide a mathematical definition of quantized neural networks and analyze their approximation capabilities, showing in particular that any Lipschitz-continuous map defined on a hypercube can be uniformly approximated by a quantized neural network. We then focus on the regularizat… ▽ More

    Submitted 24 May, 2019; originally announced May 2019.

  24. arXiv:1810.03979  [pdf, other

    cs.CV cs.AI cs.AR cs.LG

    Extended Bit-Plane Compression for Convolutional Neural Network Accelerators

    Authors: Lukas Cavigelli, Luca Benini

    Abstract: After the tremendous success of convolutional neural networks in image classification, object detection, speech recognition, etc., there is now rising demand for deployment of these compute-intensive ML models on tightly power constrained embedded and mobile systems at low cost as well as for pushing the throughput in data centers. This has triggered a wave of research towards specialized hardware… ▽ More

    Submitted 1 October, 2018; originally announced October 2018.

  25. arXiv:1808.05488  [pdf, other

    cs.CV cs.AI cs.NE eess.IV

    CBinfer: Exploiting Frame-to-Frame Locality for Faster Convolutional Network Inference on Video Streams

    Authors: Lukas Cavigelli, Luca Benini

    Abstract: The last few years have brought advances in computer vision at an amazing pace, grounded on new findings in deep neural network construction and training as well as the availability of large labeled datasets. Applying these networks to images demands a high computational effort and pushes the use of state-of-the-art networks on real-time video data out of reach of embedded platforms. Many recent w… ▽ More

    Submitted 4 March, 2019; v1 submitted 15 August, 2018; originally announced August 2018.

    Comments: arXiv admin note: substantial text overlap with arXiv:1704.04313

  26. arXiv:1804.00623  [pdf, other

    cs.DC cs.AR cs.CV eess.IV eess.SP

    Hyperdrive: A Multi-Chip Systolically Scalable Binary-Weight CNN Inference Engine

    Authors: Renzo Andri, Lukas Cavigelli, Davide Rossi, Luca Benini

    Abstract: Deep neural networks have achieved impressive results in computer vision and machine learning. Unfortunately, state-of-the-art networks are extremely compute and memory intensive which makes them unsuitable for mW-devices such as IoT end-nodes. Aggressive quantization of these networks dramatically reduces the computation and memory footprint. Binary-weight neural networks (BWNs) follow this trend… ▽ More

    Submitted 14 March, 2019; v1 submitted 5 March, 2018; originally announced April 2018.

  27. arXiv:1803.05849  [pdf, other

    cs.CV cs.AI cs.AR cs.NE eess.IV

    XNORBIN: A 95 TOp/s/W Hardware Accelerator for Binary Convolutional Neural Networks

    Authors: Andrawes Al Bahou, Geethan Karunaratne, Renzo Andri, Lukas Cavigelli, Luca Benini

    Abstract: Deploying state-of-the-art CNNs requires power-hungry processors and off-chip memory. This precludes the implementation of CNNs in low-power embedded systems. Recent research shows CNNs sustain extreme quantization, binarizing their weights and intermediate feature maps, thereby saving 8-32\x memory and collapsing energy-intensive sum-of-products into XNOR-and-popcount operations. We present XNO… ▽ More

    Submitted 5 March, 2018; originally announced March 2018.

  28. arXiv:1712.01743  [pdf, other

    cs.OH cs.AR cs.CV cs.NE eess.SP

    Design Automation for Binarized Neural Networks: A Quantum Leap Opportunity?

    Authors: Manuele Rusci, Lukas Cavigelli, Luca Benini

    Abstract: Design automation in general, and in particular logic synthesis, can play a key role in enabling the design of application-specific Binarized Neural Networks (BNN). This paper presents the hardware design and synthesis of a purely combinational BNN for ultra-low power near-sensor processing. We leverage the major opportunities raised by BNN models, which consist mostly of logical bit-wise operatio… ▽ More

    Submitted 21 November, 2017; originally announced December 2017.

  29. arXiv:1711.05734  [pdf, other

    cs.DC cs.LG cs.NE cs.SD

    Chipmunk: A Systolically Scalable 0.9 mm${}^2$, 3.08 Gop/s/mW @ 1.2 mW Accelerator for Near-Sensor Recurrent Neural Network Inference

    Authors: Francesco Conti, Lukas Cavigelli, Gianna Paulin, Igor Susmelj, Luca Benini

    Abstract: Recurrent neural networks (RNNs) are state-of-the-art in voice awareness/understanding and speech recognition. On-device computation of RNNs on low-power mobile and wearable devices would be key to applications such as zero-latency voice-based human-machine interfaces. Here we present Chipmunk, a small (<1 mm${}^2$) hardware accelerator for Long-Short Term Memory RNNs in UMC 65 nm technology capab… ▽ More

    Submitted 20 February, 2018; v1 submitted 15 November, 2017; originally announced November 2017.

  30. arXiv:1711.03538  [pdf, other

    eess.IV cs.AR cs.GR eess.SP

    Hydra: An Accelerator for Real-Time Edge-Aware Permeability Filtering in 65nm CMOS

    Authors: Manuel Eggimann, Christelle Gloor, Florian Scheidegger, Lukas Cavigelli, Michael Schaffner, Aljosa Smolic, Luca Benini

    Abstract: Many modern video processing pipelines rely on edge-aware (EA) filtering methods. However, recent high-quality methods are challenging to run in real-time on embedded hardware due to their computational load. To this end, we propose an area-efficient and real-time capable hardware implementation of a high quality EA method. In particular, we focus on the recently proposed permeability filter (PF)… ▽ More

    Submitted 8 November, 2017; originally announced November 2017.

  31. arXiv:1709.09888  [pdf, other

    cs.CV eess.AS

    Efficient Convolutional Neural Network For Audio Event Detection

    Authors: Matthias Meyer, Lukas Cavigelli, Lothar Thiele

    Abstract: Wireless distributed systems as used in sensor networks, Internet-of-Things and cyber-physical systems, impose high requirements on resource efficiency. Advanced preprocessing and classification of data at the network edge can help to decrease the communication demand and to reduce the amount of data to be processed centrally. In the area of distributed acoustic sensing, the combination of algorit… ▽ More

    Submitted 28 September, 2017; originally announced September 2017.

  32. arXiv:1704.04313  [pdf, other

    cs.CV cs.AI cs.LG cs.PF eess.IV

    CBinfer: Change-Based Inference for Convolutional Neural Networks on Video Data

    Authors: Lukas Cavigelli, Philippe Degen, Luca Benini

    Abstract: Extracting per-frame features using convolutional neural networks for real-time processing of video data is currently mainly performed on powerful GPU-accelerated workstations and compute clusters. However, there are many applications such as smart surveillance cameras that require or would benefit from on-site processing. To this end, we propose and evaluate a novel algorithm for change-based eva… ▽ More

    Submitted 21 June, 2017; v1 submitted 13 April, 2017; originally announced April 2017.

  33. arXiv:1704.00648  [pdf, other

    cs.LG cs.CV

    Soft-to-Hard Vector Quantization for End-to-End Learning Compressible Representations

    Authors: Eirikur Agustsson, Fabian Mentzer, Michael Tschannen, Lukas Cavigelli, Radu Timofte, Luca Benini, Luc Van Gool

    Abstract: We present a new approach to learn compressible representations in deep architectures with an end-to-end training strategy. Our method is based on a soft (continuous) relaxation of quantization and entropy, which we anneal to their discrete counterparts throughout training. We showcase this method for two challenging applications: Image compression and neural network compression. While these tasks… ▽ More

    Submitted 8 June, 2017; v1 submitted 3 April, 2017; originally announced April 2017.

  34. arXiv:1611.07233  [pdf, other

    cs.CV cs.AI cs.GR cs.IR cs.MM

    CAS-CNN: A Deep Convolutional Neural Network for Image Compression Artifact Suppression

    Authors: Lukas Cavigelli, Pascal Hager, Luca Benini

    Abstract: Lossy image compression algorithms are pervasively used to reduce the size of images transmitted over the web and recorded on data storage media. However, we pay for their high compression rate with visual artifacts degrading the user experience. Deep convolutional neural networks have become a widespread tool to address high-level computer vision tasks very successfully. Recently, they have found… ▽ More

    Submitted 22 November, 2016; originally announced November 2016.

    Comments: 8 pages

  35. arXiv:1611.03130  [pdf

    cs.CV cs.AI cs.NE eess.IV eess.SP

    Computationally Efficient Target Classification in Multispectral Image Data with Deep Neural Networks

    Authors: Lukas Cavigelli, Dominic Bernath, Michele Magno, Luca Benini

    Abstract: Detecting and classifying targets in video streams from surveillance cameras is a cumbersome, error-prone and expensive task. Often, the incurred costs are prohibitive for real-time monitoring. This leads to data being stored locally or transmitted to a central storage site for post-incident examination. The required communication links and archiving of the video data are still expensive and this… ▽ More

    Submitted 9 November, 2016; originally announced November 2016.

    Comments: Presented at SPIE Security + Defence 2016 Proc. SPIE 9997, Target and Background Signatures II

  36. arXiv:1609.07916  [pdf, other

    cs.CV cs.LG

    Deep Structured Features for Semantic Segmentation

    Authors: Michael Tschannen, Lukas Cavigelli, Fabian Mentzer, Thomas Wiatowski, Luca Benini

    Abstract: We propose a highly structured neural network architecture for semantic segmentation with an extremely small model size, suitable for low-power embedded and mobile platforms. Specifically, our architecture combines i) a Haar wavelet-based tree-like convolutional neural network (CNN), ii) a random layer realizing a radial basis function kernel approximation, and iii) a linear classifier. While stag… ▽ More

    Submitted 16 June, 2017; v1 submitted 26 September, 2016; originally announced September 2016.

    Comments: EUSIPCO 2017, 5 pages, 2 figures

  37. arXiv:1606.05487  [pdf, other

    cs.AR cs.CV cs.NE

    YodaNN: An Architecture for Ultra-Low Power Binary-Weight CNN Acceleration

    Authors: Renzo Andri, Lukas Cavigelli, Davide Rossi, Luca Benini

    Abstract: Convolutional neural networks (CNNs) have revolutionized the world of computer vision over the last few years, pushing image classification beyond human accuracy. The computational effort of today's CNNs requires power-hungry parallel processors or GP-GPUs. Recent developments in CNN accelerators for system-on-chip integration have reduced energy consumption significantly. Unfortunately, even thes… ▽ More

    Submitted 24 February, 2017; v1 submitted 17 June, 2016; originally announced June 2016.

  38. arXiv:1512.04295  [pdf, other

    cs.CV cs.AI cs.LG cs.NE

    Origami: A 803 GOp/s/W Convolutional Network Accelerator

    Authors: Lukas Cavigelli, Luca Benini

    Abstract: An ever increasing number of computer vision and image/video processing challenges are being approached using deep convolutional neural networks, obtaining state-of-the-art results in object recognition and detection, semantic segmentation, action recognition, optical flow and superresolution. Hardware acceleration of these algorithms is essential to adopt these improvements in embedded and mobile… ▽ More

    Submitted 19 January, 2016; v1 submitted 14 December, 2015; originally announced December 2015.

    Comments: 14 pages

    ACM Class: B.7.1; I.2.6