-
NERV++: An Enhanced Implicit Neural Video Representation
Authors:
Ahmed Ghorbel,
Wassim Hamidouche,
Luce Morin
Abstract:
Neural fields, also known as implicit neural representations (INRs), have shown a remarkable capability of representing, generating, and manipulating various data types, allowing for continuous data reconstruction at a low memory footprint. Though promising, INRs applied to video compression still need to improve their rate-distortion performance by a large margin, and require a huge number of par…
▽ More
Neural fields, also known as implicit neural representations (INRs), have shown a remarkable capability of representing, generating, and manipulating various data types, allowing for continuous data reconstruction at a low memory footprint. Though promising, INRs applied to video compression still need to improve their rate-distortion performance by a large margin, and require a huge number of parameters and long training iterations to capture high-frequency details, limiting their wider applicability. Resolving this problem remains a quite challenging task, which would make INRs more accessible in compression tasks. We take a step towards resolving these shortcomings by introducing neural representations for videos NeRV++, an enhanced implicit neural video representation, as more straightforward yet effective enhancement over the original NeRV decoder architecture, featuring separable conv2d residual blocks (SCRBs) that sandwiches the upsampling block (UB), and a bilinear interpolation skip layer for improved feature representation. NeRV++ allows videos to be directly represented as a function approximated by a neural network, and significantly enhance the representation capacity beyond current INR-based video codecs. We evaluate our method on UVG, MCL JVC, and Bunny datasets, achieving competitive results for video compression with INRs. This achievement narrows the gap to autoencoder-based video coding, marking a significant stride in INR-based video compression research.
△ Less
Submitted 28 February, 2024;
originally announced February 2024.
-
ESG Accountability Made Easy: DocQA at Your Service
Authors:
Lokesh Mishra,
Cesar Berrospi,
Kasper Dinkla,
Diego Antognini,
Francesco Fusco,
Benedikt Bothur,
Maksym Lysak,
Nikolaos Livathinos,
Ahmed Nassar,
Panagiotis Vagenas,
Lucas Morin,
Christoph Auer,
Michele Dolfi,
Peter Staar
Abstract:
We present Deep Search DocQA. This application enables information extraction from documents via a question-answering conversational assistant. The system integrates several technologies from different AI disciplines consisting of document conversion to machine-readable format (via computer vision), finding relevant data (via natural language processing), and formulating an eloquent response (via…
▽ More
We present Deep Search DocQA. This application enables information extraction from documents via a question-answering conversational assistant. The system integrates several technologies from different AI disciplines consisting of document conversion to machine-readable format (via computer vision), finding relevant data (via natural language processing), and formulating an eloquent response (via large language models). Users can explore over 10,000 Environmental, Social, and Governance (ESG) disclosure reports from over 2000 corporations. The Deep Search platform can be accessed at: https://ds4sd.github.io.
△ Less
Submitted 30 November, 2023;
originally announced November 2023.
-
MolGrapher: Graph-based Visual Recognition of Chemical Structures
Authors:
Lucas Morin,
Martin Danelljan,
Maria Isabel Agea,
Ahmed Nassar,
Valery Weber,
Ingmar Meijer,
Peter Staar,
Fisher Yu
Abstract:
The automatic analysis of chemical literature has immense potential to accelerate the discovery of new materials and drugs. Much of the critical information in patent documents and scientific articles is contained in figures, depicting the molecule structures. However, automatically parsing the exact chemical structure is a formidable challenge, due to the amount of detailed information, the diver…
▽ More
The automatic analysis of chemical literature has immense potential to accelerate the discovery of new materials and drugs. Much of the critical information in patent documents and scientific articles is contained in figures, depicting the molecule structures. However, automatically parsing the exact chemical structure is a formidable challenge, due to the amount of detailed information, the diversity of drawing styles, and the need for training data. In this work, we introduce MolGrapher to recognize chemical structures visually. First, a deep keypoint detector detects the atoms. Second, we treat all candidate atoms and bonds as nodes and put them in a graph. This construct allows a natural graph representation of the molecule. Last, we classify atom and bond nodes in the graph with a Graph Neural Network. To address the lack of real training data, we propose a synthetic data generation pipeline producing diverse and realistic results. In addition, we introduce a large-scale benchmark of annotated real molecule images, USPTO-30K, to spur research on this critical topic. Extensive experiments on five datasets show that our approach significantly outperforms classical and learning-based methods in most settings. Code, models, and datasets are available.
△ Less
Submitted 23 August, 2023;
originally announced August 2023.
-
ConvNeXt-ChARM: ConvNeXt-based Transform for Efficient Neural Image Compression
Authors:
Ahmed Ghorbel,
Wassim Hamidouche,
Luce Morin
Abstract:
Over the last few years, neural image compression has gained wide attention from research and industry, yielding promising end-to-end deep neural codecs outperforming their conventional counterparts in rate-distortion performance. Despite significant advancement, current methods, including attention-based transform coding, still need to be improved in reducing the coding rate while preserving the…
▽ More
Over the last few years, neural image compression has gained wide attention from research and industry, yielding promising end-to-end deep neural codecs outperforming their conventional counterparts in rate-distortion performance. Despite significant advancement, current methods, including attention-based transform coding, still need to be improved in reducing the coding rate while preserving the reconstruction fidelity, especially in non-homogeneous textured image areas. Those models also require more parameters and a higher decoding time. To tackle the above challenges, we propose ConvNeXt-ChARM, an efficient ConvNeXt-based transform coding framework, paired with a compute-efficient channel-wise auto-regressive prior to capturing both global and local contexts from the hyper and quantized latent representations. The proposed architecture can be optimized end-to-end to fully exploit the context information and extract compact latent representation while reconstructing higher-quality images. Experimental results on four widely-used datasets showed that ConvNeXt-ChARM brings consistent and significant BD-rate (PSNR) reductions estimated on average to 5.24% and 1.22% over the versatile video coding (VVC) reference encoder (VTM-18.0) and the state-of-the-art learned image compression method SwinT-ChARM, respectively. Moreover, we provide model scaling studies to verify the computational efficiency of our approach and conduct several objective and subjective analyses to bring to the fore the performance gap between the next generation ConvNet, namely ConvNeXt, and Swin Transformer.
△ Less
Submitted 12 July, 2023;
originally announced July 2023.
-
AICT: An Adaptive Image Compression Transformer
Authors:
Ahmed Ghorbel,
Wassim Hamidouche,
Luce Morin
Abstract:
Motivated by the efficiency investigation of the Tranformer-based transform coding framework, namely SwinT-ChARM, we propose to enhance the latter, as first, with a more straightforward yet effective Tranformer-based channel-wise auto-regressive prior model, resulting in an absolute image compression transformer (ICT). Current methods that still rely on ConvNet-based entropy coding are limited in…
▽ More
Motivated by the efficiency investigation of the Tranformer-based transform coding framework, namely SwinT-ChARM, we propose to enhance the latter, as first, with a more straightforward yet effective Tranformer-based channel-wise auto-regressive prior model, resulting in an absolute image compression transformer (ICT). Current methods that still rely on ConvNet-based entropy coding are limited in long-range modeling dependencies due to their local connectivity and an increasing number of architectural biases and priors. On the contrary, the proposed ICT can capture both global and local contexts from the latent representations and better parameterize the distribution of the quantized latents. Further, we leverage a learnable scaling module with a sandwich ConvNeXt-based pre/post-processor to accurately extract more compact latent representation while reconstructing higher-quality images. Extensive experimental results on benchmark datasets showed that the proposed adaptive image compression transformer (AICT) framework significantly improves the trade-off between coding efficiency and decoder complexity over the versatile video coding (VVC) reference encoder (VTM-18.0) and the neural codec SwinT-ChARM.
△ Less
Submitted 12 July, 2023;
originally announced July 2023.
-
Joint Hierarchical Priors and Adaptive Spatial Resolution for Efficient Neural Image Compression
Authors:
Ahmed Ghorbel,
Wassim Hamidouche,
Luce Morin
Abstract:
Recently, the performance of neural image compression (NIC) has steadily improved thanks to the last line of study, reaching or outperforming state-of-the-art conventional codecs. Despite significant progress, current NIC methods still rely on ConvNet-based entropy coding, limited in modeling long-range dependencies due to their local connectivity and the increasing number of architectural biases…
▽ More
Recently, the performance of neural image compression (NIC) has steadily improved thanks to the last line of study, reaching or outperforming state-of-the-art conventional codecs. Despite significant progress, current NIC methods still rely on ConvNet-based entropy coding, limited in modeling long-range dependencies due to their local connectivity and the increasing number of architectural biases and priors, resulting in complex underperforming models with high decoding latency. Motivated by the efficiency investigation of the Tranformer-based transform coding framework, namely SwinT-ChARM, we propose to enhance the latter, as first, with a more straightforward yet effective Tranformer-based channel-wise auto-regressive prior model, resulting in an absolute image compression transformer (ICT). Through the proposed ICT, we can capture both global and local contexts from the latent representations and better parameterize the distribution of the quantized latents. Further, we leverage a learnable scaling module with a sandwich ConvNeXt-based pre-/post-processor to accurately extract more compact latent codes while reconstructing higher-quality images. Extensive experimental results on benchmark datasets showed that the proposed framework significantly improves the trade-off between coding efficiency and decoder complexity over the versatile video coding (VVC) reference encoder (VTM-18.0) and the neural codec SwinT-ChARM. Moreover, we provide model scaling studies to verify the computational efficiency of our approach and conduct several objective and subjective analyses to bring to the fore the performance gap between the adaptive image compression transformer (AICT) and the neural codec SwinT-ChARM.
△ Less
Submitted 22 January, 2024; v1 submitted 5 July, 2023;
originally announced July 2023.
-
Quality Assessment of DIBR-synthesized views: An Overview
Authors:
Shishun Tian,
Lu Zhang,
Wenbin Zou,
Xia Li,
Ting Su,
Luce Morin,
Olivier Deforges
Abstract:
The Depth-Image-Based-Rendering (DIBR) is one of the main fundamental technique to generate new views in 3D video applications, such as Multi-View Videos (MVV), Free-Viewpoint Videos (FVV) and Virtual Reality (VR). However, the quality assessment of DIBR-synthesized views is quite different from the traditional 2D images/videos. In recent years, several efforts have been made towards this topic, b…
▽ More
The Depth-Image-Based-Rendering (DIBR) is one of the main fundamental technique to generate new views in 3D video applications, such as Multi-View Videos (MVV), Free-Viewpoint Videos (FVV) and Virtual Reality (VR). However, the quality assessment of DIBR-synthesized views is quite different from the traditional 2D images/videos. In recent years, several efforts have been made towards this topic, but there {is a lack of} detailed survey in {the} literature. In this paper, we provide a comprehensive survey on various current approaches for DIBR-synthesized views. The current accessible datasets of DIBR-synthesized views are firstly reviewed{, followed} by a summary analysis of the representative state-of-the-art objective metrics. Then, the performances of different objective metrics are evaluated and discussed on all available datasets. Finally, we discuss the potential challenges and suggest possible directions for future research.
△ Less
Submitted 27 April, 2021; v1 submitted 16 November, 2019;
originally announced November 2019.
-
Numerical simulation of model problems in plasticity based on field dislocation mechanics
Authors:
Léo Morin,
Renald Brenner,
Pierre Suquet
Abstract:
The aim of this paper is to investigate the numerical implementation of the Field Dislocation Mechanics (FDM) theory for the simulation of dislocation-mediated plasticity. First, the mesoscale FDM theory of Acharya and Roy (2006) is recalled which permits to express the set of equations under the form of a static problem, corresponding to the determination of the local stress field for a given dis…
▽ More
The aim of this paper is to investigate the numerical implementation of the Field Dislocation Mechanics (FDM) theory for the simulation of dislocation-mediated plasticity. First, the mesoscale FDM theory of Acharya and Roy (2006) is recalled which permits to express the set of equations under the form of a static problem, corresponding to the determination of the local stress field for a given dislocation density distribution, complemented by an evolution problem, corresponding to the transport of the dislocation density. The static problem is solved using FFT-based techniques (Brenner et al., 2014). The main contribution of the present study is an efficient numerical scheme based on high resolution Godunov-type solvers to solve the evolution problem. Model problems of dislocation-mediated plasticity are finally considered in a simplified layer case. First, uncoupled problems with uniform velocity are considered, which permits to reproduce annihilation of dislocations and expansion of dislocation loops. Then, the FDM theory is applied to several problems of dislocation microstructures subjected to a mechanical loading.
△ Less
Submitted 6 November, 2019;
originally announced November 2019.