Search | arXiv e-print repository

L-MAGIC: Language Model Assisted Generation of Images with Coherence

Authors: Zhipeng Cai, Matthias Mueller, Reiner Birkl, Diana Wofk, Shao-Yen Tseng, JunDa Cheng, Gabriela Ben-Melech Stan, Vasudev Lal, Michael Paulitsch

Abstract: In the current era of generative AI breakthroughs, generating panoramic scenes from a single input image remains a key challenge. Most existing methods use diffusion-based iterative or simultaneous multi-view inpainting. However, the lack of global scene layout priors leads to subpar outputs with duplicated objects (e.g., multiple beds in a bedroom) or requires time-consuming human text inputs for… ▽ More In the current era of generative AI breakthroughs, generating panoramic scenes from a single input image remains a key challenge. Most existing methods use diffusion-based iterative or simultaneous multi-view inpainting. However, the lack of global scene layout priors leads to subpar outputs with duplicated objects (e.g., multiple beds in a bedroom) or requires time-consuming human text inputs for each view. We propose L-MAGIC, a novel method leveraging large language models for guidance while diffusing multiple coherent views of 360 degree panoramic scenes. L-MAGIC harnesses pre-trained diffusion and language models without fine-tuning, ensuring zero-shot performance. The output quality is further enhanced by super-resolution and multi-view fusion techniques. Extensive experiments demonstrate that the resulting panoramic scenes feature better scene layouts and perspective view rendering quality compared to related works, with >70% preference in human evaluations. Combined with conditional diffusion models, L-MAGIC can accept various input modalities, including but not limited to text, depth maps, sketches, and colored scripts. Applying depth estimation further enables 3D point cloud generation and dynamic scene exploration with fluid camera motion. Code is available at https://github.com/IntelLabs/MMPano. The video presentation is available at https://youtu.be/XDMNEzH4-Ec?list=PLG9Zyvu7iBa0-a7ccNLO8LjcVRAoMn57s. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: accepted to CVPR 2024

arXiv:2403.19319 [pdf, other]

Mesh2NeRF: Direct Mesh Supervision for Neural Radiance Field Representation and Generation

Authors: Yu** Chen, Yinyu Nie, Benjamin Ummenhofer, Reiner Birkl, Michael Paulitsch, Matthias Müller, Matthias Nießner

Abstract: We present Mesh2NeRF, an approach to derive ground-truth radiance fields from textured meshes for 3D generation tasks. Many 3D generative approaches represent 3D scenes as radiance fields for training. Their ground-truth radiance fields are usually fitted from multi-view renderings from a large-scale synthetic 3D dataset, which often results in artifacts due to occlusions or under-fitting issues.… ▽ More We present Mesh2NeRF, an approach to derive ground-truth radiance fields from textured meshes for 3D generation tasks. Many 3D generative approaches represent 3D scenes as radiance fields for training. Their ground-truth radiance fields are usually fitted from multi-view renderings from a large-scale synthetic 3D dataset, which often results in artifacts due to occlusions or under-fitting issues. In Mesh2NeRF, we propose an analytic solution to directly obtain ground-truth radiance fields from 3D meshes, characterizing the density field with an occupancy function featuring a defined surface thickness, and determining view-dependent color through a reflection function considering both the mesh and environment lighting. Mesh2NeRF extracts accurate radiance fields which provides direct supervision for training generative NeRFs and single scene representation. We validate the effectiveness of Mesh2NeRF across various tasks, achieving a noteworthy 3.12dB improvement in PSNR for view synthesis in single scene representation on the ABO dataset, a 0.69 PSNR enhancement in the single-view conditional generation of ShapeNet Cars, and notably improved mesh extraction from NeRF in the unconditional generation of Objaverse Mugs. △ Less

Submitted 28 March, 2024; originally announced March 2024.

Comments: Project page: https://terencecyj.github.io/projects/Mesh2NeRF/ Video: https://youtu.be/oufv1N3f7iY

arXiv:2307.14460 [pdf, other]

MiDaS v3.1 -- A Model Zoo for Robust Monocular Relative Depth Estimation

Authors: Reiner Birkl, Diana Wofk, Matthias Müller

Abstract: We release MiDaS v3.1 for monocular depth estimation, offering a variety of new models based on different encoder backbones. This release is motivated by the success of transformers in computer vision, with a large variety of pretrained vision transformers now available. We explore how using the most promising vision transformers as image encoders impacts depth estimation quality and runtime of th… ▽ More We release MiDaS v3.1 for monocular depth estimation, offering a variety of new models based on different encoder backbones. This release is motivated by the success of transformers in computer vision, with a large variety of pretrained vision transformers now available. We explore how using the most promising vision transformers as image encoders impacts depth estimation quality and runtime of the MiDaS architecture. Our investigation also includes recent convolutional approaches that achieve comparable quality to vision transformers in image classification tasks. While the previous release MiDaS v3.0 solely leverages the vanilla vision transformer ViT, MiDaS v3.1 offers additional models based on BEiT, Swin, SwinV2, Next-ViT and LeViT. These models offer different performance-runtime tradeoffs. The best model improves the depth estimation quality by 28% while efficient models enable downstream tasks requiring high frame rates. We also describe the general process for integrating new backbones. A video summarizing the work can be found at https://youtu.be/UjaeNNFf9sE and the code is available at https://github.com/isl-org/MiDaS. △ Less

Submitted 26 July, 2023; originally announced July 2023.

Comments: 14 pages, 2 figures

arXiv:2302.12288 [pdf, other]

ZoeDepth: Zero-shot Transfer by Combining Relative and Metric Depth

Authors: Shariq Farooq Bhat, Reiner Birkl, Diana Wofk, Peter Wonka, Matthias Müller

Abstract: This paper tackles the problem of depth estimation from a single image. Existing work either focuses on generalization performance disregarding metric scale, i.e. relative depth estimation, or state-of-the-art results on specific datasets, i.e. metric depth estimation. We propose the first approach that combines both worlds, leading to a model with excellent generalization performance while mainta… ▽ More This paper tackles the problem of depth estimation from a single image. Existing work either focuses on generalization performance disregarding metric scale, i.e. relative depth estimation, or state-of-the-art results on specific datasets, i.e. metric depth estimation. We propose the first approach that combines both worlds, leading to a model with excellent generalization performance while maintaining metric scale. Our flagship model, ZoeD-M12-NK, is pre-trained on 12 datasets using relative depth and fine-tuned on two datasets using metric depth. We use a lightweight head with a novel bin adjustment design called metric bins module for each domain. During inference, each input image is automatically routed to the appropriate head using a latent classifier. Our framework admits multiple configurations depending on the datasets used for relative depth pre-training and metric fine-tuning. Without pre-training, we can already significantly improve the state of the art (SOTA) on the NYU Depth v2 indoor dataset. Pre-training on twelve datasets and fine-tuning on the NYU Depth v2 indoor dataset, we can further improve SOTA for a total of 21% in terms of relative absolute error (REL). Finally, ZoeD-M12-NK is the first model that can jointly train on multiple datasets (NYU Depth v2 and KITTI) without a significant drop in performance and achieve unprecedented zero-shot generalization performance to eight unseen datasets from both indoor and outdoor domains. The code and pre-trained models are publicly available at https://github.com/isl-org/ZoeDepth . △ Less

Submitted 23 February, 2023; originally announced February 2023.

arXiv:1712.02595 [pdf, other]

doi 10.1109/TII.2018.2794997

Gaussian Process Regression for In-situ Capacity Estimation of Lithium-ion Batteries

Authors: Robert R. Richardson, Christoph R. Birkl, Michael A. Osborne, David A. Howey

Abstract: Accurate on-board capacity estimation is of critical importance in lithium-ion battery applications. Battery charging/discharging often occurs under a constant current load, and hence voltage vs. time measurements under this condition may be accessible in practice. This paper presents a data-driven diagnostic technique, Gaussian Process regression for In-situ Capacity Estimation (GP-ICE), which es… ▽ More Accurate on-board capacity estimation is of critical importance in lithium-ion battery applications. Battery charging/discharging often occurs under a constant current load, and hence voltage vs. time measurements under this condition may be accessible in practice. This paper presents a data-driven diagnostic technique, Gaussian Process regression for In-situ Capacity Estimation (GP-ICE), which estimates battery capacity using voltage measurements over short periods of galvanostatic operation. Unlike previous works, GP-ICE does not rely on interpreting the voltage-time data as Incremental Capacity (IC) or Differential Voltage (DV) curves. This overcomes the need to differentiate the voltage-time data (a process which amplifies measurement noise), and the requirement that the range of voltage measurements encompasses the peaks in the IC/DV curves. GP-ICE is applied to two datasets, consisting of 8 and 20 cells respectively. In each case, within certain voltage ranges, as little as 10 seconds of galvanostatic operation enables capacity estimates with approximately 2-3% RMSE. △ Less

Submitted 18 December, 2017; v1 submitted 7 December, 2017; originally announced December 2017.

Comments: 12 pages, 10 figures, submitted to IEEE Transactions on Industrial Informatics

Report number: TII-17-1314 MSC Class: 62P30 ACM Class: J.2; G.3

arXiv:1410.4370 [pdf, other]

doi 10.1109/GHTC.2014.6970281

Modular converter system for low-cost off-grid energy storage using second life Li-ion batteries

Authors: Christoph R. Birkl, Damien F. Frost, Adrien M. Bizeray, Robert R. Richardson, David A. Howey

Abstract: Lithium ion batteries are promising for small off- grid energy storage applications in develo** countries because of their high energy density and long life. However, costs are prohibitive. Instead, we consider 'used' Li-ion batteries for this application, finding experimentally that many discarded laptop cells, for example, still have good capacity and cycle life. In order to make safe and opti… ▽ More Lithium ion batteries are promising for small off- grid energy storage applications in develo** countries because of their high energy density and long life. However, costs are prohibitive. Instead, we consider 'used' Li-ion batteries for this application, finding experimentally that many discarded laptop cells, for example, still have good capacity and cycle life. In order to make safe and optimal use of such cells, we present a modular power management system using a separate power converter for every cell. This novel approach allows individual batteries to be used to their full capacity. The power converters operate in voltage droop control mode to provide easy charge balancing and implement a battery management system to estimate the capacity of each cell, as we demonstrate experimentally. △ Less

Submitted 16 October, 2014; originally announced October 2014.

Comments: Presented at IEEE GHTC Oct 10-14, 2014, Silicon Valley

arXiv:1011.5475 [pdf, ps, other]

doi 10.1103/PhysRevD.84.023003

Stationary, Axisymmetric Neutron Stars with Meridional Circulation in General Relativity

Authors: Reiner Birkl, Nikolaos Stergioulas, Ewald Müller

Abstract: We present the first stationary, axisymmetric neutron star models with meridional circulation in general relativity. For that purpose, we developed GRNS, a new code based on a fixed point iteration. We find a two-dimensional set of meridional circulation modes, which differ by the number of vortices in the stream lines of the neutron star fluid. For expected maximal meridional circulation velociti… ▽ More We present the first stationary, axisymmetric neutron star models with meridional circulation in general relativity. For that purpose, we developed GRNS, a new code based on a fixed point iteration. We find a two-dimensional set of meridional circulation modes, which differ by the number of vortices in the stream lines of the neutron star fluid. For expected maximal meridional circulation velocities of about 1000 km/s, the vortices cause surface deformations of about a percent. The deformations depend on the shape of the vortices close to the surface and increase with the meridional circulation velocity. We also computed models of rotating neutron stars with meridional circulation, where neither the surface rotates nor does the rotation velocity exceed the circulation velocity. △ Less

Submitted 24 November, 2010; originally announced November 2010.

Journal ref: Phys.Rev.D84:023003,2011

arXiv:astro-ph/0608543 [pdf, ps, other]

doi 10.1051/0004-6361:20066293

Neutrino pair annihilation near accreting, stellar-mass black holes

Authors: R. Birkl, M. -A. Aloy, H. -Th. Janka, E. Mueller

Abstract: We investigate the energy-momentum deposition due to neutrino-antineutrino annihilation in the vicinity of axisymmetric, accreting black holes (BHs) by numerically ray-tracing neutrino trajectories in a Kerr space-time. Hyperaccreting stellar-mass BHs are widely considered as energy sources that can drive ultrarelativistic outflows with the potential to produce gamma-ray bursts. In contrast to e… ▽ More We investigate the energy-momentum deposition due to neutrino-antineutrino annihilation in the vicinity of axisymmetric, accreting black holes (BHs) by numerically ray-tracing neutrino trajectories in a Kerr space-time. Hyperaccreting stellar-mass BHs are widely considered as energy sources that can drive ultrarelativistic outflows with the potential to produce gamma-ray bursts. In contrast to earlier works, we provide an extensive and detailed parameter study of the influence of general relativistic (GR) effects and of different neutrinosphere geometries. These include idealized thin disks, tori, and spheres, or are constructed as non-selfgravitating equilibrium matter distributions for varied BH rotation. Considering isothermal neutrinospheres with the same temperature and surface area, we confirm previous results that compared to Newtonian calculations, GR effects increase the annihilation rate measured by an observer at infinity by a factor of 2 when the neutrinosphere is a disk. However, in case of a torus and a sphere the influence of GR effects is globally only ~25%, although locally it can be significantly larger. Independent of whether GR effects are included, disks yield the highest energy deposition rates, followed by tori and, with the lowest rates, spheres. For disks and tori, increasing the angular momentum of the BH from 0 to 1 enhances the energy deposition rate measured by an observer at infinity by roughly a factor of 2 due to the shrinking inner radius of the neutrinosphere. (abridged) △ Less

Submitted 12 December, 2006; v1 submitted 25 August, 2006; originally announced August 2006.

Comments: 18 pages, 8 figures, accepted by A&A

Showing 1–8 of 8 results for author: Birkl, R