Search | arXiv e-print repository

Versatile polymer method to dry-flip two-dimensional moiré hetero structures for nanoscale surface characterization

Authors: Roop Kumar Mech, Jean Spièce, Kenji Watanabe, Takashi Taniguchi, Pascal Gehring

Abstract: The recent discovery of magic angle twisted bilayer graphene (MATBG), in which two sheets of monolayer graphene are precisely stacked to a specific angle, has opened up a plethora of new opportunities in the field of topology, superconductivity, and other strongly correlated effects. Most conventional ways of preparing twisted bilayer devices require the use of high process temperatures and solven… ▽ More The recent discovery of magic angle twisted bilayer graphene (MATBG), in which two sheets of monolayer graphene are precisely stacked to a specific angle, has opened up a plethora of new opportunities in the field of topology, superconductivity, and other strongly correlated effects. Most conventional ways of preparing twisted bilayer devices require the use of high process temperatures and solvents and are not well-suited for preparing samples which need to be flipped to be compatible with characterization techniques like STM, ARPES, PFM, SThM etc. Here, we demonstrate a very simple polymer-based method using Polyvinyl Chloride (PVC), which can be used for making flipped twisted bilayer graphene devices. This allowed us to produce flipped twisted samples without the need of any solvents and with high quality as confirmed by Piezoresponse Force Microscopy. We believe that this dry flip technique can be readily extended to twist 2D materials beyond graphene, especially air-sensitive materials which require operation under inert atmosphere, where often solvents cannot be used. △ Less

Submitted 4 June, 2024; originally announced June 2024.

arXiv:2402.08654 [pdf, other]

Learning Continuous 3D Words for Text-to-Image Generation

Authors: Ta-Ying Cheng, Matheus Gadelha, Thibault Groueix, Matthew Fisher, Radomir Mech, Andrew Markham, Niki Trigoni

Abstract: Current controls over diffusion models (e.g., through text or ControlNet) for image generation fall short in recognizing abstract, continuous attributes like illumination direction or non-rigid shape change. In this paper, we present an approach for allowing users of text-to-image models to have fine-grained control of several attributes in an image. We do this by engineering special sets of input… ▽ More Current controls over diffusion models (e.g., through text or ControlNet) for image generation fall short in recognizing abstract, continuous attributes like illumination direction or non-rigid shape change. In this paper, we present an approach for allowing users of text-to-image models to have fine-grained control of several attributes in an image. We do this by engineering special sets of input tokens that can be transformed in a continuous manner -- we call them Continuous 3D Words. These attributes can, for example, be represented as sliders and applied jointly with text prompts for fine-grained control over image generation. Given only a single mesh and a rendering engine, we show that our approach can be adopted to provide continuous user control over several 3D-aware attributes, including time-of-day illumination, bird wing orientation, dollyzoom effect, and object poses. Our method is capable of conditioning image creation with multiple Continuous 3D Words and text descriptions simultaneously while adding no overhead to the generative process. Project Page: https://ttchengab.github.io/continuous_3d_words △ Less

Submitted 13 February, 2024; originally announced February 2024.

Comments: Project Page: https://ttchengab.github.io/continuous_3d_words

arXiv:2310.19188 [pdf, other]

3DMiner: Discovering Shapes from Large-Scale Unannotated Image Datasets

Authors: Ta-Ying Cheng, Matheus Gadelha, Soren Pirk, Thibault Groueix, Radomir Mech, Andrew Markham, Niki Trigoni

Abstract: We present 3DMiner -- a pipeline for mining 3D shapes from challenging large-scale unannotated image datasets. Unlike other unsupervised 3D reconstruction methods, we assume that, within a large-enough dataset, there must exist images of objects with similar shapes but varying backgrounds, textures, and viewpoints. Our approach leverages the recent advances in learning self-supervised image repres… ▽ More We present 3DMiner -- a pipeline for mining 3D shapes from challenging large-scale unannotated image datasets. Unlike other unsupervised 3D reconstruction methods, we assume that, within a large-enough dataset, there must exist images of objects with similar shapes but varying backgrounds, textures, and viewpoints. Our approach leverages the recent advances in learning self-supervised image representations to cluster images with geometrically similar shapes and find common image correspondences between them. We then exploit these correspondences to obtain rough camera estimates as initialization for bundle-adjustment. Finally, for every image cluster, we apply a progressive bundle-adjusting reconstruction method to learn a neural occupancy field representing the underlying shape. We show that this procedure is robust to several types of errors introduced in previous steps (e.g., wrong camera poses, images containing dissimilar shapes, etc.), allowing us to obtain shape and pose annotations for images in-the-wild. When using images from Pix3D chairs, our method is capable of producing significantly better results than state-of-the-art unsupervised 3D reconstruction techniques, both quantitatively and qualitatively. Furthermore, we show how 3DMiner can be applied to in-the-wild data by reconstructing shapes present in images from the LAION-5B dataset. Project Page: https://ttchengab.github.io/3dminerOfficial △ Less

Submitted 29 October, 2023; originally announced October 2023.

Comments: In ICCV 2023

arXiv:2208.05044 [pdf]

Two-dimensional cuprate nanodetector with single photon sensitivity at T = 20 K

Authors: Rafael Luque Merino, Paul Seifert, Jose Duran Retamal, Roop Mech, Takashi Taniguchi, Kenji Watanabe, Kazuo Kadowaki, Robert H. Hadfield, Dmitri K. Efetov

Abstract: Detecting light at the single-photon level is one of the pillars of emergent photonic technologies. This is realized through state-of-the-art superconducting detectors that offer efficient, broadband and fast response. However, the use of superconducting thin films with low TC limits their operation temperature below 4K. In this work, we demonstrate proof-of-concept nanodetectors based on exfoliat… ▽ More Detecting light at the single-photon level is one of the pillars of emergent photonic technologies. This is realized through state-of-the-art superconducting detectors that offer efficient, broadband and fast response. However, the use of superconducting thin films with low TC limits their operation temperature below 4K. In this work, we demonstrate proof-of-concept nanodetectors based on exfoliated, two-dimensional cuprate superconductor Bi2Sr2CaCu2O8-δ (BSCCO) that exhibit single-photon sensitivity at telecom wavelength at a record temperature of T = 20K. These non-optimized devices exhibit a slow (ms) reset time and a low detection efficiency (10^(-4)). We realize the elusive prospect of single-photon sensitivity on a high-TC nanodetector thanks to a novel approach, combining van der Waals fabrication techniques and a non-invasive nanopatterning based on light ion irradiation. This result paves the way for broader application of single-photon technologies, relaxing the cryogenic constraints for single-photon detection at telecom wavelength. △ Less

Submitted 9 August, 2022; originally announced August 2022.

arXiv:2207.01044 [pdf, other]

doi 10.1145/3528223.3530173

MatFormer: A Generative Model for Procedural Materials

Authors: Paul Guerrero, Miloš Hašan, Kalyan Sunkavalli, Radomír Měch, Tamy Boubekeur, Niloy J. Mitra

Abstract: Procedural material graphs are a compact, parameteric, and resolution-independent representation that are a popular choice for material authoring. However, designing procedural materials requires significant expertise and publicly accessible libraries contain only a few thousand such graphs. We present MatFormer, a generative model that can produce a diverse set of high-quality procedural material… ▽ More Procedural material graphs are a compact, parameteric, and resolution-independent representation that are a popular choice for material authoring. However, designing procedural materials requires significant expertise and publicly accessible libraries contain only a few thousand such graphs. We present MatFormer, a generative model that can produce a diverse set of high-quality procedural materials with complex spatial patterns and appearance. While procedural materials can be modeled as directed (operation) graphs, they contain arbitrary numbers of heterogeneous nodes with unstructured, often long-range node connections, and functional constraints on node parameters and connections. MatFormer addresses these challenges with a multi-stage transformer-based model that sequentially generates nodes, node parameters, and edges, while ensuring the semantic validity of the graph. In addition to generation, MatFormer can be used for the auto-completion and exploration of partial material graphs. We qualitatively and quantitatively demonstrate that our method outperforms alternative approaches, in both generated graph and material quality. △ Less

Submitted 15 August, 2022; v1 submitted 3 July, 2022; originally announced July 2022.

Journal ref: ACM Transactions on Graphics, Volume 41, Issue 4 (Proceedings of Siggraph 2022)

arXiv:2207.00182 [pdf, other]

Recovering Detail in 3D Shapes Using Disparity Maps

Authors: Marissa Ramirez de Chanlatte, Matheus Gadelha, Thibault Groueix, Radomir Mech

Abstract: We present a fine-tuning method to improve the appearance of 3D geometries reconstructed from single images. We leverage advances in monocular depth estimation to obtain disparity maps and present a novel approach to transforming 2D normalized disparity maps into 3D point clouds by using shape priors to solve an optimization on the relevant camera parameters. After creating a 3D point cloud from d… ▽ More We present a fine-tuning method to improve the appearance of 3D geometries reconstructed from single images. We leverage advances in monocular depth estimation to obtain disparity maps and present a novel approach to transforming 2D normalized disparity maps into 3D point clouds by using shape priors to solve an optimization on the relevant camera parameters. After creating a 3D point cloud from disparity, we introduce a method to combine the new point cloud with existing information to form a more faithful and detailed final geometry. We demonstrate the efficacy of our approach with multiple experiments on both synthetic and real images. △ Less

Submitted 18 October, 2022; v1 submitted 30 June, 2022; originally announced July 2022.

arXiv:2205.13682 [pdf, other]

doi 10.1109/TVCG.2023.3265306

ANISE: Assembly-based Neural Implicit Surface rEconstruction

Authors: Dmitry Petrov, Matheus Gadelha, Radomir Mech, Evangelos Kalogerakis

Abstract: We present ANISE, a method that reconstructs a 3D~shape from partial observations (images or sparse point clouds) using a part-aware neural implicit shape representation. The shape is formulated as an assembly of neural implicit functions, each representing a different part instance. In contrast to previous approaches, the prediction of this representation proceeds in a coarse-to-fine manner. Our… ▽ More We present ANISE, a method that reconstructs a 3D~shape from partial observations (images or sparse point clouds) using a part-aware neural implicit shape representation. The shape is formulated as an assembly of neural implicit functions, each representing a different part instance. In contrast to previous approaches, the prediction of this representation proceeds in a coarse-to-fine manner. Our model first reconstructs a structural arrangement of the shape in the form of geometric transformations of its part instances. Conditioned on them, the model predicts part latent codes encoding their surface geometry. Reconstructions can be obtained in two ways: (i) by directly decoding the part latent codes to part implicit functions, then combining them into the final shape; or (ii) by using part latents to retrieve similar part instances in a part database and assembling them in a single shape. We demonstrate that, when performing reconstruction by decoding part representations into implicit functions, our method achieves state-of-the-art part-aware reconstruction results from both images and sparse point clouds.When reconstructing shapes by assembling parts retrieved from a dataset, our approach significantly outperforms traditional shape retrieval methods even when significantly restricting the database size. We present our results in well-known sparse point cloud reconstruction and single-view reconstruction benchmarks. △ Less

Submitted 5 July, 2023; v1 submitted 26 May, 2022; originally announced May 2022.

arXiv:2109.00113 [pdf, other]

CPFN: Cascaded Primitive Fitting Networks for High-Resolution Point Clouds

Authors: Eric-Tuan Lê, Minhyuk Sung, Duygu Ceylan, Radomir Mech, Tamy Boubekeur, Niloy J. Mitra

Abstract: Representing human-made objects as a collection of base primitives has a long history in computer vision and reverse engineering. In the case of high-resolution point cloud scans, the challenge is to be able to detect both large primitives as well as those explaining the detailed parts. While the classical RANSAC approach requires case-specific parameter tuning, state-of-the-art networks are limit… ▽ More Representing human-made objects as a collection of base primitives has a long history in computer vision and reverse engineering. In the case of high-resolution point cloud scans, the challenge is to be able to detect both large primitives as well as those explaining the detailed parts. While the classical RANSAC approach requires case-specific parameter tuning, state-of-the-art networks are limited by memory consumption of their backbone modules such as PointNet++, and hence fail to detect the fine-scale primitives. We present Cascaded Primitive Fitting Networks (CPFN) that relies on an adaptive patch sampling network to assemble detection results of global and local primitive detection networks. As a key enabler, we present a merging formulation that dynamically aggregates the primitives across global and local scales. Our evaluation demonstrates that CPFN improves the state-of-the-art SPFN performance by 13-14% on high-resolution point cloud datasets and specifically improves the detection of fine-scale primitives by 20-22%. △ Less

Submitted 6 September, 2021; v1 submitted 31 August, 2021; originally announced September 2021.

Comments: ICCV 2021: 15 pages, 8 figures

Journal ref: ICCV 2021

arXiv:2102.09105 [pdf, other]

DeepMetaHandles: Learning Deformation Meta-Handles of 3D Meshes with Biharmonic Coordinates

Authors: Minghua Liu, Minhyuk Sung, Radomir Mech, Hao Su

Abstract: We propose DeepMetaHandles, a 3D conditional generative model based on mesh deformation. Given a collection of 3D meshes of a category and their deformation handles (control points), our method learns a set of meta-handles for each shape, which are represented as combinations of the given handles. The disentangled meta-handles factorize all the plausible deformations of the shape, while each of th… ▽ More We propose DeepMetaHandles, a 3D conditional generative model based on mesh deformation. Given a collection of 3D meshes of a category and their deformation handles (control points), our method learns a set of meta-handles for each shape, which are represented as combinations of the given handles. The disentangled meta-handles factorize all the plausible deformations of the shape, while each of them corresponds to an intuitive deformation. A new deformation can then be generated by sampling the coefficients of the meta-handles in a specific range. We employ biharmonic coordinates as the deformation function, which can smoothly propagate the control points' translations to the entire mesh. To avoid learning zero deformation as meta-handles, we incorporate a target-fitting module which deforms the input mesh to match a random target. To enhance deformations' plausibility, we employ a soft-rasterizer-based discriminator that projects the meshes to a 2D space. Our experiments demonstrate the superiority of the generated deformations as well as the interpretability and consistency of the learned meta-handles. △ Less

Submitted 28 March, 2021; v1 submitted 17 February, 2021; originally announced February 2021.

Comments: CVPR2021 oral

arXiv:2005.04464 [pdf, other]

FAME: 3D Shape Generation via Functionality-Aware Model Evolution

Authors: Yanran Guan, Han Liu, Kun Liu, Kangxue Yin, Ruizhen Hu, Oliver van Kaick, Yan Zhang, Ersin Yumer, Nathan Carr, Radomir Mech, Hao Zhang

Abstract: We introduce a modeling tool which can evolve a set of 3D objects in a functionality-aware manner. Our goal is for the evolution to generate large and diverse sets of plausible 3D objects for data augmentation, constrained modeling, as well as open-ended exploration to possibly inspire new designs. Starting with an initial population of 3D objects belonging to one or more functional categories, we… ▽ More We introduce a modeling tool which can evolve a set of 3D objects in a functionality-aware manner. Our goal is for the evolution to generate large and diverse sets of plausible 3D objects for data augmentation, constrained modeling, as well as open-ended exploration to possibly inspire new designs. Starting with an initial population of 3D objects belonging to one or more functional categories, we evolve the shapes through part recombination to produce generations of hybrids or crossbreeds between parents from the heterogeneous shape collection. Evolutionary selection of offsprings is guided both by a functional plausibility score derived from functionality analysis of shapes in the initial population and user preference, as in a design gallery. Since cross-category hybridization may result in offsprings not belonging to any of the known functional categories, we develop a means for functionality partial matching to evaluate functional plausibility on partial shapes. We show a variety of plausible hybrid shapes generated by our functionality-aware model evolution, which can complement existing datasets as training data and boost the performance of contemporary data-driven segmentation schemes, especially in challenging cases. Our tool supports constrained modeling, allowing users to restrict or steer the model evolution with functionality labels. At the same time, unexpected yet functional object prototypes can emerge during open-ended exploration owing to structure breaking when evolving a heterogeneous collection. △ Less

Submitted 19 November, 2020; v1 submitted 9 May, 2020; originally announced May 2020.

arXiv:2004.03028 [pdf, other]

Learning Generative Models of Shape Handles

Authors: Matheus Gadelha, Giorgio Gori, Duygu Ceylan, Radomir Mech, Nathan Carr, Tamy Boubekeur, Rui Wang, Subhransu Maji

Abstract: We present a generative model to synthesize 3D shapes as sets of handles -- lightweight proxies that approximate the original 3D shape -- for applications in interactive editing, shape parsing, and building compact 3D representations. Our model can generate handle sets with varying cardinality and different types of handles (Figure 1). Key to our approach is a deep architecture that predicts both… ▽ More We present a generative model to synthesize 3D shapes as sets of handles -- lightweight proxies that approximate the original 3D shape -- for applications in interactive editing, shape parsing, and building compact 3D representations. Our model can generate handle sets with varying cardinality and different types of handles (Figure 1). Key to our approach is a deep architecture that predicts both the parameters and existence of shape handles, and a novel similarity measure that can easily accommodate different types of handles, such as cuboids or sphere-meshes. We leverage the recent advances in semantic 3D annotation as well as automatic shape summarizing techniques to supervise our approach. We show that the resulting shape representations are intuitive and achieve superior quality than previous state-of-the-art. Finally, we demonstrate how our method can be used in applications such as interactive shape editing, completion, and interpolation, leveraging the latent space learned by our model to guide these tasks. Project page: http://mgadelha.me/shapehandles. △ Less

Submitted 6 April, 2020; originally announced April 2020.

Comments: 11 pages, 11 figures, accepted do CVPR 2020

arXiv:2003.12181 [pdf, other]

ParSeNet: A Parametric Surface Fitting Network for 3D Point Clouds

Authors: Gopal Sharma, Difan Liu, Subhransu Maji, Evangelos Kalogerakis, Siddhartha Chaudhuri, Radomír Měch

Abstract: We propose a novel, end-to-end trainable, deep network called ParSeNet that decomposes a 3D point cloud into parametric surface patches, including B-spline patches as well as basic geometric primitives. ParSeNet is trained on a large-scale dataset of man-made 3D shapes and captures high-level semantic priors for shape decomposition. It handles a much richer class of primitives than prior work, and… ▽ More We propose a novel, end-to-end trainable, deep network called ParSeNet that decomposes a 3D point cloud into parametric surface patches, including B-spline patches as well as basic geometric primitives. ParSeNet is trained on a large-scale dataset of man-made 3D shapes and captures high-level semantic priors for shape decomposition. It handles a much richer class of primitives than prior work, and allows us to represent surfaces with higher fidelity. It also produces repeatable and robust parametrizations of a surface compared to purely geometric approaches. We present extensive experiments to validate our approach against analytical and learning-based alternatives. Our source code is publicly available at: https://hippogriff.github.io/parsenet. △ Less

Submitted 22 September, 2020; v1 submitted 26 March, 2020; originally announced March 2020.

arXiv:1906.06841 [pdf, other]

LPaintB: Learning to Paint from Self-Supervision

Authors: Biao Jia, Jonathan Brandt, Radomir Mech, Byungmoon Kim, Dinesh Manocha

Abstract: We present a novel reinforcement learning-based natural media painting algorithm. Our goal is to reproduce a reference image using brush strokes and we encode the objective through observations. Our formulation takes into account that the distribution of the reward in the action space is sparse and training a reinforcement learning algorithm from scratch can be difficult. We present an approach th… ▽ More We present a novel reinforcement learning-based natural media painting algorithm. Our goal is to reproduce a reference image using brush strokes and we encode the objective through observations. Our formulation takes into account that the distribution of the reward in the action space is sparse and training a reinforcement learning algorithm from scratch can be difficult. We present an approach that combines self-supervised learning and reinforcement learning to effectively transfer negative samples into positive ones and change the reward distribution. We demonstrate the benefits of our painting agent to reproduce reference images with brush strokes. The training phase takes about one hour and the runtime algorithm takes about 30 seconds on a GTX1080 GPU reproducing a 1000x800 image with 20,000 strokes. △ Less

Submitted 21 September, 2019; v1 submitted 17 June, 2019; originally announced June 2019.

arXiv:1905.10711 [pdf, other]

DISN: Deep Implicit Surface Network for High-quality Single-view 3D Reconstruction

Authors: Qiangeng Xu, Weiyue Wang, Duygu Ceylan, Radomir Mech, Ulrich Neumann

Abstract: Reconstructing 3D shapes from single-view images has been a long-standing research problem. In this paper, we present DISN, a Deep Implicit Surface Network which can generate a high-quality detail-rich 3D mesh from an 2D image by predicting the underlying signed distance fields. In addition to utilizing global image features, DISN predicts the projected location for each 3D point on the 2D image,… ▽ More Reconstructing 3D shapes from single-view images has been a long-standing research problem. In this paper, we present DISN, a Deep Implicit Surface Network which can generate a high-quality detail-rich 3D mesh from an 2D image by predicting the underlying signed distance fields. In addition to utilizing global image features, DISN predicts the projected location for each 3D point on the 2D image, and extracts local features from the image feature maps. Combining global and local features significantly improves the accuracy of the signed distance field prediction, especially for the detail-rich areas. To the best of our knowledge, DISN is the first method that constantly captures details such as holes and thin structures present in 3D shapes from single-view images. DISN achieves the state-of-the-art single-view reconstruction performance on a variety of shape categories reconstructed from both synthetic and real images. Code is available at https://github.com/xharlie/DISN The supplementary can be found at https://xharlie.github.io/images/neurips_2019_supp.pdf △ Less

Submitted 25 March, 2024; v1 submitted 25 May, 2019; originally announced May 2019.

Comments: This project was in part supported by the gift funding to the University of Southern California from Adobe Research

Journal ref: 33rd Annual Conference on Neural Information Processing Systems (NeurIPS 2019)

arXiv:1903.03322 [pdf, other]

3DN: 3D Deformation Network

Authors: Weiyue Wang, Duygu Ceylan, Radomir Mech, Ulrich Neumann

Abstract: Applications in virtual and augmented reality create a demand for rapid creation and easy access to large sets of 3D models. An effective way to address this demand is to edit or deform existing 3D models based on a reference, e.g., a 2D image which is very easy to acquire. Given such a source 3D model and a target which can be a 2D image, 3D model, or a point cloud acquired as a depth scan, we in… ▽ More Applications in virtual and augmented reality create a demand for rapid creation and easy access to large sets of 3D models. An effective way to address this demand is to edit or deform existing 3D models based on a reference, e.g., a 2D image which is very easy to acquire. Given such a source 3D model and a target which can be a 2D image, 3D model, or a point cloud acquired as a depth scan, we introduce 3DN, an end-to-end network that deforms the source model to resemble the target. Our method infers per-vertex offset displacements while kee** the mesh connectivity of the source model fixed. We present a training strategy which uses a novel differentiable operation, mesh sampling operator, to generalize our method across source and target models with varying mesh densities. Mesh sampling operator can be seamlessly integrated into the network to handle meshes with different topologies. Qualitative and quantitative results show that our method generates higher quality results compared to the state-of-the art learning-based methods for 3D shape generation. Code is available at github.com/laughtervv/3DN. △ Less

Submitted 8 March, 2019; originally announced March 2019.

arXiv:1901.00542 [pdf, other]

Photo-Sketching: Inferring Contour Drawings from Images

Authors: Mengtian Li, Zhe Lin, Radomir Mech, Ersin Yumer, Deva Ramanan

Abstract: Edges, boundaries and contours are important subjects of study in both computer graphics and computer vision. On one hand, they are the 2D elements that convey 3D shapes, on the other hand, they are indicative of occlusion events and thus separation of objects or semantic concepts. In this paper, we aim to generate contour drawings, boundary-like drawings that capture the outline of the visual sce… ▽ More Edges, boundaries and contours are important subjects of study in both computer graphics and computer vision. On one hand, they are the 2D elements that convey 3D shapes, on the other hand, they are indicative of occlusion events and thus separation of objects or semantic concepts. In this paper, we aim to generate contour drawings, boundary-like drawings that capture the outline of the visual scene. Prior art often cast this problem as boundary detection. However, the set of visual cues presented in the boundary detection output are different from the ones in contour drawings, and also the artistic style is ignored. We address these issues by collecting a new dataset of contour drawings and proposing a learning-based method that resolves diversity in the annotation and, unlike boundary detectors, can work with imperfect alignment of the annotation and the actual ground truth. Our method surpasses previous methods quantitatively and qualitatively. Surprisingly, when our model fine-tunes on BSDS500, we achieve the state-of-the-art performance in salient boundary detection, suggesting contour drawing might be a scalable alternative to boundary annotation, which at the same time is easier and more interesting for annotators to draw. △ Less

Submitted 2 January, 2019; originally announced January 2019.

Comments: WACV 2019. For code and dataset, see http://www.cs.cmu.edu/~mengtial/proj/sketch/

arXiv:1707.05911 [pdf, other]

Recognizing and Curating Photo Albums via Event-Specific Image Importance

Authors: Yufei Wang, Zhe Lin, Xiaohui Shen, Radomir Mech, Gavin Miller, Garrison W. Cottrell

Abstract: Automatic organization of personal photos is a problem with many real world ap- plications, and can be divided into two main tasks: recognizing the event type of the photo collection, and selecting interesting images from the collection. In this paper, we attempt to simultaneously solve both tasks: album-wise event recognition and image- wise importance prediction. We collected an album dataset wi… ▽ More Automatic organization of personal photos is a problem with many real world ap- plications, and can be divided into two main tasks: recognizing the event type of the photo collection, and selecting interesting images from the collection. In this paper, we attempt to simultaneously solve both tasks: album-wise event recognition and image- wise importance prediction. We collected an album dataset with both event type labels and image importance labels, refined from an existing CUFED dataset. We propose a hybrid system consisting of three parts: A siamese network-based event-specific image importance prediction, a Convolutional Neural Network (CNN) that recognizes the event type, and a Long Short-Term Memory (LSTM)-based sequence level event recognizer. We propose an iterative updating procedure for event type and image importance score prediction. We experimentally verified that image importance score prediction and event type recognition can each help the performance of the other. △ Less

Submitted 18 July, 2017; originally announced July 2017.

Comments: Accepted as oral in BMVC 2017

arXiv:1612.01635 [pdf, other]

Learning to Detect Multiple Photographic Defects

Authors: Ning Yu, Xiaohui Shen, Zhe Lin, Radomir Mech, Connelly Barnes

Abstract: In this paper, we introduce the problem of simultaneously detecting multiple photographic defects. We aim at detecting the existence, severity, and potential locations of common photographic defects related to color, noise, blur and composition. The automatic detection of such defects could be used to provide users with suggestions for how to improve photos without the need to laboriously try vari… ▽ More In this paper, we introduce the problem of simultaneously detecting multiple photographic defects. We aim at detecting the existence, severity, and potential locations of common photographic defects related to color, noise, blur and composition. The automatic detection of such defects could be used to provide users with suggestions for how to improve photos without the need to laboriously try various correction methods. Defect detection could also help users select photos of higher quality while filtering out those with severe defects in photo curation and summarization. To investigate this problem, we collected a large-scale dataset of user annotations on seven common photographic defects, which allows us to evaluate algorithms by measuring their consistency with human judgments. Our new dataset enables us to formulate the problem as a multi-task learning problem and train a multi-column deep convolutional neural network (CNN) to simultaneously predict the severity of all the defects. Unlike some existing single-defect estimation methods that rely on low-level statistics and may fail in many cases on natural photographs, our model is able to understand image contents and quality at a higher level. As a result, in our experiments, we show that our model has predictions with much higher consistency with human judgments than low-level methods as well as several baseline CNN models. Our model also performs better than an average human from our user study. △ Less

Submitted 8 March, 2018; v1 submitted 5 December, 2016; originally announced December 2016.

Comments: Accepted to WACV'18

arXiv:1607.07525 [pdf, other]

Salient Object Subitizing

Authors: Jianming Zhang, Shugao Ma, Mehrnoosh Sameki, Stan Sclaroff, Margrit Betke, Zhe Lin, Xiaohui Shen, Brian Price, Radomir Mech

Abstract: We study the problem of Salient Object Subitizing, i.e. predicting the existence and the number of salient objects in an image using holistic cues. This task is inspired by the ability of people to quickly and accurately identify the number of items within the subitizing range (1-4). To this end, we present a salient object subitizing image dataset of about 14K everyday images which are annotated… ▽ More We study the problem of Salient Object Subitizing, i.e. predicting the existence and the number of salient objects in an image using holistic cues. This task is inspired by the ability of people to quickly and accurately identify the number of items within the subitizing range (1-4). To this end, we present a salient object subitizing image dataset of about 14K everyday images which are annotated using an online crowdsourcing marketplace. We show that using an end-to-end trained Convolutional Neural Network (CNN) model, we achieve prediction accuracy comparable to human performance in identifying images with zero or one salient object. For images with multiple salient objects, our model also provides significantly better than chance performance without requiring any localization process. Moreover, we propose a method to improve the training of the CNN subitizing model by leveraging synthetic images. In experiments, we demonstrate the accuracy and generalizability of our CNN subitizing model and its applications in salient object detection and image retrieval. △ Less

Submitted 25 July, 2016; originally announced July 2016.

arXiv:1606.01621 [pdf, other]

Photo Aesthetics Ranking Network with Attributes and Content Adaptation

Authors: Shu Kong, Xiaohui Shen, Zhe Lin, Radomir Mech, Charless Fowlkes

Abstract: Real-world applications could benefit from the ability to automatically generate a fine-grained ranking of photo aesthetics. However, previous methods for image aesthetics analysis have primarily focused on the coarse, binary categorization of images into high- or low-aesthetic categories. In this work, we propose to learn a deep convolutional neural network to rank photo aesthetics in which the r… ▽ More Real-world applications could benefit from the ability to automatically generate a fine-grained ranking of photo aesthetics. However, previous methods for image aesthetics analysis have primarily focused on the coarse, binary categorization of images into high- or low-aesthetic categories. In this work, we propose to learn a deep convolutional neural network to rank photo aesthetics in which the relative ranking of photo aesthetics are directly modeled in the loss function. Our model incorporates joint learning of meaningful photographic attributes and image content information which can help regularize the complicated photo aesthetics rating problem. To train and analyze this model, we have assembled a new aesthetics and attributes database (AADB) which contains aesthetic scores and meaningful attributes assigned to each image by multiple human raters. Anonymized rater identities are recorded across images allowing us to exploit intra-rater consistency using a novel sampling strategy when computing the ranking loss of training image pairs. We show the proposed sampling strategy is very effective and robust in face of subjective judgement of image aesthetics by individuals with different aesthetic tastes. Experiments demonstrate that our unified model can generate aesthetic rankings that are more consistent with human ratings. To further validate our model, we show that by simply thresholding the estimated aesthetic scores, we are able to achieve state-or-the-art classification performance on the existing AVA dataset benchmark. △ Less

Submitted 26 July, 2016; v1 submitted 6 June, 2016; originally announced June 2016.

Showing 1–20 of 20 results for author: Mech, R