Search | arXiv e-print repository

OpenCOLE: Towards Reproducible Automatic Graphic Design Generation

Authors: Naoto Inoue, Kento Masui, Wataru Shimoda, Kota Yamaguchi

Abstract: Automatic generation of graphic designs has recently received considerable attention. However, the state-of-the-art approaches are complex and rely on proprietary datasets, which creates reproducibility barriers. In this paper, we propose an open framework for automatic graphic design called OpenCOLE, where we build a modified version of the pioneering COLE and train our model exclusively on publi… ▽ More Automatic generation of graphic designs has recently received considerable attention. However, the state-of-the-art approaches are complex and rely on proprietary datasets, which creates reproducibility barriers. In this paper, we propose an open framework for automatic graphic design called OpenCOLE, where we build a modified version of the pioneering COLE and train our model exclusively on publicly available datasets. Based on GPT4V evaluations, our model shows promising performance comparable to the original COLE. We release the pipeline and training results to encourage open development. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: To appear as an extended abstract (EA) in Workshop on Graphic Design Understanding and Generation (in CVPR2024), code: https://github.com/CyberAgentAILab/OpenCOLE

arXiv:2403.12784 [pdf, other]

Total Disentanglement of Font Images into Style and Character Class Features

Authors: Daichi Haraguchi, Wataru Shimoda, Kota Yamaguchi, Seiichi Uchida

Abstract: In this paper, we demonstrate a total disentanglement of font images. Total disentanglement is a neural network-based method for decomposing each font image nonlinearly and completely into its style and content (i.e., character class) features. It uses a simple but careful training procedure to extract the common style feature from all `A'-`Z' images in the same font and the common content feature… ▽ More In this paper, we demonstrate a total disentanglement of font images. Total disentanglement is a neural network-based method for decomposing each font image nonlinearly and completely into its style and content (i.e., character class) features. It uses a simple but careful training procedure to extract the common style feature from all `A'-`Z' images in the same font and the common content feature from all `A' (or another class) images in different fonts. These disentangled features guarantee the reconstruction of the original font image. Various experiments have been conducted to understand the performance of total disentanglement. First, it is demonstrated that total disentanglement is achievable with very high accuracy; this is experimental proof of the long-standing open question, ``Does `A'-ness exist?'' Hofstadter (1985). Second, it is demonstrated that the disentangled features produced by total disentanglement apply to a variety of tasks, including font recognition, character recognition, and one-shot font image generation. △ Less

Submitted 19 March, 2024; originally announced March 2024.

arXiv:2311.13602 [pdf, other]

Retrieval-Augmented Layout Transformer for Content-Aware Layout Generation

Authors: Daichi Horita, Naoto Inoue, Kotaro Kikuchi, Kota Yamaguchi, Kiyoharu Aizawa

Abstract: Content-aware graphic layout generation aims to automatically arrange visual elements along with a given content, such as an e-commerce product image. In this paper, we argue that the current layout generation approaches suffer from the limited training data for the high-dimensional layout structure. We show that a simple retrieval augmentation can significantly improve the generation quality. Our… ▽ More Content-aware graphic layout generation aims to automatically arrange visual elements along with a given content, such as an e-commerce product image. In this paper, we argue that the current layout generation approaches suffer from the limited training data for the high-dimensional layout structure. We show that a simple retrieval augmentation can significantly improve the generation quality. Our model, which is named Retrieval-Augmented Layout Transformer (RALF), retrieves nearest neighbor layout examples based on an input image and feeds these results into an autoregressive generator. Our model can apply retrieval augmentation to various controllable generation tasks and yield high-quality layouts within a unified architecture. Our extensive experiments show that RALF successfully generates content-aware layouts in both constrained and unconstrained settings and significantly outperforms the baselines. △ Less

Submitted 15 April, 2024; v1 submitted 22 November, 2023; originally announced November 2023.

Comments: Accepted to CVPR 2024 (Oral), Project website: https://udonda.github.io/RALF/ , GitHub: https://github.com/CyberAgentAILab/RALF

arXiv:2309.02099 [pdf, other]

Towards Diverse and Consistent Typography Generation

Authors: Wataru Shimoda, Daichi Haraguchi, Seiichi Uchida, Kota Yamaguchi

Abstract: In this work, we consider the typography generation task that aims at producing diverse typographic styling for the given graphic document. We formulate typography generation as a fine-grained attribute generation for multiple text elements and build an autoregressive model to generate diverse typography that matches the input design context. We further propose a simple yet effective sampling appr… ▽ More In this work, we consider the typography generation task that aims at producing diverse typographic styling for the given graphic document. We formulate typography generation as a fine-grained attribute generation for multiple text elements and build an autoregressive model to generate diverse typography that matches the input design context. We further propose a simple yet effective sampling approach that respects the consistency and distinction principle of typography so that generated examples share consistent typographic styling across text elements. Our empirical study shows that our model successfully generates diverse typographic designs while preserving a consistent typographic structure. △ Less

Submitted 5 September, 2023; originally announced September 2023.

arXiv:2303.18248 [pdf, other]

Towards Flexible Multi-modal Document Models

Authors: Naoto Inoue, Kotaro Kikuchi, Edgar Simo-Serra, Mayu Otani, Kota Yamaguchi

Abstract: Creative workflows for generating graphical documents involve complex inter-related tasks, such as aligning elements, choosing appropriate fonts, or employing aesthetically harmonious colors. In this work, we attempt at building a holistic model that can jointly solve many different design tasks. Our model, which we denote by FlexDM, treats vector graphic documents as a set of multi-modal elements… ▽ More Creative workflows for generating graphical documents involve complex inter-related tasks, such as aligning elements, choosing appropriate fonts, or employing aesthetically harmonious colors. In this work, we attempt at building a holistic model that can jointly solve many different design tasks. Our model, which we denote by FlexDM, treats vector graphic documents as a set of multi-modal elements, and learns to predict masked fields such as element type, position, styling attributes, image, or text, using a unified architecture. Through the use of explicit multi-task learning and in-domain pre-training, our model can better capture the multi-modal relationships among the different document fields. Experimental results corroborate that our single FlexDM is able to successfully solve a multitude of different design tasks, while achieving performance that is competitive with task-specific and costly baselines. △ Less

Submitted 31 March, 2023; originally announced March 2023.

Comments: To be published in CVPR2023 (highlight), project page: https://cyberagentailab.github.io/flex-dm

arXiv:2303.08137 [pdf, other]

LayoutDM: Discrete Diffusion Model for Controllable Layout Generation

Authors: Naoto Inoue, Kotaro Kikuchi, Edgar Simo-Serra, Mayu Otani, Kota Yamaguchi

Abstract: Controllable layout generation aims at synthesizing plausible arrangement of element bounding boxes with optional constraints, such as type or position of a specific element. In this work, we try to solve a broad range of layout generation tasks in a single model that is based on discrete state-space diffusion models. Our model, named LayoutDM, naturally handles the structured layout data in the d… ▽ More Controllable layout generation aims at synthesizing plausible arrangement of element bounding boxes with optional constraints, such as type or position of a specific element. In this work, we try to solve a broad range of layout generation tasks in a single model that is based on discrete state-space diffusion models. Our model, named LayoutDM, naturally handles the structured layout data in the discrete representation and learns to progressively infer a noiseless layout from the initial input, where we model the layout corruption process by modality-wise discrete diffusion. For conditional generation, we propose to inject layout constraints in the form of masking or logit adjustment during inference. We show in the experiments that our LayoutDM successfully generates high-quality layouts and outperforms both task-specific and task-agnostic baselines on several layout tasks. △ Less

Submitted 14 March, 2023; originally announced March 2023.

Comments: To be published in CVPR2023, project page: https://cyberagentailab.github.io/layout-dm/

arXiv:2303.01308 [pdf, ps, other]

In-the-wild vibrotactile sensation: Perceptual transformation of vibrations from smartphones

Authors: Keiko Yamaguchi, Satoshi Takahashi

Abstract: Vibrations emitted by smartphones have become a part of our daily lives. The vibrations can add various meanings to the information people obtain from the screen. Hence, it is worth understanding the perceptual transformation of vibration with ordinary devices to evaluate the possibility of enriched vibrotactile communication via smartphones. This study assessed the reproducibility of vibrotactile… ▽ More Vibrations emitted by smartphones have become a part of our daily lives. The vibrations can add various meanings to the information people obtain from the screen. Hence, it is worth understanding the perceptual transformation of vibration with ordinary devices to evaluate the possibility of enriched vibrotactile communication via smartphones. This study assessed the reproducibility of vibrotactile sensations via smartphone in the in-the-wild environment. To realize improved haptic design to communicate with smartphone users smoothly, we also focused on the moderation effects of the in-the-wild environments on the vibrotactile sensations: the physical specifications of mobile devices, the manner of device operation by users, and the personal traits of the users about the desire for touch. We conducted a Web-based in-the-wild experiment instead of a laboratory experiment to reproduce an environment as close to the daily lives of users as possible. Through a series of analyses, we revealed that users perceive the weight of vibration stimuli to be higher in sensation magnitude than intensity under identical conditions of vibration stimuli. We also showed that it is desirable to consider the moderation effects of the in-the-wild environments for realizing better tactile system design to maximize the impact of vibrotactile stimuli. △ Less

Submitted 2 March, 2023; originally announced March 2023.

Comments: 8 pages, 9 figures

arXiv:2212.11541 [pdf, other]

Generative Colorization of Structured Mobile Web Pages

Authors: Kotaro Kikuchi, Naoto Inoue, Mayu Otani, Edgar Simo-Serra, Kota Yamaguchi

Abstract: Color is a critical design factor for web pages, affecting important factors such as viewer emotions and the overall trust and satisfaction of a website. Effective coloring requires design knowledge and expertise, but if this process could be automated through data-driven modeling, efficient exploration and alternative workflows would be possible. However, this direction remains underexplored due… ▽ More Color is a critical design factor for web pages, affecting important factors such as viewer emotions and the overall trust and satisfaction of a website. Effective coloring requires design knowledge and expertise, but if this process could be automated through data-driven modeling, efficient exploration and alternative workflows would be possible. However, this direction remains underexplored due to the lack of a formalization of the web page colorization problem, datasets, and evaluation protocols. In this work, we propose a new dataset consisting of e-commerce mobile web pages in a tractable format, which are created by simplifying the pages and extracting canonical color styles with a common web browser. The web page colorization problem is then formalized as a task of estimating plausible color styles for a given web page content with a given hierarchical structure of the elements. We present several Transformer-based methods that are adapted to this task by prepending structural message passing to capture hierarchical relationships between elements. Experimental results, including a quantitative evaluation designed for this task, demonstrate the advantages of our methods over statistical and image colorization methods. The code is available at https://github.com/CyberAgentAILab/webcolor. △ Less

Submitted 23 January, 2023; v1 submitted 22 December, 2022; originally announced December 2022.

Comments: Accepted to WACV 2023

arXiv:2205.03549 [pdf]

doi 10.1021/acsphotonics.2c00572

Deep Learning-enabled Detection and Classification of Bacterial Colonies using a Thin Film Transistor (TFT) Image Sensor

Authors: Yuzhu Li, Tairan Liu, Hatice Ceylan Koydemir, Hongda Wang, Keelan O'Riordan, Bijie Bai, Yuta Haga, Junji Kobashi, Hitoshi Tanaka, Takaya Tamaru, Kazunori Yamaguchi, Aydogan Ozcan

Abstract: Early detection and identification of pathogenic bacteria such as Escherichia coli (E. coli) is an essential task for public health. The conventional culture-based methods for bacterial colony detection usually take >24 hours to get the final read-out. Here, we demonstrate a bacterial colony-forming-unit (CFU) detection system exploiting a thin-film-transistor (TFT)-based image sensor array that s… ▽ More Early detection and identification of pathogenic bacteria such as Escherichia coli (E. coli) is an essential task for public health. The conventional culture-based methods for bacterial colony detection usually take >24 hours to get the final read-out. Here, we demonstrate a bacterial colony-forming-unit (CFU) detection system exploiting a thin-film-transistor (TFT)-based image sensor array that saves ~12 hours compared to the Environmental Protection Agency (EPA)-approved methods. To demonstrate the efficacy of this CFU detection system, a lensfree imaging modality was built using the TFT image sensor with a sample field-of-view of ~10 cm^2. Time-lapse images of bacterial colonies cultured on chromogenic agar plates were automatically collected at 5-minute intervals. Two deep neural networks were used to detect and count the growing colonies and identify their species. When blindly tested with 265 colonies of E. coli and other coliform bacteria (i.e., Citrobacter and Klebsiella pneumoniae), our system reached an average CFU detection rate of 97.3% at 9 hours of incubation and an average recovery rate of 91.6% at ~12 hours. This TFT-based sensor can be applied to various microbiological detection methods. Due to the large scalability, ultra-large field-of-view, and low cost of the TFT-based image sensors, this platform can be integrated with each agar plate to be tested and disposed of after the automated CFU count. The imaging field-of-view of this platform can be cost-effectively increased to >100 cm^2 to provide a massive throughput for CFU detection using, e.g., roll-to-roll manufacturing of TFTs as used in the flexible display industry. △ Less

Submitted 7 May, 2022; originally announced May 2022.

Comments: 18 Pages, 6 Figures

Journal ref: ACS Photonics (2022)

arXiv:2201.06674 [pdf, other]

TYPIC: A Corpus of Template-Based Diagnostic Comments on Argumentation

Authors: Shoichi Naito, Shintaro Sawada, Chihiro Nakagawa, Naoya Inoue, Kenshi Yamaguchi, Iori Shimizu, Farjana Sultana Mim, Keshav Singh, Kentaro Inui

Abstract: Providing feedback on the argumentation of the learner is essential for develo** critical thinking skills, however, it requires a lot of time and effort. To mitigate the overload on teachers, we aim to automate a process of providing feedback, especially giving diagnostic comments which point out the weaknesses inherent in the argumentation. It is recommended to give specific diagnostic comments… ▽ More Providing feedback on the argumentation of the learner is essential for develo** critical thinking skills, however, it requires a lot of time and effort. To mitigate the overload on teachers, we aim to automate a process of providing feedback, especially giving diagnostic comments which point out the weaknesses inherent in the argumentation. It is recommended to give specific diagnostic comments so that learners can recognize the diagnosis without misinterpretation. However, it is not obvious how the task of providing specific diagnostic comments should be formulated. We present a formulation of the task as template selection and slot filling to make an automatic evaluation easier and the behavior of the model more tractable. The key to the formulation is the possibility of creating a template set that is sufficient for practical use. In this paper, we define three criteria that a template set should satisfy: expressiveness, informativeness, and uniqueness, and verify the feasibility of creating a template set that satisfies these criteria as a first trial. We will show that it is feasible through an annotation study that converts diagnostic comments given in a text to a template format. The corpus used in the annotation study is publicly available. △ Less

Submitted 21 June, 2022; v1 submitted 17 January, 2022; originally announced January 2022.

Comments: LREC2022. The dataset is available at https://github.com/cl-tohoku/TYPIC

arXiv:2110.01890 [pdf, other]

De-rendering Stylized Texts

Authors: Wataru Shimoda, Daichi Haraguchi, Seiichi Uchida, Kota Yamaguchi

Abstract: Editing raster text is a promising but challenging task. We propose to apply text vectorization for the task of raster text editing in display media, such as posters, web pages, or advertisements. In our approach, instead of applying image transformation or generation in the raster domain, we learn a text vectorization model to parse all the rendering parameters including text, location, size, fon… ▽ More Editing raster text is a promising but challenging task. We propose to apply text vectorization for the task of raster text editing in display media, such as posters, web pages, or advertisements. In our approach, instead of applying image transformation or generation in the raster domain, we learn a text vectorization model to parse all the rendering parameters including text, location, size, font, style, effects, and hidden background, then utilize those parameters for reconstruction and any editing task. Our text vectorization takes advantage of differentiable text rendering to accurately reproduce the input raster text in a resolution-free parametric format. We show in the experiments that our approach can successfully parse text, styling, and background information in the unified model, and produces artifact-free text editing compared to a raster baseline. △ Less

Submitted 5 October, 2021; originally announced October 2021.

Comments: Accepted to ICCV 2021. Codes: https://github.com/CyberAgentAILab/derendering-text

arXiv:2108.01249 [pdf, other]

CanvasVAE: Learning to Generate Vector Graphic Documents

Authors: Kota Yamaguchi

Abstract: Vector graphic documents present visual elements in a resolution free, compact format and are often seen in creative applications. In this work, we attempt to learn a generative model of vector graphic documents. We define vector graphic documents by a multi-modal set of attributes associated to a canvas and a sequence of visual elements such as shapes, images, or texts, and train variational auto… ▽ More Vector graphic documents present visual elements in a resolution free, compact format and are often seen in creative applications. In this work, we attempt to learn a generative model of vector graphic documents. We define vector graphic documents by a multi-modal set of attributes associated to a canvas and a sequence of visual elements such as shapes, images, or texts, and train variational auto-encoders to learn the representation of the documents. We collect a new dataset of design templates from an online service that features complete document structure including occluded elements. In experiments, we show that our model, named CanvasVAE, constitutes a strong baseline for generative modeling of vector graphic documents. △ Less

Submitted 2 August, 2021; originally announced August 2021.

Comments: to be published in ICCV 2021

arXiv:2108.00871 [pdf, other]

doi 10.1145/3474085.3475497

Constrained Graphic Layout Generation via Latent Optimization

Authors: Kotaro Kikuchi, Edgar Simo-Serra, Mayu Otani, Kota Yamaguchi

Abstract: It is common in graphic design humans visually arrange various elements according to their design intent and semantics. For example, a title text almost always appears on top of other elements in a document. In this work, we generate graphic layouts that can flexibly incorporate such design semantics, either specified implicitly or explicitly by a user. We optimize using the latent space of an off… ▽ More It is common in graphic design humans visually arrange various elements according to their design intent and semantics. For example, a title text almost always appears on top of other elements in a document. In this work, we generate graphic layouts that can flexibly incorporate such design semantics, either specified implicitly or explicitly by a user. We optimize using the latent space of an off-the-shelf layout generation model, allowing our approach to be complementary to and used with existing layout generation models. Our approach builds on a generative layout model based on a Transformer architecture, and formulates the layout generation as a constrained optimization problem where design constraints are used for element alignment, overlap avoidance, or any other user-specified relationship. We show in the experiments that our approach is capable of generating realistic layouts in both constrained and unconstrained generation tasks with a single model. The code is available at https://github.com/ktrk115/const_layout . △ Less

Submitted 2 August, 2021; originally announced August 2021.

Comments: Accepted by ACM Multimedia 2021

arXiv:2107.08351 [pdf, ps, other]

doi 10.1007/978-3-030-91669-5_15

A Novel Approach to Analyze Fashion Digital Archive from Humanities

Authors: Satoshi Takahashi, Keiko Yamaguchi, Asuka Watanabe

Abstract: Fashion styles adopted every day are an important aspect of culture, and style trend analysis helps provide a deeper understanding of our societies and cultures. To analyze everyday fashion trends from the humanities perspective, we need a digital archive that includes images of what people wore in their daily lives over an extended period. In fashion research, building digital fashion image archi… ▽ More Fashion styles adopted every day are an important aspect of culture, and style trend analysis helps provide a deeper understanding of our societies and cultures. To analyze everyday fashion trends from the humanities perspective, we need a digital archive that includes images of what people wore in their daily lives over an extended period. In fashion research, building digital fashion image archives has attracted significant attention. However, the existing archives are not suitable for retrieving everyday fashion trends. In addition, to interpret how the trends emerge, we need non-fashion data sources relevant to why and how people choose fashion. In this study, we created a new fashion image archive called Chronicle Archive of Tokyo Street Fashion (CAT STREET) based on a review of the limitations in the existing digital fashion archives. CAT STREET includes images showing the clothing people wore in their daily lives during the period 1970--2017, which contain timestamps and street location annotations. We applied machine learning to CAT STREET and found two types of fashion trend patterns. Then, we demonstrated how magazine archives help us interpret how trend patterns emerge. These empirical analyses show our approach's potential to discover new perspectives to promote an understanding of our societies and cultures through fashion embedded in consumers' daily lives. △ Less

Submitted 10 September, 2021; v1 submitted 17 July, 2021; originally announced July 2021.

Comments: In Proceedings of 'The 23rd International Conference on Asia-Pacific Digital Libraries' 17 pages, 8 figures. arXiv admin note: text overlap with arXiv:2009.13395

Journal ref: In International Conference on Asian Digital Libraries (pp. 179-194). Springer, Cham (2021)

arXiv:2011.01428 [pdf, other]

Leaf-like Origami with Bistability for Self-Adaptive Gras** Motions

Authors: Hiromi Yasuda, Kyle Johnson, Vicente Arroyos, Koshiro Yamaguchi, Jordan R. Raney, **kyu Yang

Abstract: The leaf-like origami structure was inspired by geometric patterns found in nature, exhibiting unique transitions between open and closed shapes. With a bistable energy landscape, leaf-like origami is able to replicate the autonomous gras** of objects observed in biological systems like the Venus flytrap. We show uniform gras** motions of the leaf-like origami, as well as various non-uniform g… ▽ More The leaf-like origami structure was inspired by geometric patterns found in nature, exhibiting unique transitions between open and closed shapes. With a bistable energy landscape, leaf-like origami is able to replicate the autonomous gras** of objects observed in biological systems like the Venus flytrap. We show uniform gras** motions of the leaf-like origami, as well as various non-uniform gras** motions which arise from its multi-transformable nature. Gras** motions can be triggered with high tunability due to the structure's bistable energy landscape. We demonstrate the self-adaptive gras** motion by drop** a target object onto our paper prototype, which does not require an external power source to retain the capture of the object. We also explore the non-uniform gras** motions of the leaf-like structure by selectively controlling the creases, which reveals various unique gras** configurations that can be exploited for versatile, autonomous, and self-adaptive robotic operations. △ Less

Submitted 2 November, 2020; originally announced November 2020.

arXiv:2009.13395 [pdf, ps, other]

CAT STREET: Chronicle Archive of Tokyo Street-fashion

Authors: Satoshi Takahashi, Keiko Yamaguchi, Asuka Watanabe

Abstract: The analysis of daily-life fashion trends can provide us a profound understanding of our societies and cultures. However, no appropriate digital archive exists that includes images illustrating what people wore in their daily lives over an extended period. In this study, we propose a new fashion image archive, Chronicle Archive of Tokyo Street-fashion (CAT STREET), to shed light on daily-life fash… ▽ More The analysis of daily-life fashion trends can provide us a profound understanding of our societies and cultures. However, no appropriate digital archive exists that includes images illustrating what people wore in their daily lives over an extended period. In this study, we propose a new fashion image archive, Chronicle Archive of Tokyo Street-fashion (CAT STREET), to shed light on daily-life fashion trends. CAT STREET includes images showing what people wore in their daily lives during 1970--2017, and these images contain timestamps and street location annotations. This novel database combined with machine learning enables us to observe daily-life fashion trends over a long term and analyze them quantitatively. To evaluate the potential of our proposed approach with the novel database, we corroborated the rules-of-thumb of two fashion trend phenomena that have been observed and discussed qualitatively in previous studies. Through these empirical analyses, we verified that our approach to quantify fashion trends can help in exploring unsolved research questions. We also demonstrate CAT STREET's potential to find new standpoints to promote the understanding of societies and cultures through fashion embedded in consumers' daily lives. △ Less

Submitted 29 April, 2021; v1 submitted 28 September, 2020; originally announced September 2020.

Comments: 19 pages, 17 figures

arXiv:1906.10269 [pdf, ps, other]

Serif or Sans: Visual Font Analytics on Book Covers and Online Advertisements

Authors: Yuto Shinahara, Takuro Karamatsu, Daisuke Harada, Kota Yamaguchi, Seiichi Uchida

Abstract: In this paper, we conduct a large-scale study of font statistics in book covers and online advertisements. Through the statistical study, we try to understand how graphic designers relate fonts and content genres and identify the relationship between font styles, colors, and genres. We propose an automatic approach to extract font information from graphic designs by applying a sequence of characte… ▽ More In this paper, we conduct a large-scale study of font statistics in book covers and online advertisements. Through the statistical study, we try to understand how graphic designers relate fonts and content genres and identify the relationship between font styles, colors, and genres. We propose an automatic approach to extract font information from graphic designs by applying a sequence of character detection, style classification, and clustering techniques to the graphic designs. The extracted font information is accumulated together with genre information, such as romance or business, for further trend analysis. Through our unique empirical study, we show that the collected font statistics reveal interesting trends in terms of how typographic design represents the impression and the atmosphere of the content genres. △ Less

Submitted 29 June, 2019; v1 submitted 24 June, 2019; originally announced June 2019.

Comments: Accepted by ICDAR2019

arXiv:1906.01196 [pdf]

Convolution filter embedded quantum gate autoencoder

Authors: Kodai Shiba, Katsuyoshi Sakamoto, Koichi Yamaguchi, Dinesh Bahadur Malla, Tomah Sogabe

Abstract: The autoencoder is one of machine learning algorithms used for feature extraction by dimension reduction of input data, denoising of images, and prior learning of neural networks. At the same time, autoencoders using quantum computers are also being developed. However, current quantum computers have a limited number of qubits, which makes it difficult to calculate big data. In this paper, as a sol… ▽ More The autoencoder is one of machine learning algorithms used for feature extraction by dimension reduction of input data, denoising of images, and prior learning of neural networks. At the same time, autoencoders using quantum computers are also being developed. However, current quantum computers have a limited number of qubits, which makes it difficult to calculate big data. In this paper, as a solution to this problem, we propose a computation method that applies a convolution filter, which is one of the methods used in machine learning, to quantum computation. As a result of applying this method to a quantum autoencoder, we succeeded in denoising image data of several hundred qubits or more using only a few qubits under the autoencoding accuracy of 98%, and the effectiveness of this method was obtained. Meanwhile, we have verified the feature extraction function of the proposed autoencoder by dimensionality reduction. By projecting the MNIST data to two-dimension, we found the proposed method showed superior classification accuracy to the vanilla principle component analysis (PCA). We also verified the proposed method using IBM Q Melbourne and the actual machine failed to provide accurate results implying high error rate prevailing in the current NISQ quantum computer. △ Less

Submitted 4 June, 2019; originally announced June 2019.

Comments: 8 pages, 7 figures

arXiv:1810.10258 [pdf, ps, other]

A Maximum Edge-Weight Clique Extraction Algorithm Based on Branch-and-Bound

Authors: Satoshi Shimizu, Kazuaki Yamaguchi, Sumio Masuda

Abstract: The maximum edge-weight clique problem is to find a clique whose sum of edge-weight is the maximum for a given edge-weighted undirected graph. The problem is NP-hard and some branch-and-bound algorithms have been proposed. In this paper, we propose a new exact algorithm based on branch-and-bound. It assigns edge-weights to vertices and calculates upper bounds using vertex coloring. By some computa… ▽ More The maximum edge-weight clique problem is to find a clique whose sum of edge-weight is the maximum for a given edge-weighted undirected graph. The problem is NP-hard and some branch-and-bound algorithms have been proposed. In this paper, we propose a new exact algorithm based on branch-and-bound. It assigns edge-weights to vertices and calculates upper bounds using vertex coloring. By some computational experiments, we confirmed our algorithm is faster than previous algorithms. △ Less

Submitted 24 October, 2018; originally announced October 2018.

arXiv:1804.09979 [pdf, other]

Recommending Outfits from Personal Closet

Authors: Pongsate Tangseng, Kota Yamaguchi, Takayuki Okatani

Abstract: We consider grading a fashion outfit for recommendation, where we assume that users have a closet of items and we aim at producing a score for an arbitrary combination of items in the closet. The challenge in outfit grading is that the input to the system is a bag of item pictures that are unordered and vary in size. We build a deep neural network-based system that can take variable-length items a… ▽ More We consider grading a fashion outfit for recommendation, where we assume that users have a closet of items and we aim at producing a score for an arbitrary combination of items in the closet. The challenge in outfit grading is that the input to the system is a bag of item pictures that are unordered and vary in size. We build a deep neural network-based system that can take variable-length items and predict a score. We collect a large number of outfits from a popular fashion sharing website, Polyvore, and evaluate the performance of our grading system. We compare our model with a random-choice baseline, both on the traditional classification evaluation and on people's judgment using a crowdsourcing platform. With over 84% in classification accuracy and 91% matching ratio to human annotators, our model can reliably grade the quality of an outfit. We also build an outfit recommender on top of our grader to demonstrate the practical application of our model for a personal closet assistant. △ Less

Submitted 26 April, 2018; originally announced April 2018.

arXiv:1710.08049 [pdf, other]

Feedback-prop: Convolutional Neural Network Inference under Partial Evidence

Authors: Tianlu Wang, Kota Yamaguchi, Vicente Ordonez

Abstract: We propose an inference procedure for deep convolutional neural networks (CNNs) when partial evidence is available. Our method consists of a general feedback-based propagation approach (feedback-prop) that boosts the prediction accuracy for an arbitrary set of unknown target labels when the values for a non-overlap** arbitrary set of target labels are known. We show that existing models trained… ▽ More We propose an inference procedure for deep convolutional neural networks (CNNs) when partial evidence is available. Our method consists of a general feedback-based propagation approach (feedback-prop) that boosts the prediction accuracy for an arbitrary set of unknown target labels when the values for a non-overlap** arbitrary set of target labels are known. We show that existing models trained in a multi-label or multi-task setting can readily take advantage of feedback-prop without any retraining or fine-tuning. Our feedback-prop inference procedure is general, simple, reliable, and works on different challenging visual recognition tasks. We present two variants of feedback-prop based on layer-wise and residual iterative updates. We experiment using several multi-task models and show that feedback-prop is effective in all of them. Our results unveil a previously unreported but interesting dynamic property of deep CNNs. We also present an associated technical approach that takes advantage of this property for inference under partial evidence in general visual recognition tasks. △ Less

Submitted 29 March, 2018; v1 submitted 22 October, 2017; originally announced October 2017.

Comments: Accepted to CVPR 2018

arXiv:1708.01892 [pdf, other]

End-to-end learning potentials for structured attribute prediction

Authors: Kota Yamaguchi, Takayuki Okatani, Takayuki Umeda, Kazuhiko Murasaki, Kyoko Sudo

Abstract: We present a structured inference approach in deep neural networks for multiple attribute prediction. In attribute prediction, a common approach is to learn independent classifiers on top of a good feature representation. However, such classifiers assume conditional independence on features and do not explicitly consider the dependency between attributes in the inference process. We propose to for… ▽ More We present a structured inference approach in deep neural networks for multiple attribute prediction. In attribute prediction, a common approach is to learn independent classifiers on top of a good feature representation. However, such classifiers assume conditional independence on features and do not explicitly consider the dependency between attributes in the inference process. We propose to formulate attribute prediction in terms of marginal inference in the conditional random field. We model potential functions by deep neural networks and apply the sum-product algorithm to solve for the approximate marginal distribution in feed-forward networks. Our message passing layer implements sparse pairwise potentials by a softplus-linear function that is equivalent to a higher-order classifier, and learns all the model parameters by end-to-end back propagation. The experimental results using SUN attributes and CelebA datasets suggest that the structured inference improves the attribute prediction performance, and possibly uncovers the hidden relationship between attributes. △ Less

Submitted 6 August, 2017; originally announced August 2017.

arXiv:1703.01386 [pdf, other]

Looking at Outfit to Parse Clothing

Authors: Pongsate Tangseng, Zhipeng Wu, Kota Yamaguchi

Abstract: This paper extends fully-convolutional neural networks (FCN) for the clothing parsing problem. Clothing parsing requires higher-level knowledge on clothing semantics and contextual cues to disambiguate fine-grained categories. We extend FCN architecture with a side-branch network which we refer outfit encoder to predict a consistent set of clothing labels to encourage combinatorial preference, and… ▽ More This paper extends fully-convolutional neural networks (FCN) for the clothing parsing problem. Clothing parsing requires higher-level knowledge on clothing semantics and contextual cues to disambiguate fine-grained categories. We extend FCN architecture with a side-branch network which we refer outfit encoder to predict a consistent set of clothing labels to encourage combinatorial preference, and with conditional random field (CRF) to explicitly consider coherent label assignment to the given image. The empirical results using Fashionista and CFPD datasets show that our model achieves state-of-the-art performance in clothing parsing, without additional supervision during training. We also study the qualitative influence of annotation on the current clothing parsing benchmarks, with our Web-based tool for multi-scale pixel-wise annotation and manual refinement effort to the Fashionista dataset. Finally, we show that the image representation of the outfit encoder is useful for dress-up image retrieval application. △ Less

Submitted 3 March, 2017; originally announced March 2017.

arXiv:1607.07262 [pdf, other]

Automatic Attribute Discovery with Neural Activations

Authors: Sirion Vittayakorn, Takayuki Umeda, Kazuhiko Murasaki, Kyoko Sudo, Takayuki Okatani, Kota Yamaguchi

Abstract: How can a machine learn to recognize visual attributes emerging out of online community without a definitive supervised dataset? This paper proposes an automatic approach to discover and analyze visual attributes from a noisy collection of image-text data on the Web. Our approach is based on the relationship between attributes and neural activations in the deep network. We characterize the visual… ▽ More How can a machine learn to recognize visual attributes emerging out of online community without a definitive supervised dataset? This paper proposes an automatic approach to discover and analyze visual attributes from a noisy collection of image-text data on the Web. Our approach is based on the relationship between attributes and neural activations in the deep network. We characterize the visual property of the attribute word as a divergence within weakly-annotated set of images. We show that the neural activations are useful for discovering and learning a classifier that well agrees with human perception from the noisy real-world Web data. The empirical study suggests the layered structure of the deep neural networks also gives us insights into the perceptual depth of the given word. Finally, we demonstrate that we can utilize highly-activating neurons for finding semantically relevant regions. △ Less

Submitted 25 July, 2016; originally announced July 2016.

Comments: ECCV 2016

arXiv:1204.1393 [pdf, other]

Continuous Markov Random Fields for Robust Stereo Estimation

Authors: Koichiro Yamaguchi, Tamir Hazan, David McAllester, Raquel Urtasun

Abstract: In this paper we present a novel slanted-plane MRF model which reasons jointly about occlusion boundaries as well as depth. We formulate the problem as the one of inference in a hybrid MRF composed of both continuous (i.e., slanted 3D planes) and discrete (i.e., occlusion boundaries) random variables. This allows us to define potentials encoding the ownership of the pixels that compose the boundar… ▽ More In this paper we present a novel slanted-plane MRF model which reasons jointly about occlusion boundaries as well as depth. We formulate the problem as the one of inference in a hybrid MRF composed of both continuous (i.e., slanted 3D planes) and discrete (i.e., occlusion boundaries) random variables. This allows us to define potentials encoding the ownership of the pixels that compose the boundary between segments, as well as potentials encoding which junctions are physically possible. Our approach outperforms the state-of-the-art on Middlebury high resolution imagery as well as in the more challenging KITTI dataset, while being more efficient than existing slanted plane MRF-based methods, taking on average 2 minutes to perform inference on high resolution imagery. △ Less

Submitted 5 April, 2012; originally announced April 2012.

ACM Class: I.2.10; I.4.8

Showing 1–25 of 25 results for author: Yamaguchi, K