-
OpenCOLE: Towards Reproducible Automatic Graphic Design Generation
Authors:
Naoto Inoue,
Kento Masui,
Wataru Shimoda,
Kota Yamaguchi
Abstract:
Automatic generation of graphic designs has recently received considerable attention. However, the state-of-the-art approaches are complex and rely on proprietary datasets, which creates reproducibility barriers. In this paper, we propose an open framework for automatic graphic design called OpenCOLE, where we build a modified version of the pioneering COLE and train our model exclusively on publi…
▽ More
Automatic generation of graphic designs has recently received considerable attention. However, the state-of-the-art approaches are complex and rely on proprietary datasets, which creates reproducibility barriers. In this paper, we propose an open framework for automatic graphic design called OpenCOLE, where we build a modified version of the pioneering COLE and train our model exclusively on publicly available datasets. Based on GPT4V evaluations, our model shows promising performance comparable to the original COLE. We release the pipeline and training results to encourage open development.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Total Disentanglement of Font Images into Style and Character Class Features
Authors:
Daichi Haraguchi,
Wataru Shimoda,
Kota Yamaguchi,
Seiichi Uchida
Abstract:
In this paper, we demonstrate a total disentanglement of font images. Total disentanglement is a neural network-based method for decomposing each font image nonlinearly and completely into its style and content (i.e., character class) features. It uses a simple but careful training procedure to extract the common style feature from all `A'-`Z' images in the same font and the common content feature…
▽ More
In this paper, we demonstrate a total disentanglement of font images. Total disentanglement is a neural network-based method for decomposing each font image nonlinearly and completely into its style and content (i.e., character class) features. It uses a simple but careful training procedure to extract the common style feature from all `A'-`Z' images in the same font and the common content feature from all `A' (or another class) images in different fonts. These disentangled features guarantee the reconstruction of the original font image. Various experiments have been conducted to understand the performance of total disentanglement. First, it is demonstrated that total disentanglement is achievable with very high accuracy; this is experimental proof of the long-standing open question, ``Does `A'-ness exist?'' Hofstadter (1985). Second, it is demonstrated that the disentangled features produced by total disentanglement apply to a variety of tasks, including font recognition, character recognition, and one-shot font image generation.
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
Retrieval-Augmented Layout Transformer for Content-Aware Layout Generation
Authors:
Daichi Horita,
Naoto Inoue,
Kotaro Kikuchi,
Kota Yamaguchi,
Kiyoharu Aizawa
Abstract:
Content-aware graphic layout generation aims to automatically arrange visual elements along with a given content, such as an e-commerce product image. In this paper, we argue that the current layout generation approaches suffer from the limited training data for the high-dimensional layout structure. We show that a simple retrieval augmentation can significantly improve the generation quality. Our…
▽ More
Content-aware graphic layout generation aims to automatically arrange visual elements along with a given content, such as an e-commerce product image. In this paper, we argue that the current layout generation approaches suffer from the limited training data for the high-dimensional layout structure. We show that a simple retrieval augmentation can significantly improve the generation quality. Our model, which is named Retrieval-Augmented Layout Transformer (RALF), retrieves nearest neighbor layout examples based on an input image and feeds these results into an autoregressive generator. Our model can apply retrieval augmentation to various controllable generation tasks and yield high-quality layouts within a unified architecture. Our extensive experiments show that RALF successfully generates content-aware layouts in both constrained and unconstrained settings and significantly outperforms the baselines.
△ Less
Submitted 15 April, 2024; v1 submitted 22 November, 2023;
originally announced November 2023.
-
Towards Diverse and Consistent Typography Generation
Authors:
Wataru Shimoda,
Daichi Haraguchi,
Seiichi Uchida,
Kota Yamaguchi
Abstract:
In this work, we consider the typography generation task that aims at producing diverse typographic styling for the given graphic document. We formulate typography generation as a fine-grained attribute generation for multiple text elements and build an autoregressive model to generate diverse typography that matches the input design context. We further propose a simple yet effective sampling appr…
▽ More
In this work, we consider the typography generation task that aims at producing diverse typographic styling for the given graphic document. We formulate typography generation as a fine-grained attribute generation for multiple text elements and build an autoregressive model to generate diverse typography that matches the input design context. We further propose a simple yet effective sampling approach that respects the consistency and distinction principle of typography so that generated examples share consistent typographic styling across text elements. Our empirical study shows that our model successfully generates diverse typographic designs while preserving a consistent typographic structure.
△ Less
Submitted 5 September, 2023;
originally announced September 2023.
-
Towards Flexible Multi-modal Document Models
Authors:
Naoto Inoue,
Kotaro Kikuchi,
Edgar Simo-Serra,
Mayu Otani,
Kota Yamaguchi
Abstract:
Creative workflows for generating graphical documents involve complex inter-related tasks, such as aligning elements, choosing appropriate fonts, or employing aesthetically harmonious colors. In this work, we attempt at building a holistic model that can jointly solve many different design tasks. Our model, which we denote by FlexDM, treats vector graphic documents as a set of multi-modal elements…
▽ More
Creative workflows for generating graphical documents involve complex inter-related tasks, such as aligning elements, choosing appropriate fonts, or employing aesthetically harmonious colors. In this work, we attempt at building a holistic model that can jointly solve many different design tasks. Our model, which we denote by FlexDM, treats vector graphic documents as a set of multi-modal elements, and learns to predict masked fields such as element type, position, styling attributes, image, or text, using a unified architecture. Through the use of explicit multi-task learning and in-domain pre-training, our model can better capture the multi-modal relationships among the different document fields. Experimental results corroborate that our single FlexDM is able to successfully solve a multitude of different design tasks, while achieving performance that is competitive with task-specific and costly baselines.
△ Less
Submitted 31 March, 2023;
originally announced March 2023.
-
LayoutDM: Discrete Diffusion Model for Controllable Layout Generation
Authors:
Naoto Inoue,
Kotaro Kikuchi,
Edgar Simo-Serra,
Mayu Otani,
Kota Yamaguchi
Abstract:
Controllable layout generation aims at synthesizing plausible arrangement of element bounding boxes with optional constraints, such as type or position of a specific element. In this work, we try to solve a broad range of layout generation tasks in a single model that is based on discrete state-space diffusion models. Our model, named LayoutDM, naturally handles the structured layout data in the d…
▽ More
Controllable layout generation aims at synthesizing plausible arrangement of element bounding boxes with optional constraints, such as type or position of a specific element. In this work, we try to solve a broad range of layout generation tasks in a single model that is based on discrete state-space diffusion models. Our model, named LayoutDM, naturally handles the structured layout data in the discrete representation and learns to progressively infer a noiseless layout from the initial input, where we model the layout corruption process by modality-wise discrete diffusion. For conditional generation, we propose to inject layout constraints in the form of masking or logit adjustment during inference. We show in the experiments that our LayoutDM successfully generates high-quality layouts and outperforms both task-specific and task-agnostic baselines on several layout tasks.
△ Less
Submitted 14 March, 2023;
originally announced March 2023.
-
In-the-wild vibrotactile sensation: Perceptual transformation of vibrations from smartphones
Authors:
Keiko Yamaguchi,
Satoshi Takahashi
Abstract:
Vibrations emitted by smartphones have become a part of our daily lives. The vibrations can add various meanings to the information people obtain from the screen. Hence, it is worth understanding the perceptual transformation of vibration with ordinary devices to evaluate the possibility of enriched vibrotactile communication via smartphones. This study assessed the reproducibility of vibrotactile…
▽ More
Vibrations emitted by smartphones have become a part of our daily lives. The vibrations can add various meanings to the information people obtain from the screen. Hence, it is worth understanding the perceptual transformation of vibration with ordinary devices to evaluate the possibility of enriched vibrotactile communication via smartphones. This study assessed the reproducibility of vibrotactile sensations via smartphone in the in-the-wild environment. To realize improved haptic design to communicate with smartphone users smoothly, we also focused on the moderation effects of the in-the-wild environments on the vibrotactile sensations: the physical specifications of mobile devices, the manner of device operation by users, and the personal traits of the users about the desire for touch. We conducted a Web-based in-the-wild experiment instead of a laboratory experiment to reproduce an environment as close to the daily lives of users as possible. Through a series of analyses, we revealed that users perceive the weight of vibration stimuli to be higher in sensation magnitude than intensity under identical conditions of vibration stimuli. We also showed that it is desirable to consider the moderation effects of the in-the-wild environments for realizing better tactile system design to maximize the impact of vibrotactile stimuli.
△ Less
Submitted 2 March, 2023;
originally announced March 2023.
-
Generative Colorization of Structured Mobile Web Pages
Authors:
Kotaro Kikuchi,
Naoto Inoue,
Mayu Otani,
Edgar Simo-Serra,
Kota Yamaguchi
Abstract:
Color is a critical design factor for web pages, affecting important factors such as viewer emotions and the overall trust and satisfaction of a website. Effective coloring requires design knowledge and expertise, but if this process could be automated through data-driven modeling, efficient exploration and alternative workflows would be possible. However, this direction remains underexplored due…
▽ More
Color is a critical design factor for web pages, affecting important factors such as viewer emotions and the overall trust and satisfaction of a website. Effective coloring requires design knowledge and expertise, but if this process could be automated through data-driven modeling, efficient exploration and alternative workflows would be possible. However, this direction remains underexplored due to the lack of a formalization of the web page colorization problem, datasets, and evaluation protocols. In this work, we propose a new dataset consisting of e-commerce mobile web pages in a tractable format, which are created by simplifying the pages and extracting canonical color styles with a common web browser. The web page colorization problem is then formalized as a task of estimating plausible color styles for a given web page content with a given hierarchical structure of the elements. We present several Transformer-based methods that are adapted to this task by prepending structural message passing to capture hierarchical relationships between elements. Experimental results, including a quantitative evaluation designed for this task, demonstrate the advantages of our methods over statistical and image colorization methods. The code is available at https://github.com/CyberAgentAILab/webcolor.
△ Less
Submitted 23 January, 2023; v1 submitted 22 December, 2022;
originally announced December 2022.
-
Deep Learning-enabled Detection and Classification of Bacterial Colonies using a Thin Film Transistor (TFT) Image Sensor
Authors:
Yuzhu Li,
Tairan Liu,
Hatice Ceylan Koydemir,
Hongda Wang,
Keelan O'Riordan,
Bijie Bai,
Yuta Haga,
Junji Kobashi,
Hitoshi Tanaka,
Takaya Tamaru,
Kazunori Yamaguchi,
Aydogan Ozcan
Abstract:
Early detection and identification of pathogenic bacteria such as Escherichia coli (E. coli) is an essential task for public health. The conventional culture-based methods for bacterial colony detection usually take >24 hours to get the final read-out. Here, we demonstrate a bacterial colony-forming-unit (CFU) detection system exploiting a thin-film-transistor (TFT)-based image sensor array that s…
▽ More
Early detection and identification of pathogenic bacteria such as Escherichia coli (E. coli) is an essential task for public health. The conventional culture-based methods for bacterial colony detection usually take >24 hours to get the final read-out. Here, we demonstrate a bacterial colony-forming-unit (CFU) detection system exploiting a thin-film-transistor (TFT)-based image sensor array that saves ~12 hours compared to the Environmental Protection Agency (EPA)-approved methods. To demonstrate the efficacy of this CFU detection system, a lensfree imaging modality was built using the TFT image sensor with a sample field-of-view of ~10 cm^2. Time-lapse images of bacterial colonies cultured on chromogenic agar plates were automatically collected at 5-minute intervals. Two deep neural networks were used to detect and count the growing colonies and identify their species. When blindly tested with 265 colonies of E. coli and other coliform bacteria (i.e., Citrobacter and Klebsiella pneumoniae), our system reached an average CFU detection rate of 97.3% at 9 hours of incubation and an average recovery rate of 91.6% at ~12 hours. This TFT-based sensor can be applied to various microbiological detection methods. Due to the large scalability, ultra-large field-of-view, and low cost of the TFT-based image sensors, this platform can be integrated with each agar plate to be tested and disposed of after the automated CFU count. The imaging field-of-view of this platform can be cost-effectively increased to >100 cm^2 to provide a massive throughput for CFU detection using, e.g., roll-to-roll manufacturing of TFTs as used in the flexible display industry.
△ Less
Submitted 7 May, 2022;
originally announced May 2022.
-
TYPIC: A Corpus of Template-Based Diagnostic Comments on Argumentation
Authors:
Shoichi Naito,
Shintaro Sawada,
Chihiro Nakagawa,
Naoya Inoue,
Kenshi Yamaguchi,
Iori Shimizu,
Farjana Sultana Mim,
Keshav Singh,
Kentaro Inui
Abstract:
Providing feedback on the argumentation of the learner is essential for develo** critical thinking skills, however, it requires a lot of time and effort. To mitigate the overload on teachers, we aim to automate a process of providing feedback, especially giving diagnostic comments which point out the weaknesses inherent in the argumentation. It is recommended to give specific diagnostic comments…
▽ More
Providing feedback on the argumentation of the learner is essential for develo** critical thinking skills, however, it requires a lot of time and effort. To mitigate the overload on teachers, we aim to automate a process of providing feedback, especially giving diagnostic comments which point out the weaknesses inherent in the argumentation. It is recommended to give specific diagnostic comments so that learners can recognize the diagnosis without misinterpretation. However, it is not obvious how the task of providing specific diagnostic comments should be formulated. We present a formulation of the task as template selection and slot filling to make an automatic evaluation easier and the behavior of the model more tractable. The key to the formulation is the possibility of creating a template set that is sufficient for practical use. In this paper, we define three criteria that a template set should satisfy: expressiveness, informativeness, and uniqueness, and verify the feasibility of creating a template set that satisfies these criteria as a first trial. We will show that it is feasible through an annotation study that converts diagnostic comments given in a text to a template format. The corpus used in the annotation study is publicly available.
△ Less
Submitted 21 June, 2022; v1 submitted 17 January, 2022;
originally announced January 2022.
-
De-rendering Stylized Texts
Authors:
Wataru Shimoda,
Daichi Haraguchi,
Seiichi Uchida,
Kota Yamaguchi
Abstract:
Editing raster text is a promising but challenging task. We propose to apply text vectorization for the task of raster text editing in display media, such as posters, web pages, or advertisements. In our approach, instead of applying image transformation or generation in the raster domain, we learn a text vectorization model to parse all the rendering parameters including text, location, size, fon…
▽ More
Editing raster text is a promising but challenging task. We propose to apply text vectorization for the task of raster text editing in display media, such as posters, web pages, or advertisements. In our approach, instead of applying image transformation or generation in the raster domain, we learn a text vectorization model to parse all the rendering parameters including text, location, size, font, style, effects, and hidden background, then utilize those parameters for reconstruction and any editing task. Our text vectorization takes advantage of differentiable text rendering to accurately reproduce the input raster text in a resolution-free parametric format. We show in the experiments that our approach can successfully parse text, styling, and background information in the unified model, and produces artifact-free text editing compared to a raster baseline.
△ Less
Submitted 5 October, 2021;
originally announced October 2021.
-
CanvasVAE: Learning to Generate Vector Graphic Documents
Authors:
Kota Yamaguchi
Abstract:
Vector graphic documents present visual elements in a resolution free, compact format and are often seen in creative applications. In this work, we attempt to learn a generative model of vector graphic documents. We define vector graphic documents by a multi-modal set of attributes associated to a canvas and a sequence of visual elements such as shapes, images, or texts, and train variational auto…
▽ More
Vector graphic documents present visual elements in a resolution free, compact format and are often seen in creative applications. In this work, we attempt to learn a generative model of vector graphic documents. We define vector graphic documents by a multi-modal set of attributes associated to a canvas and a sequence of visual elements such as shapes, images, or texts, and train variational auto-encoders to learn the representation of the documents. We collect a new dataset of design templates from an online service that features complete document structure including occluded elements. In experiments, we show that our model, named CanvasVAE, constitutes a strong baseline for generative modeling of vector graphic documents.
△ Less
Submitted 2 August, 2021;
originally announced August 2021.
-
Constrained Graphic Layout Generation via Latent Optimization
Authors:
Kotaro Kikuchi,
Edgar Simo-Serra,
Mayu Otani,
Kota Yamaguchi
Abstract:
It is common in graphic design humans visually arrange various elements according to their design intent and semantics. For example, a title text almost always appears on top of other elements in a document. In this work, we generate graphic layouts that can flexibly incorporate such design semantics, either specified implicitly or explicitly by a user. We optimize using the latent space of an off…
▽ More
It is common in graphic design humans visually arrange various elements according to their design intent and semantics. For example, a title text almost always appears on top of other elements in a document. In this work, we generate graphic layouts that can flexibly incorporate such design semantics, either specified implicitly or explicitly by a user. We optimize using the latent space of an off-the-shelf layout generation model, allowing our approach to be complementary to and used with existing layout generation models. Our approach builds on a generative layout model based on a Transformer architecture, and formulates the layout generation as a constrained optimization problem where design constraints are used for element alignment, overlap avoidance, or any other user-specified relationship. We show in the experiments that our approach is capable of generating realistic layouts in both constrained and unconstrained generation tasks with a single model. The code is available at https://github.com/ktrk115/const_layout .
△ Less
Submitted 2 August, 2021;
originally announced August 2021.
-
A Novel Approach to Analyze Fashion Digital Archive from Humanities
Authors:
Satoshi Takahashi,
Keiko Yamaguchi,
Asuka Watanabe
Abstract:
Fashion styles adopted every day are an important aspect of culture, and style trend analysis helps provide a deeper understanding of our societies and cultures. To analyze everyday fashion trends from the humanities perspective, we need a digital archive that includes images of what people wore in their daily lives over an extended period. In fashion research, building digital fashion image archi…
▽ More
Fashion styles adopted every day are an important aspect of culture, and style trend analysis helps provide a deeper understanding of our societies and cultures. To analyze everyday fashion trends from the humanities perspective, we need a digital archive that includes images of what people wore in their daily lives over an extended period. In fashion research, building digital fashion image archives has attracted significant attention. However, the existing archives are not suitable for retrieving everyday fashion trends. In addition, to interpret how the trends emerge, we need non-fashion data sources relevant to why and how people choose fashion. In this study, we created a new fashion image archive called Chronicle Archive of Tokyo Street Fashion (CAT STREET) based on a review of the limitations in the existing digital fashion archives. CAT STREET includes images showing the clothing people wore in their daily lives during the period 1970--2017, which contain timestamps and street location annotations. We applied machine learning to CAT STREET and found two types of fashion trend patterns. Then, we demonstrated how magazine archives help us interpret how trend patterns emerge. These empirical analyses show our approach's potential to discover new perspectives to promote an understanding of our societies and cultures through fashion embedded in consumers' daily lives.
△ Less
Submitted 10 September, 2021; v1 submitted 17 July, 2021;
originally announced July 2021.
-
Leaf-like Origami with Bistability for Self-Adaptive Gras** Motions
Authors:
Hiromi Yasuda,
Kyle Johnson,
Vicente Arroyos,
Koshiro Yamaguchi,
Jordan R. Raney,
**kyu Yang
Abstract:
The leaf-like origami structure was inspired by geometric patterns found in nature, exhibiting unique transitions between open and closed shapes. With a bistable energy landscape, leaf-like origami is able to replicate the autonomous gras** of objects observed in biological systems like the Venus flytrap. We show uniform gras** motions of the leaf-like origami, as well as various non-uniform g…
▽ More
The leaf-like origami structure was inspired by geometric patterns found in nature, exhibiting unique transitions between open and closed shapes. With a bistable energy landscape, leaf-like origami is able to replicate the autonomous gras** of objects observed in biological systems like the Venus flytrap. We show uniform gras** motions of the leaf-like origami, as well as various non-uniform gras** motions which arise from its multi-transformable nature. Gras** motions can be triggered with high tunability due to the structure's bistable energy landscape. We demonstrate the self-adaptive gras** motion by drop** a target object onto our paper prototype, which does not require an external power source to retain the capture of the object. We also explore the non-uniform gras** motions of the leaf-like structure by selectively controlling the creases, which reveals various unique gras** configurations that can be exploited for versatile, autonomous, and self-adaptive robotic operations.
△ Less
Submitted 2 November, 2020;
originally announced November 2020.
-
CAT STREET: Chronicle Archive of Tokyo Street-fashion
Authors:
Satoshi Takahashi,
Keiko Yamaguchi,
Asuka Watanabe
Abstract:
The analysis of daily-life fashion trends can provide us a profound understanding of our societies and cultures. However, no appropriate digital archive exists that includes images illustrating what people wore in their daily lives over an extended period. In this study, we propose a new fashion image archive, Chronicle Archive of Tokyo Street-fashion (CAT STREET), to shed light on daily-life fash…
▽ More
The analysis of daily-life fashion trends can provide us a profound understanding of our societies and cultures. However, no appropriate digital archive exists that includes images illustrating what people wore in their daily lives over an extended period. In this study, we propose a new fashion image archive, Chronicle Archive of Tokyo Street-fashion (CAT STREET), to shed light on daily-life fashion trends. CAT STREET includes images showing what people wore in their daily lives during 1970--2017, and these images contain timestamps and street location annotations. This novel database combined with machine learning enables us to observe daily-life fashion trends over a long term and analyze them quantitatively. To evaluate the potential of our proposed approach with the novel database, we corroborated the rules-of-thumb of two fashion trend phenomena that have been observed and discussed qualitatively in previous studies. Through these empirical analyses, we verified that our approach to quantify fashion trends can help in exploring unsolved research questions. We also demonstrate CAT STREET's potential to find new standpoints to promote the understanding of societies and cultures through fashion embedded in consumers' daily lives.
△ Less
Submitted 29 April, 2021; v1 submitted 28 September, 2020;
originally announced September 2020.
-
Serif or Sans: Visual Font Analytics on Book Covers and Online Advertisements
Authors:
Yuto Shinahara,
Takuro Karamatsu,
Daisuke Harada,
Kota Yamaguchi,
Seiichi Uchida
Abstract:
In this paper, we conduct a large-scale study of font statistics in book covers and online advertisements. Through the statistical study, we try to understand how graphic designers relate fonts and content genres and identify the relationship between font styles, colors, and genres. We propose an automatic approach to extract font information from graphic designs by applying a sequence of characte…
▽ More
In this paper, we conduct a large-scale study of font statistics in book covers and online advertisements. Through the statistical study, we try to understand how graphic designers relate fonts and content genres and identify the relationship between font styles, colors, and genres. We propose an automatic approach to extract font information from graphic designs by applying a sequence of character detection, style classification, and clustering techniques to the graphic designs. The extracted font information is accumulated together with genre information, such as romance or business, for further trend analysis. Through our unique empirical study, we show that the collected font statistics reveal interesting trends in terms of how typographic design represents the impression and the atmosphere of the content genres.
△ Less
Submitted 29 June, 2019; v1 submitted 24 June, 2019;
originally announced June 2019.
-
Convolution filter embedded quantum gate autoencoder
Authors:
Kodai Shiba,
Katsuyoshi Sakamoto,
Koichi Yamaguchi,
Dinesh Bahadur Malla,
Tomah Sogabe
Abstract:
The autoencoder is one of machine learning algorithms used for feature extraction by dimension reduction of input data, denoising of images, and prior learning of neural networks. At the same time, autoencoders using quantum computers are also being developed. However, current quantum computers have a limited number of qubits, which makes it difficult to calculate big data. In this paper, as a sol…
▽ More
The autoencoder is one of machine learning algorithms used for feature extraction by dimension reduction of input data, denoising of images, and prior learning of neural networks. At the same time, autoencoders using quantum computers are also being developed. However, current quantum computers have a limited number of qubits, which makes it difficult to calculate big data. In this paper, as a solution to this problem, we propose a computation method that applies a convolution filter, which is one of the methods used in machine learning, to quantum computation. As a result of applying this method to a quantum autoencoder, we succeeded in denoising image data of several hundred qubits or more using only a few qubits under the autoencoding accuracy of 98%, and the effectiveness of this method was obtained. Meanwhile, we have verified the feature extraction function of the proposed autoencoder by dimensionality reduction. By projecting the MNIST data to two-dimension, we found the proposed method showed superior classification accuracy to the vanilla principle component analysis (PCA). We also verified the proposed method using IBM Q Melbourne and the actual machine failed to provide accurate results implying high error rate prevailing in the current NISQ quantum computer.
△ Less
Submitted 4 June, 2019;
originally announced June 2019.
-
A Maximum Edge-Weight Clique Extraction Algorithm Based on Branch-and-Bound
Authors:
Satoshi Shimizu,
Kazuaki Yamaguchi,
Sumio Masuda
Abstract:
The maximum edge-weight clique problem is to find a clique whose sum of edge-weight is the maximum for a given edge-weighted undirected graph. The problem is NP-hard and some branch-and-bound algorithms have been proposed. In this paper, we propose a new exact algorithm based on branch-and-bound. It assigns edge-weights to vertices and calculates upper bounds using vertex coloring. By some computa…
▽ More
The maximum edge-weight clique problem is to find a clique whose sum of edge-weight is the maximum for a given edge-weighted undirected graph. The problem is NP-hard and some branch-and-bound algorithms have been proposed. In this paper, we propose a new exact algorithm based on branch-and-bound. It assigns edge-weights to vertices and calculates upper bounds using vertex coloring. By some computational experiments, we confirmed our algorithm is faster than previous algorithms.
△ Less
Submitted 24 October, 2018;
originally announced October 2018.
-
Recommending Outfits from Personal Closet
Authors:
Pongsate Tangseng,
Kota Yamaguchi,
Takayuki Okatani
Abstract:
We consider grading a fashion outfit for recommendation, where we assume that users have a closet of items and we aim at producing a score for an arbitrary combination of items in the closet. The challenge in outfit grading is that the input to the system is a bag of item pictures that are unordered and vary in size. We build a deep neural network-based system that can take variable-length items a…
▽ More
We consider grading a fashion outfit for recommendation, where we assume that users have a closet of items and we aim at producing a score for an arbitrary combination of items in the closet. The challenge in outfit grading is that the input to the system is a bag of item pictures that are unordered and vary in size. We build a deep neural network-based system that can take variable-length items and predict a score. We collect a large number of outfits from a popular fashion sharing website, Polyvore, and evaluate the performance of our grading system. We compare our model with a random-choice baseline, both on the traditional classification evaluation and on people's judgment using a crowdsourcing platform. With over 84% in classification accuracy and 91% matching ratio to human annotators, our model can reliably grade the quality of an outfit. We also build an outfit recommender on top of our grader to demonstrate the practical application of our model for a personal closet assistant.
△ Less
Submitted 26 April, 2018;
originally announced April 2018.
-
Feedback-prop: Convolutional Neural Network Inference under Partial Evidence
Authors:
Tianlu Wang,
Kota Yamaguchi,
Vicente Ordonez
Abstract:
We propose an inference procedure for deep convolutional neural networks (CNNs) when partial evidence is available. Our method consists of a general feedback-based propagation approach (feedback-prop) that boosts the prediction accuracy for an arbitrary set of unknown target labels when the values for a non-overlap** arbitrary set of target labels are known. We show that existing models trained…
▽ More
We propose an inference procedure for deep convolutional neural networks (CNNs) when partial evidence is available. Our method consists of a general feedback-based propagation approach (feedback-prop) that boosts the prediction accuracy for an arbitrary set of unknown target labels when the values for a non-overlap** arbitrary set of target labels are known. We show that existing models trained in a multi-label or multi-task setting can readily take advantage of feedback-prop without any retraining or fine-tuning. Our feedback-prop inference procedure is general, simple, reliable, and works on different challenging visual recognition tasks. We present two variants of feedback-prop based on layer-wise and residual iterative updates. We experiment using several multi-task models and show that feedback-prop is effective in all of them. Our results unveil a previously unreported but interesting dynamic property of deep CNNs. We also present an associated technical approach that takes advantage of this property for inference under partial evidence in general visual recognition tasks.
△ Less
Submitted 29 March, 2018; v1 submitted 22 October, 2017;
originally announced October 2017.
-
End-to-end learning potentials for structured attribute prediction
Authors:
Kota Yamaguchi,
Takayuki Okatani,
Takayuki Umeda,
Kazuhiko Murasaki,
Kyoko Sudo
Abstract:
We present a structured inference approach in deep neural networks for multiple attribute prediction. In attribute prediction, a common approach is to learn independent classifiers on top of a good feature representation. However, such classifiers assume conditional independence on features and do not explicitly consider the dependency between attributes in the inference process. We propose to for…
▽ More
We present a structured inference approach in deep neural networks for multiple attribute prediction. In attribute prediction, a common approach is to learn independent classifiers on top of a good feature representation. However, such classifiers assume conditional independence on features and do not explicitly consider the dependency between attributes in the inference process. We propose to formulate attribute prediction in terms of marginal inference in the conditional random field. We model potential functions by deep neural networks and apply the sum-product algorithm to solve for the approximate marginal distribution in feed-forward networks. Our message passing layer implements sparse pairwise potentials by a softplus-linear function that is equivalent to a higher-order classifier, and learns all the model parameters by end-to-end back propagation. The experimental results using SUN attributes and CelebA datasets suggest that the structured inference improves the attribute prediction performance, and possibly uncovers the hidden relationship between attributes.
△ Less
Submitted 6 August, 2017;
originally announced August 2017.
-
Looking at Outfit to Parse Clothing
Authors:
Pongsate Tangseng,
Zhipeng Wu,
Kota Yamaguchi
Abstract:
This paper extends fully-convolutional neural networks (FCN) for the clothing parsing problem. Clothing parsing requires higher-level knowledge on clothing semantics and contextual cues to disambiguate fine-grained categories. We extend FCN architecture with a side-branch network which we refer outfit encoder to predict a consistent set of clothing labels to encourage combinatorial preference, and…
▽ More
This paper extends fully-convolutional neural networks (FCN) for the clothing parsing problem. Clothing parsing requires higher-level knowledge on clothing semantics and contextual cues to disambiguate fine-grained categories. We extend FCN architecture with a side-branch network which we refer outfit encoder to predict a consistent set of clothing labels to encourage combinatorial preference, and with conditional random field (CRF) to explicitly consider coherent label assignment to the given image. The empirical results using Fashionista and CFPD datasets show that our model achieves state-of-the-art performance in clothing parsing, without additional supervision during training. We also study the qualitative influence of annotation on the current clothing parsing benchmarks, with our Web-based tool for multi-scale pixel-wise annotation and manual refinement effort to the Fashionista dataset. Finally, we show that the image representation of the outfit encoder is useful for dress-up image retrieval application.
△ Less
Submitted 3 March, 2017;
originally announced March 2017.
-
Automatic Attribute Discovery with Neural Activations
Authors:
Sirion Vittayakorn,
Takayuki Umeda,
Kazuhiko Murasaki,
Kyoko Sudo,
Takayuki Okatani,
Kota Yamaguchi
Abstract:
How can a machine learn to recognize visual attributes emerging out of online community without a definitive supervised dataset? This paper proposes an automatic approach to discover and analyze visual attributes from a noisy collection of image-text data on the Web. Our approach is based on the relationship between attributes and neural activations in the deep network. We characterize the visual…
▽ More
How can a machine learn to recognize visual attributes emerging out of online community without a definitive supervised dataset? This paper proposes an automatic approach to discover and analyze visual attributes from a noisy collection of image-text data on the Web. Our approach is based on the relationship between attributes and neural activations in the deep network. We characterize the visual property of the attribute word as a divergence within weakly-annotated set of images. We show that the neural activations are useful for discovering and learning a classifier that well agrees with human perception from the noisy real-world Web data. The empirical study suggests the layered structure of the deep neural networks also gives us insights into the perceptual depth of the given word. Finally, we demonstrate that we can utilize highly-activating neurons for finding semantically relevant regions.
△ Less
Submitted 25 July, 2016;
originally announced July 2016.
-
Continuous Markov Random Fields for Robust Stereo Estimation
Authors:
Koichiro Yamaguchi,
Tamir Hazan,
David McAllester,
Raquel Urtasun
Abstract:
In this paper we present a novel slanted-plane MRF model which reasons jointly about occlusion boundaries as well as depth. We formulate the problem as the one of inference in a hybrid MRF composed of both continuous (i.e., slanted 3D planes) and discrete (i.e., occlusion boundaries) random variables. This allows us to define potentials encoding the ownership of the pixels that compose the boundar…
▽ More
In this paper we present a novel slanted-plane MRF model which reasons jointly about occlusion boundaries as well as depth. We formulate the problem as the one of inference in a hybrid MRF composed of both continuous (i.e., slanted 3D planes) and discrete (i.e., occlusion boundaries) random variables. This allows us to define potentials encoding the ownership of the pixels that compose the boundary between segments, as well as potentials encoding which junctions are physically possible. Our approach outperforms the state-of-the-art on Middlebury high resolution imagery as well as in the more challenging KITTI dataset, while being more efficient than existing slanted plane MRF-based methods, taking on average 2 minutes to perform inference on high resolution imagery.
△ Less
Submitted 5 April, 2012;
originally announced April 2012.