Skip to main content

Showing 1–42 of 42 results for author: Ushiku, Y

.
  1. arXiv:2403.11686  [pdf, other

    cs.LG cond-mat.mtrl-sci physics.comp-ph

    Crystalformer: Infinitely Connected Attention for Periodic Structure Encoding

    Authors: Tatsunori Taniai, Ryo Igarashi, Yuta Suzuki, Naoya Chiba, Kotaro Saito, Yoshitaka Ushiku, Kanta Ono

    Abstract: Predicting physical properties of materials from their crystal structures is a fundamental problem in materials science. In peripheral areas such as the prediction of molecular properties, fully connected attention networks have been shown to be successful. However, unlike these finite atom arrangements, crystal structures are infinitely repeating, periodic arrangements of atoms, whose fully conne… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: 13 main pages, 3 figures, 4 tables, 10 appendix pages. Published as a conference paper at ICLR 2024. For more information, see https://omron-sinicx.github.io/crystalformer/

  2. arXiv:2403.01802  [pdf, other

    cs.CV

    TNF: Tri-branch Neural Fusion for Multimodal Medical Data Classification

    Authors: Tong Zheng, Shusaku Sone, Yoshitaka Ushiku, Yuki Oba, Jiaxin Ma

    Abstract: This paper presents a Tri-branch Neural Fusion (TNF) approach designed for classifying multimodal medical images and tabular data. It also introduces two solutions to address the challenge of label inconsistency in multimodal classification. Traditional methods in multi-modality medical data classification often rely on single-label approaches, typically merging features from two distinct input mo… ▽ More

    Submitted 10 March, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

  3. arXiv:2402.12170  [pdf, other

    cs.CL cs.AI

    Where is the answer? Investigating Positional Bias in Language Model Knowledge Extraction

    Authors: Kuniaki Saito, Kihyuk Sohn, Chen-Yu Lee, Yoshitaka Ushiku

    Abstract: Large language models require updates to remain up-to-date or adapt to new domains by fine-tuning them with new documents. One key is memorizing the latest information in a way that the memorized information is extractable with a query prompt. However, LLMs suffer from a phenomenon called perplexity curse; despite minimizing document perplexity during fine-tuning, LLMs struggle to extract informat… ▽ More

    Submitted 23 May, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

  4. arXiv:2312.04070  [pdf, other

    cs.LG

    A Transformer Model for Symbolic Regression towards Scientific Discovery

    Authors: Florian Lalande, Yoshitomo Matsubara, Naoya Chiba, Tatsunori Taniai, Ryo Igarashi, Yoshitaka Ushiku

    Abstract: Symbolic Regression (SR) searches for mathematical expressions which best describe numerical datasets. This allows to circumvent interpretation issues inherent to artificial neural networks, but SR algorithms are often computationally expensive. This work proposes a new Transformer model aiming at Symbolic Regression particularly focused on its application for Scientific Discovery. We propose thre… ▽ More

    Submitted 13 December, 2023; v1 submitted 7 December, 2023; originally announced December 2023.

    Comments: Accepted for oral presentation at NeurIPS2023 AI4Science Workshop. OpenReview: https://openreview.net/forum?id=AIfqWNHKjo

  5. arXiv:2311.16444  [pdf, other

    cs.CV cs.CL

    Exo2EgoDVC: Dense Video Captioning of Egocentric Procedural Activities Using Web Instructional Videos

    Authors: Takehiko Ohkawa, Takuma Yagi, Taichi Nishimura, Ryosuke Furuta, Atsushi Hashimoto, Yoshitaka Ushiku, Yoichi Sato

    Abstract: We propose a novel benchmark for cross-view knowledge transfer of dense video captioning, adapting models from web instructional videos with exocentric views to an egocentric view. While dense video captioning (predicting time segments and their captions) is primarily studied with exocentric videos (e.g., YouCook2), benchmarks with egocentric videos are restricted due to data scarcity. To overcome… ▽ More

    Submitted 29 November, 2023; v1 submitted 27 November, 2023; originally announced November 2023.

  6. arXiv:2311.00967  [pdf, other

    cs.RO cs.AI cs.CL

    Vision-Language Interpreter for Robot Task Planning

    Authors: Keisuke Shirai, Cristian C. Beltran-Hernandez, Masashi Hamaya, Atsushi Hashimoto, Shohei Tanaka, Kento Kawaharazuka, Kazutoshi Tanaka, Yoshitaka Ushiku, Shinsuke Mori

    Abstract: Large language models (LLMs) are accelerating the development of language-guided robot planners. Meanwhile, symbolic planners offer the advantage of interpretability. This paper proposes a new task that bridges these two trends, namely, multimodal planning problem specification. The aim is to generate a problem description (PD), a machine-readable file used by the planners to find a plan. By gener… ▽ More

    Submitted 19 February, 2024; v1 submitted 1 November, 2023; originally announced November 2023.

    Comments: ICRA 2024

  7. arXiv:2310.12515  [pdf, other

    cs.LG

    WeaveNet for Approximating Two-sided Matching Problems

    Authors: Shusaku Sone, Jiaxin Ma, Atsushi Hashimoto, Naoya Chiba, Yoshitaka Ushiku

    Abstract: Matching, a task to optimally assign limited resources under constraints, is a fundamental technology for society. The task potentially has various objectives, conditions, and constraints; however, the efficient neural network architecture for matching is underexplored. This paper proposes a novel graph neural network (GNN), \textit{WeaveNet}, designed for bipartite graphs. Since a bipartite graph… ▽ More

    Submitted 19 October, 2023; originally announced October 2023.

  8. arXiv:2307.02862  [pdf, other

    cs.CV

    A Critical Look at the Current Usage of Foundation Model for Dense Recognition Task

    Authors: Shiqi Yang, Atsushi Hashimoto, Yoshitaka Ushiku

    Abstract: In recent years large model trained on huge amount of cross-modality data, which is usually be termed as foundation model, achieves conspicuous accomplishment in many fields, such as image recognition and generation. Though achieving great success in their original application case, it is still unclear whether those foundation models can be applied to other different downstream tasks. In this pape… ▽ More

    Submitted 1 August, 2023; v1 submitted 6 July, 2023; originally announced July 2023.

    Comments: This is a short report on the current usage of foundation model (mainly pretrained diffusion model) for downstream dense recognition task (e.g., open vocabulary segmentation). We hope this short report could give an insight to the future research

  9. arXiv:2304.10333  [pdf, other

    cs.CV

    Noisy Universal Domain Adaptation via Divergence Optimization for Visual Recognition

    Authors: Qing Yu, Atsushi Hashimoto, Yoshitaka Ushiku

    Abstract: To transfer the knowledge learned from a labeled source domain to an unlabeled target domain, many studies have worked on universal domain adaptation (UniDA), where there is no constraint on the label sets of the source domain and target domain. However, the existing UniDA methods rely on source samples with correct annotations. Due to the limited resources in the real world, it is difficult to ob… ▽ More

    Submitted 20 April, 2023; originally announced April 2023.

  10. arXiv:2212.13120  [pdf, other

    cond-mat.mtrl-sci cs.LG physics.comp-ph

    Neural Structure Fields with Application to Crystal Structure Autoencoders

    Authors: Naoya Chiba, Yuta Suzuki, Tatsunori Taniai, Ryo Igarashi, Yoshitaka Ushiku, Kotaro Saito, Kanta Ono

    Abstract: Representing crystal structures of materials to facilitate determining them via neural networks is crucial for enabling machine-learning applications involving crystal structure estimation. Among these applications, the inverse design of materials can contribute to explore materials with desired properties without relying on luck or serendipity. We propose neural structure fields (NeSF) as an accu… ▽ More

    Submitted 13 December, 2023; v1 submitted 8 December, 2022; originally announced December 2022.

    Comments: 17 pages , 7 figures, 4 tables. 15 pages Supplementary Information

    Journal ref: Communications Materials (2023)

  11. arXiv:2209.10134  [pdf, other

    cs.MM cs.CL cs.CV

    Recipe Generation from Unsegmented Cooking Videos

    Authors: Taichi Nishimura, Atsushi Hashimoto, Yoshitaka Ushiku, Hirotaka Kameko, Shinsuke Mori

    Abstract: This paper tackles recipe generation from unsegmented cooking videos, a task that requires agents to (1) extract key events in completing the dish and (2) generate sentences for the extracted events. Our task is similar to dense video captioning (DVC), which aims at detecting events thoroughly and generating sentences for them. However, unlike DVC, in recipe generation, recipe story awareness is c… ▽ More

    Submitted 18 February, 2024; v1 submitted 21 September, 2022; originally announced September 2022.

    Comments: Accepted at ACM TOMM; ACM Transactions on Multimedia Computing, Communications, and Applications

  12. arXiv:2209.05840  [pdf, other

    cs.CL cs.AI

    Visual Recipe Flow: A Dataset for Learning Visual State Changes of Objects with Recipe Flows

    Authors: Keisuke Shirai, Atsushi Hashimoto, Taichi Nishimura, Hirotaka Kameko, Shuhei Kurita, Yoshitaka Ushiku, Shinsuke Mori

    Abstract: We present a new multimodal dataset called Visual Recipe Flow, which enables us to learn each cooking action result in a recipe text. The dataset consists of object state changes and the workflow of the recipe text. The state change is represented as an image pair, while the workflow is represented as a recipe flow graph (r-FG). The image pairs are grounded in the r-FG, which provides the cross-mo… ▽ More

    Submitted 13 September, 2022; originally announced September 2022.

    Comments: COLING 2022

  13. arXiv:2206.10540  [pdf, other

    cs.LG cs.AI cs.NE cs.SC

    Rethinking Symbolic Regression Datasets and Benchmarks for Scientific Discovery

    Authors: Yoshitomo Matsubara, Naoya Chiba, Ryo Igarashi, Yoshitaka Ushiku

    Abstract: This paper revisits datasets and evaluation criteria for Symbolic Regression (SR), specifically focused on its potential for scientific discovery. Focused on a set of formulas used in the existing datasets based on Feynman Lectures on Physics, we recreate 120 datasets to discuss the performance of symbolic regression for scientific discovery (SRSD). For each of the 120 SRSD datasets, we carefully… ▽ More

    Submitted 5 March, 2024; v1 submitted 21 June, 2022; originally announced June 2022.

    Comments: Accepted at DMLR. Code and datasets are available at https://github.com/omron-sinicx/srsd-benchmark https://huggingface.co/datasets/yoshitomo-matsubara/srsd-feynman_easy https://huggingface.co/datasets/yoshitomo-matsubara/srsd-feynman_medium https://huggingface.co/datasets/yoshitomo-matsubara/srsd-feynman_hard and another three sets of SRSD datasets with dummy variables

  14. arXiv:2202.02149  [pdf, other

    cs.CV

    3D Point Cloud Registration with Learning-based Matching Algorithm

    Authors: Rintaro Yanagi, Atsushi Hashimoto, Shusaku Sone, Naoya Chiba, Jiaxin Ma, Yoshitaka Ushiku

    Abstract: We present a novel differential matching algorithm for 3D point cloud registration. Instead of only optimizing the feature extractor for a matching algorithm, we propose a learning-based matching module optimized to the jointly-trained feature extractor. We focused on edge-wise feature-forwarding architectures, which are memory-consuming but can avoid the over-smoothing effect that GNNs suffer. We… ▽ More

    Submitted 4 December, 2023; v1 submitted 4 February, 2022; originally announced February 2022.

  15. Foreground-Aware Stylization and Consensus Pseudo-Labeling for Domain Adaptation of First-Person Hand Segmentation

    Authors: Takehiko Ohkawa, Takuma Yagi, Atsushi Hashimoto, Yoshitaka Ushiku, Yoichi Sato

    Abstract: Hand segmentation is a crucial task in first-person vision. Since first-person images exhibit strong bias in appearance among different environments, adapting a pre-trained segmentation model to a new domain is required in hand segmentation. Here, we focus on appearance gaps for hand regions and backgrounds separately. We propose (i) foreground-aware image stylization and (ii) consensus pseudo-lab… ▽ More

    Submitted 27 March, 2022; v1 submitted 6 July, 2021; originally announced July 2021.

    Comments: Accepted to IEEE Access 2021

  16. arXiv:2104.13872  [pdf, other

    cs.CL cs.CV

    Removing Word-Level Spurious Alignment between Images and Pseudo-Captions in Unsupervised Image Captioning

    Authors: Ukyo Honda, Yoshitaka Ushiku, Atsushi Hashimoto, Taro Watanabe, Yuji Matsumoto

    Abstract: Unsupervised image captioning is a challenging task that aims at generating captions without the supervision of image-sentence pairs, but only with images and sentences drawn from different sources and object labels detected from the images. In previous work, pseudo-captions, i.e., sentences that contain the detected object labels, were assigned to a given image. The focus of the previous work was… ▽ More

    Submitted 1 June, 2021; v1 submitted 28 April, 2021; originally announced April 2021.

    Comments: EACL 2021 (11 pages, 3 figures; added references)

  17. arXiv:2104.00246  [pdf, other

    cs.CV

    Divergence Optimization for Noisy Universal Domain Adaptation

    Authors: Qing Yu, Atsushi Hashimoto, Yoshitaka Ushiku

    Abstract: Universal domain adaptation (UniDA) has been proposed to transfer knowledge learned from a label-rich source domain to a label-scarce target domain without any constraints on the label sets. In practice, however, it is difficult to obtain a large amount of perfectly clean labeled data in a source domain with limited resources. Existing UniDA methods rely on source samples with correct annotations,… ▽ More

    Submitted 1 April, 2021; originally announced April 2021.

    Comments: CVPR 2021

  18. arXiv:1911.09814  [pdf, other

    cs.CV

    Crowd Density Forecasting by Modeling Patch-based Dynamics

    Authors: Hiroaki Minoura, Ryo Yonetani, Mai Nishimura, Yoshitaka Ushiku

    Abstract: Forecasting human activities observed in videos is a long-standing challenge in computer vision, which leads to various real-world applications such as mobile robots, autonomous driving, and assistive systems. In this work, we present a new visual forecasting task called crowd density forecasting. Given a video of a crowd captured by a surveillance camera, our goal is to predict how that crowd wil… ▽ More

    Submitted 21 November, 2019; originally announced November 2019.

  19. arXiv:1905.09684  [pdf, other

    cs.LG stat.ML

    Decentralized Learning of Generative Adversarial Networks from Non-iid Data

    Authors: Ryo Yonetani, Tomohiro Takahashi, Atsushi Hashimoto, Yoshitaka Ushiku

    Abstract: This work addresses a new problem that learns generative adversarial networks (GANs) from multiple data collections that are each i) owned separately by different clients and ii) drawn from a non-identical distribution that comprises different classes. Given such non-iid data as input, we aim to learn a distribution involving all the classes input data can belong to, while kee** the data decentr… ▽ More

    Submitted 21 November, 2019; v1 submitted 23 May, 2019; originally announced May 2019.

  20. arXiv:1903.06315  [pdf, other

    cs.CV

    Pose Graph Optimization for Unsupervised Monocular Visual Odometry

    Authors: Yang Li, Yoshitaka Ushiku, Tatsuya Harada

    Abstract: Unsupervised Learning based monocular visual odometry (VO) has lately drawn significant attention for its potential in label-free leaning ability and robustness to camera parameters and environmental variations. However, partially due to the lack of drift correction technique, these methods are still by far less accurate than geometric approaches for large-scale odometry estimation. In this paper,… ▽ More

    Submitted 14 March, 2019; originally announced March 2019.

    Comments: Accepted to ICRA'2019

  21. arXiv:1812.04798  [pdf, other

    cs.CV

    Strong-Weak Distribution Alignment for Adaptive Object Detection

    Authors: Kuniaki Saito, Yoshitaka Ushiku, Tatsuya Harada, Kate Saenko

    Abstract: We propose an approach for unsupervised adaptation of object detectors from label-rich to label-poor domains which can significantly reduce annotation costs associated with detection. Recently, approaches that align distributions of source and target images using an adversarial loss have been proven effective for adapting object classifiers. However, for object detection, fully matching the entire… ▽ More

    Submitted 5 April, 2019; v1 submitted 11 December, 2018; originally announced December 2018.

    Comments: Accepted to CVPR2019, project page http://cs-people.bu.edu/keisaito/research/CVPR2019.html

  22. arXiv:1812.04351  [pdf, other

    cs.CV

    Multichannel Semantic Segmentation with Unsupervised Domain Adaptation

    Authors: Kohei Watanabe, Kuniaki Saito, Yoshitaka Ushiku, Tatsuya Harada

    Abstract: Most contemporary robots have depth sensors, and research on semantic segmentation with RGBD images has shown that depth images boost the accuracy of segmentation. Since it is time-consuming to annotate images with semantic labels per pixel, it would be ideal if we could avoid this laborious work by utilizing an existing dataset or a synthetic dataset which we can generate on our own. Robot motion… ▽ More

    Submitted 11 December, 2018; originally announced December 2018.

    Comments: published on AUTONUE Workshops of ECCV 2018

  23. arXiv:1812.01261  [pdf, other

    cs.CV

    Conditional Video Generation Using Action-Appearance Captions

    Authors: Shohei Yamamoto, Antonio Tejero-de-Pablos, Yoshitaka Ushiku, Tatsuya Harada

    Abstract: The field of automatic video generation has received a boost thanks to the recent Generative Adversarial Networks (GANs). However, most existing methods cannot control the contents of the generated video using a text caption, losing their usefulness to a large extent. This particularly affects human videos due to their great variety of actions and appearances. This paper presents Conditional Flow… ▽ More

    Submitted 4 December, 2018; v1 submitted 4 December, 2018; originally announced December 2018.

  24. arXiv:1811.12104  [pdf, other

    cs.CV

    Generating Easy-to-Understand Referring Expressions for Target Identifications

    Authors: Mikihiro Tanaka, Takayuki Itamochi, Kenichi Narioka, Ikuro Sato, Yoshitaka Ushiku, Tatsuya Harada

    Abstract: This paper addresses the generation of referring expressions that not only refer to objects correctly but also let humans find them quickly. As a target becomes relatively less salient, identifying referred objects itself becomes more difficult. However, the existing studies regarded all sentences that refer to objects correctly as equally good, ignoring whether they are easily understood by human… ▽ More

    Submitted 29 August, 2019; v1 submitted 29 November, 2018; originally announced November 2018.

  25. arXiv:1811.11165  [pdf, other

    cs.CV cs.LG stat.ML

    Label-Noise Robust Generative Adversarial Networks

    Authors: Takuhiro Kaneko, Yoshitaka Ushiku, Tatsuya Harada

    Abstract: Generative adversarial networks (GANs) are a framework that learns a generative distribution through adversarial training. Recently, their class-conditional extensions (e.g., conditional GAN (cGAN) and auxiliary classifier GAN (AC-GAN)) have attracted much attention owing to their ability to learn the disentangled representations and to improve the training stability. However, their training requi… ▽ More

    Submitted 2 May, 2019; v1 submitted 27 November, 2018; originally announced November 2018.

    Comments: Accepted to CVPR 2019 (Oral). Project page: https://takuhirok.github.io/rGAN/

  26. arXiv:1811.11163  [pdf, other

    cs.CV cs.LG stat.ML

    Class-Distinct and Class-Mutual Image Generation with GANs

    Authors: Takuhiro Kaneko, Yoshitaka Ushiku, Tatsuya Harada

    Abstract: Class-conditional extensions of generative adversarial networks (GANs), such as auxiliary classifier GAN (AC-GAN) and conditional GAN (cGAN), have garnered attention owing to their ability to decompose representations into class labels and other factors and to boost the training stability. However, a limitation is that they assume that each class is separable and ignore the relationship between cl… ▽ More

    Submitted 24 July, 2019; v1 submitted 27 November, 2018; originally announced November 2018.

    Comments: Accepted to BMVC 2019 (Spotlight). Project page: https://takuhirok.github.io/CP-GAN/

  27. arXiv:1808.01821  [pdf, other

    cs.CV

    Visual Question Generation for Class Acquisition of Unknown Objects

    Authors: Kohei Uehara, Antonio Tejero-De-Pablos, Yoshitaka Ushiku, Tatsuya Harada

    Abstract: Traditional image recognition methods only consider objects belonging to already learned classes. However, since training a recognition model with every object class in the world is unfeasible, a way of getting information on unknown objects (i.e., objects whose class has not been learned) is necessary. A way for an image recognition system to learn new classes could be asking a human about object… ▽ More

    Submitted 6 August, 2018; originally announced August 2018.

  28. arXiv:1805.00460  [pdf, other

    cs.CL cs.AI cs.CV cs.HC

    Customized Image Narrative Generation via Interactive Visual Question Generation and Answering

    Authors: Andrew Shin, Yoshitaka Ushiku, Tatsuya Harada

    Abstract: Image description task has been invariably examined in a static manner with qualitative presumptions held to be universally applicable, regardless of the scope or target of the description. In practice, however, different viewers may pay attention to different aspects of the image, and yield different descriptions or interpretations under various contexts. Such diversity in perspectives is difficu… ▽ More

    Submitted 27 April, 2018; originally announced May 2018.

    Comments: To Appear at CVPR 2018 as spotlight presentation

  29. arXiv:1804.10427  [pdf, other

    cs.CV

    Open Set Domain Adaptation by Backpropagation

    Authors: Kuniaki Saito, Shohei Yamamoto, Yoshitaka Ushiku, Tatsuya Harada

    Abstract: Numerous algorithms have been proposed for transferring knowledge from a label-rich domain (source) to a label-scarce domain (target). Almost all of them are proposed for a closed-set scenario, where the source and the target domain completely share the class of their samples. We call the shared class the \doublequote{known class.} However, in practice, when samples in target domain are not labele… ▽ More

    Submitted 6 July, 2018; v1 submitted 27 April, 2018; originally announced April 2018.

    Comments: Accepted by ECCV2018

  30. arXiv:1804.02843  [pdf, other

    cs.CV

    Viewpoint-aware Video Summarization

    Authors: Atsushi Kanehira, Luc Van Gool, Yoshitaka Ushiku, Tatsuya Harada

    Abstract: This paper introduces a novel variant of video summarization, namely building a summary that depends on the particular aspect of a video the viewer focuses on. We refer to this as $\textit{viewpoint}$. To infer what the desired $\textit{viewpoint}$ may be, we assume that several other videos are available, especially groups of videos, e.g., as folders on a person's phone or laptop. The semantic si… ▽ More

    Submitted 10 April, 2018; v1 submitted 9 April, 2018; originally announced April 2018.

    Comments: to appear at CVPR 2018

  31. arXiv:1712.02560  [pdf, other

    cs.CV

    Maximum Classifier Discrepancy for Unsupervised Domain Adaptation

    Authors: Kuniaki Saito, Kohei Watanabe, Yoshitaka Ushiku, Tatsuya Harada

    Abstract: In this work, we present a method for unsupervised domain adaptation. Many adversarial learning methods train domain classifier networks to distinguish the features as either a source or target and train a feature generator network to mimic the discriminator. Two problems exist with these methods. First, the domain classifier only tries to distinguish the features as a source or target and thus do… ▽ More

    Submitted 3 April, 2018; v1 submitted 7 December, 2017; originally announced December 2017.

    Comments: Accepted to CVPR2018 Oral, Code is available at https://github.com/mil-tokyo/MCD_DA

  32. arXiv:1711.10284  [pdf, other

    cs.LG cs.CV stat.ML

    Between-class Learning for Image Classification

    Authors: Yuji Tokozume, Yoshitaka Ushiku, Tatsuya Harada

    Abstract: In this paper, we propose a novel learning method for image classification called Between-Class learning (BC learning). We generate between-class images by mixing two images belonging to different classes with a random ratio. We then input the mixed image to the model and train the model to output the mixing ratio. BC learning has the ability to impose constraints on the shape of the feature distr… ▽ More

    Submitted 8 April, 2018; v1 submitted 28 November, 2017; originally announced November 2017.

    Comments: 11 pages, 8 figures, published as a conference paper at CVPR 2018

  33. arXiv:1711.10282  [pdf, other

    cs.LG cs.SD eess.AS stat.ML

    Learning from Between-class Examples for Deep Sound Recognition

    Authors: Yuji Tokozume, Yoshitaka Ushiku, Tatsuya Harada

    Abstract: Deep learning methods have achieved high performance in sound recognition tasks. Deciding how to feed the training data is important for further performance improvement. We propose a novel learning method for deep sound recognition: Between-Class learning (BC learning). Our strategy is to learn a discriminative feature space by recognizing the between-class sounds as between-class sounds. We gener… ▽ More

    Submitted 28 February, 2018; v1 submitted 28 November, 2017; originally announced November 2017.

    Comments: 13 pages, 6 figures, published as a conference paper at ICLR 2018

  34. arXiv:1711.09618  [pdf, ps, other

    cs.CV

    Hierarchical Video Generation from Orthogonal Information: Optical Flow and Texture

    Authors: Katsunori Ohnishi, Shohei Yamamoto, Yoshitaka Ushiku, Tatsuya Harada

    Abstract: Learning to represent and generate videos from unlabeled data is a very challenging problem. To generate realistic videos, it is important not only to ensure that the appearance of each frame is real, but also to ensure the plausibility of a video motion and consistency of a video appearance in the time direction. The process of video generation should be divided according to these intrinsic diffi… ▽ More

    Submitted 1 December, 2017; v1 submitted 27 November, 2017; originally announced November 2017.

    Comments: Our supplemental material is available on http://www.mi.t.u-tokyo.ac.jp/assets/publication/hierarchical_video_generation_sup/ Accepted to AAAI2018

  35. arXiv:1711.07566  [pdf, other

    cs.CV cs.LG

    Neural 3D Mesh Renderer

    Authors: Hiroharu Kato, Yoshitaka Ushiku, Tatsuya Harada

    Abstract: For modeling the 3D world behind 2D images, which 3D representation is most appropriate? A polygon mesh is a promising candidate for its compactness and geometric properties. However, it is not straightforward to model a polygon mesh from 2D images using neural networks because the conversion from a mesh to an image, or rendering, involves a discrete operation called rasterization, which prevents… ▽ More

    Submitted 20 November, 2017; originally announced November 2017.

  36. arXiv:1711.01575  [pdf, other

    cs.CV

    Adversarial Dropout Regularization

    Authors: Kuniaki Saito, Yoshitaka Ushiku, Tatsuya Harada, Kate Saenko

    Abstract: We present a method for transferring neural representations from label-rich source domains to unlabeled target domains. Recent adversarial methods proposed for this task learn to align features across domains by fooling a special domain critic network. However, a drawback of this approach is that the critic simply labels the generated features as in-domain or not, without considering the boundarie… ▽ More

    Submitted 1 March, 2018; v1 submitted 5 November, 2017; originally announced November 2017.

    Comments: TBA on ICLR2018

  37. arXiv:1710.11549  [pdf, other

    cs.SD cs.MM eess.AS

    Melody Generation for Pop Music via Word Representation of Musical Properties

    Authors: Andrew Shin, Leopold Crestel, Hiroharu Kato, Kuniaki Saito, Katsunori Ohnishi, Masataka Yamaguchi, Masahiro Nakawaki, Yoshitaka Ushiku, Tatsuya Harada

    Abstract: Automatic melody generation for pop music has been a long-time aspiration for both AI researchers and musicians. However, learning to generate euphonious melody has turned out to be highly challenging due to a number of factors. Representation of multivariate property of notes has been one of the primary challenges. It is also difficult to remain in the permissible spectrum of musical variety, out… ▽ More

    Submitted 31 October, 2017; originally announced October 2017.

    Comments: submitted to ICLR 2018

  38. arXiv:1704.07945  [pdf, ps, other

    cs.CV

    Spatio-temporal Person Retrieval via Natural Language Queries

    Authors: Masataka Yamaguchi, Kuniaki Saito, Yoshitaka Ushiku, Tatsuya Harada

    Abstract: In this paper, we address the problem of spatio-temporal person retrieval from multiple videos using a natural language query, in which we output a tube (i.e., a sequence of bounding boxes) which encloses the person described by the query. For this problem, we introduce a novel dataset consisting of videos containing people annotated with bounding boxes for each second and with five natural langua… ▽ More

    Submitted 22 August, 2017; v1 submitted 25 April, 2017; originally announced April 2017.

    Comments: Accepted to ICCV2017

  39. arXiv:1702.08400  [pdf, ps, other

    cs.CV cs.AI

    Asymmetric Tri-training for Unsupervised Domain Adaptation

    Authors: Kuniaki Saito, Yoshitaka Ushiku, Tatsuya Harada

    Abstract: Deep-layered models trained on a large number of labeled samples boost the accuracy of many tasks. It is important to apply such models to different domains because collecting many labeled samples in various domains is expensive. In unsupervised domain adaptation, one needs to train a classifier that works well on a target domain when provided with labeled source samples and unlabeled target sampl… ▽ More

    Submitted 13 May, 2017; v1 submitted 27 February, 2017; originally announced February 2017.

    Comments: TBA on ICML2017

  40. arXiv:1612.07976  [pdf, ps, other

    cs.LG stat.ML

    DeMIAN: Deep Modality Invariant Adversarial Network

    Authors: Kuniaki Saito, Yusuke Mukuta, Yoshitaka Ushiku, Tatsuya Harada

    Abstract: Obtaining common representations from different modalities is important in that they are interchangeable with each other in a classification problem. For example, we can train a classifier on image features in the common representations and apply it to the testing of the text features in the representations. Existing multi-modal representation learning methods mainly aim to extract rich informatio… ▽ More

    Submitted 27 December, 2016; v1 submitted 23 December, 2016; originally announced December 2016.

  41. arXiv:1609.06657  [pdf, other

    cs.CV cs.CL

    The Color of the Cat is Gray: 1 Million Full-Sentences Visual Question Answering (FSVQA)

    Authors: Andrew Shin, Yoshitaka Ushiku, Tatsuya Harada

    Abstract: Visual Question Answering (VQA) task has showcased a new stage of interaction between language and vision, two of the most pivotal components of artificial intelligence. However, it has mostly focused on generating short and repetitive answers, mostly single words, which fall short of rich linguistic capabilities of humans. We introduce Full-Sentence Visual Question Answering (FSVQA) dataset, cons… ▽ More

    Submitted 21 September, 2016; originally announced September 2016.

  42. arXiv:1606.06108  [pdf, ps, other

    cs.CV

    DualNet: Domain-Invariant Network for Visual Question Answering

    Authors: Kuniaki Saito, Andrew Shin, Yoshitaka Ushiku, Tatsuya Harada

    Abstract: Visual question answering (VQA) task not only bridges the gap between images and language, but also requires that specific contents within the image are understood as indicated by linguistic context of the question, in order to generate the accurate answers. Thus, it is critical to build an efficient embedding of images and texts. We implement DualNet, which fully takes advantage of discriminative… ▽ More

    Submitted 4 May, 2017; v1 submitted 20 June, 2016; originally announced June 2016.

    Comments: Accepted as an oral paper by ICME 2017