Skip to main content

Showing 1–50 of 82 results for author: Sato, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.01090  [pdf, other

    cs.CV

    Learning Object States from Actions via Large Language Models

    Authors: Masatoshi Tateno, Takuma Yagi, Ryosuke Furuta, Yoichi Sato

    Abstract: Temporally localizing the presence of object states in videos is crucial in understanding human activities beyond actions and objects. This task has suffered from a lack of training data due to object states' inherent ambiguity and variety. To avoid exhaustive annotation, learning from transcribed narrations in instructional videos would be intriguing. However, object states are less described in… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: 19 pages of main content, 24 pages of supplementary material

  2. arXiv:2403.16428  [pdf, other

    cs.CV

    Benchmarks and Challenges in Pose Estimation for Egocentric Hand Interactions with Objects

    Authors: Zicong Fan, Takehiko Ohkawa, Linlin Yang, Nie Lin, Zhishan Zhou, Shihao Zhou, Jiajun Liang, Zhong Gao, Xuanyang Zhang, Xue Zhang, Fei Li, Liu Zheng, Feng Lu, Karim Abou Zeid, Bastian Leibe, Jeongwan On, Seungryul Baek, Aditya Prakash, Saurabh Gupta, Kun He, Yoichi Sato, Otmar Hilliges, Hyung ** Chang, Angela Yao

    Abstract: We interact with the world with our hands and see it through our own (egocentric) perspective. A holistic 3D understanding of such interactions from egocentric views is important for tasks in robotics, AR/VR, action recognition and motion generation. Accurately reconstructing such interactions in 3D is challenging due to heavy occlusion, viewpoint bias, camera distortion, and motion blur from the… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  3. arXiv:2403.04381  [pdf, other

    cs.CV

    Single-to-Dual-View Adaptation for Egocentric 3D Hand Pose Estimation

    Authors: Ruicong Liu, Takehiko Ohkawa, Mingfang Zhang, Yoichi Sato

    Abstract: The pursuit of accurate 3D hand pose estimation stands as a keystone for understanding human activity in the realm of egocentric vision. The majority of existing estimation methods still rely on single-view images as input, leading to potential limitations, e.g., limited field-of-view and ambiguity in depth. To address these problems, adding another camera to better capture the shape of hands is a… ▽ More

    Submitted 9 March, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

    Comments: This paper is accepted by CVPR2024. Code will be released at https://github.com/ut-vision/S2DHand

  4. arXiv:2402.17360  [pdf, other

    cs.CV cs.AI cs.RO

    CAPT: Category-level Articulation Estimation from a Single Point Cloud Using Transformer

    Authors: Lian Fu, Ryoichi Ishikawa, Yoshihiro Sato, Takeshi Oishi

    Abstract: The ability to estimate joint parameters is essential for various applications in robotics and computer vision. In this paper, we propose CAPT: category-level articulation estimation from a point cloud using Transformer. CAPT uses an end-to-end transformer-based architecture for joint parameter and state estimation of articulated objects from a single point cloud. The proposed CAPT methods accurat… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

    Comments: Accepted to ICRA 2024

  5. arXiv:2402.00293  [pdf, other

    cs.CV

    FineBio: A Fine-Grained Video Dataset of Biological Experiments with Hierarchical Annotation

    Authors: Takuma Yagi, Misaki Ohashi, Yifei Huang, Ryosuke Furuta, Shungo Adachi, Toutai Mitsuyama, Yoichi Sato

    Abstract: In the development of science, accurate and reproducible documentation of the experimental process is crucial. Automatic recognition of the actions in experiments from videos would help experimenters by complementing the recording of experiments. Towards this goal, we propose FineBio, a new fine-grained video dataset of people performing biological experiments. The dataset consists of multi-view v… ▽ More

    Submitted 31 January, 2024; originally announced February 2024.

  6. arXiv:2401.00159  [pdf, other

    eess.IV cs.CV

    Automatic hip osteoarthritis grading with uncertainty estimation from computed tomography using digitally-reconstructed radiographs

    Authors: Masachika Masuda, Mazen Soufi, Yoshito Otake, Keisuke Uemura, Sotaro Kono, Kazuma Takashima, Hidetoshi Hamada, Yi Gu, Masaki Takao, Seiji Okada, Nobuhiko Sugano, Yoshinobu Sato

    Abstract: Progression of hip osteoarthritis (hip OA) leads to pain and disability, likely leading to surgical treatment such as hip arthroplasty at the terminal stage. The severity of hip OA is often classified using the Crowe and Kellgren-Lawrence (KL) classifications. However, as the classification is subjective, we aimed to develop an automated approach to classify the disease severity based on the two g… ▽ More

    Submitted 30 December, 2023; originally announced January 2024.

  7. arXiv:2312.17670  [pdf, other

    cs.CV cs.LG q-bio.QM q-bio.TO

    Benchmarking the CoW with the TopCoW Challenge: Topology-Aware Anatomical Segmentation of the Circle of Willis for CTA and MRA

    Authors: Kaiyuan Yang, Fabio Musio, Yihui Ma, Norman Juchler, Johannes C. Paetzold, Rami Al-Maskari, Luciano Höher, Hongwei Bran Li, Ibrahim Ethem Hamamci, Anjany Sekuboyina, Suprosanna Shit, Hou**g Huang, Chinmay Prabhakar, Ezequiel de la Rosa, Diana Waldmannstetter, Florian Kofler, Fernando Navarro, Martin Menten, Ivan Ezhov, Daniel Rueckert, Iris Vos, Ynte Ruigrok, Birgitta Velthuis, Hugo Kuijf, Julien Hämmerli , et al. (59 additional authors not shown)

    Abstract: The Circle of Willis (CoW) is an important network of arteries connecting major circulations of the brain. Its vascular architecture is believed to affect the risk, severity, and clinical outcome of serious neuro-vascular diseases. However, characterizing the highly variable CoW anatomy is still a manual and time-consuming expert task. The CoW is usually imaged by two angiographic imaging modaliti… ▽ More

    Submitted 29 April, 2024; v1 submitted 29 December, 2023; originally announced December 2023.

    Comments: 24 pages, 11 figures, 9 tables. Summary Paper for the MICCAI TopCoW 2023 Challenge

  8. arXiv:2312.02544  [pdf, ps, other

    physics.app-ph cs.LG

    Characterization of Locality in Spin States and Forced Moves for Optimizations

    Authors: Yoshiki Sato, Makiko Konoshima, Hirotaka Tamura, Jun Ohkubo

    Abstract: Ising formulations are widely utilized to solve combinatorial optimization problems, and a variety of quantum or semiconductor-based hardware has recently been made available. In combinatorial optimization problems, the existence of local minima in energy landscapes is problematic to use to seek the global minimum. We note that the aim of the optimization is not to obtain exact samplings from the… ▽ More

    Submitted 14 February, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

    Comments: 8 pages, 3 figures

    Journal ref: J. Phys. Soc. Jpn. 93, 044802 (2024)

  9. arXiv:2311.18259  [pdf, other

    cs.CV cs.AI

    Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

    Authors: Kristen Grauman, Andrew Westbury, Lorenzo Torresani, Kris Kitani, Jitendra Malik, Triantafyllos Afouras, Kumar Ashutosh, Vijay Baiyya, Siddhant Bansal, Bikram Boote, Eugene Byrne, Zach Chavis, Joya Chen, Feng Cheng, Fu-Jen Chu, Sean Crane, Avijit Dasgupta, **g Dong, Maria Escobar, Cristhian Forigua, Abrham Gebreselasie, Sanjay Haresh, **g Huang, Md Mohaiminul Islam, Suyog Jain , et al. (76 additional authors not shown)

    Abstract: We present Ego-Exo4D, a diverse, large-scale multimodal multiview video dataset and benchmark challenge. Ego-Exo4D centers around simultaneously-captured egocentric and exocentric video of skilled human activities (e.g., sports, music, dance, bike repair). 740 participants from 13 cities worldwide performed these activities in 123 different natural scene contexts, yielding long-form captures from… ▽ More

    Submitted 29 April, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

    Comments: updated baseline results and dataset statistics to match the released v2 data; added table to appendix comparing stats of Ego-Exo4D alongside other datasets

  10. arXiv:2311.17366  [pdf, other

    cs.CV

    Generative Hierarchical Temporal Transformer for Hand Action Recognition and Motion Prediction

    Authors: Yilin Wen, Hao Pan, Takehiko Ohkawa, Lei Yang, Jia Pan, Yoichi Sato, Taku Komura, Wen** Wang

    Abstract: We present a novel framework that concurrently tackles hand action recognition and 3D future hand motion prediction. While previous works focus on either recognition or prediction, we propose a generative Transformer VAE architecture to jointly capture both aspects, facilitating realistic motion prediction by leveraging the short-term hand motion and long-term action consistency observed across ti… ▽ More

    Submitted 24 December, 2023; v1 submitted 29 November, 2023; originally announced November 2023.

  11. arXiv:2311.16444  [pdf, other

    cs.CV cs.CL

    Exo2EgoDVC: Dense Video Captioning of Egocentric Procedural Activities Using Web Instructional Videos

    Authors: Takehiko Ohkawa, Takuma Yagi, Taichi Nishimura, Ryosuke Furuta, Atsushi Hashimoto, Yoshitaka Ushiku, Yoichi Sato

    Abstract: We propose a novel benchmark for cross-view knowledge transfer of dense video captioning, adapting models from web instructional videos with exocentric views to an egocentric view. While dense video captioning (predicting time segments and their captions) is primarily studied with exocentric videos (e.g., YouCook2), benchmarks with egocentric videos are restricted due to data scarcity. To overcome… ▽ More

    Submitted 29 November, 2023; v1 submitted 27 November, 2023; originally announced November 2023.

  12. GREEMA: Proposal and Experimental Verification of Growing Robot by Eating Environmental MAterial for Landslide Disaster

    Authors: Yusuke Tsunoda, Yuya Sato, Koichi Osuka

    Abstract: In areas that are inaccessible to humans, such as the lunar surface and landslide sites, there is a need for multiple autonomous mobile robot systems that can replace human workers. In particular, at landslide sites such as river channel blockages, robots are required to remove water and sediment from the site as soon as possible. Conventionally, several construction machines have been deployed to… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

    Journal ref: J. Robot. Mechatron., Vol.36 No.2, pp. 415-425, 2024

  13. arXiv:2310.19351  [pdf, other

    cs.CV

    Seeking Flat Minima with Mean Teacher on Semi- and Weakly-Supervised Domain Generalization for Object Detection

    Authors: Ryosuke Furuta, Yoichi Sato

    Abstract: Object detectors do not work well when domains largely differ between training and testing data. To overcome this domain gap in object detection without requiring expensive annotations, we consider two problem settings: semi-supervised domain generalizable object detection (SS-DGOD) and weakly-supervised DGOD (WS-DGOD). In contrast to the conventional domain generalization for object detection tha… ▽ More

    Submitted 15 March, 2024; v1 submitted 30 October, 2023; originally announced October 2023.

  14. arXiv:2310.19303  [pdf, other

    cs.CY

    Extracting user needs with Chat-GPT for dialogue recommendation

    Authors: Yugen Sato, Taisei Nakajima, Tatsuki Kawamoto, Tomohiro Takagi

    Abstract: Large-scale language models (LLMs), such as ChatGPT, are becoming increasingly sophisticated and exhibit human-like capabilities, playing an essential role in assisting humans in a variety of everyday tasks. An important application of AI is interactive recommendation systems that respond to human inquiries and make recommendations tailored to the user. In most conventional interactive recommendat… ▽ More

    Submitted 6 December, 2023; v1 submitted 30 October, 2023; originally announced October 2023.

  15. arXiv:2310.10985  [pdf

    cs.CE

    Computational synthesis of locomoting soft robots by topology optimization

    Authors: Hiroki Kobayashi, Farzad Gholami, S. Macrae Montgomery, Masato Tanaka, Liang Yue, Changyoung Yuhn, Yuki Sato, Atsushi Kawamoto, H. Jerry Qi, Tsuyoshi Nomura

    Abstract: Biological organisms have acquired sophisticated body shapes for walking or climbing through million-year evolutionary processes. In contrast, the components of locomoting soft robots, such as legs and arms, are designed in trial-and-error loops guided by a priori knowledge and experience, which leaves considerable room for improvement. Here, we present optimized soft robots that performed a speci… ▽ More

    Submitted 17 October, 2023; originally announced October 2023.

    Comments: 22 total pages (19 pages, 3 supplementary pages), 4 Figures, 4 Supplementary figures. 1 Supplementary table

  16. Image Crop** under Design Constraints

    Authors: Takumi Nishiyasu, Wataru Shimoda, Yoichi Sato

    Abstract: Image crop** is essential in image editing for obtaining a compositionally enhanced image. In display media, image crop** is a prospective technique for automatically creating media content. However, image crop** for media contents is often required to satisfy various constraints, such as an aspect ratio and blank regions for placing texts or objects. We call this problem image crop** unde… ▽ More

    Submitted 13 October, 2023; originally announced October 2023.

    Comments: ACMMM Asia accepted

  17. arXiv:2310.05511  [pdf, other

    cs.CV

    Proposal-based Temporal Action Localization with Point-level Supervision

    Authors: Yuan Yin, Yifei Huang, Ryosuke Furuta, Yoichi Sato

    Abstract: Point-level supervised temporal action localization (PTAL) aims at recognizing and localizing actions in untrimmed videos where only a single point (frame) within every action instance is annotated in training data. Without temporal annotations, most previous works adopt the multiple instance learning (MIL) framework, where the input video is segmented into non-overlapped short snippets, and actio… ▽ More

    Submitted 9 October, 2023; originally announced October 2023.

    Comments: BMVC 2023

  18. An Accurate Graph Generative Model with Tunable Features

    Authors: Takahiro Yokoyama, Yoshiki Sato, Sho Tsugawa, Kohei Watabe

    Abstract: A graph is a very common and powerful data structure used for modeling communication and social networks. Models that generate graphs with arbitrary features are important basic technologies in repeated simulations of networks and prediction of topology changes. Although existing generative models for graphs are useful for providing graphs similar to real-world graphs, graph generation models with… ▽ More

    Submitted 3 September, 2023; originally announced September 2023.

    Comments: This paper was presented at the 32nd International Conference on Computer Communications and Networks (ICCCN 2023) Poster Track

  19. arXiv:2308.16122  [pdf, other

    cs.LG

    Spatial Graph Coarsening: Weather and Weekday Prediction with London's Bike-Sharing Service using GNN

    Authors: Yuta Sato, Pak Hei Lam, Shruti Gupta, Fareesah Hussain

    Abstract: This study introduced the use of Graph Neural Network (GNN) for predicting the weather and weekday of a day in London, from the dataset of Santander Cycles bike-sharing system as a graph classification task. The proposed GNN models newly introduced (i) a concatenation operator of graph features with trained node embeddings and (ii) a graph coarsening operator based on geographical contiguity, name… ▽ More

    Submitted 30 August, 2023; originally announced August 2023.

  20. arXiv:2308.13441  [pdf, other

    cs.CV

    Mesh-Wise Prediction of Demographic Composition from Satellite Images Using Multi-Head Convolutional Neural Network

    Authors: Yuta Sato

    Abstract: Population aging is one of the most serious problems in certain countries. In order to implement its countermeasures, understanding its rapid progress is of urgency with a granular resolution. However, a detailed and rigorous survey with high frequency is not feasible due to the constraints of financial and human resources. Nowadays, Deep Learning is prevalent for pattern recognition with signific… ▽ More

    Submitted 25 August, 2023; originally announced August 2023.

  21. arXiv:2307.13986  [pdf, other

    eess.IV cs.CV

    Hybrid Representation-Enhanced Sampling for Bayesian Active Learning in Musculoskeletal Segmentation of Lower Extremities

    Authors: Gan** Li, Yoshito Otake, Mazen Soufi, Masashi Taniguchi, Masahide Yagi, Noriaki Ichihashi, Keisuke Uemura, Masaki Takao, Nobuhiko Sugano, Yoshinobu Sato

    Abstract: Purpose: Manual annotations for training deep learning (DL) models in auto-segmentation are time-intensive. This study introduces a hybrid representation-enhanced sampling strategy that integrates both density and diversity criteria within an uncertainty-based Bayesian active learning (BAL) framework to reduce annotation efforts by selecting the most informative training samples. Methods: The expe… ▽ More

    Submitted 20 December, 2023; v1 submitted 26 July, 2023; originally announced July 2023.

    Comments: 15 pages, 5 figures

  22. arXiv:2307.11513  [pdf, other

    eess.IV cs.CV

    Bone mineral density estimation from a plain X-ray image by learning decomposition into projections of bone-segmented computed tomography

    Authors: Yi Gu, Yoshito Otake, Keisuke Uemura, Mazen Soufi, Masaki Takao, Hugues Talbot, Seiji Okada, Nobuhiko Sugano, Yoshinobu Sato

    Abstract: Osteoporosis is a prevalent bone disease that causes fractures in fragile bones, leading to a decline in daily living activities. Dual-energy X-ray absorptiometry (DXA) and quantitative computed tomography (QCT) are highly accurate for diagnosing osteoporosis; however, these modalities require special equipment and scan protocols. To frequently monitor bone health, low-cost, low-dose, and ubiquito… ▽ More

    Submitted 21 July, 2023; originally announced July 2023.

    Comments: 20 pages and 22 figures

  23. arXiv:2307.01467  [pdf

    cs.CV

    Technical Report for Ego4D Long Term Action Anticipation Challenge 2023

    Authors: Tatsuya Ishibashi, Kosuke Ono, Noriyuki Kugo, Yuji Sato

    Abstract: In this report, we describe the technical details of our approach for the Ego4D Long-Term Action Anticipation Challenge 2023. The aim of this task is to predict a sequence of future actions that will take place at an arbitrary time or later, given an input video. To accomplish this task, we introduce three improvements to the baseline model, which consists of an encoder that generates clip-level f… ▽ More

    Submitted 4 July, 2023; originally announced July 2023.

  24. arXiv:2305.19920  [pdf, other

    cs.CV

    MSKdeX: Musculoskeletal (MSK) decomposition from an X-ray image for fine-grained estimation of lean muscle mass and muscle volume

    Authors: Yi Gu, Yoshito Otake, Keisuke Uemura, Masaki Takao, Mazen Soufi, Yuta Hiasa, Hugues Talbot, Seiji Okata, Nobuhiko Sugano, Yoshinobu Sato

    Abstract: Musculoskeletal diseases such as sarcopenia and osteoporosis are major obstacles to health during aging. Although dual-energy X-ray absorptiometry (DXA) and computed tomography (CT) can be used to evaluate musculoskeletal conditions, frequent monitoring is difficult due to the cost and accessibility (as well as high radiation exposure in the case of CT). We propose a method (named MSKdeX) to estim… ▽ More

    Submitted 21 July, 2023; v1 submitted 31 May, 2023; originally announced May 2023.

    Comments: MICCAI 2023 early acceptance (12 pages and 6 figures)

  25. arXiv:2305.07152  [pdf, other

    cs.CV

    Surgical tool classification and localization: results and methods from the MICCAI 2022 SurgToolLoc challenge

    Authors: Aneeq Zia, Kiran Bhattacharyya, Xi Liu, Max Berniker, Ziheng Wang, Rogerio Nespolo, Satoshi Kondo, Satoshi Kasai, Kousuke Hirasawa, Bo Liu, David Austin, Yiheng Wang, Michal Futrega, Jean-Francois Puget, Zhenqiang Li, Yoichi Sato, Ryo Fujii, Ryo Hachiuma, Mana Masuda, Hideo Saito, An Wang, Mengya Xu, Mobarakol Islam, Long Bai, Winnie Pang , et al. (46 additional authors not shown)

    Abstract: The ability to automatically detect and track surgical instruments in endoscopic videos can enable transformational interventions. Assessing surgical performance and efficiency, identifying skilled tool use and choreography, and planning operational and logistical aspects of OR resources are just a few of the applications that could benefit. Unfortunately, obtaining the annotations needed to train… ▽ More

    Submitted 31 May, 2023; v1 submitted 11 May, 2023; originally announced May 2023.

  26. arXiv:2303.05937  [pdf, other

    cs.CV

    Structural Multiplane Image: Bridging Neural View Synthesis and 3D Reconstruction

    Authors: Mingfang Zhang, **glu Wang, Xiao Li, Yifei Huang, Yoichi Sato, Yan Lu

    Abstract: The Multiplane Image (MPI), containing a set of fronto-parallel RGBA layers, is an effective and efficient representation for view synthesis from sparse inputs. Yet, its fixed structure limits the performance, especially for surfaces imaged at oblique angles. We introduce the Structural MPI (S-MPI), where the plane structure approximates 3D scenes concisely. Conveying RGBA contexts with geometrica… ▽ More

    Submitted 10 March, 2023; originally announced March 2023.

    Comments: Accepted to CVPR2023

  27. arXiv:2302.03292  [pdf, other

    cs.CV

    Fine-grained Affordance Annotation for Egocentric Hand-Object Interaction Videos

    Authors: Zecheng Yu, Yifei Huang, Ryosuke Furuta, Takuma Yagi, Yusuke Goutsu, Yoichi Sato

    Abstract: Object affordance is an important concept in hand-object interaction, providing information on action possibilities based on human motor capacity and objects' physical property thus benefiting tasks such as action anticipation and robot imitation learning. However, the definition of affordance in existing datasets often: 1) mix up affordance with object functionality; 2) confuse affordance with go… ▽ More

    Submitted 9 February, 2023; v1 submitted 7 February, 2023; originally announced February 2023.

    Comments: WACV 2023. Refined version of Workshop article arXiv:2206.05424

  28. 4D topology optimization: Integrated optimization of the structure and self-actuation of soft bodies for dynamic motions

    Authors: Changyoung Yuhn, Yuki Sato, Hiroki Kobayashi, Atsushi Kawamoto, Tsuyoshi Nomura

    Abstract: Topology optimization is a powerful tool utilized in various fields for structural design. However, its application has primarily been restricted to static or passively moving objects, mainly focusing on hard materials with limited deformations and contact capabilities. Designing soft and actively moving objects, such as soft robots equipped with actuators, poses challenges due to simulating dynam… ▽ More

    Submitted 29 June, 2023; v1 submitted 2 February, 2023; originally announced February 2023.

    Comments: 36 pages, 27 figures; for supplementary video, see https://youtu.be/sPY2jcAsNYs

    Journal ref: Comput. Methods Appl. Mech. Engrg. 414 (2023) 116187

  29. arXiv:2211.11492  [pdf, other

    cs.CV

    ClipCrop: Conditioned Crop** Driven by Vision-Language Model

    Authors: Zhihang Zhong, Mingxi Cheng, Zhirong Wu, Yuhui Yuan, Yinqiang Zheng, Ji Li, Han Hu, Stephen Lin, Yoichi Sato, Imari Sato

    Abstract: Image crop** has progressed tremendously under the data-driven paradigm. However, current approaches do not account for the intentions of the user, which is an issue especially when the composition of the input image is complex. Moreover, labeling of crop** data is costly and hence the amount of data is limited, leading to poor generalization performance of current algorithms in the wild. In t… ▽ More

    Submitted 21 November, 2022; originally announced November 2022.

  30. arXiv:2209.01755  [pdf, other

    cs.CE

    Free material optimization of thermal conductivity tensors with asymmetric components

    Authors: Yuki Sato, Teppei Deguchi, Tsuyoshi Nomura, Atsushi Kawamoto

    Abstract: Free Material Optimization (FMO), a branch of topology optimization, in which the design variables are the full constitutive tensors, can provide the most general form of the design problems. Considering the microstructure composed of isotropic materials, the constitutive tensors are yet positive definite and symmetric. On the other hand, it has been reported that the symmetry of this constitutive… ▽ More

    Submitted 5 September, 2022; originally announced September 2022.

    Comments: 12 pages, 9 figures

    MSC Class: 65K15(Primary) 65N30(Secondary) ACM Class: J.2; J.6; G.1.6

  31. arXiv:2208.02611  [pdf, other

    cs.CV

    Surgical Skill Assessment via Video Semantic Aggregation

    Authors: Zhenqiang Li, Lin Gu, Weimin Wang, Ryosuke Nakamura, Yoichi Sato

    Abstract: Automated video-based assessment of surgical skills is a promising task in assisting young surgical trainees, especially in poor-resource areas. Existing works often resort to a CNN-LSTM joint framework that models long-term relationships by LSTMs on spatially pooled short-term CNN features. However, this practice would inevitably neglect the difference among semantic concepts such as tools, tissu… ▽ More

    Submitted 4 August, 2022; originally announced August 2022.

    Comments: To appear in MICCAI 2022

  32. arXiv:2207.11467  [pdf, other

    cs.CV cs.AI

    CompNVS: Novel View Synthesis with Scene Completion

    Authors: Zuoyue Li, Tianxing Fan, Zhenqiang Li, Zhaopeng Cui, Yoichi Sato, Marc Pollefeys, Martin R. Oswald

    Abstract: We introduce a scalable framework for novel view synthesis from RGB-D images with largely incomplete scene coverage. While generative neural approaches have demonstrated spectacular results on 2D images, they have not yet achieved similar photorealistic results in combination with scene completion where a spatial 3D scene understanding is essential. To this end, we propose a generative pipeline pe… ▽ More

    Submitted 23 July, 2022; originally announced July 2022.

    Comments: ECCV 2022

  33. arXiv:2207.05515  [pdf, other

    cs.CV

    Compound Prototype Matching for Few-shot Action Recognition

    Authors: Yifei Huang, Li** Yang, Yoichi Sato

    Abstract: Few-shot action recognition aims to recognize novel action classes using only a small number of labeled training samples. In this work, we propose a novel approach that first summarizes each video into compound prototypes consisting of a group of global prototypes and a group of focused prototypes, and then compares video similarity based on the prototypes. Each global prototype is encouraged to s… ▽ More

    Submitted 14 October, 2023; v1 submitted 12 July, 2022; originally announced July 2022.

    Comments: ECCV 2022

  34. arXiv:2207.03210  [pdf, other

    eess.IV cs.CV

    BMD-GAN: Bone mineral density estimation using x-ray image decomposition into projections of bone-segmented quantitative computed tomography using hierarchical learning

    Authors: Yi Gu, Yoshito Otake, Keisuke Uemura, Mazen Soufi, Masaki Takao, Nobuhiko Sugano, Yoshinobu Sato

    Abstract: We propose a method for estimating the bone mineral density (BMD) from a plain x-ray image. Dual-energy X-ray absorptiometry (DXA) and quantitative computed tomography (QCT) provide high accuracy in diagnosing osteoporosis; however, these modalities require special equipment and scan protocols. Measuring BMD from an x-ray image provides an opportunistic screening, which is potentially useful for e… ▽ More

    Submitted 7 July, 2022; originally announced July 2022.

    Comments: MICCAI 2022 Provisional Acceptance

  35. arXiv:2206.05424  [pdf, other

    cs.CV

    Precise Affordance Annotation for Egocentric Action Video Datasets

    Authors: Zecheng Yu, Yifei Huang, Ryosuke Furuta, Takuma Yagi, Yusuke Goutsu, Yoichi Sato

    Abstract: Object affordance is an important concept in human-object interaction, providing information on action possibilities based on human motor capacity and objects' physical property thus benefiting tasks such as action anticipation and robot imitation learning. However, existing datasets often: 1) mix up affordance with object functionality; 2) confuse affordance with goal-related action; and 3) ignor… ▽ More

    Submitted 11 June, 2022; originally announced June 2022.

    Comments: Technical report for CVPR 2022 EPIC-Ego4D Workshop

  36. arXiv:2206.05319  [pdf, other

    cs.CV

    Object Instance Identification in Dynamic Environments

    Authors: Takuma Yagi, Md Tasnimul Hasan, Yoichi Sato

    Abstract: We study the problem of identifying object instances in a dynamic environment where people interact with the objects. In such an environment, objects' appearance changes dynamically by interaction with other entities, occlusion by hands, background change, etc. This leads to a larger intra-instance variation of appearance than in static environments. To discover the challenges in this setting, we… ▽ More

    Submitted 10 June, 2022; originally announced June 2022.

    Comments: Joint 1st Ego4D and 10th EPIC Workshop (EPIC@CVPR2022) Extended Abstract

  37. arXiv:2206.02257  [pdf, other

    cs.CV

    Efficient Annotation and Learning for 3D Hand Pose Estimation: A Survey

    Authors: Takehiko Ohkawa, Ryosuke Furuta, Yoichi Sato

    Abstract: In this survey, we present a systematic review of 3D hand pose estimation from the perspective of efficient annotation and learning. 3D hand pose estimation has been an important research area owing to its potential to enable various applications, such as video understanding, AR/VR, and robotics. However, the performance of models is tied to the quality and quantity of annotated 3D hand poses. Und… ▽ More

    Submitted 26 April, 2023; v1 submitted 5 June, 2022; originally announced June 2022.

  38. Topology optimization of locomoting soft bodies using material point method

    Authors: Yuki Sato, Hiroki Kobayashi, Changyoung Yuhn, Atsushi Kawamoto, Tsuyoshi Nomura, Noboru Kikuchi

    Abstract: Topology optimization methods have widely been used in various industries, owing to their potential for providing promising design candidates for mechanical devices. However, their applications are usually limited to the objects which do not move significantly due to the difficulty in computationally efficient handling of the contact and interactions among multiple structures or with boundaries by… ▽ More

    Submitted 28 February, 2023; v1 submitted 31 March, 2022; originally announced March 2022.

    Comments: 9 pages, 7 figures

    Journal ref: Structural and Multidisciplinary Optimization volume 66, Article number: 50 (2023)

  39. arXiv:2203.08344  [pdf, other

    cs.CV cs.LG

    Domain Adaptive Hand Keypoint and Pixel Localization in the Wild

    Authors: Takehiko Ohkawa, Yu-Jhe Li, Qichen Fu, Ryosuke Furuta, Kris M. Kitani, Yoichi Sato

    Abstract: We aim to improve the performance of regressing hand keypoints and segmenting pixel-level hand masks under new imaging conditions (e.g., outdoors) when we only have labeled images taken under very different conditions (e.g., indoors). In the real world, it is important that the model trained for both tasks works under various imaging conditions. However, their variation covered by existing labeled… ▽ More

    Submitted 14 July, 2022; v1 submitted 15 March, 2022; originally announced March 2022.

    Comments: Accepted to ECCV 2022

  40. arXiv:2202.13941  [pdf, other

    cs.CV cs.AI cs.LG

    Background Mixup Data Augmentation for Hand and Object-in-Contact Detection

    Authors: Koya Tango, Takehiko Ohkawa, Ryosuke Furuta, Yoichi Sato

    Abstract: Detecting the positions of human hands and objects-in-contact (hand-object detection) in each video frame is vital for understanding human activities from videos. For training an object detector, a method called Mixup, which overlays two training images to mitigate data bias, has been empirically shown to be effective for data augmentation. However, in hand-object detection, mixing two hand-manipu… ▽ More

    Submitted 28 February, 2022; v1 submitted 28 February, 2022; originally announced February 2022.

    Comments: 5 pages, 4 figures

  41. GraphTune: A Learning-based Graph Generative Model with Tunable Structural Features

    Authors: Kohei Watabe, Shohei Nakazawa, Yoshiki Sato, Sho Tsugawa, Kenji Nakagawa

    Abstract: Generative models for graphs have been actively studied for decades, and they have a wide range of applications. Recently, learning-based graph generation that reproduces real-world graphs has been attracting the attention of many researchers. Although several generative models that utilize modern machine learning technologies have been proposed, conditional generation of general graphs has been l… ▽ More

    Submitted 5 April, 2023; v1 submitted 27 January, 2022; originally announced January 2022.

    Comments: The paper was published in IEEE Transactions on Network Science and Engineering (2023). An earlier and short version of this paper was presented at the 41st IEEE International Conference on Distributed Computing Systems (ICDCS 2021) Poster Track

  42. arXiv:2112.01038  [pdf, other

    cs.CV

    Stacked Temporal Attention: Improving First-person Action Recognition by Emphasizing Discriminative Clips

    Authors: Li** Yang, Yifei Huang, Yusuke Sugano, Yoichi Sato

    Abstract: First-person action recognition is a challenging task in video understanding. Because of strong ego-motion and a limited field of view, many backgrounds or noisy frames in a first-person video can distract an action recognition model during its learning process. To encode more discriminative features, the model needs to have the ability to focus on the most relevant part of the video for action re… ▽ More

    Submitted 2 December, 2021; originally announced December 2021.

    Comments: BMVC 2021

  43. arXiv:2112.01034  [pdf, other

    cs.CV eess.IV

    Leveraging Human Selective Attention for Medical Image Analysis with Limited Training Data

    Authors: Yifei Huang, Xiaoxiao Li, Li** Yang, Lin Gu, Yingying Zhu, Hirofumi Seo, Qiuming Meng, Tatsuya Harada, Yoichi Sato

    Abstract: The human gaze is a cost-efficient physiological data that reveals human underlying attentional patterns. The selective attention mechanism helps the cognition system focus on task-relevant visual clues by ignoring the presence of distractors. Thanks to this ability, human beings can efficiently learn from a very limited number of training samples. Inspired by this mechanism, we aim to leverage ga… ▽ More

    Submitted 2 December, 2021; originally announced December 2021.

    Comments: BMVC 2021

  44. arXiv:2110.11407  [pdf, other

    cs.CV cs.LG eess.SY

    Video-Data Pipelines for Machine Learning Applications

    Authors: Sohini Roychowdhury, James Y. Sato

    Abstract: Data pipelines are an essential component for end-to-end solutions that take machine learning algorithms to production. Engineering data pipelines for video-sequences poses several challenges including isolation of key-frames from video sequences that are high quality and represent significant variations in the scene. Manual isolation of such quality key-frames can take hours of sifting through ho… ▽ More

    Submitted 15 October, 2021; originally announced October 2021.

    Comments: 10 pages, 6 Figures, 5 Tables, conference

  45. arXiv:2110.10174  [pdf, other

    cs.CV

    Hand-Object Contact Prediction via Motion-Based Pseudo-Labeling and Guided Progressive Label Correction

    Authors: Takuma Yagi, Md Tasnimul Hasan, Yoichi Sato

    Abstract: Every hand-object interaction begins with contact. Despite predicting the contact state between hands and objects is useful in understanding hand-object interactions, prior methods on hand-object analysis have assumed that the interacting hands and objects are known, and were not studied in detail. In this study, we introduce a video-based method for predicting contact between a hand and an object… ▽ More

    Submitted 19 October, 2021; originally announced October 2021.

    Comments: BMVC 2021

  46. arXiv:2110.07058  [pdf, other

    cs.CV cs.AI

    Ego4D: Around the World in 3,000 Hours of Egocentric Video

    Authors: Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhongcong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent Cartillier, Sean Crane, Tien Do , et al. (60 additional authors not shown)

    Abstract: We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household, outdoor, workplace, leisure, etc.) captured by 931 unique camera wearers from 74 worldwide locations and 9 different countries. The approach to collection is designed to uphold rigorous privacy and ethics standards with cons… ▽ More

    Submitted 11 March, 2022; v1 submitted 13 October, 2021; originally announced October 2021.

    Comments: To appear in the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022. This version updates the baseline result numbers for the Hands and Objects benchmark (appendix)

  47. Spatio-Temporal Perturbations for Video Attribution

    Authors: Zhenqiang Li, Weimin Wang, Zuoyue Li, Yifei Huang, Yoichi Sato

    Abstract: The attribution method provides a direction for interpreting opaque neural networks in a visual way by identifying and visualizing the input regions/pixels that dominate the output of a network. Regarding the attribution method for visually explaining video understanding networks, it is challenging because of the unique spatiotemporal dependencies existing in video inputs and the special 3D convol… ▽ More

    Submitted 1 September, 2021; originally announced September 2021.

    Journal ref: IEEE Transactions on Circuits and Systems for Video Technology 2021

  48. Foreground-Aware Stylization and Consensus Pseudo-Labeling for Domain Adaptation of First-Person Hand Segmentation

    Authors: Takehiko Ohkawa, Takuma Yagi, Atsushi Hashimoto, Yoshitaka Ushiku, Yoichi Sato

    Abstract: Hand segmentation is a crucial task in first-person vision. Since first-person images exhibit strong bias in appearance among different environments, adapting a pre-trained segmentation model to a new domain is required in hand segmentation. Here, we focus on appearance gaps for hand regions and backgrounds separately. We propose (i) foreground-aware image stylization and (ii) consensus pseudo-lab… ▽ More

    Submitted 27 March, 2022; v1 submitted 6 July, 2021; originally announced July 2021.

    Comments: Accepted to IEEE Access 2021

  49. arXiv:2106.10026  [pdf, other

    cs.CV

    EPIC-KITCHENS-100 Unsupervised Domain Adaptation Challenge for Action Recognition 2021: Team M3EM Technical Report

    Authors: Li** Yang, Yifei Huang, Yusuke Sugano, Yoichi Sato

    Abstract: In this report, we describe the technical details of our submission to the 2021 EPIC-KITCHENS-100 Unsupervised Domain Adaptation Challenge for Action Recognition. Leveraging multiple modalities has been proved to benefit the Unsupervised Domain Adaptation (UDA) task. In this work, we present Multi-Modal Mutual Enhancement Module (M3EM), a deep module for jointly considering information from multip… ▽ More

    Submitted 30 June, 2021; v1 submitted 18 June, 2021; originally announced June 2021.

  50. arXiv:2105.10131  [pdf, other

    cs.CV cs.AI

    Visual representation of negation: Real world data analysis on comic image design

    Authors: Yuri Sato, Koji Mineshima, Kazuhiro Ueda

    Abstract: There has been a widely held view that visual representations (e.g., photographs and illustrations) do not depict negation, for example, one that can be expressed by a sentence "the train is not coming". This view is empirically challenged by analyzing the real-world visual representations of comic (manga) illustrations. In the experiment using image captioning tasks, we gave people comic illustra… ▽ More

    Submitted 21 May, 2021; originally announced May 2021.

    Comments: To appear in Proceedings of the 43rd Annual Conference of the Cognitive Science Society (CogSci 2021)