-
Object Segmentation from Open-Vocabulary Manipulation Instructions Based on Optimal Transport Polygon Matching with Multimodal Foundation Models
Authors:
Takayuki Nishimura,
Katsuyuki Kuyo,
Motonari Kambara,
Komei Sugiura
Abstract:
We consider the task of generating segmentation masks for the target object from an object manipulation instruction, which allows users to give open vocabulary instructions to domestic service robots. Conventional segmentation generation approaches often fail to account for objects outside the camera's field of view and cases in which the order of vertices differs but still represents the same pol…
▽ More
We consider the task of generating segmentation masks for the target object from an object manipulation instruction, which allows users to give open vocabulary instructions to domestic service robots. Conventional segmentation generation approaches often fail to account for objects outside the camera's field of view and cases in which the order of vertices differs but still represents the same polygon, which leads to erroneous mask generation. In this study, we propose a novel method that generates segmentation masks from open vocabulary instructions. We implement a novel loss function using optimal transport to prevent significant loss where the order of vertices differs but still represents the same polygon. To evaluate our approach, we constructed a new dataset based on the REVERIE dataset and Matterport3D dataset. The results demonstrated the effectiveness of the proposed method compared with existing mask generation methods. Remarkably, our best model achieved a +16.32% improvement on the dataset compared with a representative polygon-based method.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Learning-To-Rank Approach for Identifying Everyday Objects Using a Physical-World Search Engine
Authors:
Kanta Kaneda,
Shunya Nagashima,
Ryosuke Korekata,
Motonari Kambara,
Komei Sugiura
Abstract:
Domestic service robots offer a solution to the increasing demand for daily care and support. A human-in-the-loop approach that combines automation and operator intervention is considered to be a realistic approach to their use in society. Therefore, we focus on the task of retrieving target objects from open-vocabulary user instructions in a human-in-the-loop setting, which we define as the learn…
▽ More
Domestic service robots offer a solution to the increasing demand for daily care and support. A human-in-the-loop approach that combines automation and operator intervention is considered to be a realistic approach to their use in society. Therefore, we focus on the task of retrieving target objects from open-vocabulary user instructions in a human-in-the-loop setting, which we define as the learning-to-rank physical objects (LTRPO) task. For example, given the instruction "Please go to the dining room which has a round table. Pick up the bottle on it," the model is required to output a ranked list of target objects that the operator/user can select. In this paper, we propose MultiRankIt, which is a novel approach for the LTRPO task. MultiRankIt introduces the Crossmodal Noun Phrase Encoder to model the relationship between phrases that contain referring expressions and the target bounding box, and the Crossmodal Region Feature Encoder to model the relationship between the target object and multiple images of its surrounding contextual environment. Additionally, we built a new dataset for the LTRPO task that consists of instructions with complex referring expressions accompanied by real indoor environmental images that feature various target objects. We validated our model on the dataset and it outperformed the baseline method in terms of the mean reciprocal rank and recall@k. Furthermore, we conducted physical experiments in a setting where a domestic service robot retrieved everyday objects in a standardized domestic environment, based on users' instruction in a human--in--the--loop setting. The experimental results demonstrate that the success rate for object retrieval achieved 80%. Our code is available at https://github.com/keio-smilab23/MultiRankIt.
△ Less
Submitted 25 December, 2023;
originally announced December 2023.
-
DialMAT: Dialogue-Enabled Transformer with Moment-Based Adversarial Training
Authors:
Kanta Kaneda,
Ryosuke Korekata,
Yuiga Wada,
Shunya Nagashima,
Motonari Kambara,
Yui Iioka,
Haruka Matsuo,
Yuto Imai,
Takayuki Nishimura,
Komei Sugiura
Abstract:
This paper focuses on the DialFRED task, which is the task of embodied instruction following in a setting where an agent can actively ask questions about the task. To address this task, we propose DialMAT. DialMAT introduces Moment-based Adversarial Training, which incorporates adversarial perturbations into the latent space of language, image, and action. Additionally, it introduces a crossmodal…
▽ More
This paper focuses on the DialFRED task, which is the task of embodied instruction following in a setting where an agent can actively ask questions about the task. To address this task, we propose DialMAT. DialMAT introduces Moment-based Adversarial Training, which incorporates adversarial perturbations into the latent space of language, image, and action. Additionally, it introduces a crossmodal parallel feature extraction mechanism that applies foundation models to both language and image. We evaluated our model using a dataset constructed from the DialFRED dataset and demonstrated superior performance compared to the baseline method in terms of success rate and path weighted success rate. The model secured the top position in the DialFRED Challenge, which took place at the CVPR 2023 Embodied AI workshop.
△ Less
Submitted 12 November, 2023;
originally announced November 2023.
-
Fully Automated Task Management for Generation, Execution, and Evaluation: A Framework for Fetch-and-Carry Tasks with Natural Language Instructions in Continuous Space
Authors:
Motonari Kambara,
Komei Sugiura
Abstract:
This paper aims to develop a framework that enables a robot to execute tasks based on visual information, in response to natural language instructions for Fetch-and-Carry with Object Grounding (FCOG) tasks. Although there have been many frameworks, they usually rely on manually given instruction sentences. Therefore, evaluations have only been conducted with fixed tasks. Furthermore, many multimod…
▽ More
This paper aims to develop a framework that enables a robot to execute tasks based on visual information, in response to natural language instructions for Fetch-and-Carry with Object Grounding (FCOG) tasks. Although there have been many frameworks, they usually rely on manually given instruction sentences. Therefore, evaluations have only been conducted with fixed tasks. Furthermore, many multimodal language understanding models for the benchmarks only consider discrete actions. To address the limitations, we propose a framework for the full automation of the generation, execution, and evaluation of FCOG tasks. In addition, we introduce an approach to solving the FCOG tasks by dividing them into four distinct subtasks.
△ Less
Submitted 7 November, 2023;
originally announced November 2023.
-
Switching Head-Tail Funnel UNITER for Dual Referring Expression Comprehension with Fetch-and-Carry Tasks
Authors:
Ryosuke Korekata,
Motonari Kambara,
Yu Yoshida,
Shintaro Ishikawa,
Yosuke Kawasaki,
Masaki Takahashi,
Komei Sugiura
Abstract:
This paper describes a domestic service robot (DSR) that fetches everyday objects and carries them to specified destinations according to free-form natural language instructions. Given an instruction such as "Move the bottle on the left side of the plate to the empty chair," the DSR is expected to identify the bottle and the chair from multiple candidates in the environment and carry the target ob…
▽ More
This paper describes a domestic service robot (DSR) that fetches everyday objects and carries them to specified destinations according to free-form natural language instructions. Given an instruction such as "Move the bottle on the left side of the plate to the empty chair," the DSR is expected to identify the bottle and the chair from multiple candidates in the environment and carry the target object to the destination. Most of the existing multimodal language understanding methods are impractical in terms of computational complexity because they require inferences for all combinations of target object candidates and destination candidates. We propose Switching Head-Tail Funnel UNITER, which solves the task by predicting the target object and the destination individually using a single model. Our method is validated on a newly-built dataset consisting of object manipulation instructions and semi photo-realistic images captured in a standard Embodied AI simulator. The results show that our method outperforms the baseline method in terms of language comprehension accuracy. Furthermore, we conduct physical experiments in which a DSR delivers standardized everyday objects in a standardized domestic environment as requested by instructions with referring expressions. The experimental results show that the object gras** and placing actions are achieved with success rates of more than 90%.
△ Less
Submitted 14 July, 2023;
originally announced July 2023.
-
Relational Future Captioning Model for Explaining Likely Collisions in Daily Tasks
Authors:
Motonari Kambara,
Komei Sugiura
Abstract:
Domestic service robots that support daily tasks are a promising solution for elderly or disabled people. It is crucial for domestic service robots to explain the collision risk before they perform actions. In this paper, our aim is to generate a caption about a future event. We propose the Relational Future Captioning Model (RFCM), a crossmodal language generation model for the future captioning…
▽ More
Domestic service robots that support daily tasks are a promising solution for elderly or disabled people. It is crucial for domestic service robots to explain the collision risk before they perform actions. In this paper, our aim is to generate a caption about a future event. We propose the Relational Future Captioning Model (RFCM), a crossmodal language generation model for the future captioning task. The RFCM has the Relational Self-Attention Encoder to extract the relationships between events more effectively than the conventional self-attention in transformers. We conducted comparison experiments, and the results show the RFCM outperforms a baseline method on two datasets.
△ Less
Submitted 19 July, 2022;
originally announced July 2022.
-
Case Relation Transformer: A Crossmodal Language Generation Model for Fetching Instructions
Authors:
Motonari Kambara,
Komei Sugiura
Abstract:
There have been many studies in robotics to improve the communication skills of domestic service robots. Most studies, however, have not fully benefited from recent advances in deep neural networks because the training datasets are not large enough. In this paper, our aim is to augment the datasets based on a crossmodal language generation model. We propose the Case Relation Transformer (CRT), whi…
▽ More
There have been many studies in robotics to improve the communication skills of domestic service robots. Most studies, however, have not fully benefited from recent advances in deep neural networks because the training datasets are not large enough. In this paper, our aim is to augment the datasets based on a crossmodal language generation model. We propose the Case Relation Transformer (CRT), which generates a fetching instruction sentence from an image, such as "Move the blue flip-flop to the lower left box." Unlike existing methods, the CRT uses the Transformer to integrate the visual features and geometry features of objects in the image. The CRT can handle the objects because of the Case Relation Block. We conducted comparison experiments and a human evaluation. The experimental results show the CRT outperforms baseline methods.
△ Less
Submitted 1 July, 2021;
originally announced July 2021.
-
Neutron Irradiation of MgB2 Bulk Superconductors
Authors:
M. Eisterer,
M. Zehetmayer,
S. Toenies,
H. W. Weber,
M. Kambara,
N. Hari Babu,
D. A. Cardwell,
L. R. Greenwood
Abstract:
Sintered samples of MgB2 were irradiated in a fission reactor. Defects in the bulk microstructure are produced during this process mainly by the 10B(n,a)7Li reaction while collisions of fast neutrons with the lattice atoms induce much less damage. Self-shielding effects turn out to be very important and lead to a highly inhomogeneous defect distribution in the irradiated samples. The resulting d…
▽ More
Sintered samples of MgB2 were irradiated in a fission reactor. Defects in the bulk microstructure are produced during this process mainly by the 10B(n,a)7Li reaction while collisions of fast neutrons with the lattice atoms induce much less damage. Self-shielding effects turn out to be very important and lead to a highly inhomogeneous defect distribution in the irradiated samples. The resulting disorder enhances the normal state resistivity and the upper critical field. The irreversibility line shifts to higher fields at low temperatures and the measured critical current densities increase following irradiation.
△ Less
Submitted 7 December, 2001;
originally announced December 2001.
-
Growth of Strongly Biaxially Aligned MgB2 Thin Films on Sapphire by Post-annealing of Amorphous Precursors
Authors:
A. Berenov,
Z. Lockman,
X. Qi,
Y. Bugoslavsky,
L. F. Cohen,
M. -H. Jo,
N. A. Stelmashenko,
V. N. Tsaneva,
M. Kambara,
N. Hari Babu,
D. A. Cardwell,
M. G. Blamire,
J. L. MacManus-Driscoll
Abstract:
MgB2 thin films were cold-grown on sapphire substrates by pulsed laser deposition (PLD), followed by post-annealing in mixed, reducing gas, Mg-rich, Zr gettered, environments. The films had Tcs in the range 29 K to 34 K, Jcs (20K, H=0) in the range 30 kA/cm2 to 300 kA/cm2, and irreversibility fields at 20 K of 4 T to 6.2 T. An inverse correlation was found between Tc and irreversibility field. T…
▽ More
MgB2 thin films were cold-grown on sapphire substrates by pulsed laser deposition (PLD), followed by post-annealing in mixed, reducing gas, Mg-rich, Zr gettered, environments. The films had Tcs in the range 29 K to 34 K, Jcs (20K, H=0) in the range 30 kA/cm2 to 300 kA/cm2, and irreversibility fields at 20 K of 4 T to 6.2 T. An inverse correlation was found between Tc and irreversibility field. The films had grain sizes of 0.1-1 micron and a strong biaxial alignment was observed in the 950C annealed film.
△ Less
Submitted 14 June, 2001;
originally announced June 2001.
-
Evidence for high inter-granular current flow in single-phase polycrystalline MgB2 superconductor
Authors:
K. Kawano,
J. S. Abell,
M. Kambara,
N Hari Babu,
D. A. Cardwell
Abstract:
The distribution of magnetic field in single-phase polycrystalline bulk MgB2 has been measured using a Magneto-Optical (MO) technique for an external magnetic field applied perpendicular to the sample surface. The MO studies indicate that an inter-granular current network is readily established in this material and the current is not limited by weak-linked grain boundaries. The grain boundaries…
▽ More
The distribution of magnetic field in single-phase polycrystalline bulk MgB2 has been measured using a Magneto-Optical (MO) technique for an external magnetic field applied perpendicular to the sample surface. The MO studies indicate that an inter-granular current network is readily established in this material and the current is not limited by weak-linked grain boundaries. The grain boundaries are observed to resist preferential magnetic field penetration, with the inter-grain mechanism dominating the current flow in the sample at temperatures up to 30K. The results provide clear evidence that the intra-granular current flow is isotropic. A critical current density of ~10^4 Acm-2 was estimated at 30K in a field of 150mT from the MO measurements. These results provide further evidence of the considerable potential for MgB2 for engineering applications.
△ Less
Submitted 6 April, 2001;
originally announced April 2001.
-
Penetration Depth Measurements in MgB_2: Evidence for Unconventional Superconductivity
Authors:
C. Panagopoulos,
B. D. Rainford,
T. Xiang,
C. A. Scott,
M. Kambara,
I. H. Inoue
Abstract:
We have measured the magnetic penetration depth of the recently discovered binary superconductor MgB_2 using muon spin rotation and low field $ac$-susceptibility. From the dam** of the muon precession signal we find the penetration depth at zero temperature is about 85nm. The low temperature penetration depth shows a quadratic temperature dependence, indicating the presence of nodes in the sup…
▽ More
We have measured the magnetic penetration depth of the recently discovered binary superconductor MgB_2 using muon spin rotation and low field $ac$-susceptibility. From the dam** of the muon precession signal we find the penetration depth at zero temperature is about 85nm. The low temperature penetration depth shows a quadratic temperature dependence, indicating the presence of nodes in the superconducting energy gap.
△ Less
Submitted 2 March, 2001;
originally announced March 2001.