Search | arXiv e-print repository

GePSAn: Generative Procedure Step Anticipation in Cooking Videos

Authors: Mohamed Ashraf Abdelsalam, Samrudhdhi B. Rangrej, Isma Hadji, Nikita Dvornik, Konstantinos G. Derpanis, Afsaneh Fazly

Abstract: We study the problem of future step anticipation in procedural videos. Given a video of an ongoing procedural activity, we predict a plausible next procedure step described in rich natural language. While most previous work focus on the problem of data scarcity in procedural video datasets, another core challenge of future anticipation is how to account for multiple plausible future realizations i… ▽ More We study the problem of future step anticipation in procedural videos. Given a video of an ongoing procedural activity, we predict a plausible next procedure step described in rich natural language. While most previous work focus on the problem of data scarcity in procedural video datasets, another core challenge of future anticipation is how to account for multiple plausible future realizations in natural settings. This problem has been largely overlooked in previous work. To address this challenge, we frame future step prediction as modelling the distribution of all possible candidates for the next step. Specifically, we design a generative model that takes a series of video clips as input, and generates multiple plausible and diverse candidates (in natural language) for the next step. Following previous work, we side-step the video annotation scarcity by pretraining our model on a large text-based corpus of procedural activities, and then transfer the model to the video domain. Our experiments, both in textual and video domains, show that our model captures diversity in the next step prediction and generates multiple plausible future predictions. Moreover, our model establishes new state-of-the-art results on YouCookII, where it outperforms existing baselines on the next step anticipation. Finally, we also show that our model can successfully transfer from text to the video domain zero-shot, ie, without fine-tuning or adaptation, and produces good-quality future step predictions from video. △ Less

Submitted 12 October, 2023; originally announced October 2023.

Comments: published at ICCV 2023

arXiv:2301.13503 [pdf, other]

Identical Bands Around the Isobaric Rare Earth Even-Even Nuclei with the Mass Number A = 164

Authors: M. A. Abdelsalam, H. A. Ghanim, M. Kotb, A. M. Khalaf

Abstract: Eight pairs of rare earth normally deformed nuclei around the isobaric nuclei with A = 164 and have identical values of F-spin have been studied. These pairs of identical bands cover 16 mass units and are classified. We suggested a theoretical collective rotational formula containing three parameters (CRF3) as an extended version of Bohr-Mottelson model to calculate the ground state positive parit… ▽ More Eight pairs of rare earth normally deformed nuclei around the isobaric nuclei with A = 164 and have identical values of F-spin have been studied. These pairs of identical bands cover 16 mass units and are classified. We suggested a theoretical collective rotational formula containing three parameters (CRF3) as an extended version of Bohr-Mottelson model to calculate the ground state positive parity excitation energies. Also, the sd-version of the interacting boson model (IBM) has been used to describe the nuclear shapes by using the intrinsic coherent-state. The optimized models parameters for each nucleus are adjusted by using a simulation search program to minimize the root mean square deviation between the theoretical calculation and experimental excitation energies. The best adopted model parameters of the CRF3 are used to calculate the rotational frequencies, the kinematic and dynamic moments of inertia and the evolution of with increasing hw are systematically analyzed. A smooth gradual increase in both moments of inertia was seen. The calculated results agree excellently with the experimental ones which give strong support to the suggested CRF3. The adopted IBM parameters are used to calculate the potential energy surfaces which describe the nuclear deformation. The correlation quantities which identify the IB are extracted, exhibit identical excitation energies and energy ratios in their ground state rotational bands. △ Less

Submitted 31 January, 2023; originally announced January 2023.

arXiv:2210.14862 [pdf, other]

Visual Semantic Parsing: From Images to Abstract Meaning Representation

Authors: Mohamed Ashraf Abdelsalam, Zhan Shi, Federico Fancellu, Kalliopi Basioti, Dhaivat J. Bhatt, Vladimir Pavlovic, Afsaneh Fazly

Abstract: The success of scene graphs for visual scene understanding has brought attention to the benefits of abstracting a visual input (e.g., image) into a structured representation, where entities (people and objects) are nodes connected by edges specifying their relations. Building these representations, however, requires expensive manual annotation in the form of images paired with their scene graphs o… ▽ More The success of scene graphs for visual scene understanding has brought attention to the benefits of abstracting a visual input (e.g., image) into a structured representation, where entities (people and objects) are nodes connected by edges specifying their relations. Building these representations, however, requires expensive manual annotation in the form of images paired with their scene graphs or frames. These formalisms remain limited in the nature of entities and relations they can capture. In this paper, we propose to leverage a widely-used meaning representation in the field of natural language processing, the Abstract Meaning Representation (AMR), to address these shortcomings. Compared to scene graphs, which largely emphasize spatial relationships, our visual AMR graphs are more linguistically informed, with a focus on higher-level semantic concepts extrapolated from visual input. Moreover, they allow us to generate meta-AMR graphs to unify information contained in multiple image descriptions under one representation. Through extensive experimentation and analysis, we demonstrate that we can re-purpose an existing text-to-AMR parser to parse images into AMRs. Our findings point to important future research directions for improved scene understanding. △ Less

Submitted 27 October, 2022; v1 submitted 26 October, 2022; originally announced October 2022.

Comments: published in CoNLL 2022

Showing 1–3 of 3 results for author: Abdelsalam, M A