-
OpenMask3D: Open-Vocabulary 3D Instance Segmentation
Authors:
Ayça Takmaz,
Elisabetta Fedele,
Robert W. Sumner,
Marc Pollefeys,
Federico Tombari,
Francis Engelmann
Abstract:
We introduce the task of open-vocabulary 3D instance segmentation. Current approaches for 3D instance segmentation can typically only recognize object categories from a pre-defined closed set of classes that are annotated in the training datasets. This results in important limitations for real-world applications where one might need to perform tasks guided by novel, open-vocabulary queries related…
▽ More
We introduce the task of open-vocabulary 3D instance segmentation. Current approaches for 3D instance segmentation can typically only recognize object categories from a pre-defined closed set of classes that are annotated in the training datasets. This results in important limitations for real-world applications where one might need to perform tasks guided by novel, open-vocabulary queries related to a wide variety of objects. Recently, open-vocabulary 3D scene understanding methods have emerged to address this problem by learning queryable features for each point in the scene. While such a representation can be directly employed to perform semantic segmentation, existing methods cannot separate multiple object instances. In this work, we address this limitation, and propose OpenMask3D, which is a zero-shot approach for open-vocabulary 3D instance segmentation. Guided by predicted class-agnostic 3D instance masks, our model aggregates per-mask features via multi-view fusion of CLIP-based image embeddings. Experiments and ablation studies on ScanNet200 and Replica show that OpenMask3D outperforms other open-vocabulary methods, especially on the long-tail distribution. Qualitative experiments further showcase OpenMask3D's ability to segment object properties based on free-form queries describing geometry, affordances, and materials.
△ Less
Submitted 29 October, 2023; v1 submitted 23 June, 2023;
originally announced June 2023.
-
Reference-less Analysis of Context Specificity in Translation with Personalised Language Models
Authors:
Sebastian Vincent,
Alice Dowek,
Rowanne Sumner,
Charlotte Blundell,
Emily Preston,
Chris Bayliss,
Chris Oakley,
Carolina Scarton
Abstract:
Sensitising language models (LMs) to external context helps them to more effectively capture the speaking patterns of individuals with specific characteristics or in particular environments. This work investigates to what extent rich character and film annotations can be leveraged to personalise LMs in a scalable manner. We then explore the use of such models in evaluating context specificity in m…
▽ More
Sensitising language models (LMs) to external context helps them to more effectively capture the speaking patterns of individuals with specific characteristics or in particular environments. This work investigates to what extent rich character and film annotations can be leveraged to personalise LMs in a scalable manner. We then explore the use of such models in evaluating context specificity in machine translation. We build LMs which leverage rich contextual information to reduce perplexity by up to 6.5% compared to a non-contextual model, and generalise well to a scenario with no speaker-specific data, relying on combinations of demographic characteristics expressed via metadata. Our findings are consistent across two corpora, one of which (Cornell-rich) is also a contribution of this paper. We then use our personalised LMs to measure the co-occurrence of extra-textual context and translation hypotheses in a machine translation setting. Our results suggest that the degree to which professional translations in our domain are context-specific can be preserved to a better extent by a contextual machine translation model than a non-contextual model, which is also reflected in the contextual model's superior reference-based scores.
△ Less
Submitted 5 March, 2024; v1 submitted 29 March, 2023;
originally announced March 2023.
-
Animating Sand, Mud, and Snow
Authors:
Robert W. Sumner,
James F. O'Brien,
Jessica K. Hodgins
Abstract:
Computer animations often lack the subtle environmental changes that should occur due to the actions of the characters. Squealing car tires usually leave no skid marks, airplanes rarely leave jet trails in the sky, and most runners leave no footprints. In this paper, we describe a simulation model of ground surfaces that can be deformed by the impact of rigid body models of animated characters. To…
▽ More
Computer animations often lack the subtle environmental changes that should occur due to the actions of the characters. Squealing car tires usually leave no skid marks, airplanes rarely leave jet trails in the sky, and most runners leave no footprints. In this paper, we describe a simulation model of ground surfaces that can be deformed by the impact of rigid body models of animated characters. To demonstrate the algorithms, we show footprints made by a runner in sand, mud, and snow as well as bicycle tire tracks, a bicycle crash, and a falling runner. The shapes of the footprints in the three surfaces are quite different, but the effects were controlled through only five essentially independent parameters. To assess the realism of the resulting motion, we compare the simulated footprints to human footprints in sand.
△ Less
Submitted 21 February, 2023; v1 submitted 16 February, 2023;
originally announced February 2023.
-
3D Segmentation of Humans in Point Clouds with Synthetic Data
Authors:
Ayça Takmaz,
Jonas Schult,
Irem Kaftan,
Mertcan Akçay,
Bastian Leibe,
Robert Sumner,
Francis Engelmann,
Siyu Tang
Abstract:
Segmenting humans in 3D indoor scenes has become increasingly important with the rise of human-centered robotics and AR/VR applications. To this end, we propose the task of joint 3D human semantic segmentation, instance segmentation and multi-human body-part segmentation. Few works have attempted to directly segment humans in cluttered 3D scenes, which is largely due to the lack of annotated train…
▽ More
Segmenting humans in 3D indoor scenes has become increasingly important with the rise of human-centered robotics and AR/VR applications. To this end, we propose the task of joint 3D human semantic segmentation, instance segmentation and multi-human body-part segmentation. Few works have attempted to directly segment humans in cluttered 3D scenes, which is largely due to the lack of annotated training data of humans interacting with 3D scenes. We address this challenge and propose a framework for generating training data of synthetic humans interacting with real 3D scenes. Furthermore, we propose a novel transformer-based model, Human3D, which is the first end-to-end model for segmenting multiple human instances and their body-parts in a unified manner. The key advantage of our synthetic data generation framework is its ability to generate diverse and realistic human-scene interactions, with highly accurate ground truth. Our experiments show that pre-training on synthetic data improves performance on a wide variety of 3D human segmentation tasks. Finally, we demonstrate that Human3D outperforms even task-specific state-of-the-art 3D segmentation methods.
△ Less
Submitted 18 August, 2023; v1 submitted 1 December, 2022;
originally announced December 2022.
-
Experiments as Code: A Concept for Reproducible, Auditable, Debuggable, Reusable, & Scalable Experiments
Authors:
Leonel Aguilar,
Michal Gath-Morad,
Jascha Grübel,
Jasper Ermatinger,
Hantao Zhao,
Stefan Wehrli,
Robert W. Sumner,
Ce Zhang,
Dirk Helbing,
Christoph Hölscher
Abstract:
A common concern in experimental research is the auditability and reproducibility of experiments. Experiments are usually designed, provisioned, managed, and analyzed by diverse teams of specialists (e.g., researchers, technicians and engineers) and may require many resources (e.g. cloud infrastructure, specialized equipment). Even though researchers strive to document experiments accurately, this…
▽ More
A common concern in experimental research is the auditability and reproducibility of experiments. Experiments are usually designed, provisioned, managed, and analyzed by diverse teams of specialists (e.g., researchers, technicians and engineers) and may require many resources (e.g. cloud infrastructure, specialized equipment). Even though researchers strive to document experiments accurately, this process is often lacking, making it hard to reproduce them. Moreover, when it is necessary to create a similar experiment, very often we end up "reinventing the wheel" as it is easier to start from scratch than trying to reuse existing work, thus losing valuable embedded best practices and previous experiences. In behavioral studies this has contributed to the reproducibility crisis. To tackle this challenge, we propose the "Experiments as Code" paradigm, where the whole experiment is not only documented but additionally the automation code to provision, deploy, manage, and analyze it is provided. To this end we define the Experiments as Code concept, provide a taxonomy for the components of a practical implementation, and provide a proof of concept with a simple desktop VR experiment that showcases the benefits of its "as code" representation, i.e., reproducibility, auditability, debuggability, reusability, and scalability.
△ Less
Submitted 24 February, 2022;
originally announced February 2022.
-
The Hitchhiker's Guide to Fused Twins: A Review of Access to Digital Twins in situ in Smart Cities
Authors:
Jascha Grübel,
Tyler Thrash,
Leonel Aguilar,
Michal Gath-Morad,
Julia Chatain,
Robert W. Sumner,
Christoph Hölscher,
Victor R. Schinazi
Abstract:
Smart Cities already surround us, and yet they are still incomprehensibly far from directly impacting everyday life. While current Smart Cities are often inaccessible, the experience of everyday citizens may be enhanced with a combination of the emerging technologies Digital Twins (DTs) and Situated Analytics. DTs represent their Physical Twin (PT) in the real world via models, simulations, (remot…
▽ More
Smart Cities already surround us, and yet they are still incomprehensibly far from directly impacting everyday life. While current Smart Cities are often inaccessible, the experience of everyday citizens may be enhanced with a combination of the emerging technologies Digital Twins (DTs) and Situated Analytics. DTs represent their Physical Twin (PT) in the real world via models, simulations, (remotely) sensed data, context awareness, and interactions. However, interaction requires appropriate interfaces to address the complexity of the city. Ultimately, leveraging the potential of Smart Cities requires going beyond assembling the DT to be comprehensive and accessible. Situated Analytics allows for the anchoring of city information in its spatial context. We advance the concept of embedding the DT into the PT through Situated Analytics to form Fused Twins (FTs). This fusion allows access to data in the location that it is generated in an embodied context that can make the data more understandable. Prototypes of FTs are rapidly emerging from different domains, but Smart Cities represent the context with the most potential for FTs in the future. This paper reviews DTs, Situated Analytics, and Smart Cities as the foundations of FTs. Regarding DTs, we define five components (Physical, Data, Analytical, Virtual, and Connection environments) that we relate to several cognates (i.e., similar but different terms) from existing literature. Regarding Situated Analytics, we review the effects of user embodiment on cognition and cognitive load. Finally, we classify existing partial examples of FTs from the literature and address their construction from Augmented Reality, Geographic Information Systems, Building/City Information Models, and DTs and provide an overview of future direction
△ Less
Submitted 8 June, 2022; v1 submitted 14 February, 2022;
originally announced February 2022.
-
Emergence of integrated institutions in a large population of self-governing communities
Authors:
Seth Frey,
Robert W Sumner
Abstract:
Most aspects of our lives are governed by large, highly developed institutions that integrate several governance tasks under one authority structure. But theorists differ as to the mechanisms that drive the development of such concentrated governance systems from rudimentary beginnings. Is the emergence of integrated governance schemes a symptom of consolidation of authority by small status groups…
▽ More
Most aspects of our lives are governed by large, highly developed institutions that integrate several governance tasks under one authority structure. But theorists differ as to the mechanisms that drive the development of such concentrated governance systems from rudimentary beginnings. Is the emergence of integrated governance schemes a symptom of consolidation of authority by small status groups? Or does integration occur because a complex institution has more potential responses to a complex environment? Here we examine the emergence of complex governance regimes in 5,000 sovereign, resource-constrained, self-governing online communities, ranging in scale from one to thousands of users. Each community begins with no community members and no governance infrastructure. As communities grow, they are subject to selection pressures that keep better managed servers better populated. We identify predictors of community success and test the hypothesis that governance complexity can enhance community fitness. We find that what predicts success depends on size: changes in complexity predict increased success with larger population servers. Specifically, governance rules in a large successful community are more numerous and broader in scope. They also tend to rely more on rules that concentrate power in administrators, and on rules that manage bad behavior and limited server resources. Overall, this work is consistent with theories that formal integrated governance systems emerge to organize collective responses to interdependent resource management problems, especially as factors such as population size exacerbate those problems.
△ Less
Submitted 11 July, 2019; v1 submitted 26 April, 2018;
originally announced April 2018.
-
Towards a living earth simulator
Authors:
M. Paolucci,
D. Kossman,
R. Conte,
P. Lukowicz,
P. Argyrakis,
A. Blandford,
G. Bonelli,
S. Anderson,
S. de Freitas,
B. Edmonds,
N. Gilbert,
M. Gross,
J. Kohlhammer,
P. Koumoutsakos,
A. Krause,
B. -O. Linnér,
P. Slusallek,
O. Sorkine,
R. W. Sumner,
D. Helbing
Abstract:
The Living Earth Simulator (LES) is one of the core components of the FuturICT architecture. It will work as a federation of methods, tools, techniques and facilities supporting all of the FuturICT simulation-related activities to allow and encourage interactive exploration and understanding of societal issues. Society-relevant problems will be targeted by leaning on approaches based on complex sy…
▽ More
The Living Earth Simulator (LES) is one of the core components of the FuturICT architecture. It will work as a federation of methods, tools, techniques and facilities supporting all of the FuturICT simulation-related activities to allow and encourage interactive exploration and understanding of societal issues. Society-relevant problems will be targeted by leaning on approaches based on complex systems theories and data science in tight interaction with the other components of FuturICT. The LES will evaluate and provide answers to real-world questions by taking into account multiple scenarios. It will build on present approaches such as agent-based simulation and modeling, multiscale modelling, statistical inference, and data mining, moving beyond disciplinary borders to achieve a new perspective on complex social systems.
△ Less
Submitted 6 April, 2013;
originally announced April 2013.
-
Topological Grammars for Data Approximation
Authors:
A. N. Gorban,
N. R. Sumner,
A. Y. Zinovyev
Abstract:
A method of {\it topological grammars} is proposed for multidimensional data approximation. For data with complex topology we define a {\it principal cubic complex} of low dimension and given complexity that gives the best approximation for the dataset. This complex is a generalization of linear and non-linear principal manifolds and includes them as particular cases. The problem of optimal prin…
▽ More
A method of {\it topological grammars} is proposed for multidimensional data approximation. For data with complex topology we define a {\it principal cubic complex} of low dimension and given complexity that gives the best approximation for the dataset. This complex is a generalization of linear and non-linear principal manifolds and includes them as particular cases. The problem of optimal principal complex construction is transformed into a series of minimization problems for quadratic functionals. These quadratic functionals have a physically transparent interpretation in terms of elastic energy. For the energy computation, the whole complex is represented as a system of nodes and springs. Topologically, the principal complex is a product of one-dimensional continuums (represented by graphs), and the grammars describe how these continuums transform during the process of optimal complex construction. This factorization of the whole process onto one-dimensional transformations using minimization of quadratic energy functionals allow us to construct efficient algorithms.
△ Less
Submitted 28 July, 2006; v1 submitted 22 March, 2006;
originally announced March 2006.