-
Explainable Interfaces for Rapid Gaze-Based Interactions in Mixed Reality
Authors:
Mengjie Yu,
Dustin Harris,
Ian Jones,
Ting Zhang,
Yue Liu,
Naveen Sendhilnathan,
Narine Kokhlikyan,
Fulton Wang,
Co Tran,
Jordan L. Livingston,
Krista E. Taylor,
Zhenhong Hu,
Mary A. Hood,
Hrvoje Benko,
Tanya R. Jonker
Abstract:
Gaze-based interactions offer a potential way for users to naturally engage with mixed reality (XR) interfaces. Black-box machine learning models enabled higher accuracy for gaze-based interactions. However, due to the black-box nature of the model, users might not be able to understand and effectively adapt their gaze behaviour to achieve high quality interaction. We posit that explainable AI (XA…
▽ More
Gaze-based interactions offer a potential way for users to naturally engage with mixed reality (XR) interfaces. Black-box machine learning models enabled higher accuracy for gaze-based interactions. However, due to the black-box nature of the model, users might not be able to understand and effectively adapt their gaze behaviour to achieve high quality interaction. We posit that explainable AI (XAI) techniques can facilitate understanding of and interaction with gaze-based model-driven system in XR. To study this, we built a real-time, multi-level XAI interface for gaze-based interaction using a deep learning model, and evaluated it during a visual search task in XR. A between-subjects study revealed that participants who interacted with XAI made more accurate selections compared to those who did not use the XAI system (i.e., F1 score increase of 10.8%). Additionally, participants who used the XAI system adapted their gaze behavior over time to make more effective selections. These findings suggest that XAI can potentially be used to assist users in more effective collaboration with model-driven interactions in XR.
△ Less
Submitted 21 April, 2024;
originally announced April 2024.
-
Temporal-Spatial Processing of Event Camera Data via Delay-Loop Reservoir Neural Network
Authors:
Richard Lau,
Anthony Tylan-Tyler,
Lihan Yao,
Rey de Castro Roberto,
Robert Taylor,
Isaiah Jones
Abstract:
This paper describes a temporal-spatial model for video processing with special applications to processing event camera videos. We propose to study a conjecture motivated by our previous study of video processing with delay loop reservoir (DLR) neural network, which we call Temporal-Spatial Conjecture (TSC). The TSC postulates that there is significant information content carried in the temporal r…
▽ More
This paper describes a temporal-spatial model for video processing with special applications to processing event camera videos. We propose to study a conjecture motivated by our previous study of video processing with delay loop reservoir (DLR) neural network, which we call Temporal-Spatial Conjecture (TSC). The TSC postulates that there is significant information content carried in the temporal representation of a video signal and that machine learning algorithms would benefit from separate optimization of the spatial and temporal components for intelligent processing. To verify or refute the TSC, we propose a Visual Markov Model (VMM) which decompose the video into spatial and temporal components and estimate the mutual information (MI) of these components. Since computation of video mutual information is complex and time consuming, we use a Mutual Information Neural Network to estimate the bounds of the mutual information. Our result shows that the temporal component carries significant MI compared to that of the spatial component. This finding has often been overlooked in neural network literature. In this paper, we will exploit this new finding to guide our design of a delay-loop reservoir neural network for event camera classification, which results in a 18% improvement on classification accuracy.
△ Less
Submitted 12 February, 2024;
originally announced March 2024.
-
Algebraic Dynamical Systems in Machine Learning
Authors:
Iolo Jones,
Jerry Swan,
Jeffrey Giansiracusa
Abstract:
We introduce an algebraic analogue of dynamical systems, based on term rewriting. We show that a recursive function applied to the output of an iterated rewriting system defines a formal class of models into which all the main architectures for dynamic machine learning models (including recurrent neural networks, graph neural networks, and diffusion models) can be embedded. Considered in category…
▽ More
We introduce an algebraic analogue of dynamical systems, based on term rewriting. We show that a recursive function applied to the output of an iterated rewriting system defines a formal class of models into which all the main architectures for dynamic machine learning models (including recurrent neural networks, graph neural networks, and diffusion models) can be embedded. Considered in category theory, we also show that these algebraic models are a natural language for describing the compositionality of dynamic models. Furthermore, we propose that these models provide a template for the generalisation of the above dynamic models to learning problems on structured or non-numerical data, including 'hybrid symbolic-numeric' models.
△ Less
Submitted 6 November, 2023;
originally announced November 2023.
-
An Optimization Case Study for solving a Transport Robot Scheduling Problem on Quantum-Hybrid and Quantum-Inspired Hardware
Authors:
Dominik Leib,
Tobias Seidel,
Sven Jäger,
Raoul Heese,
Caitlin Isobel Jones,
Abhishek Awasthi,
Astrid Niederle,
Michael Bortz
Abstract:
We present a comprehensive case study comparing the performance of D-Waves' quantum-classical hybrid framework, Fujitsu's quantum-inspired digital annealer, and Gurobi's state-of-the-art classical solver in solving a transport robot scheduling problem. This problem originates from an industrially relevant real-world scenario. We provide three different models for our problem following different de…
▽ More
We present a comprehensive case study comparing the performance of D-Waves' quantum-classical hybrid framework, Fujitsu's quantum-inspired digital annealer, and Gurobi's state-of-the-art classical solver in solving a transport robot scheduling problem. This problem originates from an industrially relevant real-world scenario. We provide three different models for our problem following different design philosophies. In our benchmark, we focus on the solution quality and end-to-end runtime of different model and solver combinations. We find promising results for the digital annealer and some opportunities for the hybrid quantum annealer in direct comparison with Gurobi. Our study provides insights into the workflow for solving an application-oriented optimization problem with different strategies, and can be useful for evaluating the strengths and weaknesses of different approaches.
△ Less
Submitted 24 October, 2023; v1 submitted 18 September, 2023;
originally announced September 2023.
-
3DGen: Triplane Latent Diffusion for Textured Mesh Generation
Authors:
Anchit Gupta,
Wenhan Xiong,
Yixin Nie,
Ian Jones,
Barlas Oğuz
Abstract:
Latent diffusion models for image generation have crossed a quality threshold which enabled them to achieve mass adoption. Recently, a series of works have made advancements towards replicating this success in the 3D domain, introducing techniques such as point cloud VAE, triplane representation, neural implicit surfaces and differentiable rendering based training. We take another step along this…
▽ More
Latent diffusion models for image generation have crossed a quality threshold which enabled them to achieve mass adoption. Recently, a series of works have made advancements towards replicating this success in the 3D domain, introducing techniques such as point cloud VAE, triplane representation, neural implicit surfaces and differentiable rendering based training. We take another step along this direction, combining these developments in a two-step pipeline consisting of 1) a triplane VAE which can learn latent representations of textured meshes and 2) a conditional diffusion model which generates the triplane features. For the first time this architecture allows conditional and unconditional generation of high quality textured or untextured 3D meshes across multiple diverse categories in a few seconds on a single GPU. It outperforms previous work substantially on image-conditioned and unconditional generation on mesh quality as well as texture generation. Furthermore, we demonstrate the scalability of our model to large datasets for increased quality and diversity. We will release our code and trained models.
△ Less
Submitted 27 March, 2023; v1 submitted 9 March, 2023;
originally announced March 2023.
-
CLIP-Layout: Style-Consistent Indoor Scene Synthesis with Semantic Furniture Embedding
Authors:
**gyu Liu,
Wenhan Xiong,
Ian Jones,
Yixin Nie,
Anchit Gupta,
Barlas Oğuz
Abstract:
Indoor scene synthesis involves automatically picking and placing furniture appropriately on a floor plan, so that the scene looks realistic and is functionally plausible. Such scenes can serve as homes for immersive 3D experiences, or be used to train embodied agents. Existing methods for this task rely on labeled categories of furniture, e.g. bed, chair or table, to generate contextually relevan…
▽ More
Indoor scene synthesis involves automatically picking and placing furniture appropriately on a floor plan, so that the scene looks realistic and is functionally plausible. Such scenes can serve as homes for immersive 3D experiences, or be used to train embodied agents. Existing methods for this task rely on labeled categories of furniture, e.g. bed, chair or table, to generate contextually relevant combinations of furniture. Whether heuristic or learned, these methods ignore instance-level visual attributes of objects, and as a result may produce visually less coherent scenes. In this paper, we introduce an auto-regressive scene model which can output instance-level predictions, using general purpose image embedding based on CLIP. This allows us to learn visual correspondences such as matching color and style, and produce more functionally plausible and aesthetically pleasing scenes. Evaluated on the 3D-FRONT dataset, our model achieves SOTA results in scene synthesis and improves auto-completion metrics by over 50%. Moreover, our embedding-based approach enables zero-shot text-guided scene synthesis and editing, which easily generalizes to furniture not seen during training.
△ Less
Submitted 2 June, 2023; v1 submitted 6 March, 2023;
originally announced March 2023.
-
BERT for Long Documents: A Case Study of Automated ICD Coding
Authors:
Arash Afkanpour,
Shabir Adeel,
Hansenclever Bassani,
Arkady Epshteyn,
Hongbo Fan,
Isaac Jones,
Mahan Malihi,
Adrian Nauth,
Raj Sinha,
Sanjana Woonna,
Shiva Zamani,
Elli Kanal,
Mikhail Fomitchev,
Donny Cheung
Abstract:
Transformer models have achieved great success across many NLP problems. However, previous studies in automated ICD coding concluded that these models fail to outperform some of the earlier solutions such as CNN-based models. In this paper we challenge this conclusion. We present a simple and scalable method to process long text with the existing transformer models such as BERT. We show that this…
▽ More
Transformer models have achieved great success across many NLP problems. However, previous studies in automated ICD coding concluded that these models fail to outperform some of the earlier solutions such as CNN-based models. In this paper we challenge this conclusion. We present a simple and scalable method to process long text with the existing transformer models such as BERT. We show that this method significantly improves the previous results reported for transformer models in ICD coding, and is able to outperform one of the prominent CNN-based methods.
△ Less
Submitted 4 November, 2022;
originally announced November 2022.
-
Spatial Games of Fake News
Authors:
Matthew I Jones,
Scott D. Pauls,
Feng Fu
Abstract:
To curb the spread of fake news on social media platforms, recent studies have considered an online crowdsourcing fact-checking approach as one possible intervention method to reduce misinformation. However, it remains unclear under what conditions crowdsourcing fact-checking efforts deter the spread of misinformation. To address this issue, we model such distributed fact-checking as `peer policin…
▽ More
To curb the spread of fake news on social media platforms, recent studies have considered an online crowdsourcing fact-checking approach as one possible intervention method to reduce misinformation. However, it remains unclear under what conditions crowdsourcing fact-checking efforts deter the spread of misinformation. To address this issue, we model such distributed fact-checking as `peer policing' that will reduce the perceived payoff to share or disseminate false information (fake news) and also reward the spread of trustworthy information (real news). By simulating our model on synthetic square lattices and small-world networks, we show that the presence of social network structure enables fake news spreaders to be self-organized into echo chambers, thereby providing a boost to the efficacy of fake news and thus its resistance to fact-checking efforts. Additionally, to study our model in a more realistic setting, we utilize a Twitter network dataset and study the effectiveness of deliberately choosing specific individuals to be fact-checkers. We find that targeted fact-checking efforts can be highly effective, seeing the same level of success with as little as a fifth of the number of fact-checkers, but it depends on the structure of the network in question. In the limit of weak selection, we obtain closed-form analytical conditions for critical threshold of crowdsourced fact-checking in terms of the payoff values in our fact-checker/fake news game. Our work has practical implications for develo** model-based mitigation strategies for controlling the spread of misinformation that interferes with the political discourse.
△ Less
Submitted 8 June, 2022;
originally announced June 2022.
-
On functions computed on trees
Authors:
Roozbeh Farhoodi,
Khashayar Filom,
Ilenna Simone Jones,
Konrad Paul Kording
Abstract:
Any function can be constructed using a hierarchy of simpler functions through compositions. Such a hierarchy can be characterized by a binary rooted tree. Each node of this tree is associated with a function which takes as inputs two numbers from its children and produces one output. Since thinking about functions in terms of computation graphs is getting popular we may want to know which functio…
▽ More
Any function can be constructed using a hierarchy of simpler functions through compositions. Such a hierarchy can be characterized by a binary rooted tree. Each node of this tree is associated with a function which takes as inputs two numbers from its children and produces one output. Since thinking about functions in terms of computation graphs is getting popular we may want to know which functions can be implemented on a given tree. Here, we describe a set of necessary constraints in the form of a system of non-linear partial differential equations that must be satisfied. Moreover, we prove that these conditions are sufficient in both contexts of analytic and bit-valued functions. In the latter case, we explicitly enumerate discrete functions and observe that there are relatively few. Our point of view allows us to compare different neural network architectures in regard to their function spaces. Our work connects the structure of computation graphs with the functions they can implement and has potential applications to neuroscience and computer science.
△ Less
Submitted 22 October, 2019; v1 submitted 3 April, 2019;
originally announced April 2019.
-
Heroes and Zeroes: Predicting the Impact of New Video Games on Twitch.tv
Authors:
Isaac Jones,
Huan Liu
Abstract:
Video games and the playing thereof have been a fixture of American culture since their introduction in the arcades of the 1980s. However, it was not until the recent proliferation of broadband connections robust and fast enough to handle live video streaming that players of video games have transitioned from a content consumer role to a content producer role. Simultaneously, the rise of social me…
▽ More
Video games and the playing thereof have been a fixture of American culture since their introduction in the arcades of the 1980s. However, it was not until the recent proliferation of broadband connections robust and fast enough to handle live video streaming that players of video games have transitioned from a content consumer role to a content producer role. Simultaneously, the rise of social media has revealed how interpersonal connections drive user engagement and interest. In this work, we discuss the recent proliferation of video game streaming, particularly on Twitch.tv, analyze trends and patterns in video game viewing, and develop predictive models for determining if a new game will have substantial impact on the streaming ecosystem.
△ Less
Submitted 18 July, 2017;
originally announced July 2017.