Search | arXiv e-print repository

GraphEval2000: Benchmarking and Improving Large Language Models on Graph Datasets

Authors: Qiming Wu, Zichen Chen, Will Corcoran, Misha Sra, Ambuj K. Singh

Abstract: Large language models (LLMs) have achieved remarkable success in natural language processing (NLP), demonstrating significant capabilities in processing and understanding text data. However, recent studies have identified limitations in LLMs' ability to reason about graph-structured data. To address this gap, we introduce GraphEval2000, the first comprehensive graph dataset, comprising 40 graph da… ▽ More Large language models (LLMs) have achieved remarkable success in natural language processing (NLP), demonstrating significant capabilities in processing and understanding text data. However, recent studies have identified limitations in LLMs' ability to reason about graph-structured data. To address this gap, we introduce GraphEval2000, the first comprehensive graph dataset, comprising 40 graph data structure problems along with 2000 test cases. Additionally, we introduce an evaluation framework based on GraphEval2000, designed to assess the graph reasoning abilities of LLMs through coding challenges. Our dataset categorizes test cases into four primary and four sub-categories, ensuring a comprehensive evaluation. We evaluate eight popular LLMs on GraphEval2000, revealing that LLMs exhibit a better understanding of directed graphs compared to undirected ones. While private LLMs consistently outperform open-source models, the performance gap is narrowing. Furthermore, to improve the usability of our evaluation framework, we propose Structured Symbolic Decomposition (SSD), an instruction-based method designed to enhance LLM performance on GraphEval2000. Results show that SSD improves the performance of GPT-3.5, GPT-4, and GPT-4o on complex graph problems, with an increase of 11.11\%, 33.37\%, and 33.37\%, respectively. △ Less

Submitted 23 June, 2024; originally announced June 2024.

Comments: Submitted to NeurIPs 2024 Dataset and Benchmark track, under review

MSC Class: H.2.8; I.2.6; I.2.7

arXiv:2406.15928 [pdf, other]

doi 10.3389/frvir.2023.1252551

EntangleVR++: Evaluating the Potential of using Entanglement in an Interactive VR Scene Creation System

Authors: Mengyu Chen, Marko Peljhan, Misha Sra

Abstract: Interactive digital stories provide a sense of flexibility and freedom to players by allowing them to make choices at key junctions. These choices advance the narrative and determine, to some degree, how the story evolves for that player. As shown in prior work, the ability to control or participate in the construction of the narrative can give the player a high level of agency that results in a s… ▽ More Interactive digital stories provide a sense of flexibility and freedom to players by allowing them to make choices at key junctions. These choices advance the narrative and determine, to some degree, how the story evolves for that player. As shown in prior work, the ability to control or participate in the construction of the narrative can give the player a high level of agency that results in a stronger sense of immersion in the narrative experience. To support the design of this type of interactive storytelling, our system, EntangleVR++, borrows the idea of entanglement from quantum computing. Our use of entanglement allows creators and storytellers control over which sequences of story events take place in correlation with each other, initiated by the choices a player makes. In this work, we evaluated how well our idea of entanglement enables creators to easily and quickly design interactive VR narratives. We asked 16 participants to use our system and based on user interviews, analyses of screen recordings, and questionnaire feedback, we extracted four themes. From these themes and the study overall, we derived four authoring strategies for tool designers interested in the design of future visual interface for interactively creating virtual scenes that include relational objects and multiple outcomes driven by player interactions. △ Less

Submitted 22 June, 2024; originally announced June 2024.

Comments: Preprint for Frontiers in Virtual Reality, December 2023

ACM Class: H.5.1

Journal ref: Front. Virtual Real. 4:1252551 (2023)

arXiv:2406.15889 [pdf, other]

doi 10.1109/VR58804.2024.00051

ConnectVR: A Trigger-Action Interface for Creating Agent-based Interactive VR Stories

Authors: Mengyu Chen, Marko Peljhan, Misha Sra

Abstract: The demand for interactive narratives is growing with increasing popularity of VR and video gaming. This presents an opportunity to create interactive storytelling experiences that allow players to engage with a narrative from a first person perspective, both, immersively in VR and in 3D on a computer. However, for artists and storytellers without programming experience, authoring such experiences… ▽ More The demand for interactive narratives is growing with increasing popularity of VR and video gaming. This presents an opportunity to create interactive storytelling experiences that allow players to engage with a narrative from a first person perspective, both, immersively in VR and in 3D on a computer. However, for artists and storytellers without programming experience, authoring such experiences is a particularly complex task as it involves coding a series of story events (character animation, movements, time control, dialogues, etc.) to be connected and triggered by a variety of player behaviors. In this work, we present ConnectVR, a trigger-action interface to enable non-technical creators design agent-based narrative experiences. Our no-code authoring method specifically focuses on the design of narratives driven by a series of cause-effect relationships triggered by the player's actions. We asked 15 participants to use ConnectVR in a preliminary workshop study as well as two artists to extensively use our system to create VR narrative projects in a three-week in-depth study. Our findings shed light on the creative opportunities facilitated by ConnectVR's trigger-action approach, particularly its capability to establish chained behavioral effects between virtual characters and objects. The results of both studies underscore the positive feedback from participants regarding our system's capacity to not only support creativity but also to simplify the creation of interactive narrative experiences. Results indicate compatibility with non-technical narrative creator's workflows, showcasing its potential to enhance the overall creative process in the realm of VR narrative design. △ Less

Submitted 22 June, 2024; originally announced June 2024.

Comments: Preprint for 2024 IEEE Conference Virtual Reality and 3D User Interfaces (VR)

ACM Class: H.5.1

Journal ref: in 2024 IEEE Conference Virtual Reality and 3D User Interfaces (VR), Orlando, FL, USA, 2024 pp. 286-297

arXiv:2406.14373 [pdf, other]

Artificial Leviathan: Exploring Social Evolution of LLM Agents Through the Lens of Hobbesian Social Contract Theory

Authors: Gordon Dai, Weijia Zhang, **han Li, Siqi Yang, Chidera Onochie lbe, Srihas Rao, Arthur Caetano, Misha Sra

Abstract: The emergence of Large Language Models (LLMs) and advancements in Artificial Intelligence (AI) offer an opportunity for computational social science research at scale. Building upon prior explorations of LLM agent design, our work introduces a simulated agent society where complex social relationships dynamically form and evolve over time. Agents are imbued with psychological drives and placed in… ▽ More The emergence of Large Language Models (LLMs) and advancements in Artificial Intelligence (AI) offer an opportunity for computational social science research at scale. Building upon prior explorations of LLM agent design, our work introduces a simulated agent society where complex social relationships dynamically form and evolve over time. Agents are imbued with psychological drives and placed in a sandbox survival environment. We conduct an evaluation of the agent society through the lens of Thomas Hobbes's seminal Social Contract Theory (SCT). We analyze whether, as the theory postulates, agents seek to escape a brutish "state of nature" by surrendering rights to an absolute sovereign in exchange for order and security. Our experiments unveil an alignment: Initially, agents engage in unrestrained conflict, mirroring Hobbes's depiction of the state of nature. However, as the simulation progresses, social contracts emerge, leading to the authorization of an absolute sovereign and the establishment of a peaceful commonwealth founded on mutual cooperation. This congruence between our LLM agent society's evolutionary trajectory and Hobbes's theoretical account indicates LLMs' capability to model intricate social dynamics and potentially replicate forces that shape human societies. By enabling such insights into group behavior and emergent societal phenomena, LLM-driven multi-agent simulations, while unable to simulate all the nuances of human behavior, may hold potential for advancing our understanding of social structures, group dynamics, and complex human systems. △ Less

Submitted 1 July, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

arXiv:2405.17827 [pdf, other]

doi 10.1145/3643834.3661594

DanceGen: Supporting Choreography Ideation and Prototy** with Generative AI

Authors: Yimeng Liu, Misha Sra

Abstract: Choreography creation requires high proficiency in artistic and technical skills. Choreographers typically go through four stages to create a dance piece: preparation, studio, performance, and reflection. This process is often individualized, complicated, and challenging due to multiple constraints at each stage. To assist choreographers, most prior work has focused on designing digital tools to s… ▽ More Choreography creation requires high proficiency in artistic and technical skills. Choreographers typically go through four stages to create a dance piece: preparation, studio, performance, and reflection. This process is often individualized, complicated, and challenging due to multiple constraints at each stage. To assist choreographers, most prior work has focused on designing digital tools to support the last three stages of the choreography process, with the preparation stage being the least explored. To address this research gap, we introduce an AI-based approach to assist the preparation stage by supporting ideation, creating choreographic prototypes, and documenting creative attempts and outcomes. We address the limitations of existing AI-based motion generation methods for ideation by allowing generated sequences to be edited and modified in an interactive web interface. This capability is motivated by insights from a formative study we conducted with seven choreographers. We evaluated our system's functionality, benefits, and limitations with six expert choreographers. Results highlight the usability of our system, with users reporting increased efficiency, expanded creative possibilities, and an enhanced iterative process. We also identified areas for improvement, such as the relationship between user intent and AI outcome, intuitive and flexible user interaction design, and integration with existing physical choreography prototy** workflows. By reflecting on the evaluation results, we present three insights that aim to inform the development of future AI systems that can empower choreographers. △ Less

Submitted 28 May, 2024; originally announced May 2024.

Comments: ACM Conference on Designing Interactive Systems (DIS '24)

arXiv:2404.11120 [pdf, other]

TiNO-Edit: Timestep and Noise Optimization for Robust Diffusion-Based Image Editing

Authors: Sherry X. Chen, Yaron Vaxman, Elad Ben Baruch, David Asulin, Aviad Moreshet, Kuo-Chin Lien, Misha Sra, Pradeep Sen

Abstract: Despite many attempts to leverage pre-trained text-to-image models (T2I) like Stable Diffusion (SD) for controllable image editing, producing good predictable results remains a challenge. Previous approaches have focused on either fine-tuning pre-trained T2I models on specific datasets to generate certain kinds of images (e.g., with a specific object or person), or on optimizing the weights, text… ▽ More Despite many attempts to leverage pre-trained text-to-image models (T2I) like Stable Diffusion (SD) for controllable image editing, producing good predictable results remains a challenge. Previous approaches have focused on either fine-tuning pre-trained T2I models on specific datasets to generate certain kinds of images (e.g., with a specific object or person), or on optimizing the weights, text prompts, and/or learning features for each input image in an attempt to coax the image generator to produce the desired result. However, these approaches all have shortcomings and fail to produce good results in a predictable and controllable manner. To address this problem, we present TiNO-Edit, an SD-based method that focuses on optimizing the noise patterns and diffusion timesteps during editing, something previously unexplored in the literature. With this simple change, we are able to generate results that both better align with the original images and reflect the desired result. Furthermore, we propose a set of new loss functions that operate in the latent domain of SD, greatly speeding up the optimization when compared to prior approaches, which operate in the pixel domain. Our method can be easily applied to variations of SD including Textual Inversion and DreamBooth that encode new concepts and incorporate them into the edited results. We present a host of image-editing capabilities enabled by our approach. Our code is publicly available at https://github.com/SherryXTChen/TiNO-Edit. △ Less

Submitted 17 April, 2024; originally announced April 2024.

Comments: Conference on Computer Vision and Pattern Recognition (CVPR) 2024

arXiv:2402.13123 [pdf, other]

doi 10.1145/3640544.3645227

Exploring AI-assisted Ideation and Prototy** for Choreography

Authors: Yimeng Liu, Misha Sra

Abstract: Choreography creation is a multimodal endeavor, demanding cognitive abilities to develop creative ideas and technical expertise to convert choreographic ideas into physical dance movements. Previous endeavors have sought to reduce the complexities in the choreography creation process in both dimensions. Among them, non-AI-based systems have focused on reinforcing cognitive activities by hel** an… ▽ More Choreography creation is a multimodal endeavor, demanding cognitive abilities to develop creative ideas and technical expertise to convert choreographic ideas into physical dance movements. Previous endeavors have sought to reduce the complexities in the choreography creation process in both dimensions. Among them, non-AI-based systems have focused on reinforcing cognitive activities by hel** analyze and understand dance movements and augmenting physical capabilities by enhancing body expressivity. On the other hand, AI-based methods have helped the creation of novel choreographic materials with generative AI algorithms. The choreography creation process is constrained by time and requires a rich set of resources to stimulate novel ideas, but the need for iterative prototy** and reduced physical dependence have not been adequately addressed by prior research. Recognizing these challenges and the research gap, we present an innovative AI-based choreography-support system. Our goal is to facilitate rapid ideation by utilizing a generative AI model that can produce diverse and novel dance sequences. The system is designed to support iterative digital dance prototy** through an interactive web-based user interface that enables the editing and modification of generated motion. We evaluated our system by inviting six choreographers to analyze its limitations and benefits and present the evaluation results along with potential directions for future work. △ Less

Submitted 20 February, 2024; originally announced February 2024.

arXiv:2311.08614 [pdf, other]

XplainLLM: A QA Explanation Dataset for Understanding LLM Decision-Making

Authors: Zichen Chen, Jianda Chen, Mitali Gaidhani, Ambuj Singh, Misha Sra

Abstract: Large Language Models (LLMs) have recently made impressive strides in natural language understanding tasks. Despite their remarkable performance, understanding their decision-making process remains a big challenge. In this paper, we look into bringing some transparency to this process by introducing a new explanation dataset for question answering (QA) tasks that integrates knowledge graphs (KGs)… ▽ More Large Language Models (LLMs) have recently made impressive strides in natural language understanding tasks. Despite their remarkable performance, understanding their decision-making process remains a big challenge. In this paper, we look into bringing some transparency to this process by introducing a new explanation dataset for question answering (QA) tasks that integrates knowledge graphs (KGs) in a novel way. Our dataset includes 12,102 question-answer-explanation (QAE) triples. Each explanation in the dataset links the LLM's reasoning to entities and relations in the KGs. The explanation component includes a why-choose explanation, a why-not-choose explanation, and a set of reason-elements that underlie the LLM's decision. We leverage KGs and graph attention networks (GAT) to find the reason-elements and transform them into why-choose and why-not-choose explanations that are comprehensible to humans. Through quantitative and qualitative evaluations, we demonstrate the potential of our dataset to improve the in-context learning of LLMs, and enhance their interpretability and explainability. Our work contributes to the field of explainable AI by enabling a deeper understanding of the LLMs decision-making process to make them more transparent and thereby, potentially more reliable, to researchers and practitioners alike. Our dataset is available at: https://github.com/chen-zichen/XplainLLM_dataset.git △ Less

Submitted 14 November, 2023; originally announced November 2023.

Comments: 17 pages, 6 figures, 7 tables. Our dataset is available at: https://github.com/chen-zichen/XplainLLM_dataset.git

arXiv:2303.16537 [pdf, other]

LMExplainer: a Knowledge-Enhanced Explainer for Language Models

Authors: Zichen Chen, Ambuj K Singh, Misha Sra

Abstract: Large language models (LLMs) such as GPT-4 are very powerful and can process different kinds of natural language processing (NLP) tasks. However, it can be difficult to interpret the results due to the multi-layer nonlinear model structure and millions of parameters. A lack of clarity and understanding of how the language models (LMs) work can make them unreliable, difficult to trust, and potentia… ▽ More Large language models (LLMs) such as GPT-4 are very powerful and can process different kinds of natural language processing (NLP) tasks. However, it can be difficult to interpret the results due to the multi-layer nonlinear model structure and millions of parameters. A lack of clarity and understanding of how the language models (LMs) work can make them unreliable, difficult to trust, and potentially dangerous for use in real-world scenarios. Most recent works exploit attention weights to provide explanations for LM predictions. However, pure attention-based explanations are unable to support the growing complexity of LMs, and cannot reason about their decision-making processes. We propose LMExplainer, a knowledge-enhanced explainer for LMs that can provide human-understandable explanations. We use a knowledge graph (KG) and a graph attention neural network to extract the key decision signals of the LM. We further explore whether interpretation can also help the AI understand the task better. Our experimental results show that LMExplainer outperforms existing LM+KG methods on CommonsenseQA and OpenBookQA. We compare the explanation results with generated explanation methods and human-annotated results. The comparison shows our method can provide more comprehensive and clearer explanations. LMExplainer demonstrates the potential to enhance model performance and furnish explanations for the LM reasoning process in natural language. △ Less

Submitted 3 August, 2023; v1 submitted 29 March, 2023; originally announced March 2023.

Comments: 12 pages, 1 figure, 7 tables, and 3 case studies

arXiv:2303.06277 [pdf, other]

SPOTR: Spatio-temporal Pose Transformers for Human Motion Prediction

Authors: Avinash Ajit Nargund, Misha Sra

Abstract: 3D human motion prediction is a research area of high significance and a challenge in computer vision. It is useful for the design of many applications including robotics and autonomous driving. Traditionally, autogregressive models have been used to predict human motion. However, these models have high computation needs and error accumulation that make it difficult to use them for realtime applic… ▽ More 3D human motion prediction is a research area of high significance and a challenge in computer vision. It is useful for the design of many applications including robotics and autonomous driving. Traditionally, autogregressive models have been used to predict human motion. However, these models have high computation needs and error accumulation that make it difficult to use them for realtime applications. In this paper, we present a non-autogressive model for human motion prediction. We focus on learning spatio-temporal representations non-autoregressively for generation of plausible future motions. We propose a novel architecture that leverages the recently proposed Transformers. Human motion involves complex spatio-temporal dynamics with joints affecting the position and rotation of each other even though they are not connected directly. The proposed model extracts these dynamics using both convolutions and the self-attention mechanism. Using specialized spatial and temporal self-attention to augment the features extracted through convolution allows our model to generate spatio-temporally coherent predictions in parallel independent of the activity. Our contributions are threefold: (i) we frame human motion prediction as a sequence-to-sequence problem and propose a non-autoregressive Transformer to forecast a sequence of poses in parallel; (ii) our method is activity agnostic; (iii) we show that despite its simplicity, our approach is able to make accurate predictions, achieving better or comparable results compared to the state-of-the-art on two public datasets, with far fewer parameters and much faster inference. △ Less

Submitted 10 March, 2023; originally announced March 2023.

arXiv:2210.16785 [pdf, other]

doi 10.1109/ISMAR55827.2022.00070

CardsVR: A Two-Person VR Experience with Passive Haptic Feedback from a Deck of Playing Cards

Authors: Andrew Huard, Mengyu Chen, Misha Sra

Abstract: Presence in virtual reality (VR) is meaningful for remotely connecting with others and facilitating social interactions despite great distance while providing a sense of "being there." This work presents CardsVR, a two-person VR experience that allows remote participants to play a game of cards together. An entire deck of tracked cards are used to recreate the sense of playing cards in-person. Pri… ▽ More Presence in virtual reality (VR) is meaningful for remotely connecting with others and facilitating social interactions despite great distance while providing a sense of "being there." This work presents CardsVR, a two-person VR experience that allows remote participants to play a game of cards together. An entire deck of tracked cards are used to recreate the sense of playing cards in-person. Prior work in VR commonly provides passive haptic feedback either through a single object or through static objects in the environment. CardsVR is novel in providing passive haptic feedback through multiple cards that are individually tracked and represented in the virtual environment. Participants interact with the physical cards by picking them up, holding them, playing them, or moving them on the physical table. Our participant study (N=23) shows that passive haptic feedback provides significant improvement in three standard measures of presence: Possibility to Act, Realism, and Haptics. △ Less

Submitted 30 October, 2022; originally announced October 2022.

arXiv:2207.04508 [pdf]

Adaptive Virtual Neuroarchitecture

Authors: Abhinandan Jain, Pattie Maes, Misha Sra

Abstract: Our surrounding environment impacts our cognitive-emotional processes on a daily basis and shapes our physical, psychological and social wellbeing. Although the effects of the built environment on our psycho-physiological processes are well studied, virtual environment design with a potentially similar impact on the user, has received limited attention. Based on the influence of space design on a… ▽ More Our surrounding environment impacts our cognitive-emotional processes on a daily basis and shapes our physical, psychological and social wellbeing. Although the effects of the built environment on our psycho-physiological processes are well studied, virtual environment design with a potentially similar impact on the user, has received limited attention. Based on the influence of space design on a user and combining that with the dynamic affordances of virtual spaces, we present the idea of adaptive virtual neuroarchitecture (AVN), where virtual environments respond to the user and the user's real world context while simultaneously influencing them both in realtime. To show how AVN has been explored in current research, we present a sampling of recent work that demonstrates reciprocal relationships using physical affordances (space, objects), the user's state (physiological, cognitive, emotional), and the virtual world used in the design of novel virtual reality experiences. We believe AVN has the potential to help us learn how to design spaces and environments that can enhance the wellbeing of their inhabitants. △ Less

Submitted 10 July, 2022; originally announced July 2022.

arXiv:2110.02950 [pdf, other]

Self-Supervised Knowledge Assimilation for Expert-Layman Text Style Transfer

Authors: Wenda Xu, Michael Saxon, Misha Sra, William Yang Wang

Abstract: Expert-layman text style transfer technologies have the potential to improve communication between members of scientific communities and the general public. High-quality information produced by experts is often filled with difficult jargon laypeople struggle to understand. This is a particularly notable issue in the medical domain, where layman are often confused by medical text online. At present… ▽ More Expert-layman text style transfer technologies have the potential to improve communication between members of scientific communities and the general public. High-quality information produced by experts is often filled with difficult jargon laypeople struggle to understand. This is a particularly notable issue in the medical domain, where layman are often confused by medical text online. At present, two bottlenecks interfere with the goal of building high-quality medical expert-layman style transfer systems: a dearth of pretrained medical-domain language models spanning both expert and layman terminologies and a lack of parallel corpora for training the transfer task itself. To mitigate the first issue, we propose a novel language model (LM) pretraining task, Knowledge Base Assimilation, to synthesize pretraining data from the edges of a graph of expert- and layman-style medical terminology terms into an LM during self-supervised learning. To mitigate the second issue, we build a large-scale parallel corpus in the medical expert-layman domain using a margin-based criterion. Our experiments show that transformer-based models pretrained on knowledge base assimilation and other well-established pretraining tasks fine-tuning on our new parallel corpus leads to considerable improvement against expert-layman transfer benchmarks, gaining an average relative improvement of our human evaluation, the Overall Success Rate (OSR), by 106%. We release our code and parallel corpus for future research. △ Less

Submitted 18 December, 2021; v1 submitted 6 October, 2021; originally announced October 2021.

Comments: 12 pages, 8 tables, 3 figures. AAAI 2022 Conference Paper

arXiv:2109.01186 [pdf, other]

doi 10.1145/3458709.3458946

Exploratory Design of a Hands-free Video Game Controller for a Quadriplegic Individual

Authors: Atieh Taheri, Ziv Weissman, Misha Sra

Abstract: From colored pixels to hyper-realistic 3D landscapes of virtual reality, video games have evolved immensely over the last few decades. However, video game input still requires two-handed dexterous finger manipulations for simultaneous joystick and trigger or mouse and keyboard presses. In this work, we explore the design of a hands-free game control method using realtime facial expression recognit… ▽ More From colored pixels to hyper-realistic 3D landscapes of virtual reality, video games have evolved immensely over the last few decades. However, video game input still requires two-handed dexterous finger manipulations for simultaneous joystick and trigger or mouse and keyboard presses. In this work, we explore the design of a hands-free game control method using realtime facial expression recognition for individuals with neurological and neuromuscular diseases who are unable to use traditional game controllers. Similar to other Assistive Technologies (AT), our facial input technique is also designed and tested in collaboration with a graduate student who has Spinal Muscular Atrophy. Our preliminary evaluation shows the potential of facial expression recognition for augmenting the lives of quadriplegic individuals by enabling them to accomplish things like walking, running, flying or other adventures that may not be so attainable otherwise. △ Less

Submitted 2 September, 2021; originally announced September 2021.

Comments: Published in: Augmented Humans Conference 2021

arXiv:2108.12661 [pdf, other]

SceneAR: Scene-based Micro Narratives for Sharing and Remixing in Augmented Reality

Authors: Mengyu Chen, Andrés Monroy-Hernández, Misha Sra

Abstract: Short-form digital storytelling has become a popular medium for millions of people to express themselves. Traditionally, this medium uses primarily 2D media such as text (e.g., memes), images (e.g., Instagram), gifs (e.g., Giphy), and videos (e.g., TikTok, Snapchat). To expand the modalities from 2D to 3D media, we present SceneAR, a smartphone application for creating sequential scene-based micro… ▽ More Short-form digital storytelling has become a popular medium for millions of people to express themselves. Traditionally, this medium uses primarily 2D media such as text (e.g., memes), images (e.g., Instagram), gifs (e.g., Giphy), and videos (e.g., TikTok, Snapchat). To expand the modalities from 2D to 3D media, we present SceneAR, a smartphone application for creating sequential scene-based micro narratives in augmented reality (AR). What sets SceneAR apart from prior work is the ability to share the scene-based stories as AR content -- no longer limited to sharing images or videos, these narratives can now be experienced in people's own physical environments. Additionally, SceneAR affords users the ability to remix AR, empowering them to build-upon others' creations collectively. We asked 18 people to use SceneAR in a 3-day study. Based on user interviews, analysis of screen recordings, and the stories they created, we extracted three themes. From those themes and the study overall, we derived six strategies for designers interested in supporting short-form AR narratives. △ Less

Submitted 28 August, 2021; originally announced August 2021.

Comments: To be published in 2021 IEEE International Symposium on Mixed and Augmented Reality (ISMAR)

arXiv:2107.02965 [pdf, other]

Telelife: The Future of Remote Living

Authors: Jason Orlosky, Misha Sra, Kenan Bektaş, Huaishu Peng, Jeeeun Kim, Nataliya Kos'myna, Tobias Hollerer, Anthony Steed, Kiyoshi Kiyokawa, Kaan Akşit

Abstract: In recent years, everyday activities such as work and socialization have steadily shifted to more remote and virtual settings. With the COVID-19 pandemic, the switch from physical to virtual has been accelerated, which has substantially affected various aspects of our lives, including business, education, commerce, healthcare, and personal life. This rapid and large-scale switch from in-person to… ▽ More In recent years, everyday activities such as work and socialization have steadily shifted to more remote and virtual settings. With the COVID-19 pandemic, the switch from physical to virtual has been accelerated, which has substantially affected various aspects of our lives, including business, education, commerce, healthcare, and personal life. This rapid and large-scale switch from in-person to remote interactions has revealed that our current technologies lack functionality and are limited in their ability to recreate interpersonal interactions. To help address these limitations in the future, we introduce "Telelife," a vision for the near future that depicts the potential means to improve remote living better aligned with how we interact, live and work in the physical world. Telelife encompasses novel synergies of technologies and concepts such as digital twins, virtual prototy**, and attention and context-aware user interfaces with innovative hardware that can support ultrarealistic graphics, user state detection, and more. These ideas will guide the transformation of our daily lives and routines soon, targeting the year 2035. In addition, we identify opportunities across high-impact applications in domains related to this vision of Telelife. Along with a recent survey of relevant fields such as human-computer interaction, pervasive computing, and virtual reality, the directions outlined in this paper will guide future research on remote living. △ Less

Submitted 6 July, 2021; originally announced July 2021.

arXiv:2106.14014 [pdf, other]

Txt2Vid: Ultra-Low Bitrate Compression of Talking-Head Videos via Text

Authors: Pulkit Tandon, Shubham Chandak, Pat Pataranutaporn, Yimeng Liu, Anesu M. Mapuranga, Pattie Maes, Tsachy Weissman, Misha Sra

Abstract: Video represents the majority of internet traffic today, driving a continual race between the generation of higher quality content, transmission of larger file sizes, and the development of network infrastructure. In addition, the recent COVID-19 pandemic fueled a surge in the use of video conferencing tools. Since videos take up considerable bandwidth (~100 Kbps to a few Mbps), improved video com… ▽ More Video represents the majority of internet traffic today, driving a continual race between the generation of higher quality content, transmission of larger file sizes, and the development of network infrastructure. In addition, the recent COVID-19 pandemic fueled a surge in the use of video conferencing tools. Since videos take up considerable bandwidth (~100 Kbps to a few Mbps), improved video compression can have a substantial impact on network performance for live and pre-recorded content, providing broader access to multimedia content worldwide. We present a novel video compression pipeline, called Txt2Vid, which dramatically reduces data transmission rates by compressing webcam videos ("talking-head videos") to a text transcript. The text is transmitted and decoded into a realistic reconstruction of the original video using recent advances in deep learning based voice cloning and lip syncing models. Our generative pipeline achieves two to three orders of magnitude reduction in the bitrate as compared to the standard audio-video codecs (encoders-decoders), while maintaining equivalent Quality-of-Experience based on a subjective evaluation by users (n = 242) in an online study. The Txt2Vid framework opens up the potential for creating novel applications such as enabling audio-video communication during poor internet connectivity, or in remote terrains with limited bandwidth. The code for this work is available at https://github.com/tpulkit/txt2vid.git. △ Less

Submitted 2 April, 2022; v1 submitted 26 June, 2021; originally announced June 2021.

Comments: 11 pages, 8 figures, 2 table. Addition of statistical analysis of results. Reorganization and rewriting of text to make it clearer

arXiv:1512.02922 [pdf, other]

MetaSpace II: Object and full-body tracking for interaction and navigation in social VR

Authors: Misha Sra, Chris Schmandt

Abstract: MetaSpace II (MS2) is a social Virtual Reality (VR) system where multiple users can not only see and hear but also interact with each other, grasp and manipulate objects, walk around in space, and get tactile feedback. MS2 allows walking in physical space by tracking each user's skeleton in real-time and allows users to feel by employing passive haptics i.e., when users touch or manipulate an obje… ▽ More MetaSpace II (MS2) is a social Virtual Reality (VR) system where multiple users can not only see and hear but also interact with each other, grasp and manipulate objects, walk around in space, and get tactile feedback. MS2 allows walking in physical space by tracking each user's skeleton in real-time and allows users to feel by employing passive haptics i.e., when users touch or manipulate an object in the virtual world, they simultaneously also touch or manipulate a corresponding object in the physical world. To enable these elements in VR, MS2 creates a correspondence in spatial layout and object placement by building the virtual world on top of a 3D scan of the real world. Through the association between the real and virtual world, users are able to walk freely while wearing a head-mounted device, avoid obstacles like walls and furniture, and interact with people and objects. Most current virtual reality (VR) environments are designed for a single user experience where interactions with virtual objects are mediated by hand-held input devices or hand gestures. Additionally, users are only shown a representation of their hands in VR floating in front of the camera as seen from a first person perspective. We believe, representing each user as a full-body avatar that is controlled by natural movements of the person in the real world (see Figure 1d), can greatly enhance believability and a user's sense immersion in VR. △ Less

Submitted 9 December, 2015; originally announced December 2015.

Comments: 10 pages, 9 figures. Video: http://living.media.mit.edu/projects/metaspace-ii/

ACM Class: H.5.1

arXiv:1512.02921 [pdf, other]

Design Strategies for Playful Technologies to Support Light-intensity Physical Activity in the Workplace

Authors: Misha Sra, Chris Schmandt

Abstract: Moderate to vigorous intensity physical activity has an established preventative role in obesity, cardiovascular disease, and diabetes. However recent evidence suggests that sitting time affects health negatively independent of whether adults meet prescribed physical activity guidelines. Since many of us spend long hours daily sitting in front of a host of electronic screens, this is cause for con… ▽ More Moderate to vigorous intensity physical activity has an established preventative role in obesity, cardiovascular disease, and diabetes. However recent evidence suggests that sitting time affects health negatively independent of whether adults meet prescribed physical activity guidelines. Since many of us spend long hours daily sitting in front of a host of electronic screens, this is cause for concern. In this paper, we describe a set of three prototype digital games created for encouraging light-intensity physical activity during short breaks at work. The design of these kinds of games is a complex process that must consider motivation strategies, interaction methodology, usability and ludic aspects. We present design guidelines for technologies that encourage physical activity in the workplace that we derived from a user evaluation using the prototypes. Although the design guidelines can be seen as general principles, we conclude that they have to be considered differently for different workplace cultures and workspaces. Our study was conducted with users who have some experience playing casual games on their mobile devices and were able and willing to increase their physical activity. △ Less

Submitted 9 December, 2015; originally announced December 2015.

Comments: 11 pages, 5 figures. Video: http://living.media.mit.edu/projects/see-saw/

Showing 1–19 of 19 results for author: Sra, M