Search | arXiv e-print repository

Sora as an AGI World Model? A Complete Survey on Text-to-Video Generation

Authors: Joseph Cho, Fachrina Dewi Puspitasari, Sheng Zheng, **gyao Zheng, Lik-Hang Lee, Tae-Ho Kim, Choong Seon Hong, Chaoning Zhang

Abstract: The evolution of video generation from text, starting with animating MNIST numbers to simulating the physical world with Sora, has progressed at a breakneck speed over the past seven years. While often seen as a superficial expansion of the predecessor text-to-image generation model, text-to-video generation models are developed upon carefully engineered constituents. Here, we systematically discu… ▽ More The evolution of video generation from text, starting with animating MNIST numbers to simulating the physical world with Sora, has progressed at a breakneck speed over the past seven years. While often seen as a superficial expansion of the predecessor text-to-image generation model, text-to-video generation models are developed upon carefully engineered constituents. Here, we systematically discuss these elements consisting of but not limited to core building blocks (vision, language, and temporal) and supporting features from the perspective of their contributions to achieving a world model. We employ the PRISMA framework to curate 97 impactful research articles from renowned scientific databases primarily studying video synthesis using text conditions. Upon minute exploration of these manuscripts, we observe that text-to-video generation involves more intricate technologies beyond the plain extension of text-to-image generation. Our additional review into the shortcomings of Sora-generated videos pinpoints the call for more in-depth studies in various enabling aspects of video generation such as dataset, evaluation metric, efficient architecture, and human-controlled generation. Finally, we conclude that the study of the text-to-video generation may still be in its infancy, requiring contribution from the cross-discipline research community towards its advancement as the first step to realize artificial general intelligence (AGI). △ Less

Submitted 7 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

Comments: First complete survey on Text-to-Video Generation, 44 pages, 20 figures

arXiv:2306.13667 [pdf, other]

Ghost Booking as a New Philanthropy Channel: A Case Study on Ukraine-Russia Conflict

Authors: Fachrina Dewi Puspitasari, Gareth Tyson, Ehsan-Ul Haq, Pan Hui, Lik-Hang Lee

Abstract: The term ghost booking has recently emerged as a new way to conduct humanitarian acts during the conflict between Russia and Ukraine in 2022. The phenomenon describes the events where netizens donate to Ukrainian citizens through no-show bookings on the Airbnb platform. Impressively, the social fundraising act that used to be organized on donation-based crowdfunding platforms is shifted into a sha… ▽ More The term ghost booking has recently emerged as a new way to conduct humanitarian acts during the conflict between Russia and Ukraine in 2022. The phenomenon describes the events where netizens donate to Ukrainian citizens through no-show bookings on the Airbnb platform. Impressively, the social fundraising act that used to be organized on donation-based crowdfunding platforms is shifted into a sharing economy platform market and thus gained more visibility. Although the donation purpose is clear, the motivation of donors in selecting a property to book remains concealed. Thus, our study aims to explore peer-to-peer donation behavior on a platform that was originally intended for economic exchanges, and further identifies which platform attributes effectively drive donation behaviors. We collect over 200K guest reviews from 16K Airbnb property listings in Ukraine by employing two collection methods (screen scra** and HTML parsing). Then, we distinguish ghost bookings among guest reviews. Our analysis uncovers the relationship between ghost booking behavior and the platform attributes, and pinpoints several attributes that influence ghost booking. Our findings highlight that donors incline to credible properties explicitly featured with humanitarian needs, i.e., the hosts in penury. △ Less

Submitted 15 June, 2023; originally announced June 2023.

Comments: Accepted at ACM Hypertext 2023

ACM Class: J.4

arXiv:2306.06211 [pdf, other]

A Survey on Segment Anything Model (SAM): Vision Foundation Model Meets Prompt Engineering

Authors: Chaoning Zhang, Fachrina Dewi Puspitasari, Sheng Zheng, Chenghao Li, Yu Qiao, Taegoo Kang, Xinru Shan, Chenshuang Zhang, Caiyan Qin, Francois Rameau, Lik-Hang Lee, Sung-Ho Bae, Choong Seon Hong

Abstract: Segment anything model (SAM) developed by Meta AI Research has recently attracted significant attention. Trained on a large segmentation dataset of over 1 billion masks, SAM is capable of segmenting any object on a certain image. In the original SAM work, the authors turned to zero-short transfer tasks (like edge detection) for evaluating the performance of SAM. Recently, numerous works have attem… ▽ More Segment anything model (SAM) developed by Meta AI Research has recently attracted significant attention. Trained on a large segmentation dataset of over 1 billion masks, SAM is capable of segmenting any object on a certain image. In the original SAM work, the authors turned to zero-short transfer tasks (like edge detection) for evaluating the performance of SAM. Recently, numerous works have attempted to investigate the performance of SAM in various scenarios to recognize and segment objects. Moreover, numerous projects have emerged to show the versatility of SAM as a foundation model by combining it with other models, like Grounding DINO, Stable Diffusion, ChatGPT, etc. With the relevant papers and projects increasing exponentially, it is challenging for the readers to catch up with the development of SAM. To this end, this work conducts the first yet comprehensive survey on SAM. This is an ongoing project and we intend to update the manuscript on a regular basis. Therefore, readers are welcome to contact us if they complete new works related to SAM so that we can include them in our next version. △ Less

Submitted 3 July, 2023; v1 submitted 12 May, 2023; originally announced June 2023.

Comments: First survey on Segment Anything Model (SAM), work under progress

arXiv:2210.09628 [pdf, other]

Review of Persuasive User Interface as Strategy for Technology Addiction in Virtual Environments

Authors: Fachrina Dewi Puspitasari, Lik-Hang Lee

Abstract: In the era of virtuality, the increasingly ubiquitous technology bears the challenge of excessive user dependency, also known as user addiction. Augmented reality (AR) and virtual reality (VR) have become increasingly integrated into daily life. Although discussions about the drawbacks of these technologies are abundant, their exploration for solutions is still rare. Thus, using the PRISMA methodo… ▽ More In the era of virtuality, the increasingly ubiquitous technology bears the challenge of excessive user dependency, also known as user addiction. Augmented reality (AR) and virtual reality (VR) have become increasingly integrated into daily life. Although discussions about the drawbacks of these technologies are abundant, their exploration for solutions is still rare. Thus, using the PRISMA methodology, this paper reviewed the literature on technology addiction and persuasive technology. After describing the key research trends, the paper summed up nine persuasive elements of user interfaces (UIs) that AR and VR developers could add to their apps to make them less addictive. Furthermore, this review paper encourages more research into a persuasive strategy for controlling user dependency in virtual-physical blended cyberspace. △ Less

Submitted 18 October, 2022; originally announced October 2022.

Showing 1–4 of 4 results for author: Puspitasari, F D