-
Sora as an AGI World Model? A Complete Survey on Text-to-Video Generation
Authors:
Joseph Cho,
Fachrina Dewi Puspitasari,
Sheng Zheng,
**gyao Zheng,
Lik-Hang Lee,
Tae-Ho Kim,
Choong Seon Hong,
Chaoning Zhang
Abstract:
The evolution of video generation from text, starting with animating MNIST numbers to simulating the physical world with Sora, has progressed at a breakneck speed over the past seven years. While often seen as a superficial expansion of the predecessor text-to-image generation model, text-to-video generation models are developed upon carefully engineered constituents. Here, we systematically discu…
▽ More
The evolution of video generation from text, starting with animating MNIST numbers to simulating the physical world with Sora, has progressed at a breakneck speed over the past seven years. While often seen as a superficial expansion of the predecessor text-to-image generation model, text-to-video generation models are developed upon carefully engineered constituents. Here, we systematically discuss these elements consisting of but not limited to core building blocks (vision, language, and temporal) and supporting features from the perspective of their contributions to achieving a world model. We employ the PRISMA framework to curate 97 impactful research articles from renowned scientific databases primarily studying video synthesis using text conditions. Upon minute exploration of these manuscripts, we observe that text-to-video generation involves more intricate technologies beyond the plain extension of text-to-image generation. Our additional review into the shortcomings of Sora-generated videos pinpoints the call for more in-depth studies in various enabling aspects of video generation such as dataset, evaluation metric, efficient architecture, and human-controlled generation. Finally, we conclude that the study of the text-to-video generation may still be in its infancy, requiring contribution from the cross-discipline research community towards its advancement as the first step to realize artificial general intelligence (AGI).
△ Less
Submitted 7 June, 2024; v1 submitted 8 March, 2024;
originally announced March 2024.
-
Ghost Booking as a New Philanthropy Channel: A Case Study on Ukraine-Russia Conflict
Authors:
Fachrina Dewi Puspitasari,
Gareth Tyson,
Ehsan-Ul Haq,
Pan Hui,
Lik-Hang Lee
Abstract:
The term ghost booking has recently emerged as a new way to conduct humanitarian acts during the conflict between Russia and Ukraine in 2022. The phenomenon describes the events where netizens donate to Ukrainian citizens through no-show bookings on the Airbnb platform. Impressively, the social fundraising act that used to be organized on donation-based crowdfunding platforms is shifted into a sha…
▽ More
The term ghost booking has recently emerged as a new way to conduct humanitarian acts during the conflict between Russia and Ukraine in 2022. The phenomenon describes the events where netizens donate to Ukrainian citizens through no-show bookings on the Airbnb platform. Impressively, the social fundraising act that used to be organized on donation-based crowdfunding platforms is shifted into a sharing economy platform market and thus gained more visibility. Although the donation purpose is clear, the motivation of donors in selecting a property to book remains concealed. Thus, our study aims to explore peer-to-peer donation behavior on a platform that was originally intended for economic exchanges, and further identifies which platform attributes effectively drive donation behaviors. We collect over 200K guest reviews from 16K Airbnb property listings in Ukraine by employing two collection methods (screen scra** and HTML parsing). Then, we distinguish ghost bookings among guest reviews. Our analysis uncovers the relationship between ghost booking behavior and the platform attributes, and pinpoints several attributes that influence ghost booking. Our findings highlight that donors incline to credible properties explicitly featured with humanitarian needs, i.e., the hosts in penury.
△ Less
Submitted 15 June, 2023;
originally announced June 2023.
-
A Survey on Segment Anything Model (SAM): Vision Foundation Model Meets Prompt Engineering
Authors:
Chaoning Zhang,
Fachrina Dewi Puspitasari,
Sheng Zheng,
Chenghao Li,
Yu Qiao,
Taegoo Kang,
Xinru Shan,
Chenshuang Zhang,
Caiyan Qin,
Francois Rameau,
Lik-Hang Lee,
Sung-Ho Bae,
Choong Seon Hong
Abstract:
Segment anything model (SAM) developed by Meta AI Research has recently attracted significant attention. Trained on a large segmentation dataset of over 1 billion masks, SAM is capable of segmenting any object on a certain image. In the original SAM work, the authors turned to zero-short transfer tasks (like edge detection) for evaluating the performance of SAM. Recently, numerous works have attem…
▽ More
Segment anything model (SAM) developed by Meta AI Research has recently attracted significant attention. Trained on a large segmentation dataset of over 1 billion masks, SAM is capable of segmenting any object on a certain image. In the original SAM work, the authors turned to zero-short transfer tasks (like edge detection) for evaluating the performance of SAM. Recently, numerous works have attempted to investigate the performance of SAM in various scenarios to recognize and segment objects. Moreover, numerous projects have emerged to show the versatility of SAM as a foundation model by combining it with other models, like Grounding DINO, Stable Diffusion, ChatGPT, etc. With the relevant papers and projects increasing exponentially, it is challenging for the readers to catch up with the development of SAM. To this end, this work conducts the first yet comprehensive survey on SAM. This is an ongoing project and we intend to update the manuscript on a regular basis. Therefore, readers are welcome to contact us if they complete new works related to SAM so that we can include them in our next version.
△ Less
Submitted 3 July, 2023; v1 submitted 12 May, 2023;
originally announced June 2023.
-
Review of Persuasive User Interface as Strategy for Technology Addiction in Virtual Environments
Authors:
Fachrina Dewi Puspitasari,
Lik-Hang Lee
Abstract:
In the era of virtuality, the increasingly ubiquitous technology bears the challenge of excessive user dependency, also known as user addiction. Augmented reality (AR) and virtual reality (VR) have become increasingly integrated into daily life. Although discussions about the drawbacks of these technologies are abundant, their exploration for solutions is still rare. Thus, using the PRISMA methodo…
▽ More
In the era of virtuality, the increasingly ubiquitous technology bears the challenge of excessive user dependency, also known as user addiction. Augmented reality (AR) and virtual reality (VR) have become increasingly integrated into daily life. Although discussions about the drawbacks of these technologies are abundant, their exploration for solutions is still rare. Thus, using the PRISMA methodology, this paper reviewed the literature on technology addiction and persuasive technology. After describing the key research trends, the paper summed up nine persuasive elements of user interfaces (UIs) that AR and VR developers could add to their apps to make them less addictive. Furthermore, this review paper encourages more research into a persuasive strategy for controlling user dependency in virtual-physical blended cyberspace.
△ Less
Submitted 18 October, 2022;
originally announced October 2022.