Skip to main content

Showing 1–13 of 13 results for author: Ku, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.15252  [pdf, other

    cs.CV cs.AI

    VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation

    Authors: Xuan He, Dongfu Jiang, Ge Zhang, Max Ku, Achint Soni, Sherman Siu, Haonan Chen, Abhranil Chandra, Ziyan Jiang, Aaran Arulraj, Kai Wang, Quy Duc Do, Yuansheng Ni, Bohan Lyu, Yaswanth Narsupalli, Rongqi Fan, Zhiheng Lyu, Yuchen Lin, Wenhu Chen

    Abstract: The recent years have witnessed great advances in video generation. However, the development of automatic video metrics is lagging significantly behind. None of the existing metric is able to provide reliable scores over generated videos. The main barrier is the lack of large-scale human-annotated dataset. In this paper, we release VideoFeedback, the first large-scale dataset containing human-prov… ▽ More

    Submitted 24 June, 2024; v1 submitted 21 June, 2024; originally announced June 2024.

  2. arXiv:2406.04485  [pdf, other

    cs.AI cs.CV

    GenAI Arena: An Open Evaluation Platform for Generative Models

    Authors: Dongfu Jiang, Max Ku, Tianle Li, Yuansheng Ni, Shizhuo Sun, Rongqi Fan, Wenhu Chen

    Abstract: Generative AI has made remarkable strides to revolutionize fields such as image and video generation. These advancements are driven by innovative algorithms, architecture, and data. However, the rapid proliferation of generative models has highlighted a critical gap: the absence of trustworthy evaluation metrics. Current automatic assessments such as FID, CLIP, FVD, etc often fail to capture the n… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: 9 pages,7 figures

  3. arXiv:2406.01574  [pdf, other

    cs.CL

    MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark

    Authors: Yubo Wang, Xueguang Ma, Ge Zhang, Yuansheng Ni, Abhranil Chandra, Shiguang Guo, Weiming Ren, Aaran Arulraj, Xuan He, Ziyan Jiang, Tianle Li, Max Ku, Kai Wang, Alex Zhuang, Rongqi Fan, Xiang Yue, Wenhu Chen

    Abstract: In the age of large-scale language models, benchmarks like the Massive Multitask Language Understanding (MMLU) have been pivotal in pushing the boundaries of what AI can achieve in language comprehension and reasoning across diverse domains. However, as models continue to improve, their performance on these benchmarks has begun to plateau, making it increasingly difficult to discern differences in… ▽ More

    Submitted 23 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

  4. arXiv:2405.01483  [pdf, other

    cs.CV cs.AI cs.CL

    MANTIS: Interleaved Multi-Image Instruction Tuning

    Authors: Dongfu Jiang, Xuan He, Huaye Zeng, Cong Wei, Max Ku, Qian Liu, Wenhu Chen

    Abstract: Large multimodal models (LMMs) have shown great results in single-image vision language tasks. However, their abilities to solve multi-image visual language tasks is yet to be improved. The existing LMMs like OpenFlamingo, Emu2, Idefics gain their multi-image ability through pre-training on hundreds of millions of noisy interleaved image-text data from the web, which is neither efficient nor effec… ▽ More

    Submitted 23 May, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

    Comments: 9 pages, 3 figures, 8 tables

  5. arXiv:2403.14468  [pdf, other

    cs.CV cs.AI cs.MM

    AnyV2V: A Tuning-Free Framework For Any Video-to-Video Editing Tasks

    Authors: Max Ku, Cong Wei, Weiming Ren, Harry Yang, Wenhu Chen

    Abstract: In the dynamic field of digital content creation using generative models, state-of-the-art video editing models still do not offer the level of quality and control that users desire. Previous works on video editing either extended from image-based generative models in a zero-shot manner or necessitated extensive fine-tuning, which can hinder the production of fluid video edits. Furthermore, these… ▽ More

    Submitted 10 June, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

    Comments: preprint

  6. arXiv:2312.14867  [pdf, other

    cs.CV cs.AI cs.CL cs.MM

    VIEScore: Towards Explainable Metrics for Conditional Image Synthesis Evaluation

    Authors: Max Ku, Dongfu Jiang, Cong Wei, Xiang Yue, Wenhu Chen

    Abstract: In the rapidly advancing field of conditional image generation research, challenges such as limited explainability lie in effectively evaluating the performance and capabilities of various models. This paper introduces VIEScore, a Visual Instruction-guided Explainable metric for evaluating any conditional image generation tasks. VIEScore leverages general knowledge from Multimodal Large Language M… ▽ More

    Submitted 3 June, 2024; v1 submitted 22 December, 2023; originally announced December 2023.

    Comments: Accepted to ACL2024 main

  7. arXiv:2310.01596  [pdf, other

    cs.CV cs.GR cs.MM

    ImagenHub: Standardizing the evaluation of conditional image generation models

    Authors: Max Ku, Tianle Li, Kai Zhang, Yujie Lu, Xingyu Fu, Wenwen Zhuang, Wenhu Chen

    Abstract: Recently, a myriad of conditional image generation and editing models have been developed to serve different downstream tasks, including text-to-image generation, text-guided image editing, subject-driven image generation, control-guided image generation, etc. However, we observe huge inconsistencies in experimental conditions: datasets, inference, and evaluation metrics - render fair comparisons… ▽ More

    Submitted 10 March, 2024; v1 submitted 2 October, 2023; originally announced October 2023.

    Comments: Accepted to ICLR2024 Camera Ready

  8. arXiv:2306.12624  [pdf, other

    cs.CV

    DreamEdit: Subject-driven Image Editing

    Authors: Tianle Li, Max Ku, Cong Wei, Wenhu Chen

    Abstract: Subject-driven image generation aims at generating images containing customized subjects, which has recently drawn enormous attention from the research community. However, the previous works cannot precisely control the background and position of the target subject. In this work, we aspire to fill the void and propose two novel subject-driven sub-tasks, i.e., Subject Replacement and Subject Additi… ▽ More

    Submitted 16 August, 2023; v1 submitted 21 June, 2023; originally announced June 2023.

  9. arXiv:2305.17855  [pdf, other

    cs.CL

    Vec2Gloss: definition modeling leveraging contextualized vectors with Wordnet gloss

    Authors: Yu-Hsiang Tseng, Mao-Chang Ku, Wei-Ling Chen, Yu-Lin Chang, Shu-Kai Hsieh

    Abstract: Contextualized embeddings are proven to be powerful tools in multiple NLP tasks. Nonetheless, challenges regarding their interpretability and capability to represent lexical semantics still remain. In this paper, we propose that the task of definition modeling, which aims to generate the human-readable definition of the word, provides a route to evaluate or understand the high dimensional semantic… ▽ More

    Submitted 28 May, 2023; originally announced May 2023.

  10. arXiv:2305.12524  [pdf, other

    cs.CL cs.AI

    TheoremQA: A Theorem-driven Question Answering dataset

    Authors: Wenhu Chen, Ming Yin, Max Ku, Pan Lu, Yixin Wan, Xueguang Ma, Jianyu Xu, Xinyi Wang, Tony Xia

    Abstract: The recent LLMs like GPT-4 and PaLM-2 have made tremendous progress in solving fundamental math problems like GSM8K by achieving over 90% accuracy. However, their capabilities to solve more challenging math problems which require domain-specific knowledge (i.e. theorem) have yet to be investigated. In this paper, we introduce TheoremQA, the first theorem-driven question-answering dataset designed… ▽ More

    Submitted 5 December, 2023; v1 submitted 21 May, 2023; originally announced May 2023.

    Comments: Accepted to Main Conference of EMNLP 2023

  11. arXiv:2207.10371  [pdf, other

    cs.NI cs.IT

    UAV Trajectory, User Association and Power Control for Multi-UAV Enabled Energy Harvesting Communications: Offline Design and Online Reinforcement Learning

    Authors: Chien-Wei Fu, Meng-Lin Ku, Yu-Jia Chen, Tony Q. S. Quek

    Abstract: In this paper, we consider multiple solar-powered wireless nodes which utilize the harvested solar energy to transmit collected data to multiple unmanned aerial vehicles (UAVs) in the uplink. In this context, we jointly design UAV flight trajectories, UAV-node communication associations, and uplink power control to effectively utilize the harvested energy and manage co-channel interference within… ▽ More

    Submitted 21 July, 2022; originally announced July 2022.

  12. arXiv:1405.7254  [pdf, ps, other

    cs.NI cs.IT

    Data-Driven Stochastic Models and Policies for Energy Harvesting Sensor Communications

    Authors: Meng-Lin Ku, Yan Chen, K. J. Ray Liu

    Abstract: Energy harvesting from the surroundings is a promising solution to perpetually power-up wireless sensor communications. This paper presents a data-driven approach of finding optimal transmission policies for a solar-powered sensor node that attempts to maximize net bit rates by adapting its transmission parameters, power levels and modulation types, to the changes of channel fading and battery rec… ▽ More

    Submitted 28 May, 2014; originally announced May 2014.

  13. arXiv:1308.5661  [pdf

    cs.CV cs.CR

    Detection of copy-move forgery in digital images based on DCT

    Authors: Nathalie Diane Wandji, Sun Xingming, Moise Fah Kue

    Abstract: With rapid advances in digital information processing systems, and more specifically in digital image processing software, there is a widespread development of advanced tools and techniques for digital image forgery. One of the techniques most commonly used is the Copy-move forgery which proceeds by copying a part of an image and pasting it into the same image, in order to maliciously hide an obje… ▽ More

    Submitted 26 August, 2013; originally announced August 2013.

    Comments: Published in IJCSI (International Journal of Computer Science Issues), Volume 10, Issue 2, No 1, March 2013

    Journal ref: ISSN (Print): 1694-0814 | ISSN (Online): 1694-0784, International Journal of Computer Science Issues (IJCSI), Volume 10, Issue 2, No 1, March 2013