Skip to main content

Showing 1–50 of 302 results for author: Bao, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.18710  [pdf, other

    cs.LG cs.AI

    To FP8 and Back Again: Quantifying the Effects of Reducing Precision on LLM Training Stability

    Authors: Joonhyung Lee, Jeongin Bae, Byeongwook Kim, Se Jung Kwon, Dongsoo Lee

    Abstract: The massive computational costs associated with large language model (LLM) pretraining have spurred great interest in reduced-precision floating-point representations to accelerate the process. As a result, the BrainFloat16 (BF16) precision has become the de facto standard for LLM training, with hardware support included in recent accelerators. This trend has gone even further in the latest proces… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  2. arXiv:2405.13954  [pdf, other

    cs.LG cs.AI cs.CL

    What is Your Data Worth to GPT? LLM-Scale Data Valuation with Influence Functions

    Authors: Sang Keun Choe, Hwijeen Ahn, Juhan Bae, Kewen Zhao, Minsoo Kang, Youngseog Chung, Adithya Pratapa, Willie Neiswanger, Emma Strubell, Teruko Mitamura, Jeff Schneider, Eduard Hovy, Roger Grosse, Eric Xing

    Abstract: Large language models (LLMs) are trained on a vast amount of human-written data, but data providers often remain uncredited. In response to this issue, data valuation (or data attribution), which quantifies the contribution or value of each data to the model output, has been discussed as a potential solution. Nevertheless, applying existing data valuation methods to recent LLMs and their vast trai… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  3. arXiv:2405.12186  [pdf, other

    cs.LG

    Training Data Attribution via Approximate Unrolled Differentiation

    Authors: Juhan Bae, Wu Lin, Jonathan Lorraine, Roger Grosse

    Abstract: Many training data attribution (TDA) methods aim to estimate how a model's behavior would change if one or more data points were removed from the training set. Methods based on implicit differentiation, such as influence functions, can be made computationally efficient, but fail to account for underspecification, the implicit bias of the optimization algorithm, or multi-stage training pipelines. B… ▽ More

    Submitted 21 May, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

  4. arXiv:2405.04782  [pdf, other

    cs.CV

    Dual-Image Enhanced CLIP for Zero-Shot Anomaly Detection

    Authors: Zhaoxiang Zhang, Hanqiu Deng, **an Bao, Xingyu Li

    Abstract: Image Anomaly Detection has been a challenging task in Computer Vision field. The advent of Vision-Language models, particularly the rise of CLIP-based frameworks, has opened new avenues for zero-shot anomaly detection. Recent studies have explored the use of CLIP by aligning images with normal and prompt descriptions. However, the exclusive dependence on textual guidance often falls short, highli… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  5. arXiv:2405.00642  [pdf, other

    stat.ML cond-mat.dis-nn cond-mat.stat-mech cs.LG

    From Empirical Observations to Universality: Dynamics of Deep Learning with Inputs Built on Gaussian mixture

    Authors: Jaeyong Bae, Hawoong Jeong

    Abstract: This study broadens the scope of theoretical frameworks in deep learning by delving into the dynamics of neural networks with inputs that demonstrate the structural characteristics to Gaussian Mixture (GM). We analyzed how the dynamics of neural networks under GM-structured inputs diverge from the predictions of conventional theories based on simple Gaussian structures. A revelation of our work is… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: 19 pages, 9 figures

  6. arXiv:2404.19381  [pdf, other

    cs.AR

    Low-overhead General-purpose Near-Data Processing in CXL Memory Expanders

    Authors: Hyungkyu Ham, Jeongmin Hong, Geonwoo Park, Yunseon Shin, Okkyun Woo, Wonhyuk Yang, **hoon Bae, Eunhyeok Park, Hyo** Sung, Euicheol Lim, Gwangsun Kim

    Abstract: To overcome the memory capacity wall of large-scale AI and big data applications, Compute Express Link (CXL) enables cost-efficient memory expansion beyond the local DRAM of processors. While its CXL.mem protocol stack minimizes interconnect latency, CXL memory accesses can still result in significant slowdowns for memory-bound applications. While near-data processing (NDP) in CXL memory can overc… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

  7. arXiv:2404.14219  [pdf, other

    cs.CL cs.AI

    Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

    Authors: Marah Abdin, Sam Ade Jacobs, Ammar Ahmad Awan, Jyoti Aneja, Ahmed Awadallah, Hany Awadalla, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Jianmin Bao, Harkirat Behl, Alon Benhaim, Misha Bilenko, Johan Bjorck, Sébastien Bubeck, Qin Cai, Martin Cai, Caio César Teodoro Mendes, Weizhu Chen, Vishrav Chaudhary, Dong Chen, Dongdong Chen, Yen-Chun Chen, Yi-Ling Chen, Parul Chopra , et al. (90 additional authors not shown)

    Abstract: We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. The innovation lies entirely in our dataset… ▽ More

    Submitted 23 May, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: 19 pages

  8. arXiv:2404.11929  [pdf, other

    eess.IV cs.AI cs.CV

    A Symmetric Regressor for MRI-Based Assessment of Striatal Dopamine Transporter Uptake in Parkinson's Disease

    Authors: Walid Abdullah Al, Il Dong Yun, Yun Jung Bae

    Abstract: Dopamine transporter (DAT) imaging is commonly used for monitoring Parkinson's disease (PD), where striatal DAT uptake amount is computed to assess PD severity. However, DAT imaging has a high cost and the risk of radiance exposure and is not available in general clinics. Recently, MRI patch of the nigral region has been proposed as a safer and easier alternative. This paper proposes a symmetric r… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  9. arXiv:2404.03613  [pdf, other

    cs.CV

    Per-Gaussian Embedding-Based Deformation for Deformable 3D Gaussian Splatting

    Authors: Jeongmin Bae, Seoha Kim, Youngsik Yun, Hahyun Lee, Gun Bang, Youngjung Uh

    Abstract: As 3D Gaussian Splatting (3DGS) provides fast and high-quality novel view synthesis, it is a natural extension to deform a canonical 3DGS to multiple frames. However, previous works fail to accurately reconstruct dynamic scenes, especially 1) static parts moving along nearby dynamic parts, and 2) some dynamic areas are blurry. We attribute the failure to the wrong design of the deformation field,… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: Preprint

  10. arXiv:2404.01954  [pdf, other

    cs.CL cs.AI

    HyperCLOVA X Technical Report

    Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seong** Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

    Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More

    Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: 44 pages; updated authors list and fixed author names

  11. arXiv:2403.19889  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    Towards a Robust Retrieval-Based Summarization System

    Authors: Shengjie Liu, **g Wu, **gyuan Bao, Wenyi Wang, Naira Hovakimyan, Christopher G Healey

    Abstract: This paper describes an investigation of the robustness of large language models (LLMs) for retrieval augmented generation (RAG)-based summarization tasks. While LLMs provide summarization capabilities, their performance in complex, real-world scenarios remains under-explored. Our first contribution is LogicSumm, an innovative evaluation framework incorporating realistic scenarios to assess LLM ro… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

  12. arXiv:2403.16167  [pdf, other

    cs.CV cs.CL

    Exploiting Semantic Reconstruction to Mitigate Hallucinations in Vision-Language Models

    Authors: Minchan Kim, Minyeong Kim, Junik Bae, Suhwan Choi, Sungkyung Kim, Buru Chang

    Abstract: Hallucinations in vision-language models pose a significant challenge to their reliability, particularly in the generation of long captions. Current methods fall short of accurately identifying and mitigating these hallucinations. To address this issue, we introduce ESREAL, a novel unsupervised learning framework designed to suppress the generation of hallucinations through accurate localization a… ▽ More

    Submitted 5 May, 2024; v1 submitted 24 March, 2024; originally announced March 2024.

  13. arXiv:2403.07300  [pdf, other

    cs.LG cs.CL

    CALF: Aligning LLMs for Time Series Forecasting via Cross-modal Fine-Tuning

    Authors: Peiyuan Liu, Hang Guo, Tao Dai, Naiqi Li, Jigang Bao, Xudong Ren, Yong Jiang, Shu-Tao Xia

    Abstract: Deep learning (e.g., Transformer) has been widely and successfully used in multivariate time series forecasting (MTSF). Unlike existing methods that focus on training models from a single modal of time series input, large language models (LLMs) based MTSF methods with cross-modal text and time series input have recently shown great superiority, especially with limited temporal data. However, curre… ▽ More

    Submitted 23 May, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

  14. arXiv:2403.03218  [pdf, other

    cs.LG cs.AI cs.CL cs.CY

    The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning

    Authors: Nathaniel Li, Alexander Pan, Anjali Gopal, Summer Yue, Daniel Berrios, Alice Gatti, Justin D. Li, Ann-Kathrin Dombrowski, Shashwat Goel, Long Phan, Gabriel Mukobi, Nathan Helm-Burger, Rassin Lababidi, Lennart Justen, Andrew B. Liu, Michael Chen, Isabelle Barrass, Oliver Zhang, Xiaoyuan Zhu, Rishub Tamirisa, Bhrugu Bharathi, Adam Khoja, Zhenqi Zhao, Ariel Herbert-Voss, Cort B. Breuer , et al. (32 additional authors not shown)

    Abstract: The White House Executive Order on Artificial Intelligence highlights the risks of large language models (LLMs) empowering malicious actors in develo** biological, cyber, and chemical weapons. To measure these risks of malicious use, government institutions and major AI labs are develo** evaluations for hazardous capabilities in LLMs. However, current evaluations are private, preventing furthe… ▽ More

    Submitted 15 May, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

    Comments: See the project page at https://wmdp.ai

  15. arXiv:2403.01999  [pdf, other

    cs.CL

    LLM-Oriented Retrieval Tuner

    Authors: Si Sun, Hanqing Zhang, Zhiyuan Liu, Jie Bao, Dawei Song

    Abstract: Dense Retrieval (DR) is now considered as a promising tool to enhance the memorization capacity of Large Language Models (LLM) such as GPT3 and GPT-4 by incorporating external memories. However, due to the paradigm discrepancy between text generation of LLM and DR, it is still an open challenge to integrate the retrieval and generation tasks in a shared LLM. In this paper, we propose an efficient… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

    Comments: 16 pages, 8 figures, 5 tables

  16. arXiv:2403.01086  [pdf, other

    cs.RO

    phloSAR: a Portable, High-Flow Pressure Supply and Regulator Enabling Untethered Operation of Large Pneumatic Soft Robots

    Authors: Maxwell Ahlquist, Rianna Jitosho, Jiawen Bao, Allison M. Okamura

    Abstract: Pneumatic actuation benefits soft robotics by facilitating compliance, enabling large volume change, and concentrating actuator weight away from the end-effector. However, portability is compromised when pneumatic actuators are tethered to cumbersome air and power supplies. While there are existing options for portable pneumatic systems, they are limited in dynamic capabilities, constraining their… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

    Comments: 2024 IEEE International Conference on Soft Robotics

  17. arXiv:2402.18708  [pdf, other

    cs.LO cs.PL

    Bluebell: An Alliance of Relational Lifting and Independence For Probabilistic Reasoning

    Authors: Jialu Bao, Emanuele D'Osualdo, Azadeh Farzan

    Abstract: We present Bluebell, a program logic for reasoning about probabilistic programs where unary and relational styles of reasoning come together to create new reasoning tools. Unary-style reasoning is very expressive and is powered by foundational mechanisms to reason about probabilistic behaviour like independence and conditioning. The relational style of reasoning, on the other hand, naturally shine… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

    Comments: 23 pages + 53 pages of appendix

  18. arXiv:2402.18096  [pdf, other

    cs.LG cs.AI

    No Token Left Behind: Reliable KV Cache Compression via Importance-Aware Mixed Precision Quantization

    Authors: June Yong Yang, Byeongwook Kim, Jeongin Bae, Beomseok Kwon, Gunho Park, Eunho Yang, Se Jung Kwon, Dongsoo Lee

    Abstract: Key-Value (KV) Caching has become an essential technique for accelerating the inference speed and throughput of generative Large Language Models~(LLMs). However, the memory footprint of the KV cache poses a critical bottleneck in LLM deployment as the cache size grows with batch size and sequence length, often surpassing even the size of the model itself. Although recent methods were proposed to s… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

  19. arXiv:2402.15131  [pdf, other

    cs.CL cs.AI

    Interactive-KBQA: Multi-Turn Interactions for Knowledge Base Question Answering with Large Language Models

    Authors: Guanming Xiong, Junwei Bao, Wen Zhao

    Abstract: This study explores the realm of knowledge-base question answering (KBQA). KBQA is considered a challenging task, particularly in parsing intricate questions into executable logical forms. Traditional semantic parsing (SP)-based methods require extensive data annotations, which result in significant costs. Recently, the advent of few-shot in-context learning, powered by large language models (LLMs… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

    Comments: Codes will be released upon acceptance

    ACM Class: I.2.7

  20. arXiv:2402.03496  [pdf, other

    cs.LG math.OC

    Can We Remove the Square-Root in Adaptive Gradient Methods? A Second-Order Perspective

    Authors: Wu Lin, Felix Dangel, Runa Eschenhagen, Juhan Bae, Richard E. Turner, Alireza Makhzani

    Abstract: Adaptive gradient optimizers like Adam(W) are the default training algorithms for many deep learning architectures, such as transformers. Their diagonal preconditioner is based on the gradient outer product which is incorporated into the parameter update via a square root. While these methods are often motivated as approximate second-order methods, the square root represents a fundamental differen… ▽ More

    Submitted 3 March, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

    Comments: updated Sec. 3 & 4

  21. arXiv:2401.13922  [pdf, other

    cs.IT

    Simplified Successive Cancellation List Decoding of PAC Codes

    Authors: Hamid Saber, Homayoon Hatami, Jung Hyun Bae

    Abstract: Polar codes are the first class of structured channel codes that achieve the symmetric capacity of binary channels with efficient encoding and decoding. In 2019, Arikan proposed a new polar coding scheme referred to as polarization-adjusted convolutional (PAC)} codes. In contrast to polar codes, PAC codes precode the information word using a convolutional code prior to polar encoding. This results… ▽ More

    Submitted 26 January, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

    Comments: 7 pages, 3 figures

  22. arXiv:2401.05842  [pdf, ps, other

    cs.LO

    A Categorical Approach to DIBI Models

    Authors: Tao Gu, Jialu Bao, Justin Hsu, Alexandra Silva, Fabio Zanasi

    Abstract: The logic of Dependence and Independence Bunched Implications (DIBI) is a logic to reason about conditional independence (CI); for instance, DIBI formulas can characterise CI in probability distributions and relational databases, using the probabilistic and relational DIBI models, respectively. Despite the similarity of the probabilistic and relational models, a uniform, more abstract account rema… ▽ More

    Submitted 11 January, 2024; originally announced January 2024.

    Comments: 33 pages

  23. arXiv:2401.05290  [pdf, other

    cs.RO cs.HC

    Analysis and Perspectives on the ANA Avatar XPRIZE Competition

    Authors: Kris Hauser, Eleanor Watson, Joonbum Bae, Josh Bankston, Sven Behnke, Bill Borgia, Manuel G. Catalano, Stefano Dafarra, Jan B. F. van Erp, Thomas Ferris, Jeremy Fishel, Guy Hoffman, Serena Ivaldi, Fumio Kanehiro, Abderrahmane Kheddar, Gaelle Lannuzel, Jacqueline Ford Morie, Patrick Naughton, Steve NGuyen, Paul Oh, Taskin Padir, Jim Pippine, Jaeheung Park, Daniele Pucci, Jean Vaz , et al. (3 additional authors not shown)

    Abstract: The ANA Avatar XPRIZE was a four-year competition to develop a robotic "avatar" system to allow a human operator to sense, communicate, and act in a remote environment as though physically present. The competition featured a unique requirement that judges would operate the avatars after less than one hour of training on the human-machine interfaces, and avatar systems were judged on both objective… ▽ More

    Submitted 10 January, 2024; originally announced January 2024.

    Comments: 26 pages, preprint of article appearing in International Journal of Social Robotics

  24. arXiv:2312.17510  [pdf, other

    cs.SE

    Testing Database Engines via Query Plan Guidance

    Authors: **sheng Ba, Manuel Rigger

    Abstract: Database systems are widely used to store and query data. Test oracles have been proposed to find logic bugs in such systems, that is, bugs that cause the database system to compute an incorrect result. To realize a fully automated testing approach, such test oracles are paired with a test case generation technique; a test case refers to a database state and a query on which the test oracle can be… ▽ More

    Submitted 29 December, 2023; originally announced December 2023.

    Comments: ACM SIGSOFT Distinguished Paper Award in The 45th International Conference on Software Engineering (ICSE 2023)

  25. arXiv:2312.11459  [pdf, other

    cs.CV

    VolumeDiffusion: Flexible Text-to-3D Generation with Efficient Volumetric Encoder

    Authors: Zhicong Tang, Shuyang Gu, Chunyu Wang, Ting Zhang, Jianmin Bao, Dong Chen, Baining Guo

    Abstract: This paper introduces a pioneering 3D volumetric encoder designed for text-to-3D generation. To scale up the training data for the diffusion model, a lightweight network is developed to efficiently acquire feature volumes from multi-view images. The 3D volumes are then trained on a diffusion model for text-to-3D generation using a 3D U-Net. This research further addresses the challenges of inaccur… ▽ More

    Submitted 28 April, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  26. arXiv:2312.08985  [pdf, other

    cs.CV

    OMG: Towards Open-vocabulary Motion Generation via Mixture of Controllers

    Authors: Han Liang, Jiacheng Bao, Ruichi Zhang, Sihan Ren, Yuecheng Xu, Sibei Yang, Xin Chen, **gyi Yu, Lan Xu

    Abstract: We have recently seen tremendous progress in realistic text-to-motion generation. Yet, the existing methods often fail or produce implausible motions with unseen text inputs, which limits the applications. In this paper, we present OMG, a novel framework, which enables compelling motion generation from zero-shot open-vocabulary text prompts. Our key idea is to carefully tailor the pretrain-then-fi… ▽ More

    Submitted 19 March, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

    Comments: accepted by CVPR 2024

  27. arXiv:2312.04528  [pdf, other

    cs.LG cs.AI

    Using Large Language Models for Hyperparameter Optimization

    Authors: Michael R. Zhang, Nishkrit Desai, Juhan Bae, Jonathan Lorraine, Jimmy Ba

    Abstract: This paper studies using foundational large language models (LLMs) to make decisions during hyperparameter optimization (HPO). Empirical evaluations demonstrate that in settings with constrained search budgets, LLMs can perform comparably or better than traditional HPO methods like random search and Bayesian optimization on standard benchmarks. Furthermore, we propose to treat the code specifying… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

    Comments: 29 pages

  28. arXiv:2312.02728  [pdf, other

    cs.NI

    Overview of RIS-Enabled Secure Transmission in 6G Wireless Networks

    Authors: JungSook Bae, Waqas Khalid, Anseok Lee, Heesoo Lee, Song Noh, Heejung Yu

    Abstract: As sixth-generation (6G) wireless communication networks evolve, privacy concerns are expected due to the transmission of vast amounts of security-sensitive private information. In this context, a reconfigurable intelligent surface (RIS) emerges as a promising technology capable of enhancing transmission efficiency and strengthening information security. This study demonstrates how RISs can play a… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

    Comments: Accepted for Digital Communications and Networks(DCN)

  29. arXiv:2312.02520  [pdf, other

    cs.CV

    Towards More Unified In-context Visual Understanding

    Authors: Dianmo Sheng, Dongdong Chen, Zhentao Tan, Qiankun Liu, Qi Chu, Jianmin Bao, Tao Gong, Bin Liu, Shengwei Xu, Nenghai Yu

    Abstract: The rapid advancement of large language models (LLMs) has accelerated the emergence of in-context learning (ICL) as a cutting-edge approach in the natural language processing domain. Recently, ICL has been employed in visual understanding tasks, such as semantic segmentation and image captioning, yielding promising results. However, existing visual ICL framework can not enable producing content ac… ▽ More

    Submitted 16 March, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

    Comments: Accepted by CVPR 2024

  30. arXiv:2311.18834  [pdf, other

    cs.CV

    ART$\boldsymbol{\cdot}$V: Auto-Regressive Text-to-Video Generation with Diffusion Models

    Authors: Wenming Weng, Ruoyu Feng, Yanhui Wang, Qi Dai, Chunyu Wang, Dacheng Yin, Zhiyuan Zhao, Kai Qiu, Jianmin Bao, Yuhui Yuan, Chong Luo, Yueyi Zhang, Zhiwei Xiong

    Abstract: We present ART$\boldsymbol{\cdot}$V, an efficient framework for auto-regressive video generation with diffusion models. Unlike existing methods that generate entire videos in one-shot, ART$\boldsymbol{\cdot}$V generates a single frame at a time, conditioned on the previous ones. The framework offers three distinct advantages. First, it only learns simple continual motions between adjacent frames,… ▽ More

    Submitted 30 November, 2023; originally announced November 2023.

    Comments: 24 pages, 21 figures. Project page at https://warranweng.github.io/art.v

  31. arXiv:2311.18829  [pdf, other

    cs.CV

    MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation

    Authors: Yanhui Wang, Jianmin Bao, Wenming Weng, Ruoyu Feng, Dacheng Yin, Tao Yang, **gxu Zhang, Qi Dai Zhiyuan Zhao, Chunyu Wang, Kai Qiu, Yuhui Yuan, Chuanxin Tang, Xiaoyan Sun, Chong Luo, Baining Guo

    Abstract: We present MicroCinema, a straightforward yet effective framework for high-quality and coherent text-to-video generation. Unlike existing approaches that align text prompts with video directly, MicroCinema introduces a Divide-and-Conquer strategy which divides the text-to-video into a two-stage process: text-to-image generation and image\&text-to-video generation. This strategy offers two signific… ▽ More

    Submitted 29 December, 2023; v1 submitted 30 November, 2023; originally announced November 2023.

    Comments: Project page: https://wangyanhui666.github.io/MicroCinema.github.io/

  32. arXiv:2311.16401  [pdf, ps, other

    quant-ph cs.DS

    On the quantum time complexity of divide and conquer

    Authors: Jonathan Allcock, **ge Bao, Aleksandrs Belovs, Troy Lee, Miklos Santha

    Abstract: We initiate a systematic study of the time complexity of quantum divide and conquer algorithms for classical problems. We establish generic conditions under which search and minimization problems with classical divide and conquer algorithms are amenable to quantum speedup and apply these theorems to an array of problems involving strings, integers, and geometric objects. They include LONGEST DISTI… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

    Comments: 48 pages, accepted to QIP 2024

  33. arXiv:2311.15615  [pdf, other

    cs.CV

    Technical Report for Argoverse Challenges on Unified Sensor-based Detection, Tracking, and Forecasting

    Authors: Zhepeng Wang, Feng Chen, Kanokphan Lertniphonphan, Siwei Chen, **yao Bao, Pengfei Zheng, **bao Zhang, Kaer Huang, Tao Zhang

    Abstract: This report presents our Le3DE2E solution for unified sensor-based detection, tracking, and forecasting in Argoverse Challenges at CVPR 2023 Workshop on Autonomous Driving (WAD). We propose a unified network that incorporates three tasks, including detection, tracking, and forecasting. This solution adopts a strong Bird's Eye View (BEV) encoder with spatial and temporal fusion and generates unifie… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

  34. arXiv:2311.04496  [pdf, other

    cs.CV

    PersonMAE: Person Re-Identification Pre-Training with Masked AutoEncoders

    Authors: Hezhen Hu, Xiaoyi Dong, Jianmin Bao, Dongdong Chen, Lu Yuan, Dong Chen, Houqiang Li

    Abstract: Pre-training is playing an increasingly important role in learning generic feature representation for Person Re-identification (ReID). We argue that a high-quality ReID representation should have three properties, namely, multi-level awareness, occlusion robustness, and cross-region invariance. To this end, we propose a simple yet effective pre-training framework, namely PersonMAE, which involves… ▽ More

    Submitted 8 November, 2023; originally announced November 2023.

  35. arXiv:2310.19264  [pdf, other

    cs.MM cs.SD eess.AS

    Sound of Story: Multi-modal Storytelling with Audio

    Authors: Jaeyeon Bae, Seokhoon Jeong, Seokun Kang, Namgi Han, Jae-Yon Lee, Hyounghun Kim, Taehwan Kim

    Abstract: Storytelling is multi-modal in the real world. When one tells a story, one may use all of the visualizations and sounds along with the story itself. However, prior studies on storytelling datasets and tasks have paid little attention to sound even though sound also conveys meaningful semantics of the story. Therefore, we propose to extend story understanding and telling areas by establishing a new… ▽ More

    Submitted 30 October, 2023; originally announced October 2023.

    Comments: Findings of EMNLP 2023, project: https://github.com/Sosdatasets/SoS_Dataset/

  36. arXiv:2310.13356   

    cs.CV

    Sync-NeRF: Generalizing Dynamic NeRFs to Unsynchronized Videos

    Authors: Seoha Kim, Jeongmin Bae, Youngsik Yun, Hahyun Lee, Gun Bang, Youngjung Uh

    Abstract: Recent advancements in 4D scene reconstruction using neural radiance fields (NeRF) have demonstrated the ability to represent dynamic scenes from multi-view videos. However, they fail to reconstruct the dynamic scenes and struggle to fit even the training views in unsynchronized settings. It happens because they employ a single latent embedding for a frame while the multi-view images at the same f… ▽ More

    Submitted 21 May, 2024; v1 submitted 20 October, 2023; originally announced October 2023.

    Comments: I need to revise the text (it takes more than a month)

  37. arXiv:2310.06786  [pdf, other

    cs.AI cs.CL cs.LG

    OpenWebMath: An Open Dataset of High-Quality Mathematical Web Text

    Authors: Keiran Paster, Marco Dos Santos, Zhangir Azerbayev, Jimmy Ba

    Abstract: There is growing evidence that pretraining on high quality, carefully thought-out tokens such as code or mathematics plays an important role in improving the reasoning abilities of large language models. For example, Minerva, a PaLM model finetuned on billions of tokens of mathematical documents from arXiv and the web, reported dramatically improved performance on problems that require quantitativ… ▽ More

    Submitted 10 October, 2023; originally announced October 2023.

  38. arXiv:2310.05366  [pdf, other

    cs.CV

    Rotation Matters: Generalized Monocular 3D Object Detection for Various Camera Systems

    Authors: SungHo Moon, **Woo Bae, SungHoon Im

    Abstract: Research on monocular 3D object detection is being actively studied, and as a result, performance has been steadily improving. However, 3D object detection performance is significantly reduced when applied to a camera system different from the system used to capture the training datasets. For example, a 3D detector trained on datasets from a passenger car mostly fails to regress accurate 3D boundi… ▽ More

    Submitted 8 October, 2023; originally announced October 2023.

    Comments: Accepted to CVPRw 2023

  39. arXiv:2310.04846  [pdf, other

    cs.RO

    Soft finger rotational stability for precision grasps

    Authors: Hun Jang, Valentyn Petrichenko, Joonbum Bae, Kevin Haninger

    Abstract: Soft robotic fingers can safely grasp fragile or variable form objects, but their force capacity is limited, especially with less contact area: precision grasps and when objects are smaller or not spherical. Current research is improving force capacity through mechanical design by increasing contact area or stiffness, typically without models which explain soft finger force limitations. To address… ▽ More

    Submitted 24 March, 2024; v1 submitted 7 October, 2023; originally announced October 2023.

    Comments: Submitted IROS24

  40. arXiv:2310.00839  [pdf, other

    cs.LG stat.CO stat.ML

    Subsurface Characterization using Ensemble-based Approaches with Deep Generative Models

    Authors: Jichao Bao, Hongkyu Yoon, Jonghyun Lee

    Abstract: Estimating spatially distributed properties such as hydraulic conductivity (K) from available sparse measurements is a great challenge in subsurface characterization. However, the use of inverse modeling is limited for ill-posed, high-dimensional applications due to computational costs and poor prediction accuracy with sparse datasets. In this paper, we combine Wasserstein Generative Adversarial N… ▽ More

    Submitted 9 October, 2023; v1 submitted 1 October, 2023; originally announced October 2023.

  41. arXiv:2309.16496  [pdf, other

    cs.CV

    CCEdit: Creative and Controllable Video Editing via Diffusion Models

    Authors: Ruoyu Feng, Wenming Weng, Yanhui Wang, Yuhui Yuan, Jianmin Bao, Chong Luo, Zhibo Chen, Baining Guo

    Abstract: In this paper, we present CCEdit, a versatile generative video editing framework based on diffusion models. Our approach employs a novel trident network structure that separates structure and appearance control, ensuring precise and creative editing capabilities. Utilizing the foundational ControlNet architecture, we maintain the structural integrity of the video during editing. The incorporation… ▽ More

    Submitted 6 April, 2024; v1 submitted 28 September, 2023; originally announced September 2023.

  42. arXiv:2309.15817  [pdf, other

    cs.AI cs.CL cs.LG

    Identifying the Risks of LM Agents with an LM-Emulated Sandbox

    Authors: Yangjun Ruan, Honghua Dong, Andrew Wang, Silviu Pitis, Yongchao Zhou, Jimmy Ba, Yann Dubois, Chris J. Maddison, Tatsunori Hashimoto

    Abstract: Recent advances in Language Model (LM) agents and tool use, exemplified by applications like ChatGPT Plugins, enable a rich set of capabilities but also amplify potential risks - such as leaking private data or causing financial losses. Identifying these risks is labor-intensive, necessitating implementing the tools, setting up the environment for each test scenario manually, and finding risky cas… ▽ More

    Submitted 17 May, 2024; v1 submitted 25 September, 2023; originally announced September 2023.

  43. arXiv:2309.11319  [pdf, other

    cs.LG

    WFTNet: Exploiting Global and Local Periodicity in Long-term Time Series Forecasting

    Authors: Peiyuan Liu, Beiliang Wu, Naiqi Li, Tao Dai, Fengmao Lei, Jigang Bao, Yong Jiang, Shu-Tao Xia

    Abstract: Recent CNN and Transformer-based models tried to utilize frequency and periodicity information for long-term time series forecasting. However, most existing work is based on Fourier transform, which cannot capture fine-grained and local frequency structure. In this paper, we propose a Wavelet-Fourier Transform Network (WFTNet) for long-term time series forecasting. WFTNet utilizes both Fourier and… ▽ More

    Submitted 4 January, 2024; v1 submitted 20 September, 2023; originally announced September 2023.

    Comments: Accepted to ICASSP 2024

  44. arXiv:2309.03895  [pdf, other

    cs.CV

    InstructDiffusion: A Generalist Modeling Interface for Vision Tasks

    Authors: Zigang Geng, Binxin Yang, Tiankai Hang, Chen Li, Shuyang Gu, Ting Zhang, Jianmin Bao, Zheng Zhang, Han Hu, Dong Chen, Baining Guo

    Abstract: We present InstructDiffusion, a unifying and generic framework for aligning computer vision tasks with human instructions. Unlike existing approaches that integrate prior knowledge and pre-define the output space (e.g., categories and coordinates) for each vision task, we cast diverse vision tasks into a human-intuitive image-manipulating process whose output space is a flexible and interactive pi… ▽ More

    Submitted 7 September, 2023; originally announced September 2023.

  45. arXiv:2308.15939  [pdf, other

    cs.CV

    Bootstrap Fine-Grained Vision-Language Alignment for Unified Zero-Shot Anomaly Localization

    Authors: Hanqiu Deng, Zhaoxiang Zhang, **an Bao, Xingyu Li

    Abstract: Contrastive Language-Image Pre-training (CLIP) models have shown promising performance on zero-shot visual recognition tasks by learning visual representations under natural language supervision. Recent studies attempt the use of CLIP to tackle zero-shot anomaly detection by matching images with normal and abnormal state prompts. However, since CLIP focuses on building correspondence between paire… ▽ More

    Submitted 26 February, 2024; v1 submitted 30 August, 2023; originally announced August 2023.

  46. arXiv:2308.11257  [pdf, other

    cs.CL

    HopPG: Self-Iterative Program Generation for Multi-Hop Question Answering over Heterogeneous Knowledge

    Authors: Yingyao Wang, Yongwei Zhou, Chaoqun Duan, Junwei Bao, Tiejun Zhao

    Abstract: The semantic parsing-based method is an important research branch for knowledge-based question answering. It usually generates executable programs lean upon the question and then conduct them to reason answers over a knowledge base. Benefit from this inherent mechanism, it has advantages in the performance and the interpretability. However, traditional semantic parsing methods usually generate a c… ▽ More

    Submitted 10 September, 2023; v1 submitted 22 August, 2023; originally announced August 2023.

  47. arXiv:2308.08539  [pdf, other

    quant-ph cs.CC cs.ET

    Constant-depth circuits for Uniformly Controlled Gates and Boolean functions with application to quantum memory circuits

    Authors: Jonathan Allcock, **ge Bao, João F. Doriguello, Alessandro Luongo, Miklos Santha

    Abstract: We explore the power of the unbounded Fan-Out gate and the Global Tunable gates generated by Ising-type Hamiltonians in constructing constant-depth quantum circuits, with particular attention to quantum memory devices. We propose two types of constant-depth constructions for implementing Uniformly Controlled Gates. These gates include the Fan-In gates defined by… ▽ More

    Submitted 14 December, 2023; v1 submitted 16 August, 2023; originally announced August 2023.

    Comments: 52 pages, 11 figures. v2: corrected typos, added one figure and references

  48. arXiv:2308.03296  [pdf, other

    cs.LG cs.CL stat.ML

    Studying Large Language Model Generalization with Influence Functions

    Authors: Roger Grosse, Juhan Bae, Cem Anil, Nelson Elhage, Alex Tamkin, Amirhossein Tajdini, Benoit Steiner, Dustin Li, Esin Durmus, Ethan Perez, Evan Hubinger, Kamilė Lukošiūtė, Karina Nguyen, Nicholas Joseph, Sam McCandlish, Jared Kaplan, Samuel R. Bowman

    Abstract: When trying to gain better visibility into a machine learning model in order to understand and mitigate the associated risks, a potentially valuable source of evidence is: which training examples most contribute to a given behavior? Influence functions aim to answer a counterfactual: how would the model's parameters (and hence its outputs) change if a given sequence were added to the training set?… ▽ More

    Submitted 7 August, 2023; originally announced August 2023.

    Comments: 119 pages, 47 figures, 22 tables

  49. arXiv:2307.14628  [pdf, other

    cs.LG stat.ME

    Rapid and Scalable Bayesian AB Testing

    Authors: Srivas Chennu, Andrew Maher, Christian Pangerl, Subash Prabanantham, Jae Hyeon Bae, Jamie Martin, Bud Goswami

    Abstract: AB testing aids business operators with their decision making, and is considered the gold standard method for learning from data to improve digital user experiences. However, there is usually a gap between the requirements of practitioners, and the constraints imposed by the statistical hypothesis testing methodologies commonly used for analysis of AB tests. These include the lack of statistical p… ▽ More

    Submitted 27 July, 2023; originally announced July 2023.

    Comments: The 10th IEEE International Conference On Data Science And Advanced Analytics

  50. arXiv:2307.08317  [pdf, other

    cs.CV

    AltFreezing for More General Video Face Forgery Detection

    Authors: Zhendong Wang, Jianmin Bao, Wengang Zhou, Weilun Wang, Houqiang Li

    Abstract: Existing face forgery detection models try to discriminate fake images by detecting only spatial artifacts (e.g., generative artifacts, blending) or mainly temporal artifacts (e.g., flickering, discontinuity). They may experience significant performance degradation when facing out-domain artifacts. In this paper, we propose to capture both spatial and temporal artifacts in one model for face forge… ▽ More

    Submitted 17 July, 2023; originally announced July 2023.

    Comments: Accepted by CVPR 2023 Highlight, code and models are available at https: //github.com/ZhendongWang6/AltFreezing