Skip to main content

Showing 1–10 of 10 results for author: Kondratyuk, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.13195  [pdf, other

    cs.CV cs.AI

    CamViG: Camera Aware Image-to-Video Generation with Multimodal Transformers

    Authors: Andrew Marmon, Grant Schindler, José Lezama, Dan Kondratyuk, Bryan Seybold, Irfan Essa

    Abstract: We extend multimodal transformers to include 3D camera motion as a conditioning signal for the task of video generation. Generative video models are becoming increasingly powerful, thus focusing research efforts on methods of controlling the output of such models. We propose to add virtual 3D camera controls to generative video methods by conditioning generated video on an encoding of three-dimens… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  2. arXiv:2312.14125  [pdf, other

    cs.CV cs.AI

    VideoPoet: A Large Language Model for Zero-Shot Video Generation

    Authors: Dan Kondratyuk, Lijun Yu, Xiuye Gu, José Lezama, Jonathan Huang, Grant Schindler, Rachel Hornung, Vighnesh Birodkar, Jimmy Yan, Ming-Chang Chiu, Krishna Somandepalli, Hassan Akbari, Yair Alon, Yong Cheng, Josh Dillon, Agrim Gupta, Meera Hahn, Anja Hauth, David Hendon, Alonso Martinez, David Minnen, Mikhail Sirotenko, Kihyuk Sohn, Xuan Yang, Hartwig Adam , et al. (6 additional authors not shown)

    Abstract: We present VideoPoet, a language model capable of synthesizing high-quality video, with matching audio, from a large variety of conditioning signals. VideoPoet employs a decoder-only transformer architecture that processes multimodal inputs -- including images, videos, text, and audio. The training protocol follows that of Large Language Models (LLMs), consisting of two stages: pretraining and tas… ▽ More

    Submitted 4 June, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

    Comments: To appear at ICML 2024; Project page: http://sites.research.google/videopoet/

  3. arXiv:2305.06324  [pdf, other

    cs.CV cs.AI cs.LG cs.MM eess.IV

    Alternating Gradient Descent and Mixture-of-Experts for Integrated Multimodal Perception

    Authors: Hassan Akbari, Dan Kondratyuk, Yin Cui, Rachel Hornung, Huisheng Wang, Hartwig Adam

    Abstract: We present Integrated Multimodal Perception (IMP), a simple and scalable multimodal multi-task training and modeling approach. IMP integrates multimodal inputs including image, video, text, and audio into a single Transformer encoder with minimal modality-specific components. IMP makes use of a novel design that combines Alternating Gradient Descent (AGD) and Mixture-of-Experts (MoE) for efficient… ▽ More

    Submitted 11 December, 2023; v1 submitted 10 May, 2023; originally announced May 2023.

  4. arXiv:2112.07074  [pdf, other

    cs.CV cs.LG

    Towards a Unified Foundation Model: Jointly Pre-Training Transformers on Unpaired Images and Text

    Authors: Qing Li, Boqing Gong, Yin Cui, Dan Kondratyuk, Xianzhi Du, Ming-Hsuan Yang, Matthew Brown

    Abstract: In this paper, we explore the possibility of building a unified foundation model that can be adapted to both vision-only and text-only tasks. Starting from BERT and ViT, we design a unified transformer consisting of modality-specific tokenizers, a shared transformer encoder, and task-specific output heads. To efficiently pre-train the proposed model jointly on unpaired images and text, we propose… ▽ More

    Submitted 13 December, 2021; originally announced December 2021.

    Comments: preliminary work

  5. arXiv:2103.11511  [pdf, other

    cs.CV cs.AI cs.LG

    MoViNets: Mobile Video Networks for Efficient Video Recognition

    Authors: Dan Kondratyuk, Liangzhe Yuan, Yandong Li, Li Zhang, Mingxing Tan, Matthew Brown, Boqing Gong

    Abstract: We present Mobile Video Networks (MoViNets), a family of computation and memory efficient video networks that can operate on streaming video for online inference. 3D convolutional neural networks (CNNs) are accurate at video recognition but require large computation and memory budgets and do not support online inference, making them difficult to work on mobile devices. We propose a three-step appr… ▽ More

    Submitted 18 April, 2021; v1 submitted 21 March, 2021; originally announced March 2021.

    Comments: Accepted to CVPR 2021

  6. arXiv:2012.01988  [pdf, other

    cs.CV

    Wisdom of Committees: An Overlooked Approach To Faster and More Accurate Models

    Authors: Xiaofang Wang, Dan Kondratyuk, Eric Christiansen, Kris M. Kitani, Yair Alon, Elad Eban

    Abstract: Committee-based models (ensembles or cascades) construct models by combining existing pre-trained ones. While ensembles and cascades are well-known techniques that were proposed before deep learning, they are not considered a core building block of deep model architectures and are rarely compared to in recent literature on develo** efficient models. In this work, we go back to basics and conduct… ▽ More

    Submitted 17 February, 2022; v1 submitted 3 December, 2020; originally announced December 2020.

    Comments: ICLR 2022

  7. arXiv:2005.00570  [pdf, ps, other

    cs.LG cs.CV stat.ML

    When Ensembling Smaller Models is More Efficient than Single Large Models

    Authors: Dan Kondratyuk, Mingxing Tan, Matthew Brown, Boqing Gong

    Abstract: Ensembling is a simple and popular technique for boosting evaluation performance by training multiple models (e.g., with different initializations) and aggregating their predictions. This approach is commonly reserved for the largest models, as it is commonly held that increasing the model size provides a more substantial reduction in error than ensembling smaller models. However, we show results… ▽ More

    Submitted 1 May, 2020; originally announced May 2020.

  8. arXiv:1904.02099  [pdf, other

    cs.CL cs.LG

    75 Languages, 1 Model: Parsing Universal Dependencies Universally

    Authors: Dan Kondratyuk, Milan Straka

    Abstract: We present UDify, a multilingual multi-task model capable of accurately predicting universal part-of-speech, morphological features, lemmas, and dependency trees simultaneously for all 124 Universal Dependencies treebanks across 75 languages. By leveraging a multilingual BERT self-attention model pretrained on 104 languages, we found that fine-tuning it on all datasets concatenated together with s… ▽ More

    Submitted 25 August, 2019; v1 submitted 3 April, 2019; originally announced April 2019.

    Comments: Accepted for publication at EMNLP 2019. 17 pages, 6 figures

  9. arXiv:1808.03703  [pdf, other

    cs.CL cs.LG cs.NE

    LemmaTag: Jointly Tagging and Lemmatizing for Morphologically-Rich Languages with BRNNs

    Authors: Daniel Kondratyuk, Tomáš Gavenčiak, Milan Straka, Jan Hajič

    Abstract: We present LemmaTag, a featureless neural network architecture that jointly generates part-of-speech tags and lemmas for sentences by using bidirectional RNNs with character-level and word-level embeddings. We demonstrate that both tasks benefit from sharing the encoding part of the network, predicting tag subcategories, and using the tagger output as an input to the lemmatizer. We evaluate our mo… ▽ More

    Submitted 27 August, 2018; v1 submitted 10 August, 2018; originally announced August 2018.

    Comments: 8 pages, 3 figures. Submitted to EMNLP 2018

  10. arXiv:1802.00230  [pdf

    cs.DB cs.CR

    Integrity Coded Databases: An Evaluation of Performance, Efficiency, and Practicality

    Authors: Dan Kondratyuk, Jake Rodden, Elmer Duran

    Abstract: In recent years, cloud database storage has become an inexpensive and convenient option for businesses and individuals to store information. While its positive aspects make the cloud extremely attractive for data storage, it is a relatively new area of service, making it vulnerable to cyber-attacks and security breaches. Storing data in a foreign location also requires the owner to relinquish cont… ▽ More

    Submitted 1 February, 2018; originally announced February 2018.

    Comments: 11 pages, 7 figures. Research Experience for Undergraduates in Software Security, Boise State University, July 2015