Skip to main content

Showing 1–14 of 14 results for author: Hudson, D A

.
  1. arXiv:2406.09292  [pdf, other

    cs.CV cs.AI cs.LG

    Neural Assets: 3D-Aware Multi-Object Scene Synthesis with Image Diffusion Models

    Authors: Ziyi Wu, Yulia Rubanova, Rishabh Kabra, Drew A. Hudson, Igor Gilitschenski, Yusuf Aytar, Sjoerd van Steenkiste, Kelsey R. Allen, Thomas Kipf

    Abstract: We address the problem of multi-object 3D pose control in image diffusion models. Instead of conditioning on a sequence of text tokens, we propose to use a set of per-object representations, Neural Assets, to control the 3D pose of individual objects in a scene. Neural Assets are obtained by pooling visual representations of objects from a reference image, such as a frame in a video, and are train… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Additional details and video results are available at https://neural-assets-paper.github.io/

  2. arXiv:2404.10179  [pdf, other

    cs.RO cs.AI cs.HC cs.LG

    Scaling Instructable Agents Across Many Simulated Worlds

    Authors: SIMA Team, Maria Abi Raad, Arun Ahuja, Catarina Barros, Frederic Besse, Andrew Bolt, Adrian Bolton, Bethanie Brownfield, Gavin Buttimore, Max Cant, Sarah Chakera, Stephanie C. Y. Chan, Jeff Clune, Adrian Collister, Vikki Copeman, Alex Cullum, Ishita Dasgupta, Dario de Cesare, Julia Di Trapani, Yani Donchev, Emma Dunleavy, Martin Engelcke, Ryan Faulkner, Frankie Garcia, Charles Gbadamosi , et al. (68 additional authors not shown)

    Abstract: Building embodied AI systems that can follow arbitrary language instructions in any 3D environment is a key challenge for creating general AI. Accomplishing this goal requires learning to ground language in perception and embodied actions, in order to accomplish complex tasks. The Scalable, Instructable, Multiworld Agent (SIMA) project tackles this by training agents to follow free-form instructio… ▽ More

    Submitted 17 April, 2024; v1 submitted 13 March, 2024; originally announced April 2024.

  3. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  4. arXiv:2311.17901  [pdf, other

    cs.CV cs.AI cs.LG

    SODA: Bottleneck Diffusion Models for Representation Learning

    Authors: Drew A. Hudson, Daniel Zoran, Mateusz Malinowski, Andrew K. Lampinen, Andrew Jaegle, James L. McClelland, Loic Matthey, Felix Hill, Alexander Lerchner

    Abstract: We introduce SODA, a self-supervised diffusion model, designed for representation learning. The model incorporates an image encoder, which distills a source view into a compact representation, that, in turn, guides the generation of related novel views. We show that by imposing a tight bottleneck between the encoder and a denoising decoder, and leveraging novel view synthesis as a self-supervised… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

  5. arXiv:2211.09110  [pdf, other

    cs.CL cs.AI cs.LG

    Holistic Evaluation of Language Models

    Authors: Percy Liang, Rishi Bommasani, Tony Lee, Dimitris Tsipras, Dilara Soylu, Michihiro Yasunaga, Yian Zhang, Deepak Narayanan, Yuhuai Wu, Ananya Kumar, Benjamin Newman, Binhang Yuan, Bobby Yan, Ce Zhang, Christian Cosgrove, Christopher D. Manning, Christopher RĂ©, Diana Acosta-Navas, Drew A. Hudson, Eric Zelikman, Esin Durmus, Faisal Ladhak, Frieda Rong, Hongyu Ren, Huaxiu Yao , et al. (25 additional authors not shown)

    Abstract: Language models (LMs) are becoming the foundation for almost all major language technologies, but their capabilities, limitations, and risks are not well understood. We present Holistic Evaluation of Language Models (HELM) to improve the transparency of language models. First, we taxonomize the vast space of potential scenarios (i.e. use cases) and metrics (i.e. desiderata) that are of interest fo… ▽ More

    Submitted 1 October, 2023; v1 submitted 16 November, 2022; originally announced November 2022.

    Comments: Authored by the Center for Research on Foundation Models (CRFM) at the Stanford Institute for Human-Centered Artificial Intelligence (HAI). Project page: https://crfm.stanford.edu/helm/v1.0

    Journal ref: Published in Transactions on Machine Learning Research (TMLR), 2023

  6. arXiv:2111.08960  [pdf, other

    cs.CV cs.AI cs.LG

    Compositional Transformers for Scene Generation

    Authors: Drew A. Hudson, C. Lawrence Zitnick

    Abstract: We introduce the GANformer2 model, an iterative object-oriented transformer, explored for the task of generative modeling. The network incorporates strong and explicit structural priors, to reflect the compositional nature of visual scenes, and synthesizes images through a sequential process. It operates in two stages: a fast and lightweight planning phase, where we draft a high-level scene layout… ▽ More

    Submitted 17 November, 2021; originally announced November 2021.

    Comments: Published as a conference paper at NeurIPS 2021

  7. arXiv:2108.07258  [pdf, other

    cs.LG cs.AI cs.CY

    On the Opportunities and Risks of Foundation Models

    Authors: Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri Chatterji, Annie Chen, Kathleen Creel, Jared Quincy Davis, Dora Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, Stefano Ermon, John Etchemendy, Kawin Ethayarajh , et al. (89 additional authors not shown)

    Abstract: AI is undergoing a paradigm shift with the rise of models (e.g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks. We call these models foundation models to underscore their critically central yet incomplete character. This report provides a thorough account of the opportunities and risks of foundation models, ranging from their cap… ▽ More

    Submitted 12 July, 2022; v1 submitted 16 August, 2021; originally announced August 2021.

    Comments: Authored by the Center for Research on Foundation Models (CRFM) at the Stanford Institute for Human-Centered Artificial Intelligence (HAI). Report page with citation guidelines: https://crfm.stanford.edu/report.html

  8. arXiv:2106.03428  [pdf, other

    cs.LG

    Automation for Interpretable Machine Learning Through a Comparison of Loss Functions to Regularisers

    Authors: A. I. Parkes, J. Camilleri, D. A. Hudson, A. J. Sobey

    Abstract: To increase the ubiquity of machine learning it needs to be automated. Automation is cost-effective as it allows experts to spend less time tuning the approach, which leads to shorter development times. However, while this automation produces highly accurate architectures, they can be uninterpretable, acting as `black-boxes' which produce low conventional errors but fail to model the underlying in… ▽ More

    Submitted 7 June, 2021; originally announced June 2021.

    Comments: 11 pages, 5 figures, under review,

  9. arXiv:2105.01567  [pdf, other

    cs.LG

    Towards Error Measures which Influence a Learners Inductive Bias to the Ground Truth

    Authors: A. I. Parkes, A. J. Sobey, D. A. Hudson

    Abstract: Artificial intelligence is applied in a range of sectors, and is relied upon for decisions requiring a high level of trust. For regression methods, trust is increased if they approximate the true input-output relationships and perform accurately outside the bounds of the training data. But often performance off-test-set is poor, especially when data is sparse. This is because the conditional avera… ▽ More

    Submitted 4 May, 2021; originally announced May 2021.

  10. arXiv:2103.01209  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Generative Adversarial Transformers

    Authors: Drew A. Hudson, C. Lawrence Zitnick

    Abstract: We introduce the GANformer, a novel and efficient type of transformer, and explore it for the task of visual generative modeling. The network employs a bipartite structure that enables long-range interactions across the image, while maintaining computation of linear efficiency, that can readily scale to high-resolution synthesis. It iteratively propagates information from a set of latent variables… ▽ More

    Submitted 29 March, 2022; v1 submitted 1 March, 2021; originally announced March 2021.

    Comments: Published as a conference paper at ICML 2021

  11. arXiv:2010.16249  [pdf, other

    cs.CL cs.LG

    SLM: Learning a Discourse Language Representation with Sentence Unshuffling

    Authors: Haejun Lee, Drew A. Hudson, Kangwook Lee, Christopher D. Manning

    Abstract: We introduce Sentence-level Language Modeling, a new pre-training objective for learning a discourse language representation in a fully self-supervised manner. Recent pre-training methods in NLP focus on learning either bottom or top-level language representations: contextualized word representations derived from language model objectives at one extreme and a whole sequence representation learned… ▽ More

    Submitted 30 October, 2020; originally announced October 2020.

    Comments: EMNLP 2020

  12. arXiv:1907.03950  [pdf, other

    cs.AI cs.CL cs.CV cs.LG

    Learning by Abstraction: The Neural State Machine

    Authors: Drew A. Hudson, Christopher D. Manning

    Abstract: We introduce the Neural State Machine, seeking to bridge the gap between the neural and symbolic views of AI and integrate their complementary strengths for the task of visual reasoning. Given an image, we first predict a probabilistic graph that represents its underlying semantics and serves as a structured world model. Then, we perform sequential reasoning over the graph, iteratively traversing… ▽ More

    Submitted 25 November, 2019; v1 submitted 8 July, 2019; originally announced July 2019.

    Comments: Published as a conference paper at NeurIPS 2019 (spotlight)

  13. arXiv:1902.09506  [pdf, other

    cs.CL cs.AI cs.CV cs.LG

    GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering

    Authors: Drew A. Hudson, Christopher D. Manning

    Abstract: We introduce GQA, a new dataset for real-world visual reasoning and compositional question answering, seeking to address key shortcomings of previous VQA datasets. We have developed a strong and robust question engine that leverages scene graph structures to create 22M diverse reasoning questions, all come with functional programs that represent their semantics. We use the programs to gain tight c… ▽ More

    Submitted 10 May, 2019; v1 submitted 25 February, 2019; originally announced February 2019.

    Comments: Published as a conference paper at CVPR 2019 (oral)

  14. arXiv:1803.03067  [pdf, other

    cs.AI

    Compositional Attention Networks for Machine Reasoning

    Authors: Drew A. Hudson, Christopher D. Manning

    Abstract: We present the MAC network, a novel fully differentiable neural network architecture, designed to facilitate explicit and expressive reasoning. MAC moves away from monolithic black-box neural architectures towards a design that encourages both transparency and versatility. The model approaches problems by decomposing them into a series of attention-based reasoning steps, each performed by a novel… ▽ More

    Submitted 24 April, 2018; v1 submitted 8 March, 2018; originally announced March 2018.

    Comments: Published as a conference paper at ICLR 2018