Skip to main content

Showing 1–11 of 11 results for author: Fifty, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2310.10971  [pdf, other

    cs.LG cs.CV

    Context-Aware Meta-Learning

    Authors: Christopher Fifty, Dennis Duan, Ronald G. Junkins, Ehsan Amid, Jure Leskovec, Christopher Re, Sebastian Thrun

    Abstract: Large Language Models like ChatGPT demonstrate a remarkable capacity to learn new concepts during inference without any fine-tuning. However, visual models trained to detect new objects during inference have been unable to replicate this ability, and instead either perform poorly or require meta-training and/or fine-tuning on similar objects. In this work, we propose a meta-learning algorithm that… ▽ More

    Submitted 25 March, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

    Comments: ICLR 2024

  2. arXiv:2310.08863  [pdf, other

    cs.LG

    In-Context Learning for Few-Shot Molecular Property Prediction

    Authors: Christopher Fifty, Jure Leskovec, Sebastian Thrun

    Abstract: In-context learning has become an important approach for few-shot learning in Large Language Models because of its ability to rapidly adapt to new tasks without fine-tuning model parameters. However, it is restricted to applications in natural language and inapplicable to other domains. In this paper, we adapt the concepts underpinning in-context learning to develop a new algorithm for few-shot mo… ▽ More

    Submitted 13 October, 2023; originally announced October 2023.

  3. arXiv:2302.02055  [pdf, other

    cs.LG

    Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular Property Prediction

    Authors: Christopher Fifty, Joseph M. Paggi, Ehsan Amid, Jure Leskovec, Ron Dror

    Abstract: Few-shot learning is a promising approach to molecular property prediction as supervised data is often very limited. However, many important molecular properties depend on complex molecular characteristics -- such as the various 3D geometries a molecule may adopt or the types of chemical interactions it can form -- that are not explicitly encoded in the feature space and must be approximated from… ▽ More

    Submitted 6 October, 2023; v1 submitted 3 February, 2023; originally announced February 2023.

  4. arXiv:2209.07080  [pdf, other

    cs.LG

    Layerwise Bregman Representation Learning with Applications to Knowledge Distillation

    Authors: Ehsan Amid, Rohan Anil, Christopher Fifty, Manfred K. Warmuth

    Abstract: In this work, we propose a novel approach for layerwise representation learning of a trained neural network. In particular, we form a Bregman divergence based on the layer's transfer function and construct an extension of the original Bregman PCA formulation by incorporating a mean vector and normalizing the principal directions with respect to the geometry of the local convex function around the… ▽ More

    Submitted 15 September, 2022; originally announced September 2022.

  5. arXiv:2207.06366  [pdf, other

    cs.CL cs.LG

    N-Grammer: Augmenting Transformers with latent n-grams

    Authors: Aurko Roy, Rohan Anil, Guangda Lai, Benjamin Lee, Jeffrey Zhao, Shuyuan Zhang, Shibo Wang, Ye Zhang, Shen Wu, Rigel Swavely, Tao, Yu, Phuong Dao, Christopher Fifty, Zhifeng Chen, Yonghui Wu

    Abstract: Transformer models have recently emerged as one of the foundational models in natural language processing, and as a byproduct, there is significant recent interest and investment in scaling these models. However, the training and inference costs of these large Transformer language models are prohibitive, thus necessitating more research in identifying more efficient variants. In this work, we prop… ▽ More

    Submitted 13 July, 2022; originally announced July 2022.

    Comments: 8 pages, 2 figures

  6. arXiv:2202.00145  [pdf, other

    cs.LG

    Step-size Adaptation Using Exponentiated Gradient Updates

    Authors: Ehsan Amid, Rohan Anil, Christopher Fifty, Manfred K. Warmuth

    Abstract: Optimizers like Adam and AdaGrad have been very successful in training large-scale neural networks. Yet, the performance of these methods is heavily dependent on a carefully tuned learning rate schedule. We show that in many large-scale applications, augmenting a given optimizer with an adaptive tuning method of the step-size greatly improves the performance. More precisely, we maintain a global s… ▽ More

    Submitted 31 January, 2022; originally announced February 2022.

  7. arXiv:2112.07175  [pdf, other

    cs.CV

    Co-training Transformer with Videos and Images Improves Action Recognition

    Authors: Bowen Zhang, Jiahui Yu, Christopher Fifty, Wei Han, Andrew M. Dai, Ruoming Pang, Fei Sha

    Abstract: In learning action recognition, models are typically pre-trained on object recognition with images, such as ImageNet, and later fine-tuned on target action recognition with videos. This approach has achieved good empirical performance especially with recent transformer-based video architectures. While recently many works aim to design more advanced transformer architectures for action recognition,… ▽ More

    Submitted 14 December, 2021; originally announced December 2021.

  8. arXiv:2109.04617  [pdf, other

    cs.LG cs.AI cs.CV

    Efficiently Identifying Task Grou**s for Multi-Task Learning

    Authors: Christopher Fifty, Ehsan Amid, Zhe Zhao, Tianhe Yu, Rohan Anil, Chelsea Finn

    Abstract: Multi-task learning can leverage information learned by one task to benefit the training of other tasks. Despite this capacity, naively training all tasks together in one model often degrades performance, and exhaustively searching through combinations of task grou**s can be prohibitively expensive. As a result, efficiently identifying the tasks that would benefit from training together remains… ▽ More

    Submitted 25 October, 2021; v1 submitted 9 September, 2021; originally announced September 2021.

    Comments: In NeurIPS 2021 (spotlight). Code is available at https://github.com/google-research/google-research/tree/master/tag

  9. arXiv:2010.15413  [pdf, other

    cs.LG cs.AI cs.CV cs.RO

    Measuring and Harnessing Transference in Multi-Task Learning

    Authors: Christopher Fifty, Ehsan Amid, Zhe Zhao, Tianhe Yu, Rohan Anil, Chelsea Finn

    Abstract: Multi-task learning can leverage information learned by one task to benefit the training of other tasks. Despite this capacity, naive formulations often degrade performance and in particular, identifying the tasks that would benefit from co-training remains a challenging design question. In this paper, we analyze the dynamics of information transfer, or transference, across tasks throughout traini… ▽ More

    Submitted 10 September, 2021; v1 submitted 29 October, 2020; originally announced October 2020.

  10. arXiv:2008.05808  [pdf, other

    cs.LG stat.ML

    Small Towers Make Big Differences

    Authors: Yuyan Wang, Zhe Zhao, Bo Dai, Christopher Fifty, Dong Lin, Lichan Hong, Ed H. Chi

    Abstract: Multi-task learning aims at solving multiple machine learning tasks at the same time. A good solution to a multi-task learning problem should be generalizable in addition to being Pareto optimal. In this paper, we provide some insights on understanding the trade-off between Pareto efficiency and generalization as a result of parameterization in multi-task deep learning models. As a multi-objective… ▽ More

    Submitted 13 August, 2020; originally announced August 2020.

  11. arXiv:1902.07153  [pdf, other

    cs.LG stat.ML

    Simplifying Graph Convolutional Networks

    Authors: Felix Wu, Tianyi Zhang, Amauri Holanda de Souza Jr., Christopher Fifty, Tao Yu, Kilian Q. Weinberger

    Abstract: Graph Convolutional Networks (GCNs) and their variants have experienced significant attention and have become the de facto methods for learning graph representations. GCNs derive inspiration primarily from recent deep learning approaches, and as a result, may inherit unnecessary complexity and redundant computation. In this paper, we reduce this excess complexity through successively removing nonl… ▽ More

    Submitted 20 June, 2019; v1 submitted 19 February, 2019; originally announced February 2019.

    Comments: In ICML 2019. Code available at https://github.com/Tiiiger/SGC