Skip to main content

Showing 1–2 of 2 results for author: Jose, A M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2309.17425  [pdf, other

    cs.AI cs.LG

    Data Filtering Networks

    Authors: Alex Fang, Albin Madappally Jose, Amit Jain, Ludwig Schmidt, Alexander Toshev, Vaishaal Shankar

    Abstract: Large training sets have become a cornerstone of machine learning and are the foundation for recent advances in language modeling and multimodal learning. While data curation for pre-training is often still ad-hoc, one common paradigm is to first collect a massive pool of data from the Web and then filter this candidate pool down to an actual training set via various heuristics. In this work, we s… ▽ More

    Submitted 5 November, 2023; v1 submitted 29 September, 2023; originally announced September 2023.

  2. arXiv:2301.13081  [pdf, other

    cs.CV

    STAIR: Learning Sparse Text and Image Representation in Grounded Tokens

    Authors: Chen Chen, Bowen Zhang, Liangliang Cao, Jiguang Shen, Tom Gunter, Albin Madappally Jose, Alexander Toshev, Jonathon Shlens, Ruoming Pang, Yinfei Yang

    Abstract: Image and text retrieval is one of the foundational tasks in the vision and language domain with multiple real-world applications. State-of-the-art approaches, e.g. CLIP, ALIGN, represent images and texts as dense embeddings and calculate the similarity in the dense embedding space as the matching score. On the other hand, sparse semantic features like bag-of-words models are more interpretable, b… ▽ More

    Submitted 7 February, 2023; v1 submitted 30 January, 2023; originally announced January 2023.