Skip to main content

Showing 1–7 of 7 results for author: Tiong, A M H

.
  1. arXiv:2404.02415  [pdf, other

    cs.CV

    What Are We Measuring When We Evaluate Large Vision-Language Models? An Analysis of Latent Factors and Biases

    Authors: Anthony Meng Huat Tiong, Junqi Zhao, Boyang Li, Junnan Li, Steven C. H. Hoi, Caiming Xiong

    Abstract: Vision-language (VL) models, pretrained on colossal image-text datasets, have attained broad VL competence that is difficult to evaluate. A common belief is that a small number of VL skills underlie the variety of VL tests. In this paper, we perform a large-scale transfer learning experiment aimed at discovering latent VL skills from data. We reveal interesting characteristics that have important… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

  2. arXiv:2305.06500  [pdf, other

    cs.CV cs.LG

    InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning

    Authors: Wenliang Dai, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Junqi Zhao, Weisheng Wang, Boyang Li, Pascale Fung, Steven Hoi

    Abstract: Large-scale pre-training and instruction tuning have been successful at creating general-purpose language models with broad competence. However, building general-purpose vision-language models is challenging due to the rich input distributions and task diversity resulting from the additional visual input. Although vision-language pretraining has been widely studied, vision-language instruction tun… ▽ More

    Submitted 15 June, 2023; v1 submitted 10 May, 2023; originally announced May 2023.

    Comments: preprint

  3. arXiv:2212.10846  [pdf, other

    cs.CV cs.MM

    From Images to Textual Prompts: Zero-shot VQA with Frozen Large Language Models

    Authors: Jiaxian Guo, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Boyang Li, Dacheng Tao, Steven C. H. Hoi

    Abstract: Large language models (LLMs) have demonstrated excellent zero-shot generalization to new language tasks. However, effective utilization of LLMs for zero-shot visual question-answering (VQA) remains challenging, primarily due to the modality disconnection and task disconnection between LLM and VQA task. End-to-end training on vision and language data may bridge the disconnections, but is inflexible… ▽ More

    Submitted 8 May, 2023; v1 submitted 21 December, 2022; originally announced December 2022.

    Comments: CVPR 2023 Camera Ready Version

  4. arXiv:2210.08773  [pdf, other

    cs.CV

    Plug-and-Play VQA: Zero-shot VQA by Conjoining Large Pretrained Models with Zero Training

    Authors: Anthony Meng Huat Tiong, Junnan Li, Boyang Li, Silvio Savarese, Steven C. H. Hoi

    Abstract: Visual question answering (VQA) is a hallmark of vision and language reasoning and a challenging task under the zero-shot setting. We propose Plug-and-Play VQA (PNP-VQA), a modular framework for zero-shot VQA. In contrast to most existing works, which require substantial adaptation of pretrained language models (PLMs) for the vision modality, PNP-VQA requires no additional training of the PLMs. In… ▽ More

    Submitted 19 March, 2023; v1 submitted 17 October, 2022; originally announced October 2022.

    Comments: EMNLP 2022 (Findings); correct typos in Equation 2 on page 4

  5. arXiv:2110.10048  [pdf, other

    cs.CV

    Improving Tail-Class Representation with Centroid Contrastive Learning

    Authors: Anthony Meng Huat Tiong, Junnan Li, Guosheng Lin, Boyang Li, Caiming Xiong, Steven C. H. Hoi

    Abstract: In vision domain, large-scale natural datasets typically exhibit long-tailed distribution which has large class imbalance between head and tail classes. This distribution poses difficulty in learning good representations for tail classes. Recent developments have shown good long-tailed model can be learnt by decoupling the training into representation learning and classifier balancing. However, th… ▽ More

    Submitted 4 May, 2023; v1 submitted 19 October, 2021; originally announced October 2021.

    Comments: Add in acknowledgment

  6. arXiv:1903.03266  [pdf, other

    cs.RO cs.HC cs.PF

    Performance evaluation of a foot-controlled human-robot interface

    Authors: Yanpei Huang, Etienne Burdet, Lin Cao, Phuoc Thien Phan, Anthony Meng Huat Tiong, Pai Zheng, Soo Jay Phee

    Abstract: Robotic minimally invasive interventions typically require using more than two instruments. We thus developed a foot pedal interface which allows the user to control a robotic arm (simultaneously to working with the hands) with four degrees of freedom in continuous directions and speeds. This paper evaluates and compares the performances of ten naive operators in using this new pedal interface and… ▽ More

    Submitted 7 March, 2019; originally announced March 2019.

    Comments: 7 pages, submit to 2019 IROS RA-Letter

    Journal ref: IEEE Robotics and Automation Letters, 2019

  7. arXiv:1902.04752  [pdf, other

    cs.HC cs.RO

    A Subject-Specific Four-Degree-of-Freedom Foot Interface to Control a Robot Arm

    Authors: Yanpei Huang, Etienne Burdet, Lin Cao, Phuoc Thien Phan, Anthony Meng Huat Tiong, Soo Jay Phee

    Abstract: In robotic surgery, the surgeon controls robotic instruments using dedicated interfaces. One critical limitation of current interfaces is that they are designed to be operated by only the hands. This means that the surgeon can only control at most two robotic instruments at one time while many interventions require three instruments. This paper introduces a novel four-degree-of-freedom foot-machin… ▽ More

    Submitted 13 February, 2019; originally announced February 2019.

    Comments: 11 pages,10 figures, submit to the journal of IEEE/ASME Transactions on Mechatronics with the status of under review