Skip to main content

Showing 1–17 of 17 results for author: Phuong, M

.
  1. arXiv:2407.01983  [pdf, other

    cs.CV

    SADL: An Effective In-Context Learning Method for Compositional Visual QA

    Authors: Long Hoang Dang, Thao Minh Le, Vuong Le, Tu Minh Phuong, Truyen Tran

    Abstract: Large vision-language models (LVLMs) offer a novel capability for performing in-context learning (ICL) in Visual QA. When prompted with a few demonstrations of image-question-answer triplets, LVLMs have demonstrated the ability to discern underlying patterns and transfer this latent knowledge to answer new questions about unseen images without the need for expensive supervised fine-tuning. However… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  2. arXiv:2403.13793  [pdf, other

    cs.LG

    Evaluating Frontier Models for Dangerous Capabilities

    Authors: Mary Phuong, Matthew Aitchison, Elliot Catt, Sarah Cogan, Alexandre Kaskasoli, Victoria Krakovna, David Lindner, Matthew Rahtz, Yannis Assael, Sarah Hodkinson, Heidi Howard, Tom Lieberum, Ramana Kumar, Maria Abi Raad, Albert Webson, Lewis Ho, Sharon Lin, Sebastian Farquhar, Marcus Hutter, Gregoire Deletang, Anian Ruoss, Seliem El-Sayed, Sasha Brown, Anca Dragan, Rohin Shah , et al. (2 additional authors not shown)

    Abstract: To understand the risks posed by a new AI system, we must understand what it can and cannot do. Building on prior work, we introduce a programme of new "dangerous capability" evaluations and pilot them on Gemini 1.0 models. Our evaluations cover four areas: (1) persuasion and deception; (2) cyber-security; (3) self-proliferation; and (4) self-reasoning. We do not find evidence of strong dangerous… ▽ More

    Submitted 5 April, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

  3. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  4. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  5. arXiv:2308.14654  [pdf, other

    cs.CL cs.AI

    Joint Multiple Intent Detection and Slot Filling with Supervised Contrastive Learning and Self-Distillation

    Authors: Nguyen Anh Tu, Hoang Thi Thu Uyen, Tu Minh Phuong, Ngo Xuan Bach

    Abstract: Multiple intent detection and slot filling are two fundamental and crucial tasks in spoken language understanding. Motivated by the fact that the two tasks are closely related, joint models that can detect intents and extract slots simultaneously are preferred to individual models that perform each task independently. The accuracy of a joint model depends heavily on the ability of the model to tra… ▽ More

    Submitted 28 August, 2023; originally announced August 2023.

    Comments: Accepted at ECAI 2023

  6. arXiv:2305.15324  [pdf, other

    cs.AI

    Model evaluation for extreme risks

    Authors: Toby Shevlane, Sebastian Farquhar, Ben Garfinkel, Mary Phuong, Jess Whittlestone, Jade Leung, Daniel Kokotajlo, Nahema Marchal, Markus Anderljung, Noam Kolt, Lewis Ho, Divya Siddarth, Shahar Avin, Will Hawkins, Been Kim, Iason Gabriel, Vijay Bolina, Jack Clark, Yoshua Bengio, Paul Christiano, Allan Dafoe

    Abstract: Current approaches to building general-purpose AI systems tend to produce systems with both beneficial and harmful capabilities. Further progress in AI development could lead to capabilities that pose extreme risks, such as offensive cyber capabilities or strong manipulation skills. We explain why model evaluation is critical for addressing extreme risks. Developers must be able to identify danger… ▽ More

    Submitted 22 September, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: Fixed typos; added citation

    ACM Class: K.4.1

  7. Analyzing Vietnamese Legal Questions Using Deep Neural Networks with Biaffine Classifiers

    Authors: Nguyen Anh Tu, Hoang Thi Thu Uyen, Tu Minh Phuong, Ngo Xuan Bach

    Abstract: In this paper, we propose using deep neural networks to extract important information from Vietnamese legal questions, a fundamental task towards building a question answering system in the legal domain. Given a legal question in natural language, the goal is to extract all the segments that contain the needed information to answer the question. We introduce a deep model that solves the task in th… ▽ More

    Submitted 27 April, 2023; originally announced April 2023.

    Comments: accepted as the oral presentation at ICONIP 2021

  8. arXiv:2210.01790  [pdf, other

    cs.LG

    Goal Misgeneralization: Why Correct Specifications Aren't Enough For Correct Goals

    Authors: Rohin Shah, Vikrant Varma, Ramana Kumar, Mary Phuong, Victoria Krakovna, Jonathan Uesato, Zac Kenton

    Abstract: The field of AI alignment is concerned with AI systems that pursue unintended goals. One commonly studied mechanism by which an unintended goal might arise is specification gaming, in which the designer-provided specification is flawed in a way that the designers did not foresee. However, an AI system may pursue an undesired goal even when the specification is correct, in the case of goal misgener… ▽ More

    Submitted 2 November, 2022; v1 submitted 4 October, 2022; originally announced October 2022.

  9. arXiv:2207.09238  [pdf, other

    cs.LG cs.AI cs.CL cs.NE

    Formal Algorithms for Transformers

    Authors: Mary Phuong, Marcus Hutter

    Abstract: This document aims to be a self-contained, mathematically precise overview of transformer architectures and algorithms (*not* results). It covers what transformers are, how they are trained, what they are used for, their key architectural components, and a preview of the most prominent models. The reader is assumed to be familiar with basic ML terminology and simpler neural network architectures s… ▽ More

    Submitted 19 July, 2022; originally announced July 2022.

    Comments: 16 pages, 15 algorithms

    Journal ref: Latest 2022 version at http://www.hutter1.net/publ/transalg.pdf

  10. arXiv:2207.03656  [pdf, other

    cs.CV cs.LG

    Video Dialog as Conversation about Objects Living in Space-Time

    Authors: Hoang-Anh Pham, Thao Minh Le, Vuong Le, Tu Minh Phuong, Truyen Tran

    Abstract: It would be a technological feat to be able to create a system that can hold a meaningful conversation with humans about what they watch. A setup toward that goal is presented as a video dialog task, where the system is asked to generate natural utterances in response to a question in an ongoing dialog. The task poses great visual, linguistic, and reasoning challenges that cannot be easily overcom… ▽ More

    Submitted 7 July, 2022; originally announced July 2022.

    Comments: Accepted to ECCV 2022, code will be available at https://github.com/hoanganhpham1006/COST

  11. A Summary of the ALQAC 2021 Competition

    Authors: Nguyen Ha Thanh, Bui Minh Quan, Chau Nguyen, Tung Le, Nguyen Minh Phuong, Dang Tran Binh, Vuong Thi Hai Yen, Teeradaj Racharak, Nguyen Le Minh, Tran Duc Vu, Phan Viet Anh, Nguyen Truong Son, Huy Tien Nguyen, Bhumindr Butr-indr, Peerapon Vateekul, Prachya Boonkwan

    Abstract: We summarize the evaluation of the first Automated Legal Question Answering Competition (ALQAC 2021). The competition this year contains three tasks, which aims at processing the statute law document, which are Legal Text Information Retrieval (Task 1), Legal Text Entailment Prediction (Task 2), and Legal Text Question Answering (Task 3). The final goal of these tasks is to build a system that can… ▽ More

    Submitted 24 April, 2022; v1 submitted 22 April, 2022; originally announced April 2022.

  12. arXiv:2204.03324  [pdf, other

    cs.CL cs.AI

    Autoencoding Language Model Based Ensemble Learning for Commonsense Validation and Explanation

    Authors: Ngo Quang Huy, Tu Minh Phuong, Ngo Xuan Bach

    Abstract: An ultimate goal of artificial intelligence is to build computer systems that can understand human languages. Understanding commonsense knowledge about the world expressed in text is one of the foundational and challenging problems to create such intelligent systems. As a step towards this goal, we present in this paper ALMEn, an Autoencoding Language Model based Ensemble learning method for commo… ▽ More

    Submitted 7 April, 2022; originally announced April 2022.

  13. arXiv:2105.13093  [pdf, other

    cs.LG stat.ML

    Towards Understanding Knowledge Distillation

    Authors: Mary Phuong, Christoph H. Lampert

    Abstract: Knowledge distillation, i.e., one classifier being trained on the outputs of another classifier, is an empirically very successful technique for knowledge transfer between classifiers. It has even been observed that classifiers learn much faster and more reliably if trained with the outputs of another classifier as soft labels, instead of from ground truth data. So far, however, there is no satisf… ▽ More

    Submitted 27 May, 2021; originally announced May 2021.

    Comments: ICML'19. Post-edited to add related work. arXiv admin note: text overlap with arXiv:2003.13438 by other authors

  14. arXiv:2003.06858  [pdf

    cs.CL cs.AI cs.IR

    Leveraging Foreign Language Labeled Data for Aspect-Based Opinion Mining

    Authors: Nguyen Thi Thanh Thuy, Ngo Xuan Bach, Tu Minh Phuong

    Abstract: Aspect-based opinion mining is the task of identifying sentiment at the aspect level in opinionated text, which consists of two subtasks: aspect category extraction and sentiment polarity classification. While aspect category extraction aims to detect and categorize opinion targets such as product features, sentiment polarity classification assigns a sentiment label, i.e. positive, negative, or ne… ▽ More

    Submitted 15 March, 2020; originally announced March 2020.

  15. Classifying Vietnamese Disease Outbreak Reports with Important Sentences and Rich Features

    Authors: Son Doan, Nguyen Thi Ngoc Vinh, Tu Minh Phuong

    Abstract: Text classification is an important field of research from mid 90s up to now. It has many applications, one of them is in Web-based biosurveillance systems which identify and summarize online disease outbreak reports. In this paper we focus on classifying Vietnamese disease outbreak reports. We investigate important properties of disease outbreak reports, e.g., sentences containing names of outbre… ▽ More

    Submitted 22 November, 2019; originally announced November 2019.

    Comments: 5 pages, 2 tables

    Journal ref: Proc. of the Third Symposium on Information and Communication Technology (SoICT), pages 260-265, 2012

  16. arXiv:1703.09296  [pdf, other

    cs.CV

    Femoral ROIs and Entropy for Texture-based Detection of Osteoarthritis from High-Resolution Knee Radiographs

    Authors: Jiří Hladůvka, Bui Thi Mai Phuong, Richard Ljuhar, Davul Ljuhar, Ana M Rodrigues, Jaime C Branco, Helena Canhão

    Abstract: The relationship between knee osteoarthritis progression and changes in tibial bone structure has long been recognized and various texture descriptors have been proposed to detect early osteoarthritis (OA) from radiographs. This work aims to investigate (1) femoral textures as an OA indicator and (2) the potential of entropy as a computationally efficient alternative to established texture descrip… ▽ More

    Submitted 27 March, 2017; originally announced March 2017.

  17. Natural Language Processing in Biomedicine: A Unified System Architecture Overview

    Authors: Son Doan, Mike Conway, Tu Minh Phuong, Lucila Ohno-Machado

    Abstract: In modern electronic medical records (EMR) much of the clinically important data - signs and symptoms, symptom severity, disease status, etc. - are not provided in structured data fields, but rather are encoded in clinician generated narrative text. Natural language processing (NLP) provides a means of "unlocking" this important data source for applications in clinical decision support, quality… ▽ More

    Submitted 8 January, 2014; v1 submitted 2 January, 2014; originally announced January 2014.

    Comments: 25 pages, 5 figures, book chapter in Clinical Bioinformatics, 2014, edited by Ronand Trent