Skip to main content

Showing 1–17 of 17 results for author: Bach, N

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.14219  [pdf, other

    cs.CL cs.AI

    Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

    Authors: Marah Abdin, Sam Ade Jacobs, Ammar Ahmad Awan, Jyoti Aneja, Ahmed Awadallah, Hany Awadalla, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Jianmin Bao, Harkirat Behl, Alon Benhaim, Misha Bilenko, Johan Bjorck, Sébastien Bubeck, Qin Cai, Martin Cai, Caio César Teodoro Mendes, Weizhu Chen, Vishrav Chaudhary, Dong Chen, Dongdong Chen, Yen-Chun Chen, Yi-Ling Chen, Parul Chopra , et al. (90 additional authors not shown)

    Abstract: We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. The innovation lies entirely in our dataset… ▽ More

    Submitted 23 May, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: 19 pages

  2. arXiv:2308.14654  [pdf, other

    cs.CL cs.AI

    Joint Multiple Intent Detection and Slot Filling with Supervised Contrastive Learning and Self-Distillation

    Authors: Nguyen Anh Tu, Hoang Thi Thu Uyen, Tu Minh Phuong, Ngo Xuan Bach

    Abstract: Multiple intent detection and slot filling are two fundamental and crucial tasks in spoken language understanding. Motivated by the fact that the two tasks are closely related, joint models that can detect intents and extract slots simultaneously are preferred to individual models that perform each task independently. The accuracy of a joint model depends heavily on the ability of the model to tra… ▽ More

    Submitted 28 August, 2023; originally announced August 2023.

    Comments: Accepted at ECAI 2023

  3. Analyzing Vietnamese Legal Questions Using Deep Neural Networks with Biaffine Classifiers

    Authors: Nguyen Anh Tu, Hoang Thi Thu Uyen, Tu Minh Phuong, Ngo Xuan Bach

    Abstract: In this paper, we propose using deep neural networks to extract important information from Vietnamese legal questions, a fundamental task towards building a question answering system in the legal domain. Given a legal question in natural language, the goal is to extract all the segments that contain the needed information to answer the question. We introduce a deep model that solves the task in th… ▽ More

    Submitted 27 April, 2023; originally announced April 2023.

    Comments: accepted as the oral presentation at ICONIP 2021

  4. arXiv:2206.01843  [pdf, other

    cs.CV cs.AI cs.CL

    Visual Clues: Bridging Vision and Language Foundations for Image Paragraph Captioning

    Authors: Yujia Xie, Luowei Zhou, Xiyang Dai, Lu Yuan, Nguyen Bach, Ce Liu, Michael Zeng

    Abstract: People say, "A picture is worth a thousand words". Then how can we get the rich information out of the image? We argue that by using visual clues to bridge large pretrained vision foundation models and language models, we can do so without any extra cross-modal training. Thanks to the strong zero-shot capability of foundation models, we start by constructing a rich semantic representation of the i… ▽ More

    Submitted 14 September, 2022; v1 submitted 3 June, 2022; originally announced June 2022.

  5. arXiv:2204.03324  [pdf, other

    cs.CL cs.AI

    Autoencoding Language Model Based Ensemble Learning for Commonsense Validation and Explanation

    Authors: Ngo Quang Huy, Tu Minh Phuong, Ngo Xuan Bach

    Abstract: An ultimate goal of artificial intelligence is to build computer systems that can understand human languages. Understanding commonsense knowledge about the world expressed in text is one of the foundational and challenging problems to create such intelligent systems. As a step towards this goal, we present in this paper ALMEn, an Autoencoding Language Model based Ensemble learning method for commo… ▽ More

    Submitted 7 April, 2022; originally announced April 2022.

  6. arXiv:2112.06482  [pdf, other

    cs.CL

    ITA: Image-Text Alignments for Multi-Modal Named Entity Recognition

    Authors: Xinyu Wang, Min Gui, Yong Jiang, Zixia Jia, Nguyen Bach, Tao Wang, Zhongqiang Huang, Fei Huang, Kewei Tu

    Abstract: Recently, Multi-modal Named Entity Recognition (MNER) has attracted a lot of attention. Most of the work utilizes image information through region-level visual representations obtained from a pretrained object detector and relies on an attention mechanism to model the interactions between image and text representations. However, it is difficult to model such interactions as image and text represen… ▽ More

    Submitted 20 September, 2022; v1 submitted 13 December, 2021; originally announced December 2021.

    Comments: Accepted to NAACL 2022

  7. arXiv:2109.05716  [pdf, other

    cs.CL

    MuVER: Improving First-Stage Entity Retrieval with Multi-View Entity Representations

    Authors: Xinyin Ma, Yong Jiang, Nguyen Bach, Tao Wang, Zhongqiang Huang, Fei Huang, Weiming Lu

    Abstract: Entity retrieval, which aims at disambiguating mentions to canonical entities from massive KBs, is essential for many tasks in natural language processing. Recent progress in entity retrieval shows that the dual-encoder structure is a powerful and efficient framework to nominate candidates if entities are only identified by descriptions. However, they ignore the property that meanings of entity me… ▽ More

    Submitted 13 September, 2021; originally announced September 2021.

    Comments: Accepted by EMNLP 2021

  8. arXiv:2105.03654  [pdf, other

    cs.CL cs.AI cs.LG

    Improving Named Entity Recognition by External Context Retrieving and Cooperative Learning

    Authors: Xinyu Wang, Yong Jiang, Nguyen Bach, Tao Wang, Zhongqiang Huang, Fei Huang, Kewei Tu

    Abstract: Recent advances in Named Entity Recognition (NER) show that document-level contexts can significantly improve model performance. In many application scenarios, however, such contexts are not available. In this paper, we propose to find external contexts of a sentence by retrieving and selecting a set of semantically relevant texts through a search engine, with the original sentence as the query. W… ▽ More

    Submitted 8 December, 2022; v1 submitted 8 May, 2021; originally announced May 2021.

    Comments: Accepted to ACL 2021, 12 pages. Our newest code is publicly available at https://github.com/modelscope/AdaSeq/tree/master/examples/RaNER

  9. arXiv:2011.05604  [pdf, other

    cs.CL cs.LG

    An Investigation of Potential Function Designs for Neural CRF

    Authors: Zechuan Hu, Yong Jiang, Nguyen Bach, Tao Wang, Zhongqiang Huang, Fei Huang, Kewei Tu

    Abstract: The neural linear-chain CRF model is one of the most widely-used approach to sequence labeling. In this paper, we investigate a series of increasingly expressive potential functions for neural CRF models, which not only integrate the emission and transition functions, but also explicitly take the representations of the contextual words as input. Our extensive experiments show that the decomposed q… ▽ More

    Submitted 11 November, 2020; originally announced November 2020.

  10. arXiv:2010.15425  [pdf, other

    astro-ph.IM astro-ph.EP cs.LG

    Detection of asteroid trails in Hubble Space Telescope images using Deep Learning

    Authors: Andrei A. Parfeni, Laurentiu I. Caramete, Andreea M. Dobre, Nguyen Tran Bach

    Abstract: We present an application of Deep Learning for the image recognition of asteroid trails in single-exposure photos taken by the Hubble Space Telescope. Using algorithms based on multi-layered deep Convolutional Neural Networks, we report accuracies of above 80% on the validation set. Our project was motivated by the Hubble Asteroid Hunter project on Zooniverse, which focused on identifying these ob… ▽ More

    Submitted 30 October, 2020; v1 submitted 29 October, 2020; originally announced October 2020.

    Comments: 12 pages, 8 figures

  11. arXiv:2010.05010  [pdf, other

    cs.CL cs.AI cs.LG

    Structural Knowledge Distillation: Tractably Distilling Information for Structured Predictor

    Authors: Xinyu Wang, Yong Jiang, Zhaohui Yan, Zixia Jia, Nguyen Bach, Tao Wang, Zhongqiang Huang, Fei Huang, Kewei Tu

    Abstract: Knowledge distillation is a critical technique to transfer knowledge between models, typically from a large model (the teacher) to a more fine-grained one (the student). The objective function of knowledge distillation is typically the cross-entropy between the teacher and the student's output distributions. However, for structured prediction problems, the output space is exponential in size; ther… ▽ More

    Submitted 1 June, 2021; v1 submitted 10 October, 2020; originally announced October 2020.

    Comments: Accepted to Proceedings of ACL-IJCNLP 2021. 15 pages

  12. arXiv:2010.05006  [pdf, other

    cs.CL cs.AI cs.LG

    Automated Concatenation of Embeddings for Structured Prediction

    Authors: Xinyu Wang, Yong Jiang, Nguyen Bach, Tao Wang, Zhongqiang Huang, Fei Huang, Kewei Tu

    Abstract: Pretrained contextualized embeddings are powerful word representations for structured prediction tasks. Recent work found that better word representations can be obtained by concatenating different types of embeddings. However, the selection of embeddings to form the best concatenated representation usually varies depending on the task and the collection of candidate embeddings, and the ever-incre… ▽ More

    Submitted 1 June, 2021; v1 submitted 10 October, 2020; originally announced October 2020.

    Comments: Accepted to Proceedings of ACL-IJCNLP 2021. 17 pages

  13. arXiv:2009.08330  [pdf, other

    cs.CL cs.LG

    More Embeddings, Better Sequence Labelers?

    Authors: Xinyu Wang, Yong Jiang, Nguyen Bach, Tao Wang, Zhongqiang Huang, Fei Huang, Kewei Tu

    Abstract: Recent work proposes a family of contextual embeddings that significantly improves the accuracy of sequence labelers over non-contextual embeddings. However, there is no definite conclusion on whether we can build better sequence labelers by combining different kinds of embeddings in various settings. In this paper, we conduct extensive experiments on 3 tasks over 18 datasets and 8 languages to st… ▽ More

    Submitted 1 June, 2021; v1 submitted 17 September, 2020; originally announced September 2020.

    Comments: Accepted to Findings of EMNLP 2020. Camera-ready, 16 pages

  14. arXiv:2009.08229  [pdf, other

    cs.CL cs.AI

    AIN: Fast and Accurate Sequence Labeling with Approximate Inference Network

    Authors: Xinyu Wang, Yong Jiang, Nguyen Bach, Tao Wang, Zhongqiang Huang, Fei Huang, Kewei Tu

    Abstract: The linear-chain Conditional Random Field (CRF) model is one of the most widely-used neural sequence labeling approaches. Exact probabilistic inference algorithms such as the forward-backward and Viterbi algorithms are typically applied in training and prediction stages of the CRF model. However, these algorithms require sequential computation that makes parallelization impossible. In this paper,… ▽ More

    Submitted 12 October, 2020; v1 submitted 17 September, 2020; originally announced September 2020.

    Comments: Accept to Main Conference of EMNLP 2020 (Short). Camera-ready, 8 Pages

  15. arXiv:2004.03846  [pdf, other

    cs.CL cs.AI cs.LG

    Structure-Level Knowledge Distillation For Multilingual Sequence Labeling

    Authors: Xinyu Wang, Yong Jiang, Nguyen Bach, Tao Wang, Fei Huang, Kewei Tu

    Abstract: Multilingual sequence labeling is a task of predicting label sequences using a single unified model for multiple languages. Compared with relying on multiple monolingual models, using a multilingual model has the benefit of a smaller model size, easier in online serving, and generalizability to low-resource languages. However, current multilingual models still underperform individual monolingual m… ▽ More

    Submitted 4 May, 2020; v1 submitted 8 April, 2020; originally announced April 2020.

    Comments: Accepted to ACL 2020, camera-ready. 14 pages

  16. arXiv:2003.06858  [pdf

    cs.CL cs.AI cs.IR

    Leveraging Foreign Language Labeled Data for Aspect-Based Opinion Mining

    Authors: Nguyen Thi Thanh Thuy, Ngo Xuan Bach, Tu Minh Phuong

    Abstract: Aspect-based opinion mining is the task of identifying sentiment at the aspect level in opinionated text, which consists of two subtasks: aspect category extraction and sentiment polarity classification. While aspect category extraction aims to detect and categorize opinion targets such as product features, sentiment polarity classification assigns a sentiment label, i.e. positive, negative, or ne… ▽ More

    Submitted 15 March, 2020; originally announced March 2020.

  17. arXiv:1811.11365  [pdf, other

    cs.CV cs.CL

    Unsupervised Multi-modal Neural Machine Translation

    Authors: Yuanhang Su, Kai Fan, Nguyen Bach, C. -C. Jay Kuo, Fei Huang

    Abstract: Unsupervised neural machine translation (UNMT) has recently achieved remarkable results with only large monolingual corpora in each language. However, the uncertainty of associating target with source sentences makes UNMT theoretically an ill-posed problem. This work investigates the possibility of utilizing images for disambiguation to improve the performance of UNMT. Our assumption is intuitivel… ▽ More

    Submitted 26 May, 2019; v1 submitted 27 November, 2018; originally announced November 2018.

    Comments: Accepted to CVPR 2019