Skip to main content

Showing 1–18 of 18 results for author: Hua, N

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.13216  [pdf, other

    cs.CL

    Equip** Transformer with Random-Access Reading for Long-Context Understanding

    Authors: Chenghao Yang, Zi Yang, Nan Hua

    Abstract: Long-context modeling presents a significant challenge for transformer-based large language models (LLMs) due to the quadratic complexity of the self-attention mechanism and issues with length extrapolation caused by pretraining exclusively on short inputs. Existing methods address computational complexity through techniques such as text chunking, the kernel approach, and structured attention, and… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: Preliminary works for a Google Student Researcher Project

  2. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  3. arXiv:2401.04881  [pdf, other

    cs.CL

    Attendre: Wait To Attend By Retrieval With Evicted Queries in Memory-Based Transformers for Long Context Processing

    Authors: Zi Yang, Nan Hua

    Abstract: As LLMs have become capable of processing more complex types of inputs, researchers have recently studied how to efficiently and affordably process possibly arbitrarily long sequences. One effective approach is to use a FIFO memory to store keys and values of an attention sublayer from past chunks to allow subsequent queries to attend. However, this approach requires a large memory and/or takes in… ▽ More

    Submitted 9 January, 2024; originally announced January 2024.

  4. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  5. arXiv:2309.10952  [pdf, other

    cs.CL cs.AI cs.LG

    LMDX: Language Model-based Document Information Extraction and Localization

    Authors: Vincent Perot, Kai Kang, Florian Luisier, Guolong Su, Xiaoyu Sun, Ramya Sree Boppana, Zilong Wang, Zifeng Wang, Jiaqi Mu, Hao Zhang, Chen-Yu Lee, Nan Hua

    Abstract: Large Language Models (LLM) have revolutionized Natural Language Processing (NLP), improving state-of-the-art and exhibiting emergent capabilities across various tasks. However, their application in extracting information from visually rich documents, which is at the core of many document processing workflows and involving the extraction of key entities from semi-structured documents, has not yet… ▽ More

    Submitted 21 June, 2024; v1 submitted 19 September, 2023; originally announced September 2023.

  6. arXiv:2305.02549  [pdf, other

    cs.CL cs.CV cs.LG

    FormNetV2: Multimodal Graph Contrastive Learning for Form Document Information Extraction

    Authors: Chen-Yu Lee, Chun-Liang Li, Hao Zhang, Timothy Dozat, Vincent Perot, Guolong Su, Xiang Zhang, Kihyuk Sohn, Nikolai Glushnev, Renshen Wang, Joshua Ainslie, Shangbang Long, Siyang Qin, Yasuhisa Fujii, Nan Hua, Tomas Pfister

    Abstract: The recent advent of self-supervised pre-training techniques has led to a surge in the use of multimodal learning in form document understanding. However, existing approaches that extend the mask language modeling to other modalities require careful multi-task tuning, complex reconstruction target designs, or additional pre-training data. In FormNetV2, we introduce a centralized multimodal graph c… ▽ More

    Submitted 13 June, 2023; v1 submitted 4 May, 2023; originally announced May 2023.

    Comments: Accepted to ACL 2023

  7. arXiv:2303.05072  [pdf, other

    cs.CV cs.AI cs.LG

    Identification of Systematic Errors of Image Classifiers on Rare Subgroups

    Authors: Jan Hendrik Metzen, Robin Hutmacher, N. Grace Hua, Valentyn Boreiko, Dan Zhang

    Abstract: Despite excellent average-case performance of many image classifiers, their performance can substantially deteriorate on semantically coherent subgroups of the data that were under-represented in the training data. These systematic errors can impact both fairness for demographic minority groups as well as robustness and safety under domain shift. A major challenge is to identify such subgroups wit… ▽ More

    Submitted 12 April, 2023; v1 submitted 9 March, 2023; originally announced March 2023.

  8. arXiv:2209.05980  [pdf, other

    cs.CV cs.AI cs.CR cs.LG

    Certified Defences Against Adversarial Patch Attacks on Semantic Segmentation

    Authors: Maksym Yatsura, Kaspar Sakmann, N. Grace Hua, Matthias Hein, Jan Hendrik Metzen

    Abstract: Adversarial patch attacks are an emerging security threat for real world deep learning applications. We present Demasked Smoothing, the first approach (up to our knowledge) to certify the robustness of semantic segmentation models against this threat model. Previous work on certifiably defending against patch attacks has mostly focused on image classification task and often required changes in the… ▽ More

    Submitted 21 February, 2023; v1 submitted 13 September, 2022; originally announced September 2022.

    Comments: accepted at ICLR 2023

  9. Protoformer: Embedding Prototypes for Transformers

    Authors: Ashkan Farhangi, Ning Sui, Nan Hua, Haiyan Bai, Arthur Huang, Zhishan Guo

    Abstract: Transformers have been widely applied in text classification. Unfortunately, real-world data contain anomalies and noisy labels that cause challenges for state-of-art Transformers. This paper proposes Protoformer, a novel self-learning framework for Transformers that can leverage problematic samples for text classification. Protoformer features a selection mechanism for embedding samples that allo… ▽ More

    Submitted 25 June, 2022; originally announced June 2022.

    Comments: Advances in Knowledge Discovery and Data Mining (PAKDD 2022)

    Journal ref: Advances in Knowledge Discovery and Data Mining: 26th Pacific-Asia Conference, PAKDD 2022

  10. arXiv:2203.08411  [pdf, other

    cs.CL cs.CV cs.LG

    FormNet: Structural Encoding beyond Sequential Modeling in Form Document Information Extraction

    Authors: Chen-Yu Lee, Chun-Liang Li, Timothy Dozat, Vincent Perot, Guolong Su, Nan Hua, Joshua Ainslie, Renshen Wang, Yasuhisa Fujii, Tomas Pfister

    Abstract: Sequence modeling has demonstrated state-of-the-art performance on natural language and document understanding tasks. However, it is challenging to correctly serialize tokens in form-like documents in practice due to their variety of layout patterns. We propose FormNet, a structure-aware sequence model to mitigate the suboptimal serialization of forms. First, we design Rich Attention that leverage… ▽ More

    Submitted 23 March, 2022; v1 submitted 16 March, 2022; originally announced March 2022.

    Comments: Accepted to ACL 2022

  11. arXiv:2106.01686  [pdf, other

    cs.AI cs.CL cs.DB cs.IR cs.LG

    AliCG: Fine-grained and Evolvable Conceptual Graph Construction for Semantic Search at Alibaba

    Authors: Ningyu Zhang, Qianghuai Jia, Shumin Deng, Xiang Chen, Hongbin Ye, Hui Chen, Huaixiao Tou, Gang Huang, Zhao Wang, Nengwei Hua, Huajun Chen

    Abstract: Conceptual graphs, which is a particular type of Knowledge Graphs, play an essential role in semantic search. Prior conceptual graph construction approaches typically extract high-frequent, coarse-grained, and time-invariant concepts from formal texts. In real applications, however, it is necessary to extract less-frequent, fine-grained, and time-varying conceptual knowledge and build taxonomy in… ▽ More

    Submitted 7 December, 2021; v1 submitted 3 June, 2021; originally announced June 2021.

    Comments: Accepted by KDD 2021 (Applied Data Science Track)

  12. arXiv:2010.01791  [pdf, other

    cs.CL

    Pruning Redundant Map**s in Transformer Models via Spectral-Normalized Identity Prior

    Authors: Zi Lin, Jeremiah Zhe Liu, Zi Yang, Nan Hua, Dan Roth

    Abstract: Traditional (unstructured) pruning methods for a Transformer model focus on regularizing the individual weights by penalizing them toward zero. In this work, we explore spectral-normalized identity priors (SNIP), a structured pruning approach that penalizes an entire residual module in a Transformer model toward an identity map**. Our method identifies and discards unimportant non-linear map**… ▽ More

    Submitted 5 October, 2020; originally announced October 2020.

    Comments: Findings of EMNLP 2020

  13. arXiv:2008.10813  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    Conceptualized Representation Learning for Chinese Biomedical Text Mining

    Authors: Ningyu Zhang, Qianghuai Jia, Kang** Yin, Liang Dong, Feng Gao, Nengwei Hua

    Abstract: Biomedical text mining is becoming increasingly important as the number of biomedical documents and web data rapidly grows. Recently, word representation models such as BERT has gained popularity among researchers. However, it is difficult to estimate their performance on datasets containing biomedical texts as the word distributions of general and biomedical corpora are quite different. Moreover,… ▽ More

    Submitted 25 August, 2020; originally announced August 2020.

    Comments: WSDM2020 Health Day

  14. arXiv:1909.04493  [pdf, other

    cs.IR cs.AI cs.CL cs.LG

    Context-aware Deep Model for Entity Recommendation in Search Engine at Alibaba

    Authors: Qianghuai Jia, Ningyu Zhang, Nengwei Hua

    Abstract: Entity recommendation, providing search users with an improved experience via assisting them in finding related entities for a given query, has become an indispensable feature of today's search engines. Existing studies typically only consider the queries with explicit entities. They usually fail to handle complex queries that without entities, such as "what food is good for cold weather", because… ▽ More

    Submitted 6 September, 2019; originally announced September 2019.

    Comments: CIKM2019 International Workshop on Entity Retrieval. arXiv admin note: text overlap with arXiv:1511.08996 by other authors

  15. arXiv:1803.11175  [pdf, other

    cs.CL

    Universal Sentence Encoder

    Authors: Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, Yun-Hsuan Sung, Brian Strope, Ray Kurzweil

    Abstract: We present models for encoding sentences into embedding vectors that specifically target transfer learning to other NLP tasks. The models are efficient and result in accurate performance on diverse transfer tasks. Two variants of the encoding models allow for trade-offs between accuracy and compute resources. For both variants, we investigate and report the relationship between model complexity, r… ▽ More

    Submitted 12 April, 2018; v1 submitted 29 March, 2018; originally announced March 2018.

    Comments: 7 pages; fixed module URL in Listing 1

  16. Throughput Scaling for MMF-Enabled Optical Datacenter Networks by Time-Slicing-Based Crosstalk Mitigation

    Authors: Zhizhen Zhong, Nan Hua, Yufang Yu, Zhongying Wu, Juhao Li, Haozhe Yan, Shangyuan Li, Ruijie Luo, Jialong Li, Yanhe Li, ** Zheng

    Abstract: Modal crosstalk is the main bottleneck in MMF-enabled optical datacenter networks with direct detection. A novel time-slicing-based crosstalk-mitigated MDM scheme is first proposed, then theoretically analyzed and experimentally demonstrated.

    Submitted 13 December, 2017; originally announced December 2017.

    Comments: Accepted by Optical Fiber Communications Conference (OFC), 2018

  17. Evolving Optical Networks for Latency-Sensitive Smart-Grid Communications via Optical Time Slice Switching (OTSS) Technologies

    Authors: Zhizhen Zhong, Nan Hua, Zhu Liu, Wen**g Li, Yanhe Li, ** Zheng

    Abstract: In this paper, we proposed a novel OTSS-assisted optical network architecture for smart-grid communication networks, which has unique requirements for low-latency connections. Illustrative results show that, OTSS can provide extremely better performance in latency and blocking probability than conventional flexi-grid optical networks.

    Submitted 5 September, 2017; originally announced September 2017.

    Comments: IEEE Photonics Society 1st Place Best Poster Award, on CLEO-PR/OECC/PGC 2017

  18. On QoS-assured degraded provisioning in service-differentiated multi-layer elastic optical networks

    Authors: Zhizhen Zhong, Jipu Li, Nan Hua, Gustavo B. Figueiredo, Yanhe Li, ** Zheng, Biswanath Mukherjee

    Abstract: The emergence of new network applications is driving network operators to not only fulfill dynamic bandwidth requirements, but offer various grades of service. Degraded provisioning provides an effective solution to flexibly allocate resources in various dimensions to reduce blocking for differentiated demands when network congestion occurs. In this work, we investigate the novel problem of online… ▽ More

    Submitted 4 January, 2017; v1 submitted 15 July, 2016; originally announced July 2016.

    Comments: accepted by IEEE GLOBECOM 2016