Skip to main content

Showing 1–22 of 22 results for author: Ghiasi, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  2. arXiv:2402.05881  [pdf, other

    cs.IT eess.SP

    Localized and Distributed Beyond Diagonal Reconfigurable Intelligent Surfaces with Lossy Interconnections: Modeling and Optimization

    Authors: Matteo Nerini, Golsa Ghiaasi, Bruno Clerckx

    Abstract: Reconfigurable intelligent surface (RIS) is a key technology to control the communication environment in future wireless networks. Recently, beyond diagonal RIS (BD-RIS) emerged as a generalization of RIS achieving larger coverage through additional tunable impedance components interconnecting the RIS elements. However, conventional RIS and BD-RIS can effectively serve only users in their proximit… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

    Comments: Submitted to IEEE for publication

  3. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  4. arXiv:2306.01736  [pdf, other

    cs.CV cs.AI cs.LG

    DaTaSeg: Taming a Universal Multi-Dataset Multi-Task Segmentation Model

    Authors: Xiuye Gu, Yin Cui, Jonathan Huang, Abdullah Rashwan, Xuan Yang, Xingyi Zhou, Golnaz Ghiasi, Weicheng Kuo, Huizhong Chen, Liang-Chieh Chen, David A Ross

    Abstract: Observing the close relationship among panoptic, semantic and instance segmentation tasks, we propose to train a universal multi-dataset multi-task segmentation model: DaTaSeg.We use a shared representation (mask proposals with class predictions) for all tasks. To tackle task discrepancy, we adopt different merge operations and post-processing for different tasks. We also leverage weak-supervision… ▽ More

    Submitted 2 June, 2023; originally announced June 2023.

  5. arXiv:2203.12683  [pdf, other

    cs.CV cs.AI

    Revisiting Multi-Scale Feature Fusion for Semantic Segmentation

    Authors: Tianjian Meng, Golnaz Ghiasi, Reza Mahjourian, Quoc V. Le, Mingxing Tan

    Abstract: It is commonly believed that high internal resolution combined with expensive operations (e.g. atrous convolutions) are necessary for accurate semantic segmentation, resulting in slow speed and large memory usage. In this paper, we question this belief and demonstrate that neither high internal resolution nor atrous convolutions are necessary. Our intuition is that although segmentation is a dense… ▽ More

    Submitted 14 June, 2022; v1 submitted 23 March, 2022; originally announced March 2022.

  6. arXiv:2112.12143  [pdf, other

    cs.CV

    Scaling Open-Vocabulary Image Segmentation with Image-Level Labels

    Authors: Golnaz Ghiasi, Xiuye Gu, Yin Cui, Tsung-Yi Lin

    Abstract: We design an open-vocabulary image segmentation model to organize an image into meaningful regions indicated by arbitrary texts. Recent works (CLIP and ALIGN), despite attaining impressive open-vocabulary classification accuracy with image-level caption labels, are unable to segment visual concepts with pixels. We argue that these models miss an important step of visual grou**, which organizes p… ▽ More

    Submitted 20 July, 2022; v1 submitted 22 December, 2021; originally announced December 2021.

    Comments: Accepted at ECCV 2022

  7. arXiv:2111.10050  [pdf, other

    cs.LG cs.CL cs.CV

    Combined Scaling for Zero-shot Transfer Learning

    Authors: Hieu Pham, Zihang Dai, Golnaz Ghiasi, Kenji Kawaguchi, Hanxiao Liu, Adams Wei Yu, Jiahui Yu, Yi-Ting Chen, Minh-Thang Luong, Yonghui Wu, Mingxing Tan, Quoc V. Le

    Abstract: We present a combined scaling method - named BASIC - that achieves 85.7% top-1 accuracy on the ImageNet ILSVRC-2012 validation set without learning from any labeled ImageNet example. This accuracy surpasses best published similar models - CLIP and ALIGN - by 9.3%. Our BASIC model also shows significant improvements in robustness benchmarks. For instance, on 5 test sets with natural distribution sh… ▽ More

    Submitted 12 April, 2023; v1 submitted 19 November, 2021; originally announced November 2021.

  8. arXiv:2108.11353  [pdf, other

    cs.CV

    Multi-Task Self-Training for Learning General Representations

    Authors: Golnaz Ghiasi, Barret Zoph, Ekin D. Cubuk, Quoc V. Le, Tsung-Yi Lin

    Abstract: Despite the fast progress in training specialized models for various tasks, learning a single general model that works well for many tasks is still challenging for computer vision. Here we introduce multi-task self-training (MuST), which harnesses the knowledge in independent specialized teacher models (e.g., ImageNet model on classification) to train a single general student model. Our approach h… ▽ More

    Submitted 25 August, 2021; originally announced August 2021.

    Comments: ICCV 2021

  9. arXiv:2106.06080  [pdf, other

    cs.LG cs.AI

    Gradual Domain Adaptation in the Wild:When Intermediate Distributions are Absent

    Authors: Samira Abnar, Rianne van den Berg, Golnaz Ghiasi, Mostafa Dehghani, Nal Kalchbrenner, Hanie Sedghi

    Abstract: We focus on the problem of domain adaptation when the goal is shifting the model towards the target distribution, rather than learning domain invariant representations. It has been shown that under the following two assumptions: (a) access to samples from intermediate distributions, and (b) samples being annotated with the amount of change from the source distribution, self-training can be success… ▽ More

    Submitted 13 July, 2021; v1 submitted 10 June, 2021; originally announced June 2021.

  10. arXiv:2012.07177  [pdf, other

    cs.CV

    Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation

    Authors: Golnaz Ghiasi, Yin Cui, Aravind Srinivas, Rui Qian, Tsung-Yi Lin, Ekin D. Cubuk, Quoc V. Le, Barret Zoph

    Abstract: Building instance segmentation models that are data-efficient and can handle rare object categories is an important challenge in computer vision. Leveraging data augmentations is a promising direction towards addressing this challenge. Here, we perform a systematic study of the Copy-Paste augmentation ([13, 12]) for instance segmentation where we randomly paste objects onto an image. Prior studies… ▽ More

    Submitted 23 June, 2021; v1 submitted 13 December, 2020; originally announced December 2020.

    Comments: Accepted at CVPR 2021 (Oral)

  11. arXiv:2006.06882  [pdf, other

    cs.CV cs.LG stat.ML

    Rethinking Pre-training and Self-training

    Authors: Barret Zoph, Golnaz Ghiasi, Tsung-Yi Lin, Yin Cui, Hanxiao Liu, Ekin D. Cubuk, Quoc V. Le

    Abstract: Pre-training is a dominant paradigm in computer vision. For example, supervised ImageNet pre-training is commonly used to initialize the backbones of object detection and segmentation models. He et al., however, show a surprising result that ImageNet pre-training has limited impact on COCO object detection. Here we investigate self-training as another method to utilize additional data on the same… ▽ More

    Submitted 15 November, 2020; v1 submitted 11 June, 2020; originally announced June 2020.

    Comments: Accepted for publication at the Thirty-fourth Conference on Neural Information Processing Systems (NeurIPS 2020)

  12. A Hardware-in-the-Loop Evaluation of the Impact of the V2X Channel on the Traffic-Safety Versus Efficiency Trade-offs

    Authors: Alessandro Bazzi, Thomas Blazek, Michele Menarini, Barbara M. Masini, Alberto Zanella, Christoph Mecklenbrauker, Golsa Ghiaasi

    Abstract: Vehicles are increasingly becoming connected and short-range wireless communications promise to introduce a radical change in the drivers' behaviors. Among the main use cases, the intersection management is surely one of those that could mostly impact on both traffic safety and efficiency. In this work, we consider an intersection collision warning application and exploit an hardware-in-the-loop (… ▽ More

    Submitted 22 January, 2020; originally announced January 2020.

  13. arXiv:1912.05027  [pdf, other

    cs.CV cs.LG eess.IV

    SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization

    Authors: Xianzhi Du, Tsung-Yi Lin, Pengchong **, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song

    Abstract: Convolutional neural networks typically encode an input image into a series of intermediate features with decreasing resolutions. While this structure is suited to classification tasks, it does not perform well for tasks requiring simultaneous recognition and localization (e.g., object detection). The encoder-decoder architectures are proposed to resolve this by applying a decoder network onto a b… ▽ More

    Submitted 17 June, 2020; v1 submitted 10 December, 2019; originally announced December 2019.

    Comments: CVPR 2020

  14. arXiv:1912.01106  [pdf, other

    cs.CV

    MnasFPN: Learning Latency-aware Pyramid Architecture for Object Detection on Mobile Devices

    Authors: Bo Chen, Golnaz Ghiasi, Hanxiao Liu, Tsung-Yi Lin, Dmitry Kalenichenko, Hartwig Adams, Quoc V. Le

    Abstract: Despite the blooming success of architecture search for vision tasks in resource-constrained environments, the design of on-device object detection architectures have mostly been manual. The few automated search efforts are either centered around non-mobile-friendly search spaces or not guided by on-device latency. We propose MnasFPN, a mobile-friendly search space for the detection head, and comb… ▽ More

    Submitted 30 July, 2020; v1 submitted 2 December, 2019; originally announced December 2019.

    Comments: 10 pages, 7 figures

  15. arXiv:1906.11172  [pdf, other

    cs.CV cs.LG

    Learning Data Augmentation Strategies for Object Detection

    Authors: Barret Zoph, Ekin D. Cubuk, Golnaz Ghiasi, Tsung-Yi Lin, Jonathon Shlens, Quoc V. Le

    Abstract: Data augmentation is a critical component of training deep learning models. Although data augmentation has been shown to significantly improve image classification, its potential has not been thoroughly investigated for object detection. Given the additional cost for annotating images for object detection, data augmentation may be of even greater importance for this computer vision task. In this w… ▽ More

    Submitted 26 June, 2019; originally announced June 2019.

  16. arXiv:1904.07392  [pdf, other

    cs.CV cs.LG

    NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection

    Authors: Golnaz Ghiasi, Tsung-Yi Lin, Ruoming Pang, Quoc V. Le

    Abstract: Current state-of-the-art convolutional architectures for object detection are manually designed. Here we aim to learn a better architecture of feature pyramid network for object detection. We adopt Neural Architecture Search and discover a new feature pyramid architecture in a novel scalable search space covering all cross-scale connections. The discovered architecture, named NAS-FPN, consists of… ▽ More

    Submitted 15 April, 2019; originally announced April 2019.

    Comments: Accepted at CVPR 2019

  17. Measured Channel Hardening in an Indoor Multiband Scenario

    Authors: Golsa Ghiaasi, Jens Abraham, Egil Eide, Torbjörn Ekman

    Abstract: A study of channel hardening in a large-scale antenna system has been carried out by means of indoor channel measurements over four frequency bands, namely 1.472 GHz, 2.6 GHz, 3.82 GHz and 4.16 GHz. NTNU's Reconfigurable Radio Network Platform has been used to record the channel estimates for 40 single user non-line of sight radio links to a 64 element wide-band antenna array. By examining the rms… ▽ More

    Submitted 13 December, 2018; originally announced December 2018.

    Comments: 6 pages, 7 figures, presented at IEEE PIMRC 2018

  18. arXiv:1811.08560  [pdf, other

    cs.CV

    Adjustable Real-time Style Transfer

    Authors: Mohammad Babaeizadeh, Golnaz Ghiasi

    Abstract: Artistic style transfer is the problem of synthesizing an image with content similar to a given image and style similar to another. Although recent feed-forward neural networks can generate stylized images in real-time, these models produce a single stylization given a pair of style/content images, and the user doesn't have control over the synthesized output. Moreover, the style transfer depends… ▽ More

    Submitted 20 November, 2018; originally announced November 2018.

  19. arXiv:1810.12890  [pdf, other

    cs.CV

    DropBlock: A regularization method for convolutional networks

    Authors: Golnaz Ghiasi, Tsung-Yi Lin, Quoc V. Le

    Abstract: Deep neural networks often work well when they are over-parameterized and trained with a massive amount of noise and regularization, such as weight decay and dropout. Although dropout is widely used as a regularization technique for fully connected layers, it is often less effective for convolutional layers. This lack of success of dropout for convolutional layers is perhaps due to the fact that a… ▽ More

    Submitted 30 October, 2018; originally announced October 2018.

    Comments: Accepted at NIPS 2018

  20. arXiv:1705.06830  [pdf, other

    cs.CV

    Exploring the structure of a real-time, arbitrary neural artistic stylization network

    Authors: Golnaz Ghiasi, Honglak Lee, Manjunath Kudlur, Vincent Dumoulin, Jonathon Shlens

    Abstract: In this paper, we present a method which combines the flexibility of the neural algorithm of artistic style with the speed of fast style transfer networks to allow real-time stylization using any content/style image pair. We build upon recent work leveraging conditional instance normalization for multi-style transfer networks by learning to predict the conditional instance normalization parameters… ▽ More

    Submitted 24 August, 2017; v1 submitted 18 May, 2017; originally announced May 2017.

    Comments: Accepted as an oral presentation at British Machine Vision Conference (BMVC) 2017

  21. arXiv:1605.02264  [pdf, other

    cs.CV

    Laplacian Pyramid Reconstruction and Refinement for Semantic Segmentation

    Authors: Golnaz Ghiasi, Charless C. Fowlkes

    Abstract: CNN architectures have terrific recognition performance but rely on spatial pooling which makes it difficult to adapt them to tasks that require dense, pixel-accurate labeling. This paper makes two contributions: (1) We demonstrate that while the apparent spatial resolution of convolutional feature maps is low, the high-dimensional feature representation contains significant sub-pixel localization… ▽ More

    Submitted 30 July, 2016; v1 submitted 7 May, 2016; originally announced May 2016.

  22. arXiv:1506.08347  [pdf, other

    cs.CV

    Occlusion Coherence: Detecting and Localizing Occluded Faces

    Authors: Golnaz Ghiasi, Charless C. Fowlkes

    Abstract: The presence of occluders significantly impacts object recognition accuracy. However, occlusion is typically treated as an unstructured source of noise and explicit models for occluders have lagged behind those for object appearance and shape. In this paper we describe a hierarchical deformable part model for face detection and landmark localization that explicitly models part occlusion. The propo… ▽ More

    Submitted 24 August, 2016; v1 submitted 27 June, 2015; originally announced June 2015.