-
BlockLLM: Memory-Efficient Adaptation of LLMs by Selecting and Optimizing the Right Coordinate Blocks
Authors:
Amrutha Varshini Ramesh,
Vignesh Ganapathiraman,
Issam H. Laradji,
Mark Schmidt
Abstract:
Training large language models (LLMs) for pretraining or adapting to new tasks and domains has become increasingly critical as their applications expand. However, as the model and the data sizes grow, the training process presents significant memory challenges, often requiring a prohibitive amount of GPU memory that may not be readily available. Existing methods such as low-rank adaptation (LoRA)…
▽ More
Training large language models (LLMs) for pretraining or adapting to new tasks and domains has become increasingly critical as their applications expand. However, as the model and the data sizes grow, the training process presents significant memory challenges, often requiring a prohibitive amount of GPU memory that may not be readily available. Existing methods such as low-rank adaptation (LoRA) add trainable low-rank matrix factorizations, altering the training dynamics and limiting the model's parameter search to a low-rank subspace. GaLore, a more recent method, employs Gradient Low-Rank Projection to reduce the memory footprint, in the full parameter training setting. However GaLore can only be applied to a subset of the LLM layers that satisfy the "reversibility" property, thus limiting their applicability. In response to these challenges, we introduce BlockLLM, an approach inspired by block coordinate descent. Our method carefully selects and updates a very small subset of the trainable parameters without altering any part of its architecture and training procedure. BlockLLM achieves state-of-the-art performance in both finetuning and pretraining tasks, while reducing the memory footprint of the underlying optimization process. Our experiments demonstrate that fine-tuning with only less than 5% of the parameters, BlockLLM achieves state-of-the-art perplexity scores on the GLUE benchmarks. On Llama model pretrained on C4 dataset, BlockLLM is able to train with significantly less memory than the state-of-the-art, while still maintaining competitive performance.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
WorkArena: How Capable Are Web Agents at Solving Common Knowledge Work Tasks?
Authors:
Alexandre Drouin,
Maxime Gasse,
Massimo Caccia,
Issam H. Laradji,
Manuel Del Verme,
Tom Marty,
Léo Boisvert,
Megh Thakkar,
Quentin Cappart,
David Vazquez,
Nicolas Chapados,
Alexandre Lacoste
Abstract:
We study the use of large language model-based agents for interacting with software via web browsers. Unlike prior work, we focus on measuring the agents' ability to perform tasks that span the typical daily work of knowledge workers utilizing enterprise software systems. To this end, we propose WorkArena, a remote-hosted benchmark of 33 tasks based on the widely-used ServiceNow platform. We also…
▽ More
We study the use of large language model-based agents for interacting with software via web browsers. Unlike prior work, we focus on measuring the agents' ability to perform tasks that span the typical daily work of knowledge workers utilizing enterprise software systems. To this end, we propose WorkArena, a remote-hosted benchmark of 33 tasks based on the widely-used ServiceNow platform. We also introduce BrowserGym, an environment for the design and evaluation of such agents, offering a rich set of actions as well as multimodal observations. Our empirical evaluation reveals that while current agents show promise on WorkArena, there remains a considerable gap towards achieving full task automation. Notably, our analysis uncovers a significant performance disparity between open and closed-source LLMs, highlighting a critical area for future exploration and development in the field.
△ Less
Submitted 14 June, 2024; v1 submitted 12 March, 2024;
originally announced March 2024.
-
LitLLM: A Toolkit for Scientific Literature Review
Authors:
Shubham Agarwal,
Issam H. Laradji,
Laurent Charlin,
Christopher Pal
Abstract:
Conducting literature reviews for scientific papers is essential for understanding research, its limitations, and building on existing work. It is a tedious task which makes an automatic literature review generator appealing. Unfortunately, many existing works that generate such reviews using Large Language Models (LLMs) have significant limitations. They tend to hallucinate-generate non-actual in…
▽ More
Conducting literature reviews for scientific papers is essential for understanding research, its limitations, and building on existing work. It is a tedious task which makes an automatic literature review generator appealing. Unfortunately, many existing works that generate such reviews using Large Language Models (LLMs) have significant limitations. They tend to hallucinate-generate non-actual information-and ignore the latest research they have not been trained on. To address these limitations, we propose a toolkit that operates on Retrieval Augmented Generation (RAG) principles, specialized prompting and instructing techniques with the help of LLMs. Our system first initiates a web search to retrieve relevant papers by summarizing user-provided abstracts into keywords using an off-the-shelf LLM. Authors can enhance the search by supplementing it with relevant papers or keywords, contributing to a tailored retrieval process. Second, the system re-ranks the retrieved papers based on the user-provided abstract. Finally, the related work section is generated based on the re-ranked results and the abstract. There is a substantial reduction in time and effort for literature review compared to traditional methods, establishing our toolkit as an efficient alternative. Our open-source toolkit is accessible at https://github.com/shubhamagarwal92/LitLLM and Huggingface space (https://huggingface.co/spaces/shubhamagarwal92/LitLLM) with the video demo at https://youtu.be/E2ggOZBAFw0.
△ Less
Submitted 1 February, 2024;
originally announced February 2024.
-
Capture the Flag: Uncovering Data Insights with Large Language Models
Authors:
Issam Laradji,
Perouz Taslakian,
Sai Rajeswar,
Valentina Zantedeschi,
Alexandre Lacoste,
Nicolas Chapados,
David Vazquez,
Christopher Pal,
Alexandre Drouin
Abstract:
The extraction of a small number of relevant insights from vast amounts of data is a crucial component of data-driven decision-making. However, accomplishing this task requires considerable technical skills, domain expertise, and human labor. This study explores the potential of using Large Language Models (LLMs) to automate the discovery of insights in data, leveraging recent advances in reasonin…
▽ More
The extraction of a small number of relevant insights from vast amounts of data is a crucial component of data-driven decision-making. However, accomplishing this task requires considerable technical skills, domain expertise, and human labor. This study explores the potential of using Large Language Models (LLMs) to automate the discovery of insights in data, leveraging recent advances in reasoning and code generation techniques. We propose a new evaluation methodology based on a "capture the flag" principle, measuring the ability of such models to recognize meaningful and pertinent information (flags) in a dataset. We further propose two proof-of-concept agents, with different inner workings, and compare their ability to capture such flags in a real-world sales dataset. While the work reported here is preliminary, our results are sufficiently interesting to mandate future exploration by the community.
△ Less
Submitted 21 December, 2023;
originally announced December 2023.
-
StarVector: Generating Scalable Vector Graphics Code from Images
Authors:
Juan A. Rodriguez,
Shubham Agarwal,
Issam H. Laradji,
Pau Rodriguez,
David Vazquez,
Christopher Pal,
Marco Pedersoli
Abstract:
Scalable Vector Graphics (SVGs) have become integral in modern image rendering applications due to their infinite scalability in resolution, versatile usability, and editing capabilities. SVGs are particularly popular in the fields of web development and graphic design. Existing approaches for SVG modeling using deep learning often struggle with generating complex SVGs and are restricted to simple…
▽ More
Scalable Vector Graphics (SVGs) have become integral in modern image rendering applications due to their infinite scalability in resolution, versatile usability, and editing capabilities. SVGs are particularly popular in the fields of web development and graphic design. Existing approaches for SVG modeling using deep learning often struggle with generating complex SVGs and are restricted to simpler ones that require extensive processing and simplification. This paper introduces StarVector, a multimodal SVG generation model that effectively integrates Code Generation Large Language Models (CodeLLMs) and vision models. Our approach utilizes a CLIP image encoder to extract visual representations from pixel-based images, which are then transformed into visual tokens via an adapter module. These visual tokens are pre-pended to the SVG token embeddings, and the sequence is modeled by the StarCoder model using next-token prediction, effectively learning to align the visual and code tokens. This enables StarVector to generate unrestricted SVGs that accurately represent pixel images. To evaluate StarVector's performance, we present SVG-Bench, a comprehensive benchmark for evaluating SVG methods across multiple datasets and relevant metrics. Within this benchmark, we introduce novel datasets including SVG-Stack, a large-scale dataset of real-world SVG examples, and use it to pre-train StarVector as a large foundation model for SVGs. Our results demonstrate significant enhancements in visual quality and complexity handling over current methods, marking a notable advancement in SVG generation technology. Code and models: https://github.com/joanrod/star-vector
△ Less
Submitted 17 December, 2023;
originally announced December 2023.
-
LLM aided semi-supervision for Extractive Dialog Summarization
Authors:
Nishant Mishra,
Gaurav Sahu,
Iacer Calixto,
Ameen Abu-Hanna,
Issam H. Laradji
Abstract:
Generating high-quality summaries for chat dialogs often requires large labeled datasets. We propose a method to efficiently use unlabeled data for extractive summarization of customer-agent dialogs. In our method, we frame summarization as a question-answering problem and use state-of-the-art large language models (LLMs) to generate pseudo-labels for a dialog. We then use these pseudo-labels to f…
▽ More
Generating high-quality summaries for chat dialogs often requires large labeled datasets. We propose a method to efficiently use unlabeled data for extractive summarization of customer-agent dialogs. In our method, we frame summarization as a question-answering problem and use state-of-the-art large language models (LLMs) to generate pseudo-labels for a dialog. We then use these pseudo-labels to fine-tune a chat summarization model, effectively transferring knowledge from the large LLM into a smaller specialized model. We demonstrate our method on the \tweetsumm dataset, and show that using 10% of the original labelled data set we can achieve 65.9/57.0/61.0 ROUGE-1/-2/-L, whereas the current state-of-the-art trained on the entire training data set obtains 65.16/55.81/64.37 ROUGE-1/-2/-L. In other words, in the worst case (i.e., ROUGE-L) we still effectively retain 94.7% of the performance while using only 10% of the data.
△ Less
Submitted 23 November, 2023; v1 submitted 19 November, 2023;
originally announced November 2023.
-
Prompt-based Pseudo-labeling Strategy for Sample-Efficient Semi-Supervised Extractive Summarization
Authors:
Gaurav Sahu,
Olga Vechtomova,
Issam H. Laradji
Abstract:
Semi-supervised learning (SSL) is a widely used technique in scenarios where labeled data is scarce and unlabeled data is abundant. While SSL is popular for image and text classification, it is relatively underexplored for the task of extractive text summarization. Standard SSL methods follow a teacher-student paradigm to first train a classification model and then use the classifier's confidence…
▽ More
Semi-supervised learning (SSL) is a widely used technique in scenarios where labeled data is scarce and unlabeled data is abundant. While SSL is popular for image and text classification, it is relatively underexplored for the task of extractive text summarization. Standard SSL methods follow a teacher-student paradigm to first train a classification model and then use the classifier's confidence values to select pseudo-labels for the subsequent training cycle; however, such classifiers are not suitable to measure the accuracy of pseudo-labels as they lack specific tuning for evaluation, which leads to confidence values that fail to capture the semantics and correctness of the generated summary. To address this problem, we propose a prompt-based pseudo-labeling strategy with LLMs that picks unlabeled examples with more accurate pseudo-labels than using just the classifier's probability outputs. Our approach also includes a relabeling mechanism that improves the quality of pseudo-labels. We evaluate our method on three text summarization datasets: TweetSumm, WikiHow, and ArXiv/PubMed. We empirically show that a prompting-based LLM that scores and generates pseudo-labels outperforms existing SSL methods on ROUGE-1, ROUGE-2, and ROUGE-L scores on all the datasets. Furthermore, our method achieves competitive L-Eval scores (evaluation with LLaMa-3) as a fully supervised method in a data-scarce setting and outperforms fully supervised method in a data-abundant setting.
△ Less
Submitted 1 July, 2024; v1 submitted 15 November, 2023;
originally announced November 2023.
-
PromptMix: A Class Boundary Augmentation Method for Large Language Model Distillation
Authors:
Gaurav Sahu,
Olga Vechtomova,
Dzmitry Bahdanau,
Issam H. Laradji
Abstract:
Data augmentation is a widely used technique to address the problem of text classification when there is a limited amount of training data. Recent work often tackles this problem using large language models (LLMs) like GPT3 that can generate new examples given already available ones. In this work, we propose a method to generate more helpful augmented data by utilizing the LLM's abilities to follo…
▽ More
Data augmentation is a widely used technique to address the problem of text classification when there is a limited amount of training data. Recent work often tackles this problem using large language models (LLMs) like GPT3 that can generate new examples given already available ones. In this work, we propose a method to generate more helpful augmented data by utilizing the LLM's abilities to follow instructions and perform few-shot classifications. Our specific PromptMix method consists of two steps: 1) generate challenging text augmentations near class boundaries; however, generating borderline examples increases the risk of false positives in the dataset, so we 2) relabel the text augmentations using a prompting-based LLM classifier to enhance the correctness of labels in the generated data. We evaluate the proposed method in challenging 2-shot and zero-shot settings on four text classification datasets: Banking77, TREC6, Subjectivity (SUBJ), and Twitter Complaints. Our experiments show that generating and, crucially, relabeling borderline examples facilitates the transfer of knowledge of a massive LLM like GPT3.5-turbo into smaller and cheaper classifiers like DistilBERT$_{base}$ and BERT$_{base}$. Furthermore, 2-shot PromptMix outperforms multiple 5-shot data augmentation methods on the four datasets. Our code is available at https://github.com/ServiceNow/PromptMix-EMNLP-2023.
△ Less
Submitted 22 October, 2023;
originally announced October 2023.
-
TK-KNN: A Balanced Distance-Based Pseudo Labeling Approach for Semi-Supervised Intent Classification
Authors:
Nicholas Botzer,
David Vasquez,
Tim Weninger,
Issam Laradji
Abstract:
The ability to detect intent in dialogue systems has become increasingly important in modern technology. These systems often generate a large amount of unlabeled data, and manually labeling this data requires substantial human effort. Semi-supervised methods attempt to remedy this cost by using a model trained on a few labeled examples and then by assigning pseudo-labels to further a subset of unl…
▽ More
The ability to detect intent in dialogue systems has become increasingly important in modern technology. These systems often generate a large amount of unlabeled data, and manually labeling this data requires substantial human effort. Semi-supervised methods attempt to remedy this cost by using a model trained on a few labeled examples and then by assigning pseudo-labels to further a subset of unlabeled examples that has a model prediction confidence higher than a certain threshold. However, one particularly perilous consequence of these methods is the risk of picking an imbalanced set of examples across classes, which could lead to poor labels. In the present work, we describe Top-K K-Nearest Neighbor (TK-KNN), which uses a more robust pseudo-labeling approach based on distance in the embedding space while maintaining a balanced set of pseudo-labeled examples across classes through a ranking-based approach. Experiments on several datasets show that TK-KNN outperforms existing models, particularly when labeled data is scarce on popular datasets such as CLINC150 and Banking77. Code is available at https://github.com/ServiceNow/tk-knn
△ Less
Submitted 17 October, 2023;
originally announced October 2023.
-
Automatic Data Augmentation Learning using Bilevel Optimization for Histopathological Images
Authors:
Saypraseuth Mounsaveng,
Issam Laradji,
David Vázquez,
Marco Perdersoli,
Ismail Ben Ayed
Abstract:
Training a deep learning model to classify histopathological images is challenging, because of the color and shape variability of the cells and tissues, and the reduced amount of available data, which does not allow proper learning of those variations. Variations can come from the image acquisition process, for example, due to different cell staining protocols or tissue deformation. To tackle this…
▽ More
Training a deep learning model to classify histopathological images is challenging, because of the color and shape variability of the cells and tissues, and the reduced amount of available data, which does not allow proper learning of those variations. Variations can come from the image acquisition process, for example, due to different cell staining protocols or tissue deformation. To tackle this challenge, Data Augmentation (DA) can be used during training to generate additional samples by applying transformations to existing ones, to help the model become invariant to those color and shape transformations. The problem with DA is that it is not only dataset-specific but it also requires domain knowledge, which is not always available. Without this knowledge, selecting the right transformations can only be done using heuristics or through a computationally demanding search. To address this, we propose an automatic DA learning method. In this method, the DA parameters, i.e. the transformation parameters needed to improve the model training, are considered learnable and are learned automatically using a bilevel optimization approach in a quick and efficient way using truncated backpropagation. We validated the method on six different datasets. Experimental results show that our model can learn color and affine transformations that are more helpful to train an image classifier than predefined DA transformations, which are also more expensive as they need to be selected before the training by grid search on a validation set. We also show that similarly to a model trained with RandAugment, our model has also only a few method-specific hyperparameters to tune but is performing better. This makes our model a good solution for learning the best DA parameters, especially in the context of histopathological images, where defining potentially useful transformation heuristically is not trivial.
△ Less
Submitted 21 July, 2023;
originally announced July 2023.
-
FigGen: Text to Scientific Figure Generation
Authors:
Juan A Rodriguez,
David Vazquez,
Issam Laradji,
Marco Pedersoli,
Pau Rodriguez
Abstract:
The generative modeling landscape has experienced tremendous growth in recent years, particularly in generating natural images and art. Recent techniques have shown impressive potential in creating complex visual compositions while delivering impressive realism and quality. However, state-of-the-art methods have been focusing on the narrow domain of natural images, while other distributions remain…
▽ More
The generative modeling landscape has experienced tremendous growth in recent years, particularly in generating natural images and art. Recent techniques have shown impressive potential in creating complex visual compositions while delivering impressive realism and quality. However, state-of-the-art methods have been focusing on the narrow domain of natural images, while other distributions remain unexplored. In this paper, we introduce the problem of text-to-figure generation, that is creating scientific figures of papers from text descriptions. We present FigGen, a diffusion-based approach for text-to-figure as well as the main challenges of the proposed task. Code and models are available at https://github.com/joanrod/figure-diffusion
△ Less
Submitted 17 December, 2023; v1 submitted 1 June, 2023;
originally announced June 2023.
-
Language Decision Transformers with Exponential Tilt for Interactive Text Environments
Authors:
Nicolas Gontier,
Pau Rodriguez,
Issam Laradji,
David Vazquez,
Christopher Pal
Abstract:
Text-based game environments are challenging because agents must deal with long sequences of text, execute compositional actions using text and learn from sparse rewards. We address these challenges by proposing Language Decision Transformers (LDTs), a framework that is based on transformer language models and decision transformers (DTs). Our LDTs extend DTs with 3 components: (1) exponential tilt…
▽ More
Text-based game environments are challenging because agents must deal with long sequences of text, execute compositional actions using text and learn from sparse rewards. We address these challenges by proposing Language Decision Transformers (LDTs), a framework that is based on transformer language models and decision transformers (DTs). Our LDTs extend DTs with 3 components: (1) exponential tilt to guide the agent towards high obtainable goals, (2) novel goal conditioning methods yielding better results than the traditional return-to-go (sum of all future rewards), and (3) a model of future observations that improves agent performance. LDTs are the first to address offline RL with DTs on these challenging games. Our experiments show that LDTs achieve the highest scores among many different types of agents on some of the most challenging Jericho games, such as Enchanter.
△ Less
Submitted 17 November, 2023; v1 submitted 10 February, 2023;
originally announced February 2023.
-
OCR-VQGAN: Taming Text-within-Image Generation
Authors:
Juan A. Rodriguez,
David Vazquez,
Issam Laradji,
Marco Pedersoli,
Pau Rodriguez
Abstract:
Synthetic image generation has recently experienced significant improvements in domains such as natural image or art generation. However, the problem of figure and diagram generation remains unexplored. A challenging aspect of generating figures and diagrams is effectively rendering readable texts within the images. To alleviate this problem, we present OCR-VQGAN, an image encoder, and decoder tha…
▽ More
Synthetic image generation has recently experienced significant improvements in domains such as natural image or art generation. However, the problem of figure and diagram generation remains unexplored. A challenging aspect of generating figures and diagrams is effectively rendering readable texts within the images. To alleviate this problem, we present OCR-VQGAN, an image encoder, and decoder that leverages OCR pre-trained features to optimize a text perceptual loss, encouraging the architecture to preserve high-fidelity text and diagram structure. To explore our approach, we introduce the Paper2Fig100k dataset, with over 100k images of figures and texts from research papers. The figures show architecture diagrams and methodologies of articles available at arXiv.org from fields like artificial intelligence and computer vision. Figures usually include text and discrete objects, e.g., boxes in a diagram, with lines and arrows that connect them. We demonstrate the effectiveness of OCR-VQGAN by conducting several experiments on the task of figure reconstruction. Additionally, we explore the qualitative and quantitative impact of weighting different perceptual metrics in the overall loss function. We release code, models, and dataset at https://github.com/joanrod/ocr-vqgan.
△ Less
Submitted 21 October, 2022; v1 submitted 19 October, 2022;
originally announced October 2022.
-
Constraining Representations Yields Models That Know What They Don't Know
Authors:
Joao Monteiro,
Pau Rodriguez,
Pierre-Andre Noel,
Issam Laradji,
David Vazquez
Abstract:
A well-known failure mode of neural networks is that they may confidently return erroneous predictions. Such unsafe behaviour is particularly frequent when the use case slightly differs from the training context, and/or in the presence of an adversary. This work presents a novel direction to address these issues in a broad, general manner: imposing class-aware constraints on a model's internal act…
▽ More
A well-known failure mode of neural networks is that they may confidently return erroneous predictions. Such unsafe behaviour is particularly frequent when the use case slightly differs from the training context, and/or in the presence of an adversary. This work presents a novel direction to address these issues in a broad, general manner: imposing class-aware constraints on a model's internal activation patterns. Specifically, we assign to each class a unique, fixed, randomly-generated binary vector - hereafter called class code - and train the model so that its cross-depths activation patterns predict the appropriate class code according to the input sample's class. The resulting predictors are dubbed Total Activation Classifiers (TAC), and TACs may either be trained from scratch, or used with negligible cost as a thin add-on on top of a frozen, pre-trained neural network. The distance between a TAC's activation pattern and the closest valid code acts as an additional confidence score, besides the default unTAC'ed prediction head's. In the add-on case, the original neural network's inference head is completely unaffected (so its accuracy remains the same) but we now have the option to use TAC's own confidence and prediction when determining which course of action to take in an hypothetical production workflow. In particular, we show that TAC strictly improves the value derived from models allowed to reject/defer. We provide further empirical evidence that TAC works well on multiple types of architectures and data modalities and that it is at least as good as state-of-the-art alternative confidence scores derived from existing models.
△ Less
Submitted 19 April, 2023; v1 submitted 30 August, 2022;
originally announced August 2022.
-
Workflow Discovery from Dialogues in the Low Data Regime
Authors:
Amine El Hattami,
Stefania Raimondo,
Issam Laradji,
David Vazquez,
Pau Rodriguez,
Chris Pal
Abstract:
Text-based dialogues are now widely used to solve real-world problems. In cases where solution strategies are already known, they can sometimes be codified into workflows and used to guide humans or artificial agents through the task of hel** clients. We introduce a new problem formulation that we call Workflow Discovery (WD) in which we are interested in the situation where a formal workflow ma…
▽ More
Text-based dialogues are now widely used to solve real-world problems. In cases where solution strategies are already known, they can sometimes be codified into workflows and used to guide humans or artificial agents through the task of hel** clients. We introduce a new problem formulation that we call Workflow Discovery (WD) in which we are interested in the situation where a formal workflow may not yet exist. Still, we wish to discover the set of actions that have been taken to resolve a particular problem. We also examine a sequence-to-sequence (Seq2Seq) approach for this novel task. We present experiments where we extract workflows from dialogues in the Action-Based Conversations Dataset (ABCD). Since the ABCD dialogues follow known workflows to guide agents, we can evaluate our ability to extract such workflows using ground truth sequences of actions. We propose and evaluate an approach that conditions models on the set of possible actions, and we show that using this strategy, we can improve WD performance. Our conditioning approach also improves zero-shot and few-shot WD performance when transferring learned models to unseen domains within and across datasets. Further, on ABCD a modified variant of our Seq2Seq method achieves state-of-the-art performance on related but different problems of Action State Tracking (AST) and Cascading Dialogue Success (CDS) across many evaluation metrics.
△ Less
Submitted 11 February, 2023; v1 submitted 23 May, 2022;
originally announced May 2022.
-
Data Augmentation for Intent Classification with Off-the-shelf Large Language Models
Authors:
Gaurav Sahu,
Pau Rodriguez,
Issam H. Laradji,
Parmida Atighehchian,
David Vazquez,
Dzmitry Bahdanau
Abstract:
Data augmentation is a widely employed technique to alleviate the problem of data scarcity. In this work, we propose a prompting-based approach to generate labelled training data for intent classification with off-the-shelf language models (LMs) such as GPT-3. An advantage of this method is that no task-specific LM-fine-tuning for data generation is required; hence the method requires no hyper-par…
▽ More
Data augmentation is a widely employed technique to alleviate the problem of data scarcity. In this work, we propose a prompting-based approach to generate labelled training data for intent classification with off-the-shelf language models (LMs) such as GPT-3. An advantage of this method is that no task-specific LM-fine-tuning for data generation is required; hence the method requires no hyper-parameter tuning and is applicable even when the available training data is very scarce. We evaluate the proposed method in a few-shot setting on four diverse intent classification tasks. We find that GPT-generated data significantly boosts the performance of intent classifiers when intents in consideration are sufficiently distinct from each other. In tasks with semantically close intents, we observe that the generated data is less helpful. Our analysis shows that this is because GPT often generates utterances that belong to a closely-related intent instead of the desired one. We present preliminary evidence that a prompting-based GPT classifier could be helpful in filtering the generated data to enhance its quality.
△ Less
Submitted 4 April, 2022;
originally announced April 2022.
-
Overcoming challenges in leveraging GANs for few-shot data augmentation
Authors:
Christopher Beckham,
Issam Laradji,
Pau Rodriguez,
David Vazquez,
Derek Nowrouzezahrai,
Christopher Pal
Abstract:
In this paper, we explore the use of GAN-based few-shot data augmentation as a method to improve few-shot classification performance. We perform an exploration into how a GAN can be fine-tuned for such a task (one of which is in a class-incremental manner), as well as a rigorous empirical investigation into how well these models can perform to improve few-shot classification. We identify issues re…
▽ More
In this paper, we explore the use of GAN-based few-shot data augmentation as a method to improve few-shot classification performance. We perform an exploration into how a GAN can be fine-tuned for such a task (one of which is in a class-incremental manner), as well as a rigorous empirical investigation into how well these models can perform to improve few-shot classification. We identify issues related to the difficulty of training such generative models under a purely supervised regime with very few examples, as well as issues regarding the evaluation protocols of existing works. We also find that in this regime, classification accuracy is highly sensitive to how the classes of the dataset are randomly split. Therefore, we propose a semi-supervised fine-tuning approach as a more pragmatic way forward to address these problems.
△ Less
Submitted 8 August, 2022; v1 submitted 30 March, 2022;
originally announced March 2022.
-
Kubric: A scalable dataset generator
Authors:
Klaus Greff,
Francois Belletti,
Lucas Beyer,
Carl Doersch,
Yilun Du,
Daniel Duckworth,
David J. Fleet,
Dan Gnanapragasam,
Florian Golemo,
Charles Herrmann,
Thomas Kipf,
Abhijit Kundu,
Dmitry Lagun,
Issam Laradji,
Hsueh-Ti,
Liu,
Henning Meyer,
Yishu Miao,
Derek Nowrouzezahrai,
Cengiz Oztireli,
Etienne Pot,
Noha Radwan,
Daniel Rebain,
Sara Sabour,
Mehdi S. M. Sajjadi
, et al. (10 additional authors not shown)
Abstract:
Data is the driving force of machine learning, with the amount and quality of training data often being more important for the performance of a system than architecture and training details. But collecting, processing and annotating real data at scale is difficult, expensive, and frequently raises additional privacy, fairness and legal concerns. Synthetic data is a powerful tool with the potential…
▽ More
Data is the driving force of machine learning, with the amount and quality of training data often being more important for the performance of a system than architecture and training details. But collecting, processing and annotating real data at scale is difficult, expensive, and frequently raises additional privacy, fairness and legal concerns. Synthetic data is a powerful tool with the potential to address these shortcomings: 1) it is cheap 2) supports rich ground-truth annotations 3) offers full control over data and 4) can circumvent or mitigate problems regarding bias, privacy and licensing. Unfortunately, software tools for effective data generation are less mature than those for architecture design and training, which leads to fragmented generation efforts. To address these problems we introduce Kubric, an open-source Python framework that interfaces with PyBullet and Blender to generate photo-realistic scenes, with rich annotations, and seamlessly scales to large jobs distributed over thousands of machines, and generating TBs of data. We demonstrate the effectiveness of Kubric by presenting a series of 13 different generated datasets for tasks ranging from studying 3D NeRF models to optical flow estimation. We release Kubric, the used assets, all of the generation code, as well as the rendered datasets for reuse and modification.
△ Less
Submitted 7 March, 2022;
originally announced March 2022.
-
Neural Point Light Fields
Authors:
Julian Ost,
Issam Laradji,
Alejandro Newell,
Yuval Bahat,
Felix Heide
Abstract:
We introduce Neural Point Light Fields that represent scenes implicitly with a light field living on a sparse point cloud. Combining differentiable volume rendering with learned implicit density representations has made it possible to synthesize photo-realistic images for novel views of small scenes. As neural volumetric rendering methods require dense sampling of the underlying functional scene r…
▽ More
We introduce Neural Point Light Fields that represent scenes implicitly with a light field living on a sparse point cloud. Combining differentiable volume rendering with learned implicit density representations has made it possible to synthesize photo-realistic images for novel views of small scenes. As neural volumetric rendering methods require dense sampling of the underlying functional scene representation, at hundreds of samples along a ray cast through the volume, they are fundamentally limited to small scenes with the same objects projected to hundreds of training views. Promoting sparse point clouds to neural implicit light fields allows us to represent large scenes effectively with only a single radiance evaluation per ray. These point light fields are a function of the ray direction, and local point feature neighborhood, allowing us to interpolate the light field conditioned training images without dense object coverage and parallax. We assess the proposed method for novel view synthesis on large driving scenarios, where we synthesize realistic unseen views that existing implicit approaches fail to represent. We validate that Neural Point Light Fields make it possible to predict videos along unseen trajectories previously only feasible to generate by explicitly modeling the scene.
△ Less
Submitted 7 June, 2022; v1 submitted 2 December, 2021;
originally announced December 2021.
-
Stochastic Mirror Descent: Convergence Analysis and Adaptive Variants via the Mirror Stochastic Polyak Stepsize
Authors:
Ryan D'Orazio,
Nicolas Loizou,
Issam Laradji,
Ioannis Mitliagkas
Abstract:
We investigate the convergence of stochastic mirror descent (SMD) under interpolation in relatively smooth and smooth convex optimization. In relatively smooth convex optimization we provide new convergence guarantees for SMD with a constant stepsize. For smooth convex optimization we propose a new adaptive stepsize scheme -- the mirror stochastic Polyak stepsize (mSPS). Notably, our convergence r…
▽ More
We investigate the convergence of stochastic mirror descent (SMD) under interpolation in relatively smooth and smooth convex optimization. In relatively smooth convex optimization we provide new convergence guarantees for SMD with a constant stepsize. For smooth convex optimization we propose a new adaptive stepsize scheme -- the mirror stochastic Polyak stepsize (mSPS). Notably, our convergence results in both settings do not make bounded gradient assumptions or bounded variance assumptions, and we show convergence to a neighborhood that vanishes under interpolation. Consequently, these results correspond to the first convergence guarantees under interpolation for the exponentiated gradient algorithm for fixed or adaptive stepsizes. mSPS generalizes the recently proposed stochastic Polyak stepsize (SPS) (Loizou et al. 2021) to mirror descent and remains both practical and efficient for modern machine learning applications while inheriting the benefits of mirror descent. We complement our results with experiments across various supervised learning tasks and different instances of SMD, demonstrating the effectiveness of mSPS.
△ Less
Submitted 24 May, 2023; v1 submitted 28 October, 2021;
originally announced October 2021.
-
A Survey of Self-Supervised and Few-Shot Object Detection
Authors:
Gabriel Huang,
Issam Laradji,
David Vazquez,
Simon Lacoste-Julien,
Pau Rodriguez
Abstract:
Labeling data is often expensive and time-consuming, especially for tasks such as object detection and instance segmentation, which require dense labeling of the image. While few-shot object detection is about training a model on novel (unseen) object classes with little data, it still requires prior training on many labeled examples of base (seen) classes. On the other hand, self-supervised metho…
▽ More
Labeling data is often expensive and time-consuming, especially for tasks such as object detection and instance segmentation, which require dense labeling of the image. While few-shot object detection is about training a model on novel (unseen) object classes with little data, it still requires prior training on many labeled examples of base (seen) classes. On the other hand, self-supervised methods aim at learning representations from unlabeled data which transfer well to downstream tasks such as object detection. Combining few-shot and self-supervised object detection is a promising research direction. In this survey, we review and characterize the most recent approaches on few-shot and self-supervised object detection. Then, we give our main takeaways and discuss future research directions. Project page at https://gabrielhuang.github.io/fsod-survey/
△ Less
Submitted 23 August, 2022; v1 submitted 27 October, 2021;
originally announced October 2021.
-
A Deep Learning Localization Method for Measuring Abdominal Muscle Dimensions in Ultrasound Images
Authors:
Alzayat Saleh,
Issam H. Laradji,
Corey Lammie,
David Vazquez,
Carol A Flavell,
Mostafa Rahimi Azghadi
Abstract:
Health professionals extensively use Two- Dimensional (2D) Ultrasound (US) videos and images to visualize and measure internal organs for various purposes including evaluation of muscle architectural changes. US images can be used to measure abdominal muscles dimensions for the diagnosis and creation of customized treatment plans for patients with Low Back Pain (LBP), however, they are difficult t…
▽ More
Health professionals extensively use Two- Dimensional (2D) Ultrasound (US) videos and images to visualize and measure internal organs for various purposes including evaluation of muscle architectural changes. US images can be used to measure abdominal muscles dimensions for the diagnosis and creation of customized treatment plans for patients with Low Back Pain (LBP), however, they are difficult to interpret. Due to high variability, skilled professionals with specialized training are required to take measurements to avoid low intra-observer reliability. This variability stems from the challenging nature of accurately finding the correct spatial location of measurement endpoints in abdominal US images. In this paper, we use a Deep Learning (DL) approach to automate the measurement of the abdominal muscle thickness in 2D US images. By treating the problem as a localization task, we develop a modified Fully Convolutional Network (FCN) architecture to generate blobs of coordinate locations of measurement endpoints, similar to what a human operator does. We demonstrate that using the TrA400 US image dataset, our network achieves a Mean Absolute Error (MAE) of 0.3125 on the test set, which almost matches the performance of skilled ultrasound technicians. Our approach can facilitate next steps for automating the process of measurements in 2D US images, while reducing inter-observer as well as intra-observer variability for more effective clinical outcomes.
△ Less
Submitted 30 September, 2021;
originally announced September 2021.
-
SSR: Semi-supervised Soft Rasterizer for single-view 2D to 3D Reconstruction
Authors:
Issam Laradji,
Pau Rodríguez,
David Vazquez,
Derek Nowrouzezahrai
Abstract:
Recent work has made significant progress in learning object meshes with weak supervision. Soft Rasterization methods have achieved accurate 3D reconstruction from 2D images with viewpoint supervision only. In this work, we further reduce the labeling effort by allowing such 3D reconstruction methods leverage unlabeled images. In order to obtain the viewpoints for these unlabeled images, we propos…
▽ More
Recent work has made significant progress in learning object meshes with weak supervision. Soft Rasterization methods have achieved accurate 3D reconstruction from 2D images with viewpoint supervision only. In this work, we further reduce the labeling effort by allowing such 3D reconstruction methods leverage unlabeled images. In order to obtain the viewpoints for these unlabeled images, we propose to use a Siamese network that takes two images as input and outputs whether they correspond to the same viewpoint. During training, we minimize the cross entropy loss to maximize the probability of predicting whether a pair of images belong to the same viewpoint or not. To get the viewpoint of a new image, we compare it against different viewpoints obtained from the training samples and select the viewpoint with the highest matching probability. We finally label the unlabeled images with the most confident predicted viewpoint and train a deep network that has a differentiable rasterization layer. Our experiments show that even labeling only two objects yields significant improvement in IoU for ShapeNet when leveraging unlabeled examples. Code is available at https://github.com/IssamLaradji/SSR.
△ Less
Submitted 21 August, 2021;
originally announced August 2021.
-
Beyond Trivial Counterfactual Explanations with Diverse Valuable Explanations
Authors:
Pau Rodriguez,
Massimo Caccia,
Alexandre Lacoste,
Lee Zamparo,
Issam Laradji,
Laurent Charlin,
David Vazquez
Abstract:
Explainability for machine learning models has gained considerable attention within the research community given the importance of deploying more reliable machine-learning systems. In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction, providing details about the model's decision-making. Current methods tend to generate…
▽ More
Explainability for machine learning models has gained considerable attention within the research community given the importance of deploying more reliable machine-learning systems. In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction, providing details about the model's decision-making. Current methods tend to generate trivial counterfactuals about a model's decisions, as they often suggest to exaggerate or remove the presence of the attribute being classified. For the machine learning practitioner, these types of counterfactuals offer little value, since they provide no new information about undesired model or data biases. In this work, we identify the problem of trivial counterfactual generation and we propose DiVE to alleviate it. DiVE learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss to uncover multiple valuable explanations about the model's prediction. Further, we introduce a mechanism to prevent the model from producing trivial explanations. Experiments on CelebA and Synbols demonstrate that our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods. Code is available at https://github.com/ElementAI/beyond-trivial-explanations.
△ Less
Submitted 11 November, 2021; v1 submitted 18 March, 2021;
originally announced March 2021.
-
Counting Cows: Tracking Illegal Cattle Ranching From High-Resolution Satellite Imagery
Authors:
Issam Laradji,
Pau Rodriguez,
Freddie Kalaitzis,
David Vazquez,
Ross Young,
Ed Davey,
Alexandre Lacoste
Abstract:
Cattle farming is responsible for 8.8\% of greenhouse gas emissions worldwide. In addition to the methane emitted due to their digestive process, the growing need for grazing areas is an important driver of deforestation. While some regulations are in place for preserving the Amazon against deforestation, these are being flouted in various ways, hence the need to scale and automate the monitoring…
▽ More
Cattle farming is responsible for 8.8\% of greenhouse gas emissions worldwide. In addition to the methane emitted due to their digestive process, the growing need for grazing areas is an important driver of deforestation. While some regulations are in place for preserving the Amazon against deforestation, these are being flouted in various ways, hence the need to scale and automate the monitoring of cattle ranching activities. Through a partnership with \textit{Global Witness}, we explore the feasibility of tracking and counting cattle at the continental scale from satellite imagery. With a license from Maxar Technologies, we obtained satellite imagery of the Amazon at 40cm resolution, and compiled a dataset of 903 images containing a total of 28498 cattle. Our experiments show promising results and highlight important directions for the next steps on both counting algorithms and the data collection process for solving such challenges. The code is available at \url{https://github.com/IssamLaradji/cownter_strike}.
△ Less
Submitted 14 November, 2020;
originally announced November 2020.
-
Affinity LCFCN: Learning to Segment Fish with Weak Supervision
Authors:
Issam Laradji,
Alzayat Saleh,
Pau Rodriguez,
Derek Nowrouzezahrai,
Mostafa Rahimi Azghadi,
David Vazquez
Abstract:
Aquaculture industries rely on the availability of accurate fish body measurements, e.g., length, width and mass. Manual methods that rely on physical tools like rulers are time and labour intensive. Leading automatic approaches rely on fully-supervised segmentation models to acquire these measurements but these require collecting per-pixel labels -- also time consuming and laborious: i.e., it can…
▽ More
Aquaculture industries rely on the availability of accurate fish body measurements, e.g., length, width and mass. Manual methods that rely on physical tools like rulers are time and labour intensive. Leading automatic approaches rely on fully-supervised segmentation models to acquire these measurements but these require collecting per-pixel labels -- also time consuming and laborious: i.e., it can take up to two minutes per fish to generate accurate segmentation labels, almost always requiring at least some manual intervention. We propose an automatic segmentation model efficiently trained on images labeled with only point-level supervision, where each fish is annotated with a single click. This labeling process requires significantly less manual intervention, averaging roughly one second per fish. Our approach uses a fully convolutional neural network with one branch that outputs per-pixel scores and another that outputs an affinity matrix. We aggregate these two outputs using a random walk to obtain the final, refined per-pixel segmentation output. We train the entire model end-to-end with an LCFCN loss, resulting in our A-LCFCN method. We validate our model on the DeepFish dataset, which contains many fish habitats from the north-eastern Australian region. Our experimental results confirm that A-LCFCN outperforms a fully-supervised segmentation model at fixed annotation budget. Moreover, we show that A-LCFCN achieves better segmentation results than LCFCN and a standard baseline. We have released the code at \url{https://github.com/IssamLaradji/affinity_lcfcn}.
△ Less
Submitted 5 November, 2020;
originally announced November 2020.
-
CVPR 2020 Continual Learning in Computer Vision Competition: Approaches, Results, Current Challenges and Future Directions
Authors:
Vincenzo Lomonaco,
Lorenzo Pellegrini,
Pau Rodriguez,
Massimo Caccia,
Qi She,
Yu Chen,
Quentin Jodelet,
Rui** Wang,
Zheda Mai,
David Vazquez,
German I. Parisi,
Nikhil Churamani,
Marc Pickett,
Issam Laradji,
Davide Maltoni
Abstract:
In the last few years, we have witnessed a renewed and fast-growing interest in continual learning with deep neural networks with the shared objective of making current AI systems more adaptive, efficient and autonomous. However, despite the significant and undoubted progress of the field in addressing the issue of catastrophic forgetting, benchmarking different continual learning approaches is a…
▽ More
In the last few years, we have witnessed a renewed and fast-growing interest in continual learning with deep neural networks with the shared objective of making current AI systems more adaptive, efficient and autonomous. However, despite the significant and undoubted progress of the field in addressing the issue of catastrophic forgetting, benchmarking different continual learning approaches is a difficult task by itself. In fact, given the proliferation of different settings, training and evaluation protocols, metrics and nomenclature, it is often tricky to properly characterize a continual learning algorithm, relate it to other solutions and gauge its real-world applicability. The first Continual Learning in Computer Vision challenge held at CVPR in 2020 has been one of the first opportunities to evaluate different continual learning algorithms on a common hardware with a large set of shared evaluation metrics and 3 different settings based on the realistic CORe50 video benchmark. In this paper, we report the main results of the competition, which counted more than 79 teams registered, 11 finalists and 2300$ in prizes. We also summarize the winning approaches, current challenges and future research directions.
△ Less
Submitted 14 September, 2020;
originally announced September 2020.
-
Synbols: Probing Learning Algorithms with Synthetic Datasets
Authors:
Alexandre Lacoste,
Pau Rodríguez,
Frédéric Branchaud-Charron,
Parmida Atighehchian,
Massimo Caccia,
Issam Laradji,
Alexandre Drouin,
Matt Craddock,
Laurent Charlin,
David Vázquez
Abstract:
Progress in the field of machine learning has been fueled by the introduction of benchmark datasets pushing the limits of existing algorithms. Enabling the design of datasets to test specific properties and failure modes of learning algorithms is thus a problem of high interest, as it has a direct impact on innovation in the field. In this sense, we introduce Synbols -- Synthetic Symbols -- a tool…
▽ More
Progress in the field of machine learning has been fueled by the introduction of benchmark datasets pushing the limits of existing algorithms. Enabling the design of datasets to test specific properties and failure modes of learning algorithms is thus a problem of high interest, as it has a direct impact on innovation in the field. In this sense, we introduce Synbols -- Synthetic Symbols -- a tool for rapidly generating new datasets with a rich composition of latent features rendered in low resolution images. Synbols leverages the large amount of symbols available in the Unicode standard and the wide range of artistic font provided by the open font community. Our tool's high-level interface provides a language for rapidly generating new distributions on the latent features, including various types of textures and occlusions. To showcase the versatility of Synbols, we use it to dissect the limitations and flaws in standard learning algorithms in various learning setups including supervised learning, active learning, out of distribution generalization, unsupervised representation learning, and object counting.
△ Less
Submitted 4 November, 2020; v1 submitted 14 September, 2020;
originally announced September 2020.
-
A Realistic Fish-Habitat Dataset to Evaluate Algorithms for Underwater Visual Analysis
Authors:
Alzayat Saleh,
Issam H. Laradji,
Dmitry A. Konovalov,
Michael Bradley,
David Vazquez,
Marcus Sheaves
Abstract:
Visual analysis of complex fish habitats is an important step towards sustainable fisheries for human consumption and environmental protection. Deep Learning methods have shown great promise for scene analysis when trained on large-scale datasets. However, current datasets for fish analysis tend to focus on the classification task within constrained, plain environments which do not capture the com…
▽ More
Visual analysis of complex fish habitats is an important step towards sustainable fisheries for human consumption and environmental protection. Deep Learning methods have shown great promise for scene analysis when trained on large-scale datasets. However, current datasets for fish analysis tend to focus on the classification task within constrained, plain environments which do not capture the complexity of underwater fish habitats. To address this limitation, we present DeepFish as a benchmark suite with a large-scale dataset to train and test methods for several computer vision tasks. The dataset consists of approximately 40 thousand images collected underwater from 20 \green{habitats in the} marine-environments of tropical Australia. The dataset originally contained only classification labels. Thus, we collected point-level and segmentation labels to have a more comprehensive fish analysis benchmark. These labels enable models to learn to automatically monitor fish count, identify their locations, and estimate their sizes. Our experiments provide an in-depth analysis of the dataset characteristics, and the performance evaluation of several state-of-the-art approaches based on our benchmark. Although models pre-trained on ImageNet have successfully performed on this benchmark, there is still room for improvement. Therefore, this benchmark serves as a testbed to motivate further development in this challenging domain of underwater computer vision. Code is available at: https://github.com/alzayats/DeepFish
△ Less
Submitted 28 August, 2020;
originally announced August 2020.
-
A Weakly Supervised Region-Based Active Learning Method for COVID-19 Segmentation in CT Images
Authors:
Issam Laradji,
Pau Rodriguez,
Frederic Branchaud-Charron,
Keegan Lensink,
Parmida Atighehchian,
William Parker,
David Vazquez,
Derek Nowrouzezahrai
Abstract:
One of the key challenges in the battle against the Coronavirus (COVID-19) pandemic is to detect and quantify the severity of the disease in a timely manner. Computed tomographies (CT) of the lungs are effective for assessing the state of the infection. Unfortunately, labeling CT scans can take a lot of time and effort, with up to 150 minutes per scan. We address this challenge introducing a scala…
▽ More
One of the key challenges in the battle against the Coronavirus (COVID-19) pandemic is to detect and quantify the severity of the disease in a timely manner. Computed tomographies (CT) of the lungs are effective for assessing the state of the infection. Unfortunately, labeling CT scans can take a lot of time and effort, with up to 150 minutes per scan. We address this challenge introducing a scalable, fast, and accurate active learning system that accelerates the labeling of CT scan images. Conventionally, active learning methods require the labelers to annotate whole images with full supervision, but that can lead to wasted efforts as many of the annotations could be redundant. Thus, our system presents the annotator with unlabeled regions that promise high information content and low annotation cost. Further, the system allows annotators to label regions using point-level supervision, which is much cheaper to acquire than per-pixel annotations. Our experiments on open-source COVID-19 datasets show that using an entropy-based method to rank unlabeled regions yields to significantly better results than random labeling of these regions. Also, we show that labeling small regions of images is more efficient than labeling whole images. Finally, we show that with only 7\% of the labeling effort required to label the whole training set gives us around 90\% of the performance obtained by training the model on the fully annotated training set. Code is available at: \url{https://github.com/IssamLaradji/covid19_active_learning}.
△ Less
Submitted 7 July, 2020;
originally announced July 2020.
-
Segmentation of Pulmonary Opacification in Chest CT Scans of COVID-19 Patients
Authors:
Keegan Lensink,
Issam Laradji,
Marco Law,
Paolo Emilio Barbano,
Savvas Nicolaou,
William Parker,
Eldad Haber
Abstract:
The Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) has rapidly spread into a global pandemic. A form of pneumonia, presenting as opacities with in a patient's lungs, is the most common presentation associated with this virus, and great attention has gone into how these changes relate to patient morbidity and mortality. In this work we provide open source models for the segmentation o…
▽ More
The Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) has rapidly spread into a global pandemic. A form of pneumonia, presenting as opacities with in a patient's lungs, is the most common presentation associated with this virus, and great attention has gone into how these changes relate to patient morbidity and mortality. In this work we provide open source models for the segmentation of patterns of pulmonary opacification on chest Computed Tomography (CT) scans which have been correlated with various stages and severities of infection. We have collected 663 chest CT scans of COVID-19 patients from healthcare centers around the world, and created pixel wise segmentation labels for nearly 25,000 slices that segment 6 different patterns of pulmonary opacification. We provide open source implementations and pre-trained weights for multiple segmentation models trained on our dataset. Our best model achieves an opacity Intersection-Over-Union score of 0.76 on our test set, demonstrates successful domain adaptation, and predicts the volume of opacification within 1.7\% of expert radiologists. Additionally, we present an analysis of the inter-observer variability inherent to this task, and propose methods for appropriate probabilistic approaches.
△ Less
Submitted 8 July, 2020; v1 submitted 7 July, 2020;
originally announced July 2020.
-
A Weakly Supervised Consistency-based Learning Method for COVID-19 Segmentation in CT Images
Authors:
Issam Laradji,
Pau Rodriguez,
Oscar Mañas,
Keegan Lensink,
Marco Law,
Lironne Kurzman,
William Parker,
David Vazquez,
Derek Nowrouzezahrai
Abstract:
Coronavirus Disease 2019 (COVID-19) has spread aggressively across the world causing an existential health crisis. Thus, having a system that automatically detects COVID-19 in tomography (CT) images can assist in quantifying the severity of the illness. Unfortunately, labelling chest CT scans requires significant domain expertise, time, and effort. We address these labelling challenges by only req…
▽ More
Coronavirus Disease 2019 (COVID-19) has spread aggressively across the world causing an existential health crisis. Thus, having a system that automatically detects COVID-19 in tomography (CT) images can assist in quantifying the severity of the illness. Unfortunately, labelling chest CT scans requires significant domain expertise, time, and effort. We address these labelling challenges by only requiring point annotations, a single pixel for each infected region on a CT image. This labeling scheme allows annotators to label a pixel in a likely infected region, only taking 1-3 seconds, as opposed to 10-15 seconds to segment a region. Conventionally, segmentation models train on point-level annotations using the cross-entropy loss function on these labels. However, these models often suffer from low precision. Thus, we propose a consistency-based (CB) loss function that encourages the output predictions to be consistent with spatial transformations of the input images. The experiments on 3 open-source COVID-19 datasets show that this loss function yields significant improvement over conventional point-level loss functions and almost matches the performance of models trained with full supervision with much less human effort. Code is available at: \url{https://github.com/IssamLaradji/covid19_weak_supervision}.
△ Less
Submitted 7 July, 2020; v1 submitted 4 July, 2020;
originally announced July 2020.
-
LOOC: Localize Overlap** Objects with Count Supervision
Authors:
Issam H. Laradji,
Rafael Pardinas,
Pau Rodriguez,
David Vazquez
Abstract:
Acquiring count annotations generally requires less human effort than point-level and bounding box annotations. Thus, we propose the novel problem setup of localizing objects in dense scenes under this weaker supervision. We propose LOOC, a method to Localize Overlap** Objects with Count supervision. We train LOOC by alternating between two stages. In the first stage, LOOC learns to generate pse…
▽ More
Acquiring count annotations generally requires less human effort than point-level and bounding box annotations. Thus, we propose the novel problem setup of localizing objects in dense scenes under this weaker supervision. We propose LOOC, a method to Localize Overlap** Objects with Count supervision. We train LOOC by alternating between two stages. In the first stage, LOOC learns to generate pseudo point-level annotations in a semi-supervised manner. In the second stage, LOOC uses a fully-supervised localization method that trains on these pseudo labels. The localization method is used to progressively improve the quality of the pseudo labels. We conducted experiments on popular counting datasets. For localization, LOOC achieves a strong new baseline in the novel problem setup where only count supervision is available. For counting, LOOC outperforms current state-of-the-art methods that only use count as their supervision. Code is available at: https://github.com/ElementAI/looc.
△ Less
Submitted 3 July, 2020;
originally announced July 2020.
-
Learning Data Augmentation with Online Bilevel Optimization for Image Classification
Authors:
Saypraseuth Mounsaveng,
Issam Laradji,
Ismail Ben Ayed,
David Vazquez,
Marco Pedersoli
Abstract:
Data augmentation is a key practice in machine learning for improving generalization performance. However, finding the best data augmentation hyperparameters requires domain knowledge or a computationally demanding search. We address this issue by proposing an efficient approach to automatically train a network that learns an effective distribution of transformations to improve its generalization.…
▽ More
Data augmentation is a key practice in machine learning for improving generalization performance. However, finding the best data augmentation hyperparameters requires domain knowledge or a computationally demanding search. We address this issue by proposing an efficient approach to automatically train a network that learns an effective distribution of transformations to improve its generalization. Using bilevel optimization, we directly optimize the data augmentation parameters using a validation set. This framework can be used as a general solution to learn the optimal data augmentation jointly with an end task model like a classifier. Results show that our joint training method produces an image classification accuracy that is comparable to or better than carefully hand-crafted data augmentation. Yet, it does not need an expensive external validation loop on the data augmentation hyperparameters.
△ Less
Submitted 10 November, 2020; v1 submitted 25 June, 2020;
originally announced June 2020.
-
Adaptive Gradient Methods Converge Faster with Over-Parameterization (but you should do a line-search)
Authors:
Sharan Vaswani,
Issam Laradji,
Frederik Kunstner,
Si Yi Meng,
Mark Schmidt,
Simon Lacoste-Julien
Abstract:
Adaptive gradient methods are typically used for training over-parameterized models. To better understand their behaviour, we study a simplistic setting -- smooth, convex losses with models over-parameterized enough to interpolate the data. In this setting, we prove that AMSGrad with constant step-size and momentum converges to the minimizer at a faster $O(1/T)$ rate. When interpolation is only ap…
▽ More
Adaptive gradient methods are typically used for training over-parameterized models. To better understand their behaviour, we study a simplistic setting -- smooth, convex losses with models over-parameterized enough to interpolate the data. In this setting, we prove that AMSGrad with constant step-size and momentum converges to the minimizer at a faster $O(1/T)$ rate. When interpolation is only approximately satisfied, constant step-size AMSGrad converges to a neighbourhood of the solution at the same rate, while AdaGrad is robust to the violation of interpolation. However, even for simple convex problems satisfying interpolation, the empirical performance of both methods heavily depends on the step-size and requires tuning, questioning their adaptivity. We alleviate this problem by automatically determining the step-size using stochastic line-search or Polyak step-sizes. With these techniques, we prove that both AdaGrad and AMSGrad retain their convergence guarantees, without needing to know problem-dependent constants. Empirically, we demonstrate that these techniques improve the convergence and generalization of adaptive gradient methods across tasks, from binary classification with kernel map**s to multi-class classification with deep networks.
△ Less
Submitted 18 February, 2021; v1 submitted 11 June, 2020;
originally announced June 2020.
-
Online Fast Adaptation and Knowledge Accumulation: a New Approach to Continual Learning
Authors:
Massimo Caccia,
Pau Rodriguez,
Oleksiy Ostapenko,
Fabrice Normandin,
Min Lin,
Lucas Caccia,
Issam Laradji,
Irina Rish,
Alexandre Lacoste,
David Vazquez,
Laurent Charlin
Abstract:
Continual learning studies agents that learn from streams of tasks without forgetting previous ones while adapting to new ones. Two recent continual-learning scenarios have opened new avenues of research. In meta-continual learning, the model is pre-trained to minimize catastrophic forgetting of previous tasks. In continual-meta learning, the aim is to train agents for faster remembering of previo…
▽ More
Continual learning studies agents that learn from streams of tasks without forgetting previous ones while adapting to new ones. Two recent continual-learning scenarios have opened new avenues of research. In meta-continual learning, the model is pre-trained to minimize catastrophic forgetting of previous tasks. In continual-meta learning, the aim is to train agents for faster remembering of previous tasks through adaptation. In their original formulations, both methods have limitations. We stand on their shoulders to propose a more general scenario, OSAKA, where an agent must quickly solve new (out-of-distribution) tasks, while also requiring fast remembering. We show that current continual learning, meta-learning, meta-continual learning, and continual-meta learning techniques fail in this new scenario. We propose Continual-MAML, an online extension of the popular MAML algorithm as a strong baseline for this scenario. We empirically show that Continual-MAML is better suited to the new scenario than the aforementioned methodologies, as well as standard continual learning and meta-learning approaches.
△ Less
Submitted 20 January, 2021; v1 submitted 12 March, 2020;
originally announced March 2020.
-
Embedding Propagation: Smoother Manifold for Few-Shot Classification
Authors:
Pau Rodríguez,
Issam Laradji,
Alexandre Drouin,
Alexandre Lacoste
Abstract:
Few-shot classification is challenging because the data distribution of the training set can be widely different to the test set as their classes are disjoint. This distribution shift often results in poor generalization. Manifold smoothing has been shown to address the distribution shift problem by extending the decision boundaries and reducing the noise of the class representations. Moreover, ma…
▽ More
Few-shot classification is challenging because the data distribution of the training set can be widely different to the test set as their classes are disjoint. This distribution shift often results in poor generalization. Manifold smoothing has been shown to address the distribution shift problem by extending the decision boundaries and reducing the noise of the class representations. Moreover, manifold smoothness is a key factor for semi-supervised learning and transductive learning algorithms. In this work, we propose to use embedding propagation as an unsupervised non-parametric regularizer for manifold smoothing in few-shot classification. Embedding propagation leverages interpolations between the extracted features of a neural network based on a similarity graph. We empirically show that embedding propagation yields a smoother embedding manifold. We also show that applying embedding propagation to a transductive classifier achieves new state-of-the-art results in mini-Imagenet, tiered-Imagenet, Imagenet-FS, and CUB. Furthermore, we show that embedding propagation consistently improves the accuracy of the models in multiple semi-supervised learning scenarios by up to 16\% points. The proposed embedding propagation operation can be easily integrated as a non-parametric layer into a neural network. We provide the training code and usage examples at https://github.com/ElementAI/embedding-propagation.
△ Less
Submitted 13 July, 2020; v1 submitted 9 March, 2020;
originally announced March 2020.
-
Stochastic Polyak Step-size for SGD: An Adaptive Learning Rate for Fast Convergence
Authors:
Nicolas Loizou,
Sharan Vaswani,
Issam Laradji,
Simon Lacoste-Julien
Abstract:
We propose a stochastic variant of the classical Polyak step-size (Polyak, 1987) commonly used in the subgradient method. Although computing the Polyak step-size requires knowledge of the optimal function values, this information is readily available for typical modern machine learning applications. Consequently, the proposed stochastic Polyak step-size (SPS) is an attractive choice for setting th…
▽ More
We propose a stochastic variant of the classical Polyak step-size (Polyak, 1987) commonly used in the subgradient method. Although computing the Polyak step-size requires knowledge of the optimal function values, this information is readily available for typical modern machine learning applications. Consequently, the proposed stochastic Polyak step-size (SPS) is an attractive choice for setting the learning rate for stochastic gradient descent (SGD). We provide theoretical convergence guarantees for SGD equipped with SPS in different settings, including strongly convex, convex and non-convex functions. Furthermore, our analysis results in novel convergence guarantees for SGD with a constant step-size. We show that SPS is particularly effective when training over-parameterized models capable of interpolating the training data. In this setting, we prove that SPS enables SGD to converge to the true solution at a fast rate without requiring the knowledge of any problem-dependent constants or additional computational overhead. We experimentally validate our theoretical results via extensive experiments on synthetic and real datasets. We demonstrate the strong performance of SGD with SPS compared to state-of-the-art optimization methods when training over-parameterized models.
△ Less
Submitted 22 March, 2021; v1 submitted 24 February, 2020;
originally announced February 2020.
-
Fast and Furious Convergence: Stochastic Second Order Methods under Interpolation
Authors:
Si Yi Meng,
Sharan Vaswani,
Issam Laradji,
Mark Schmidt,
Simon Lacoste-Julien
Abstract:
We consider stochastic second-order methods for minimizing smooth and strongly-convex functions under an interpolation condition satisfied by over-parameterized models. Under this condition, we show that the regularized subsampled Newton method (R-SSN) achieves global linear convergence with an adaptive step-size and a constant batch-size. By growing the batch size for both the subsampled gradient…
▽ More
We consider stochastic second-order methods for minimizing smooth and strongly-convex functions under an interpolation condition satisfied by over-parameterized models. Under this condition, we show that the regularized subsampled Newton method (R-SSN) achieves global linear convergence with an adaptive step-size and a constant batch-size. By growing the batch size for both the subsampled gradient and Hessian, we show that R-SSN can converge at a quadratic rate in a local neighbourhood of the solution. We also show that R-SSN attains local linear convergence for the family of self-concordant functions. Furthermore, we analyze stochastic BFGS algorithms in the interpolation setting and prove their global linear convergence. We empirically evaluate stochastic L-BFGS and a "Hessian-free" implementation of R-SSN for binary classification on synthetic, linearly-separable datasets and real datasets under a kernel map**. Our experimental results demonstrate the fast convergence of these methods, both in terms of the number of iterations and wall-clock time.
△ Less
Submitted 22 March, 2020; v1 submitted 10 October, 2019;
originally announced October 2019.
-
Class-Based Styling: Real-time Localized Style Transfer with Semantic Segmentation
Authors:
Lironne Kurzman,
David Vazquez,
Issam Laradji
Abstract:
We propose a Class-Based Styling method (CBS) that can map different styles for different object classes in real-time. CBS achieves real-time performance by carrying out two steps simultaneously. While a semantic segmentation method is used to obtain the mask of each object class in a video frame, a styling method is used to style that frame globally. Then an object class can be styled by combinin…
▽ More
We propose a Class-Based Styling method (CBS) that can map different styles for different object classes in real-time. CBS achieves real-time performance by carrying out two steps simultaneously. While a semantic segmentation method is used to obtain the mask of each object class in a video frame, a styling method is used to style that frame globally. Then an object class can be styled by combining the segmentation mask and the styled image. The user can also select multiple styles so that different object classes can have different styles in a single frame. For semantic segmentation, we leverage DABNet that achieves high accuracy, yet only has 0.76 million parameters and runs at 104 FPS. For the style transfer step, we use a popular real-time method proposed by Johnson et al. [7]. We evaluated CBS on a video of the CityScapes dataset and observed high-quality localized style transfer results for different object classes and real-time performance.
△ Less
Submitted 30 August, 2019;
originally announced August 2019.
-
Where are the Masks: Instance Segmentation with Image-level Supervision
Authors:
Issam H. Laradji,
David Vazquez,
Mark Schmidt
Abstract:
A major obstacle in instance segmentation is that existing methods often need many per-pixel labels in order to be effective. These labels require large human effort and for certain applications, such labels are not readily available. To address this limitation, we propose a novel framework that can effectively train with image-level labels, which are significantly cheaper to acquire. For instance…
▽ More
A major obstacle in instance segmentation is that existing methods often need many per-pixel labels in order to be effective. These labels require large human effort and for certain applications, such labels are not readily available. To address this limitation, we propose a novel framework that can effectively train with image-level labels, which are significantly cheaper to acquire. For instance, one can do an internet search for the term "car" and obtain many images where a car is present with minimal effort. Our framework consists of two stages: (1) train a classifier to generate pseudo masks for the objects of interest; (2) train a fully supervised Mask R-CNN on these pseudo masks. Our two main contribution are proposing a pipeline that is simple to implement and is amenable to different segmentation methods; and achieves new state-of-the-art results for this problem setup. Our results are based on evaluating our method on PASCAL VOC 2012, a standard dataset for weakly supervised methods, where we demonstrate major performance gains compared to existing methods with respect to mean average precision.
△ Less
Submitted 2 July, 2019;
originally announced July 2019.
-
Instance Segmentation with Point Supervision
Authors:
Issam H. Laradji,
Negar Rostamzadeh,
Pedro O. Pinheiro,
David Vazquez,
Mark Schmidt
Abstract:
Instance segmentation methods often require costly per-pixel labels. We propose a method that only requires point-level annotations. During training, the model only has access to a single pixel label per object, yet the task is to output full segmentation masks. To address this challenge, we construct a network with two branches: (1) a localization network (L-Net) that predicts the location of eac…
▽ More
Instance segmentation methods often require costly per-pixel labels. We propose a method that only requires point-level annotations. During training, the model only has access to a single pixel label per object, yet the task is to output full segmentation masks. To address this challenge, we construct a network with two branches: (1) a localization network (L-Net) that predicts the location of each object; and (2) an embedding network (E-Net) that learns an embedding space where pixels of the same object are close. The segmentation masks for the located objects are obtained by grou** pixels with similar embeddings. At training time, while L-Net only requires point-level annotations, E-Net uses pseudo-labels generated by a class-agnostic object proposal method. We evaluate our approach on PASCAL VOC, COCO, KITTI and CityScapes datasets. The experiments show that our method (1) obtains competitive results compared to fully-supervised methods in certain scenarios; (2) outperforms fully- and weakly- supervised methods with a fixed annotation budget; and (3) is a first strong baseline for instance segmentation with point-level supervision.
△ Less
Submitted 14 June, 2019;
originally announced June 2019.
-
Painless Stochastic Gradient: Interpolation, Line-Search, and Convergence Rates
Authors:
Sharan Vaswani,
Aaron Mishkin,
Issam Laradji,
Mark Schmidt,
Gauthier Gidel,
Simon Lacoste-Julien
Abstract:
Recent works have shown that stochastic gradient descent (SGD) achieves the fast convergence rates of full-batch gradient descent for over-parameterized models satisfying certain interpolation conditions. However, the step-size used in these works depends on unknown quantities and SGD's practical performance heavily relies on the choice of this step-size. We propose to use line-search techniques t…
▽ More
Recent works have shown that stochastic gradient descent (SGD) achieves the fast convergence rates of full-batch gradient descent for over-parameterized models satisfying certain interpolation conditions. However, the step-size used in these works depends on unknown quantities and SGD's practical performance heavily relies on the choice of this step-size. We propose to use line-search techniques to automatically set the step-size when training models that can interpolate the data. In the interpolation setting, we prove that SGD with a stochastic variant of the classic Armijo line-search attains the deterministic convergence rates for both convex and strongly-convex functions. Under additional assumptions, SGD with Armijo line-search is shown to achieve fast convergence for non-convex functions. Furthermore, we show that stochastic extra-gradient with a Lipschitz line-search attains linear convergence for an important class of non-convex functions and saddle-point problems satisfying interpolation. To improve the proposed methods' practical performance, we give heuristics to use larger step-sizes and acceleration. We compare the proposed algorithms against numerous optimization methods on standard classification tasks using both kernel methods and deep networks. The proposed methods result in competitive performance across all models and datasets, while being robust to the precise choices of hyper-parameters. For multi-class classification using deep networks, SGD with Armijo line-search results in both faster convergence and better generalization.
△ Less
Submitted 4 June, 2021; v1 submitted 23 May, 2019;
originally announced May 2019.
-
Efficient Deep Gaussian Process Models for Variable-Sized Input
Authors:
Issam H. Laradji,
Mark Schmidt,
Vladimir Pavlovic,
Minyoung Kim
Abstract:
Deep Gaussian processes (DGP) have appealing Bayesian properties, can handle variable-sized data, and learn deep features. Their limitation is that they do not scale well with the size of the data. Existing approaches address this using a deep random feature (DRF) expansion model, which makes inference tractable by approximating DGPs. However, DRF is not suitable for variable-sized input data such…
▽ More
Deep Gaussian processes (DGP) have appealing Bayesian properties, can handle variable-sized data, and learn deep features. Their limitation is that they do not scale well with the size of the data. Existing approaches address this using a deep random feature (DRF) expansion model, which makes inference tractable by approximating DGPs. However, DRF is not suitable for variable-sized input data such as trees, graphs, and sequences. We introduce the GP-DRF, a novel Bayesian model with an input layer of GPs, followed by DRF layers. The key advantage is that the combination of GP and DRF leads to a tractable model that can both handle a variable-sized input as well as learn deep long-range dependency structures of the data. We provide a novel efficient method to simultaneously infer the posterior of GP's latent vectors and infer the posterior of DRF's internal weights and random frequencies. Our experiments show that GP-DRF outperforms the standard GP model and DRF model across many datasets. Furthermore, they demonstrate that GP-DRF enables improved uncertainty quantification compared to GP and DRF alone, with respect to a Bhattacharyya distance assessment. Source code is available at https://github.com/IssamLaradji/GP_DRF.
△ Less
Submitted 16 May, 2019;
originally announced May 2019.
-
Where are the Blobs: Counting by Localization with Point Supervision
Authors:
Issam H. Laradji,
Negar Rostamzadeh,
Pedro O. Pinheiro,
David Vazquez,
Mark Schmidt
Abstract:
Object counting is an important task in computer vision due to its growing demand in applications such as surveillance, traffic monitoring, and counting everyday objects. State-of-the-art methods use regression-based optimization where they explicitly learn to count the objects of interest. These often perform better than detection-based methods that need to learn the more difficult task of predic…
▽ More
Object counting is an important task in computer vision due to its growing demand in applications such as surveillance, traffic monitoring, and counting everyday objects. State-of-the-art methods use regression-based optimization where they explicitly learn to count the objects of interest. These often perform better than detection-based methods that need to learn the more difficult task of predicting the location, size, and shape of each object. However, we propose a detection-based method that does not need to estimate the size and shape of the objects and that outperforms regression-based methods. Our contributions are three-fold: (1) we propose a novel loss function that encourages the network to output a single blob per object instance using point-level annotations only; (2) we design two methods for splitting large predicted blobs between object instances; and (3) we show that our method achieves new state-of-the-art results on several challenging datasets including the Pascal VOC and the Penguins dataset. Our method even outperforms those that use stronger supervision such as depth features, multi-point annotations, and bounding-box labels.
△ Less
Submitted 25 July, 2018;
originally announced July 2018.
-
M-ADDA: Unsupervised Domain Adaptation with Deep Metric Learning
Authors:
Issam Laradji,
Reza Babanezhad
Abstract:
Unsupervised domain adaptation techniques have been successful for a wide range of problems where supervised labels are limited. The task is to classify an unlabeled `target' dataset by leveraging a labeled `source' dataset that comes from a slightly similar distribution. We propose metric-based adversarial discriminative domain adaptation (M-ADDA) which performs two main steps. First, it uses a m…
▽ More
Unsupervised domain adaptation techniques have been successful for a wide range of problems where supervised labels are limited. The task is to classify an unlabeled `target' dataset by leveraging a labeled `source' dataset that comes from a slightly similar distribution. We propose metric-based adversarial discriminative domain adaptation (M-ADDA) which performs two main steps. First, it uses a metric learning approach to train the source model on the source dataset by optimizing the triplet loss function. This results in clusters where embeddings of the same label are close to each other and those with different labels are far from one another. Next, it uses the adversarial approach (as that used in ADDA \cite{2017arXiv170205464T}) to make the extracted features from the source and target datasets indistinguishable. Simultaneously, we optimize a novel loss function that encourages the target dataset's embeddings to form clusters. While ADDA and M-ADDA use similar architectures, we show that M-ADDA performs significantly better on the digits adaptation datasets of MNIST and USPS. This suggests that using metric-learning for domain adaptation can lead to large improvements in classification accuracy for the domain adaptation task. The code is available at \url{https://github.com/IssamLaradji/M-ADDA}.
△ Less
Submitted 6 July, 2018;
originally announced July 2018.
-
Let's Make Block Coordinate Descent Converge Faster: Faster Greedy Rules, Message-Passing, Active-Set Complexity, and Superlinear Convergence
Authors:
Julie Nutini,
Issam Laradji,
Mark Schmidt
Abstract:
Block coordinate descent (BCD) methods are widely used for large-scale numerical optimization because of their cheap iteration costs, low memory requirements, amenability to parallelization, and ability to exploit problem structure. Three main algorithmic choices influence the performance of BCD methods: the block partitioning strategy, the block selection rule, and the block update rule. In this…
▽ More
Block coordinate descent (BCD) methods are widely used for large-scale numerical optimization because of their cheap iteration costs, low memory requirements, amenability to parallelization, and ability to exploit problem structure. Three main algorithmic choices influence the performance of BCD methods: the block partitioning strategy, the block selection rule, and the block update rule. In this paper we explore all three of these building blocks and propose variations for each that can significantly improve the progress made by each BCD iteration. We (i) propose new greedy block-selection strategies that guarantee more progress per iteration than the Gauss-Southwell rule; (ii) explore practical issues like how to implement the new rules when using "variable" blocks; (iii) explore the use of message-passing to compute matrix or Newton updates efficiently on huge blocks for problems with sparse dependencies between variables; and (iv) consider optimal active manifold identification, which leads to bounds on the "active-set complexity" of BCD methods and leads to superlinear convergence for certain problems with sparse solutions (and in some cases finite termination at an optimal solution). We support all of our findings with numerical results for the classic machine learning problems of least squares, logistic regression, multi-class logistic regression, label propagation, and L1-regularization.
△ Less
Submitted 31 July, 2022; v1 submitted 23 December, 2017;
originally announced December 2017.
-
Convergence Rates for Greedy Kaczmarz Algorithms, and Faster Randomized Kaczmarz Rules Using the Orthogonality Graph
Authors:
Julie Nutini,
Behrooz Sepehry,
Issam Laradji,
Mark Schmidt,
Hoyt Koepke,
Alim Virani
Abstract:
The Kaczmarz method is an iterative algorithm for solving systems of linear equalities and inequalities, that iteratively projects onto these constraints. Recently, Strohmer and Vershynin [J. Fourier Anal. Appl., 15(2):262-278, 2009] gave a non-asymptotic convergence rate analysis for this algorithm, spurring numerous extensions and generalizations of the Kaczmarz method. Rather than the randomize…
▽ More
The Kaczmarz method is an iterative algorithm for solving systems of linear equalities and inequalities, that iteratively projects onto these constraints. Recently, Strohmer and Vershynin [J. Fourier Anal. Appl., 15(2):262-278, 2009] gave a non-asymptotic convergence rate analysis for this algorithm, spurring numerous extensions and generalizations of the Kaczmarz method. Rather than the randomized selection rule analyzed in that work, in this paper we instead discuss greedy and approximate greedy selection rules. We show that in some applications the computational costs of greedy and random selection are comparable, and that in many cases greedy selection rules give faster convergence rates than random selection rules. Further, we give the first multi-step analysis of Kaczmarz methods for a particular greedy rule, and propose a provably-faster randomized selection rule for matrices with many pairwise-orthogonal rows.
△ Less
Submitted 22 December, 2016;
originally announced December 2016.
-
Coordinate Descent Converges Faster with the Gauss-Southwell Rule Than Random Selection
Authors:
Julie Nutini,
Mark Schmidt,
Issam H. Laradji,
Michael Friedlander,
Hoyt Koepke
Abstract:
There has been significant recent work on the theory and application of randomized coordinate descent algorithms, beginning with the work of Nesterov [SIAM J. Optim., 22(2), 2012], who showed that a random-coordinate selection rule achieves the same convergence rate as the Gauss-Southwell selection rule. This result suggests that we should never use the Gauss-Southwell rule, as it is typically muc…
▽ More
There has been significant recent work on the theory and application of randomized coordinate descent algorithms, beginning with the work of Nesterov [SIAM J. Optim., 22(2), 2012], who showed that a random-coordinate selection rule achieves the same convergence rate as the Gauss-Southwell selection rule. This result suggests that we should never use the Gauss-Southwell rule, as it is typically much more expensive than random selection. However, the empirical behaviours of these algorithms contradict this theoretical result: in applications where the computational costs of the selection rules are comparable, the Gauss-Southwell selection rule tends to perform substantially better than random coordinate selection. We give a simple analysis of the Gauss-Southwell rule showing that---except in extreme cases---its convergence rate is faster than choosing random coordinates. Further, in this work we (i) show that exact coordinate optimization improves the convergence rate for certain sparse problems, (ii) propose a Gauss-Southwell-Lipschitz rule that gives an even faster convergence rate given knowledge of the Lipschitz constants of the partial derivatives, (iii) analyze the effect of approximate Gauss-Southwell rules, and (iv) analyze proximal-gradient variants of the Gauss-Southwell rule.
△ Less
Submitted 28 October, 2018; v1 submitted 1 June, 2015;
originally announced June 2015.