-
Implications of Regulations on the Use of AI and Generative AI for Human-Centered Responsible Artificial Intelligence
Authors:
Marios Constantinides,
Mohammad Tahaei,
Daniele Quercia,
Simone Stumpf,
Michael Madaio,
Sean Kennedy,
Lauren Wilcox,
Jessica Vitak,
Henriette Cramer,
Edyta Bogucka,
Ricardo Baeza-Yates,
Ewa Luger,
Jess Holbrook,
Michael Muller,
Ilana Golbin Blumenfeld,
Giada Pistilli
Abstract:
With the upcoming AI regulations (e.g., EU AI Act) and rapid advancements in generative AI, new challenges emerge in the area of Human-Centered Responsible Artificial Intelligence (HCR-AI). As AI becomes more ubiquitous, questions around decision-making authority, human oversight, accountability, sustainability, and the ethical and legal responsibilities of AI and their creators become paramount.…
▽ More
With the upcoming AI regulations (e.g., EU AI Act) and rapid advancements in generative AI, new challenges emerge in the area of Human-Centered Responsible Artificial Intelligence (HCR-AI). As AI becomes more ubiquitous, questions around decision-making authority, human oversight, accountability, sustainability, and the ethical and legal responsibilities of AI and their creators become paramount. Addressing these questions requires a collaborative approach. By involving stakeholders from various disciplines in the 2\textsuperscript{nd} edition of the HCR-AI Special Interest Group (SIG) at CHI 2024, we aim to discuss the implications of regulations in HCI research, develop new theories, evaluation frameworks, and methods to navigate the complex nature of AI ethics, steering AI development in a direction that is beneficial and sustainable for all of humanity.
△ Less
Submitted 29 February, 2024;
originally announced March 2024.
-
QDyLoRA: Quantized Dynamic Low-Rank Adaptation for Efficient Large Language Model Tuning
Authors:
Hossein Rajabzadeh,
Mojtaba Valipour,
Tianshu Zhu,
Marzieh Tahaei,
Hyock Ju Kwon,
Ali Ghodsi,
Boxing Chen,
Mehdi Rezagholizadeh
Abstract:
Finetuning large language models requires huge GPU memory, restricting the choice to acquire Larger models. While the quantized version of the Low-Rank Adaptation technique, named QLoRA, significantly alleviates this issue, finding the efficient LoRA rank is still challenging. Moreover, QLoRA is trained on a pre-defined rank and, therefore, cannot be reconfigured for its lower ranks without requir…
▽ More
Finetuning large language models requires huge GPU memory, restricting the choice to acquire Larger models. While the quantized version of the Low-Rank Adaptation technique, named QLoRA, significantly alleviates this issue, finding the efficient LoRA rank is still challenging. Moreover, QLoRA is trained on a pre-defined rank and, therefore, cannot be reconfigured for its lower ranks without requiring further fine-tuning steps. This paper proposes QDyLoRA -Quantized Dynamic Low-Rank Adaptation-, as an efficient quantization approach for dynamic low-rank adaptation. Motivated by Dynamic LoRA, QDyLoRA is able to efficiently finetune LLMs on a set of pre-defined LoRA ranks. QDyLoRA enables fine-tuning Falcon-40b for ranks 1 to 64 on a single 32 GB V100-GPU through one round of fine-tuning. Experimental results show that QDyLoRA is competitive to QLoRA and outperforms when employing its optimal rank.
△ Less
Submitted 16 February, 2024;
originally announced February 2024.
-
Sorted LLaMA: Unlocking the Potential of Intermediate Layers of Large Language Models for Dynamic Inference
Authors:
Parsa Kavehzadeh,
Mojtaba Valipour,
Marzieh Tahaei,
Ali Ghodsi,
Boxing Chen,
Mehdi Rezagholizadeh
Abstract:
Large language models (LLMs) have revolutionized natural language processing (NLP) by excelling at understanding and generating human-like text. However, their widespread deployment can be prohibitively expensive. SortedNet is a recent training technique for enabling dynamic inference by leveraging the modularity in networks and sorting sub-models based on computation/accuracy in a nested manner.…
▽ More
Large language models (LLMs) have revolutionized natural language processing (NLP) by excelling at understanding and generating human-like text. However, their widespread deployment can be prohibitively expensive. SortedNet is a recent training technique for enabling dynamic inference by leveraging the modularity in networks and sorting sub-models based on computation/accuracy in a nested manner. We extend SortedNet to generative NLP tasks, making large language models dynamic without any Pre-Training and by only replacing Standard Fine-Tuning (SFT) with Sorted Fine-Tuning (SoFT). Our approach boosts model efficiency, eliminating the need for multiple models for various scenarios during inference. We show that this approach can unlock the power of intermediate layers of transformers in generating the target output. Our sub-models remain integral components of the original model, minimizing storage requirements and transition costs between different computational/latency budgets. The efficacy of our proposed method was demonstrated by applying it to tune LLaMA 2 13B on the Stanford Alpaca dataset for instruction following and TriviaQA for closed-book question answering. Our results show the superior performance of sub-models in comparison to Standard Fine-Tuning and SFT+ICT (Early-Exit), all achieved with efficient tuning and without additional memory usage during inference.
△ Less
Submitted 8 February, 2024; v1 submitted 16 September, 2023;
originally announced September 2023.
-
SortedNet: A Scalable and Generalized Framework for Training Modular Deep Neural Networks
Authors:
Mojtaba Valipour,
Mehdi Rezagholizadeh,
Hossein Rajabzadeh,
Parsa Kavehzadeh,
Marzieh Tahaei,
Boxing Chen,
Ali Ghodsi
Abstract:
Deep neural networks (DNNs) must cater to a variety of users with different performance needs and budgets, leading to the costly practice of training, storing, and maintaining numerous user/task-specific models. There are solutions in the literature to deal with single dynamic or many-in-one models instead of many individual networks; however, they suffer from significant drops in performance, lac…
▽ More
Deep neural networks (DNNs) must cater to a variety of users with different performance needs and budgets, leading to the costly practice of training, storing, and maintaining numerous user/task-specific models. There are solutions in the literature to deal with single dynamic or many-in-one models instead of many individual networks; however, they suffer from significant drops in performance, lack of generalization across different model architectures or different dimensions (e.g. depth, width, attention blocks), heavy model search requirements during training, and training a limited number of sub-models. To address these limitations, we propose SortedNet, a generalized and scalable training solution to harness the inherent modularity of DNNs. Thanks to a generalized nested architecture (which we refer as \textit{sorted} architecture in this paper) with shared parameters and its novel update scheme combining random sub-model sampling and a new gradient accumulation mechanism, SortedNet enables the training of sub-models simultaneously along with the training of the main model (without any significant extra training or inference overhead), simplifies dynamic model selection, customizes deployment during inference, and reduces the model storage requirement significantly. The versatility and scalability of SortedNet are validated through various architectures and tasks, including LLaMA, BERT, RoBERTa (NLP tasks), ResNet and MobileNet (image classification) demonstrating its superiority over existing dynamic training methods. For example, we introduce a novel adaptive self-speculative approach based on sorted-training to accelerate large language models decoding. Moreover, SortedNet is able to train 160 sub-models at once, achieving at least 96\% of the original model's performance.
△ Less
Submitted 1 June, 2024; v1 submitted 1 September, 2023;
originally announced September 2023.
-
RAI Guidelines: Method for Generating Responsible AI Guidelines Grounded in Regulations and Usable by (Non-)Technical Roles
Authors:
Marios Constantinides,
Edyta Bogucka,
Daniele Quercia,
Susanna Kallio,
Mohammad Tahaei
Abstract:
Many guidelines for responsible AI have been suggested to help AI practitioners in the development of ethical and responsible AI systems. However, these guidelines are often neither grounded in regulation nor usable by different roles, from developers to decision makers. To bridge this gap, we developed a four-step method to generate a list of responsible AI guidelines; these steps are: (1) manual…
▽ More
Many guidelines for responsible AI have been suggested to help AI practitioners in the development of ethical and responsible AI systems. However, these guidelines are often neither grounded in regulation nor usable by different roles, from developers to decision makers. To bridge this gap, we developed a four-step method to generate a list of responsible AI guidelines; these steps are: (1) manual coding of 17 papers on responsible AI; (2) compiling an initial catalog of responsible AI guidelines; (3) refining the catalog through interviews and expert panels; and (4) finalizing the catalog. To evaluate the resulting 22 guidelines, we incorporated them into an interactive tool and assessed them in a user study with 14 AI researchers, engineers, designers, and managers from a large technology company. Through interviews with these practitioners, we found that the guidelines were grounded in current regulations and usable across roles, encouraging self-reflection on ethical considerations at early stages of development. This significantly contributes to the concept of `Responsible AI by Design' -- a design-first approach that embeds responsible AI values throughout the development lifecycle and across various business roles.
△ Less
Submitted 4 June, 2024; v1 submitted 27 July, 2023;
originally announced July 2023.
-
On the Transferability of Whisper-based Representations for "In-the-Wild" Cross-Task Downstream Speech Applications
Authors:
Vamsikrishna Chemudupati,
Marzieh Tahaei,
Heitor Guimaraes,
Arthur Pimentel,
Anderson Avila,
Mehdi Rezagholizadeh,
Boxing Chen,
Tiago Falk
Abstract:
Large self-supervised pre-trained speech models have achieved remarkable success across various speech-processing tasks. The self-supervised training of these models leads to universal speech representations that can be used for different downstream tasks, ranging from automatic speech recognition (ASR) to speaker identification. Recently, Whisper, a transformer-based model was proposed and traine…
▽ More
Large self-supervised pre-trained speech models have achieved remarkable success across various speech-processing tasks. The self-supervised training of these models leads to universal speech representations that can be used for different downstream tasks, ranging from automatic speech recognition (ASR) to speaker identification. Recently, Whisper, a transformer-based model was proposed and trained on large amount of weakly supervised data for ASR; it outperformed several state-of-the-art self-supervised models. Given the superiority of Whisper for ASR, in this paper we explore the transferability of the representation for four other speech tasks in SUPERB benchmark. Moreover, we explore the robustness of Whisper representation for ``in the wild'' tasks where speech is corrupted by environment noise and room reverberation. Experimental results show Whisper achieves promising results across tasks and environmental conditions, thus showing potential for cross-task real-world deployment.
△ Less
Submitted 23 May, 2023;
originally announced May 2023.
-
WEIRD FAccTs: How Western, Educated, Industrialized, Rich, and Democratic is FAccT?
Authors:
Ali Akbar Septiandri,
Marios Constantinides,
Mohammad Tahaei,
Daniele Quercia
Abstract:
Studies conducted on Western, Educated, Industrialized, Rich, and Democratic (WEIRD) samples are considered atypical of the world's population and may not accurately represent human behavior. In this study, we aim to quantify the extent to which the ACM FAccT conference, the leading venue in exploring Artificial Intelligence (AI) systems' fairness, accountability, and transparency, relies on WEIRD…
▽ More
Studies conducted on Western, Educated, Industrialized, Rich, and Democratic (WEIRD) samples are considered atypical of the world's population and may not accurately represent human behavior. In this study, we aim to quantify the extent to which the ACM FAccT conference, the leading venue in exploring Artificial Intelligence (AI) systems' fairness, accountability, and transparency, relies on WEIRD samples. We collected and analyzed 128 papers published between 2018 and 2022, accounting for 30.8% of the overall proceedings published at FAccT in those years (excluding abstracts, tutorials, and papers without human-subject studies or clear country attribution for the participants). We found that 84% of the analyzed papers were exclusively based on participants from Western countries, particularly exclusively from the U.S. (63%). Only researchers who undertook the effort to collect data about local participants through interviews or surveys added diversity to an otherwise U.S.-centric view of science. Therefore, we suggest that researchers collect data from under-represented populations to obtain an inclusive worldview. To achieve this goal, scientific communities should champion data collection from such populations and enforce transparent reporting of data biases.
△ Less
Submitted 10 May, 2023;
originally announced May 2023.
-
Human-Centered Responsible Artificial Intelligence: Current & Future Trends
Authors:
Mohammad Tahaei,
Marios Constantinides,
Daniele Quercia,
Sean Kennedy,
Michael Muller,
Simone Stumpf,
Q. Vera Liao,
Ricardo Baeza-Yates,
Lora Aroyo,
Jess Holbrook,
Ewa Luger,
Michael Madaio,
Ilana Golbin Blumenfeld,
Maria De-Arteaga,
Jessica Vitak,
Alexandra Olteanu
Abstract:
In recent years, the CHI community has seen significant growth in research on Human-Centered Responsible Artificial Intelligence. While different research communities may use different terminology to discuss similar topics, all of this work is ultimately aimed at develo** AI that benefits humanity while being grounded in human rights and ethics, and reducing the potential harms of AI. In this sp…
▽ More
In recent years, the CHI community has seen significant growth in research on Human-Centered Responsible Artificial Intelligence. While different research communities may use different terminology to discuss similar topics, all of this work is ultimately aimed at develo** AI that benefits humanity while being grounded in human rights and ethics, and reducing the potential harms of AI. In this special interest group, we aim to bring together researchers from academia and industry interested in these topics to map current and future research trends to advance this important area of research by fostering collaboration and sharing ideas.
△ Less
Submitted 16 February, 2023;
originally announced February 2023.
-
A Systematic Literature Review of Human-Centered, Ethical, and Responsible AI
Authors:
Mohammad Tahaei,
Marios Constantinides,
Daniele Quercia,
Michael Muller
Abstract:
As Artificial Intelligence (AI) continues to advance rapidly, it becomes increasingly important to consider AI's ethical and societal implications. In this paper, we present a bottom-up map** of the current state of research at the intersection of Human-Centered AI, Ethical, and Responsible AI (HCER-AI) by thematically reviewing and analyzing 164 research papers from leading conferences in ethic…
▽ More
As Artificial Intelligence (AI) continues to advance rapidly, it becomes increasingly important to consider AI's ethical and societal implications. In this paper, we present a bottom-up map** of the current state of research at the intersection of Human-Centered AI, Ethical, and Responsible AI (HCER-AI) by thematically reviewing and analyzing 164 research papers from leading conferences in ethical, social, and human factors of AI: AIES, CHI, CSCW, and FAccT. The ongoing research in HCER-AI places emphasis on governance, fairness, and explainability. These conferences, however, concentrate on specific themes rather than encompassing all aspects. While AIES has fewer papers on HCER-AI, it emphasizes governance and rarely publishes papers about privacy, security, and human flourishing. FAccT publishes more on governance and lacks papers on privacy, security, and human flourishing. CHI and CSCW, as more established conferences, have a broader research portfolio. We find that the current emphasis on governance and fairness in AI research may not adequately address the potential unforeseen and unknown implications of AI. Therefore, we recommend that future research should expand its scope and diversify resources to prepare for these potential consequences. This could involve exploring additional areas such as privacy, security, human flourishing, and explainability.
△ Less
Submitted 26 June, 2023; v1 submitted 10 February, 2023;
originally announced February 2023.
-
Stuck in the Permissions With You: Developer & End-User Perspectives on App Permissions & Their Privacy Ramifications
Authors:
Mohammad Tahaei,
Ruba Abu-Salma,
Awais Rashid
Abstract:
While the literature on permissions from the end-user perspective is rich, there is a lack of empirical research on why developers request permissions, their conceptualization of permissions, and how their perspectives compare with end-users' perspectives. Our study aims to address these gaps using a mixed-methods approach. Through interviews with 19 app developers and a survey of 309 Android and…
▽ More
While the literature on permissions from the end-user perspective is rich, there is a lack of empirical research on why developers request permissions, their conceptualization of permissions, and how their perspectives compare with end-users' perspectives. Our study aims to address these gaps using a mixed-methods approach. Through interviews with 19 app developers and a survey of 309 Android and iOS end-users, we found that both groups shared similar concerns about unnecessary permissions breaking trust, damaging the app's reputation, and potentially allowing access to sensitive data. We also found that developer participants sometimes requested multiple permissions due to confusion about the scope of certain permissions or third-party library requirements. Additionally, most end-user participants believed they were responsible for granting a permission request, and it was their choice to do so, a belief shared by many developer participants. Our findings have implications for improving the permission ecosystem for both developers and end-users.
△ Less
Submitted 31 January, 2023; v1 submitted 16 January, 2023;
originally announced January 2023.
-
KronA: Parameter Efficient Tuning with Kronecker Adapter
Authors:
Ali Edalati,
Marzieh Tahaei,
Ivan Kobyzev,
Vahid Partovi Nia,
James J. Clark,
Mehdi Rezagholizadeh
Abstract:
Fine-tuning a Pre-trained Language Model (PLM) on a specific downstream task has been a well-known paradigm in Natural Language Processing. However, with the ever-growing size of PLMs, training the entire model on several downstream tasks becomes very expensive and resource-hungry. Recently, different Parameter Efficient Tuning (PET) techniques are proposed to improve the efficiency of fine-tuning…
▽ More
Fine-tuning a Pre-trained Language Model (PLM) on a specific downstream task has been a well-known paradigm in Natural Language Processing. However, with the ever-growing size of PLMs, training the entire model on several downstream tasks becomes very expensive and resource-hungry. Recently, different Parameter Efficient Tuning (PET) techniques are proposed to improve the efficiency of fine-tuning PLMs. One popular category of PET methods is the low-rank adaptation methods which insert learnable truncated SVD modules into the original model either sequentially or in parallel. However, low-rank decomposition suffers from limited representation power. In this work, we address this problem using the Kronecker product instead of the low-rank representation. We introduce KronA, a Kronecker product-based adapter module for efficient fine-tuning of Transformer-based PLMs. We apply the proposed methods for fine-tuning T5 on the GLUE benchmark to show that incorporating the Kronecker-based modules can outperform state-of-the-art PET methods.
△ Less
Submitted 20 December, 2022;
originally announced December 2022.
-
Better Call Saltzer \& Schroeder: A Retrospective Security Analysis of SolarWinds \& Log4j
Authors:
Partha Das Chowdhury,
Mohammad Tahaei,
Awais Rashid
Abstract:
Saltzer \& Schroeder's principles aim to bring security to the design of computer systems. We investigate SolarWinds Orion update and Log4j to unpack the intersections where observance of these principles could have mitigated the embedded vulnerabilities. The common principles that were not observed include \emph{fail safe defaults}, \emph{economy of mechanism}, \emph{complete mediation} and \emph…
▽ More
Saltzer \& Schroeder's principles aim to bring security to the design of computer systems. We investigate SolarWinds Orion update and Log4j to unpack the intersections where observance of these principles could have mitigated the embedded vulnerabilities. The common principles that were not observed include \emph{fail safe defaults}, \emph{economy of mechanism}, \emph{complete mediation} and \emph{least privilege}. Then we explore the literature on secure software development interventions for developers to identify usable analysis tools and frameworks that can contribute towards improved observance of these principles. We focus on a system wide view of access of codes, checking access paths and aiding application developers with safe libraries along with an appropriate security task list for functionalities.
△ Less
Submitted 4 November, 2022;
originally announced November 2022.
-
SeKron: A Decomposition Method Supporting Many Factorization Structures
Authors:
Marawan Gamal Abdel Hameed,
Ali Mosleh,
Marzieh S. Tahaei,
Vahid Partovi Nia
Abstract:
While convolutional neural networks (CNNs) have become the de facto standard for most image processing and computer vision applications, their deployment on edge devices remains challenging. Tensor decomposition methods provide a means of compressing CNNs to meet the wide range of device constraints by imposing certain factorization structures on their convolution tensors. However, being limited t…
▽ More
While convolutional neural networks (CNNs) have become the de facto standard for most image processing and computer vision applications, their deployment on edge devices remains challenging. Tensor decomposition methods provide a means of compressing CNNs to meet the wide range of device constraints by imposing certain factorization structures on their convolution tensors. However, being limited to the small set of factorization structures presented by state-of-the-art decomposition approaches can lead to sub-optimal performance. We propose SeKron, a novel tensor decomposition method that offers a wide variety of factorization structures, using sequences of Kronecker products. By recursively finding approximating Kronecker factors, we arrive at optimal decompositions for each of the factorization structures. We show that SeKron is a flexible decomposition that generalizes widely used methods, such as Tensor-Train (TT), Tensor-Ring (TR), Canonical Polyadic (CP) and Tucker decompositions. Crucially, we derive an efficient convolution projection algorithm shared by all SeKron structures, leading to seamless compression of CNN models. We validate SeKron for model compression on both high-level and low-level computer vision tasks and find that it outperforms state-of-the-art decomposition methods.
△ Less
Submitted 12 October, 2022;
originally announced October 2022.
-
Towards Fine-tuning Pre-trained Language Models with Integer Forward and Backward Propagation
Authors:
Mohammadreza Tayaranian,
Alireza Ghaffari,
Marzieh S. Tahaei,
Mehdi Rezagholizadeh,
Masoud Asgharian,
Vahid Partovi Nia
Abstract:
The large number of parameters of some prominent language models, such as BERT, makes their fine-tuning on downstream tasks computationally intensive and energy hungry. Previously researchers were focused on lower bit-width integer data types for the forward propagation of language models to save memory and computation. As for the backward propagation, however, only 16-bit floating-point data type…
▽ More
The large number of parameters of some prominent language models, such as BERT, makes their fine-tuning on downstream tasks computationally intensive and energy hungry. Previously researchers were focused on lower bit-width integer data types for the forward propagation of language models to save memory and computation. As for the backward propagation, however, only 16-bit floating-point data type has been used for the fine-tuning of BERT. In this work, we use integer arithmetic for both forward and back propagation in the fine-tuning of BERT. We study the effects of varying the integer bit-width on the model's metric performance. Our integer fine-tuning uses integer arithmetic to perform forward propagation and gradient computation of linear, layer-norm, and embedding layers of BERT. We fine-tune BERT using our integer training method on SQuAD v1.1 and SQuAD v2., and GLUE benchmark. We demonstrate that metric performance of fine-tuning 16-bit integer BERT matches both 16-bit and 32-bit floating-point baselines. Furthermore, using the faster and more memory efficient 8-bit integer data type, integer fine-tuning of BERT loses an average of 3.1 points compared to the FP32 baseline.
△ Less
Submitted 12 February, 2023; v1 submitted 20 September, 2022;
originally announced September 2022.
-
Embedding Privacy Into Design Through Software Developers: Challenges & Solutions
Authors:
Mohammad Tahaei,
Kami Vaniea,
Awais Rashid
Abstract:
To make privacy a first-class citizen in software, we argue for equip** developers with usable tools, as well as providing support from organizations, educators, and regulators. We discuss the challenges with the successful integration of privacy features and propose solutions for stakeholders to help developers perform privacy-related tasks.
To make privacy a first-class citizen in software, we argue for equip** developers with usable tools, as well as providing support from organizations, educators, and regulators. We discuss the challenges with the successful integration of privacy features and propose solutions for stakeholders to help developers perform privacy-related tasks.
△ Less
Submitted 8 September, 2022; v1 submitted 25 August, 2022;
originally announced August 2022.
-
Is Integer Arithmetic Enough for Deep Learning Training?
Authors:
Alireza Ghaffari,
Marzieh S. Tahaei,
Mohammadreza Tayaranian,
Masoud Asgharian,
Vahid Partovi Nia
Abstract:
The ever-increasing computational complexity of deep learning models makes their training and deployment difficult on various cloud and edge platforms. Replacing floating-point arithmetic with low-bit integer arithmetic is a promising approach to save energy, memory footprint, and latency of deep learning models. As such, quantization has attracted the attention of researchers in recent years. How…
▽ More
The ever-increasing computational complexity of deep learning models makes their training and deployment difficult on various cloud and edge platforms. Replacing floating-point arithmetic with low-bit integer arithmetic is a promising approach to save energy, memory footprint, and latency of deep learning models. As such, quantization has attracted the attention of researchers in recent years. However, using integer numbers to form a fully functional integer training pipeline including forward pass, back-propagation, and stochastic gradient descent is not studied in detail. Our empirical and mathematical results reveal that integer arithmetic seems to be enough to train deep learning models. Unlike recent proposals, instead of quantization, we directly switch the number representation of computations. Our novel training method forms a fully integer training pipeline that does not change the trajectory of the loss and accuracy compared to floating-point, nor does it need any special hyper-parameter tuning, distribution adjustment, or gradient clip**. Our experimental results show that our proposed method is effective in a wide variety of tasks such as classification (including vision transformers), object detection, and semantic segmentation.
△ Less
Submitted 4 January, 2023; v1 submitted 18 July, 2022;
originally announced July 2022.
-
Kronecker Decomposition for GPT Compression
Authors:
Ali Edalati,
Marzieh Tahaei,
Ahmad Rashid,
Vahid Partovi Nia,
James J. Clark,
Mehdi Rezagholizadeh
Abstract:
GPT is an auto-regressive Transformer-based pre-trained language model which has attracted a lot of attention in the natural language processing (NLP) domain due to its state-of-the-art performance in several downstream tasks. The success of GPT is mostly attributed to its pre-training on huge amount of data and its large number of parameters (from ~100M to billions of parameters). Despite the sup…
▽ More
GPT is an auto-regressive Transformer-based pre-trained language model which has attracted a lot of attention in the natural language processing (NLP) domain due to its state-of-the-art performance in several downstream tasks. The success of GPT is mostly attributed to its pre-training on huge amount of data and its large number of parameters (from ~100M to billions of parameters). Despite the superior performance of GPT (especially in few-shot or zero-shot setup), this overparameterized nature of GPT can be very prohibitive for deploying this model on devices with limited computational power or memory. This problem can be mitigated using model compression techniques; however, compressing GPT models has not been investigated much in the literature. In this work, we use Kronecker decomposition to compress the linear map**s of the GPT-22 model. Our Kronecker GPT-2 model (KnGPT2) is initialized based on the Kronecker decomposed version of the GPT-2 model and then is undergone a very light pre-training on only a small portion of the training data with intermediate layer knowledge distillation (ILKD). Finally, our KnGPT2 is fine-tuned on down-stream tasks using ILKD as well. We evaluate our model on both language modeling and General Language Understanding Evaluation benchmark tasks and show that with more efficient pre-training and similar number of parameters, our KnGPT2 outperforms the existing DistilGPT2 model significantly.
△ Less
Submitted 15 October, 2021;
originally announced October 2021.
-
Convolutional Neural Network Compression through Generalized Kronecker Product Decomposition
Authors:
Marawan Gamal Abdel Hameed,
Marzieh S. Tahaei,
Ali Mosleh,
Vahid Partovi Nia
Abstract:
Modern Convolutional Neural Network (CNN) architectures, despite their superiority in solving various problems, are generally too large to be deployed on resource constrained edge devices. In this paper, we reduce memory usage and floating-point operations required by convolutional layers in CNNs. We compress these layers by generalizing the Kronecker Product Decomposition to apply to multidimensi…
▽ More
Modern Convolutional Neural Network (CNN) architectures, despite their superiority in solving various problems, are generally too large to be deployed on resource constrained edge devices. In this paper, we reduce memory usage and floating-point operations required by convolutional layers in CNNs. We compress these layers by generalizing the Kronecker Product Decomposition to apply to multidimensional tensors, leading to the Generalized Kronecker Product Decomposition (GKPD). Our approach yields a plug-and-play module that can be used as a drop-in replacement for any convolutional layer. Experimental results for image classification on CIFAR-10 and ImageNet datasets using ResNet, MobileNetv2 and SeNet architectures substantiate the effectiveness of our proposed approach. We find that GKPD outperforms state-of-the-art decomposition methods including Tensor-Train and Tensor-Ring as well as other relevant compression methods such as pruning and knowledge distillation.
△ Less
Submitted 14 January, 2022; v1 submitted 29 September, 2021;
originally announced September 2021.
-
KroneckerBERT: Learning Kronecker Decomposition for Pre-trained Language Models via Knowledge Distillation
Authors:
Marzieh S. Tahaei,
Ella Charlaix,
Vahid Partovi Nia,
Ali Ghodsi,
Mehdi Rezagholizadeh
Abstract:
The development of over-parameterized pre-trained language models has made a significant contribution toward the success of natural language processing. While over-parameterization of these models is the key to their generalization power, it makes them unsuitable for deployment on low-capacity devices. We push the limits of state-of-the-art Transformer-based pre-trained language model compression…
▽ More
The development of over-parameterized pre-trained language models has made a significant contribution toward the success of natural language processing. While over-parameterization of these models is the key to their generalization power, it makes them unsuitable for deployment on low-capacity devices. We push the limits of state-of-the-art Transformer-based pre-trained language model compression using Kronecker decomposition. We use this decomposition for compression of the embedding layer, all linear map**s in the multi-head attention, and the feed-forward network modules in the Transformer layer. We perform intermediate-layer knowledge distillation using the uncompressed model as the teacher to improve the performance of the compressed model. We present our KroneckerBERT, a compressed version of the BERT_BASE model obtained using this framework. We evaluate the performance of KroneckerBERT on well-known NLP benchmarks and show that for a high compression factor of 19 (5% of the size of the BERT_BASE model), our KroneckerBERT outperforms state-of-the-art compression methods on the GLUE. Our experiments indicate that the proposed model has promising out-of-distribution robustness and is superior to the state-of-the-art compression methods on SQuAD.
△ Less
Submitted 13 September, 2021;
originally announced September 2021.
-
SINA-BERT: A pre-trained Language Model for Analysis of Medical Texts in Persian
Authors:
Nasrin Taghizadeh,
Ehsan Doostmohammadi,
Elham Seifossadat,
Hamid R. Rabiee,
Maedeh S. Tahaei
Abstract:
We have released Sina-BERT, a language model pre-trained on BERT (Devlin et al., 2018) to address the lack of a high-quality Persian language model in the medical domain. SINA-BERT utilizes pre-training on a large-scale corpus of medical contents including formal and informal texts collected from a variety of online resources in order to improve the performance on health-care related tasks. We emp…
▽ More
We have released Sina-BERT, a language model pre-trained on BERT (Devlin et al., 2018) to address the lack of a high-quality Persian language model in the medical domain. SINA-BERT utilizes pre-training on a large-scale corpus of medical contents including formal and informal texts collected from a variety of online resources in order to improve the performance on health-care related tasks. We employ SINA-BERT to complete following representative tasks: categorization of medical questions, medical sentiment analysis, and medical question retrieval. For each task, we have developed Persian annotated data sets for training and evaluation and learnt a representation for the data of each task especially complex and long medical questions. With the same architecture being used across tasks, SINA-BERT outperforms BERT-based models that were previously made available in the Persian language.
△ Less
Submitted 15 April, 2021;
originally announced April 2021.
-
Dementia Severity Classification under Small Sample Size and Weak Supervision in Thick Slice MRI
Authors:
Reza Shirkavand,
Sana Ayromlou,
Soroush Farghadani,
Maedeh-sadat Tahaei,
Fattane Pourakpour,
Bahareh Siahlou,
Zeynab Khodakarami,
Mohammad H. Rohban,
Mansoor Fatehi,
Hamid R. Rabiee
Abstract:
Early detection of dementia through specific biomarkers in MR images plays a critical role in develo** support strategies proactively. Fazekas scale facilitates an accurate quantitative assessment of the severity of white matter lesions and hence the disease. Imaging Biomarkers of dementia are multiple and comprehensive documentation of them is time-consuming. Therefore, any effort to automatica…
▽ More
Early detection of dementia through specific biomarkers in MR images plays a critical role in develo** support strategies proactively. Fazekas scale facilitates an accurate quantitative assessment of the severity of white matter lesions and hence the disease. Imaging Biomarkers of dementia are multiple and comprehensive documentation of them is time-consuming. Therefore, any effort to automatically extract these biomarkers will be of clinical value while reducing inter-rater discrepancies. To tackle this problem, we propose to classify the disease severity based on the Fazekas scale through the visual biomarkers, namely the Periventricular White Matter (PVWM) and the Deep White Matter (DWM) changes, in the real-world setting of thick-slice MRI. Small training sample size and weak supervision in form of assigning severity labels to the whole MRI stack are among the main challenges. To combat the mentioned issues, we have developed a deep learning pipeline that employs self-supervised representation learning, multiple instance learning, and appropriate pre-processing steps. We use pretext tasks such as non-linear transformation, local shuffling, in- and out-painting for self-supervised learning of useful features in this domain. Furthermore, an attention model is used to determine the relevance of each MRI slice for predicting the Fazekas scale in an unsupervised manner. We show the significant superiority of our method in distinguishing different classes of dementia compared to state-of-the-art methods in our mentioned setting, which improves the macro averaged F1-score of state-of-the-art from 61% to 76% in PVWM, and from 58% to 69.2% in DWM.
△ Less
Submitted 18 March, 2021;
originally announced March 2021.
-
"I Don't Know Too Much About It": On the Security Mindsets of Computer Science Students
Authors:
Mohammad Tahaei,
Adam Jenkins,
Kami Vaniea,
Maria Wolters
Abstract:
The security attitudes and approaches of software developers have a large impact on the software they produce, yet we know very little about how and when these views are constructed. This paper investigates the security and privacy (S&P) perceptions, experiences, and practices of current Computer Science students at the graduate and undergraduate level using semi-structured interviews. We find tha…
▽ More
The security attitudes and approaches of software developers have a large impact on the software they produce, yet we know very little about how and when these views are constructed. This paper investigates the security and privacy (S&P) perceptions, experiences, and practices of current Computer Science students at the graduate and undergraduate level using semi-structured interviews. We find that the attitudes of students already match many of those that have been observed in professional level developers. Students have a range of hacker and attack mindsets, lack of experience with security APIs, a mixed view of who is in charge of S&P in the software life cycle, and a tendency to trust other peoples' code as a convenient approach to rapidly build software. We discuss the impact of our results on both curriculum development and support for professional developers.
△ Less
Submitted 13 March, 2021;
originally announced March 2021.
-
FoCL: Feature-Oriented Continual Learning for Generative Models
Authors:
Qicheng Lao,
Mehrzad Mortazavi,
Marzieh Tahaei,
Francis Dutil,
Thomas Fevens,
Mohammad Havaei
Abstract:
In this paper, we propose a general framework in continual learning for generative models: Feature-oriented Continual Learning (FoCL). Unlike previous works that aim to solve the catastrophic forgetting problem by introducing regularization in the parameter space or image space, FoCL imposes regularization in the feature space. We show in our experiments that FoCL has faster adaptation to distribu…
▽ More
In this paper, we propose a general framework in continual learning for generative models: Feature-oriented Continual Learning (FoCL). Unlike previous works that aim to solve the catastrophic forgetting problem by introducing regularization in the parameter space or image space, FoCL imposes regularization in the feature space. We show in our experiments that FoCL has faster adaptation to distributional changes in sequentially arriving tasks, and achieves the state-of-the-art performance for generative models in task incremental learning. We discuss choices of combined regularization spaces towards different use case scenarios for boosted performance, e.g., tasks that have high variability in the background. Finally, we introduce a forgetfulness measure that fairly evaluates the degree to which a model suffers from forgetting. Interestingly, the analysis of our proposed forgetfulness score also implies that FoCL tends to have a mitigated forgetting for future tasks.
△ Less
Submitted 8 March, 2020;
originally announced March 2020.
-
The coefficients of the reduced Bartholdi zeta function
Authors:
Maedeh S. Tahaei,
Seyed Naser Hashemi
Abstract:
In this paper, we establish a new zeta function based on the Bartholdi zeta function for an undirected graph G called the reduced Bartholdi zeta function. We study the relation between its coefficients and the structure of the graph, and demonstrate that the coefficients count the star subgraphs in the symmetric digraph D(G). Moreover, we investigate the properties of semi principle minors extract…
▽ More
In this paper, we establish a new zeta function based on the Bartholdi zeta function for an undirected graph G called the reduced Bartholdi zeta function. We study the relation between its coefficients and the structure of the graph, and demonstrate that the coefficients count the star subgraphs in the symmetric digraph D(G). Moreover, we investigate the properties of semi principle minors extracted from the adjacency matrix of the oriented line graph of G. We also present a general formula for calculating all the coefficients of the reduced Bartholdi zeta function.
△ Less
Submitted 2 October, 2016; v1 submitted 22 September, 2015;
originally announced September 2015.