-
On-Demand Routing in LEO Mega-Constellations with Dynamic Laser Inter-Satellite Links
Authors:
Dhiraj Bhattacharjee,
Pablo G. Madoery,
Aizaz U. Chaudhry,
Halim Yanikomeroglu,
Gunes Karabulut Kurt,
Peng Hu,
Khaled Ahmed,
Stephane Martel
Abstract:
Low Earth orbit (LEO) satellite mega constellations are beginning to include laser inter-satellite links (LISLs) to extend the Internet to the most remote locations on Earth. Since the process of establishing these links incurs a setup delay on the order of seconds, a static network topology is generally established well in advance, which is then used for the routing calculations. However, this in…
▽ More
Low Earth orbit (LEO) satellite mega constellations are beginning to include laser inter-satellite links (LISLs) to extend the Internet to the most remote locations on Earth. Since the process of establishing these links incurs a setup delay on the order of seconds, a static network topology is generally established well in advance, which is then used for the routing calculations. However, this involves kee** links active even when they are not being used to forward traffic, leading to poor energy efficiency. Motivated by technological advances that are gradually decreasing the LISL setup delays, we foresee scenarios where it will be possible to compute routes and establish dynamic LISLs on demand. This will require considering setup delays as penalties that will affect the end-to-end latency. In this paper, we present a nonlinear optimization model that considers these penalties in the cost function and propose three heuristic algorithms that solve the problem in a tractable way. The algorithms establish different trade-offs in terms of performance and computational complexity. We extensively analyze metrics including average latency, route change rate, outage probability, and jitter in Starlink's Phase I version 2 constellation. The results show the benefit of adaptive routing schemes according to the link setup delay. In particular, more complex schemes can decrease the average end-to-end latency in exchange for an increase in execution time. On the other hand, depending on the maximum tolerated latency, it is possible to use less computationally complex schemes which will be more scalable for the satellite mega constellations of the future.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Many-Shot In-Context Learning in Multimodal Foundation Models
Authors:
Yixing Jiang,
Jeremy Irvin,
Ji Hun Wang,
Muhammad Ahmed Chaudhry,
Jonathan H. Chen,
Andrew Y. Ng
Abstract:
Large language models are well-known to be effective at few-shot in-context learning (ICL). Recent advancements in multimodal foundation models have enabled unprecedentedly long context windows, presenting an opportunity to explore their capability to perform ICL with many more demonstrating examples. In this work, we evaluate the performance of multimodal foundation models scaling from few-shot t…
▽ More
Large language models are well-known to be effective at few-shot in-context learning (ICL). Recent advancements in multimodal foundation models have enabled unprecedentedly long context windows, presenting an opportunity to explore their capability to perform ICL with many more demonstrating examples. In this work, we evaluate the performance of multimodal foundation models scaling from few-shot to many-shot ICL. We benchmark GPT-4o and Gemini 1.5 Pro across 10 datasets spanning multiple domains (natural imagery, medical imagery, remote sensing, and molecular imagery) and tasks (multi-class, multi-label, and fine-grained classification). We observe that many-shot ICL, including up to almost 2,000 multimodal demonstrating examples, leads to substantial improvements compared to few-shot (<100 examples) ICL across all of the datasets. Further, Gemini 1.5 Pro performance continues to improve log-linearly up to the maximum number of tested examples on many datasets. Given the high inference costs associated with the long prompts required for many-shot ICL, we also explore the impact of batching multiple queries in a single API call. We show that batching up to 50 queries can lead to performance improvements under zero-shot and many-shot ICL, with substantial gains in the zero-shot setting on multiple datasets, while drastically reducing per-query cost and latency. Finally, we measure ICL data efficiency of the models, or the rate at which the models learn from more demonstrating examples. We find that while GPT-4o and Gemini 1.5 Pro achieve similar zero-shot performance across the datasets, Gemini 1.5 Pro exhibits higher ICL data efficiency than GPT-4o on most datasets. Our results suggest that many-shot ICL could enable users to efficiently adapt multimodal foundation models to new applications and domains. Our codebase is publicly available at https://github.com/stanfordmlgroup/ManyICL .
△ Less
Submitted 16 May, 2024;
originally announced May 2024.
-
CloudTracks: A Dataset for Localizing Ship Tracks in Satellite Images of Clouds
Authors:
Muhammad Ahmed Chaudhry,
Lyna Kim,
Jeremy Irvin,
Yuzu Ido,
Sonia Chu,
Jared Thomas Isobe,
Andrew Y. Ng,
Duncan Watson-Parris
Abstract:
Clouds play a significant role in global temperature regulation through their effect on planetary albedo. Anthropogenic emissions of aerosols can alter the albedo of clouds, but the extent of this effect, and its consequent impact on temperature change, remains uncertain. Human-induced clouds caused by ship aerosol emissions, commonly referred to as ship tracks, provide visible manifestations of t…
▽ More
Clouds play a significant role in global temperature regulation through their effect on planetary albedo. Anthropogenic emissions of aerosols can alter the albedo of clouds, but the extent of this effect, and its consequent impact on temperature change, remains uncertain. Human-induced clouds caused by ship aerosol emissions, commonly referred to as ship tracks, provide visible manifestations of this effect distinct from adjacent cloud regions and therefore serve as a useful sandbox to study human-induced clouds. However, the lack of large-scale ship track data makes it difficult to deduce their general effects on cloud formation. Towards develo** automated approaches to localize ship tracks at scale, we present CloudTracks, a dataset containing 3,560 satellite images labeled with more than 12,000 ship track instance annotations. We train semantic segmentation and instance segmentation model baselines on our dataset and find that our best model substantially outperforms previous state-of-the-art for ship track localization (61.29 vs. 48.65 IoU). We also find that the best instance segmentation model is able to identify the number of ship tracks in each image more accurately than the previous state-of-the-art (1.64 vs. 4.99 MAE). However, we identify cases where the best model struggles to accurately localize and count ship tracks, so we believe CloudTracks will stimulate novel machine learning approaches to better detect elongated and overlap** features in satellite images. We release our dataset openly at {zenodo.org/records/10042922}.
△ Less
Submitted 25 January, 2024;
originally announced January 2024.
-
Higher-Order Newton Methods with Polynomial Work per Iteration
Authors:
Amir Ali Ahmadi,
Abraar Chaudhry,
Jeffrey Zhang
Abstract:
We present generalizations of Newton's method that incorporate derivatives of an arbitrary order $d$ but maintain a polynomial dependence on dimension in their cost per iteration. At each step, our $d^{\text{th}}$-order method uses semidefinite programming to construct and minimize a sum of squares-convex approximation to the $d^{\text{th}}$-order Taylor expansion of the function we wish to minimi…
▽ More
We present generalizations of Newton's method that incorporate derivatives of an arbitrary order $d$ but maintain a polynomial dependence on dimension in their cost per iteration. At each step, our $d^{\text{th}}$-order method uses semidefinite programming to construct and minimize a sum of squares-convex approximation to the $d^{\text{th}}$-order Taylor expansion of the function we wish to minimize. We prove that our $d^{\text{th}}$-order method has local convergence of order $d$. This results in lower oracle complexity compared to the classical Newton method. We show on numerical examples that basins of attraction around local minima can get larger as $d$ increases. Under additional assumptions, we present a modified algorithm, again with polynomial cost per iteration, which is globally convergent and has local convergence of order $d$.
△ Less
Submitted 12 June, 2024; v1 submitted 10 November, 2023;
originally announced November 2023.
-
Cross-Corpus Multilingual Speech Emotion Recognition: Amharic vs. Other Languages
Authors:
Ephrem Afele Retta,
Richard Sutcliffe,
Jabar Mahmood,
Michael Abebe Berwo,
Eiad Almekhlafi,
Sajjad Ahmed Khan,
Shehzad Ashraf Chaudhry,
Mustafa Mhamed,
Jun Feng
Abstract:
In a conventional Speech emotion recognition (SER) task, a classifier for a given language is trained on a pre-existing dataset for that same language. However, where training data for a language does not exist, data from other languages can be used instead. We experiment with cross-lingual and multilingual SER, working with Amharic, English, German and URDU. For Amharic, we use our own publicly-a…
▽ More
In a conventional Speech emotion recognition (SER) task, a classifier for a given language is trained on a pre-existing dataset for that same language. However, where training data for a language does not exist, data from other languages can be used instead. We experiment with cross-lingual and multilingual SER, working with Amharic, English, German and URDU. For Amharic, we use our own publicly-available Amharic Speech Emotion Dataset (ASED). For English, German and Urdu we use the existing RAVDESS, EMO-DB and URDU datasets. We followed previous research in map** labels for all datasets to just two classes, positive and negative. Thus we can compare performance on different languages directly, and combine languages for training and testing. In Experiment 1, monolingual SER trials were carried out using three classifiers, AlexNet, VGGE (a proposed variant of VGG), and ResNet50. Results averaged for the three models were very similar for ASED and RAVDESS, suggesting that Amharic and English SER are equally difficult. Similarly, German SER is more difficult, and Urdu SER is easier. In Experiment 2, we trained on one language and tested on another, in both directions for each pair: Amharic<->German, Amharic<->English, and Amharic<->Urdu. Results with Amharic as target suggested that using English or German as source will give the best result. In Experiment 3, we trained on several non-Amharic languages and then tested on Amharic. The best accuracy obtained was several percent greater than the best accuracy in Experiment 2, suggesting that a better result can be obtained when using two or three non-Amharic languages for training than when using just one non-Amharic language. Overall, the results suggest that cross-lingual and multilingual training can be an effective strategy for training a SER classifier when resources for a language are scarce.
△ Less
Submitted 20 July, 2023;
originally announced July 2023.
-
Safely Learning Dynamical Systems
Authors:
Amir Ali Ahmadi,
Abraar Chaudhry,
Vikas Sindhwani,
Stephen Tu
Abstract:
A fundamental challenge in learning an unknown dynamical system is to reduce model uncertainty by making measurements while maintaining safety. We formulate a mathematical definition of what it means to safely learn a dynamical system by sequentially deciding where to initialize trajectories. The state of the system must stay within a safety region for a horizon of $T$ time steps under the action…
▽ More
A fundamental challenge in learning an unknown dynamical system is to reduce model uncertainty by making measurements while maintaining safety. We formulate a mathematical definition of what it means to safely learn a dynamical system by sequentially deciding where to initialize trajectories. The state of the system must stay within a safety region for a horizon of $T$ time steps under the action of all dynamical systems that (i) belong to a given initial uncertainty set, and (ii) are consistent with information gathered so far.
First, we consider safely learning a linear dynamical system involving $n$ states. For the case $T=1$, we present an LP-based algorithm that either safely recovers the true dynamics from at most $n$ trajectories, or certifies that safe learning is impossible. For $T=2$, we give an SDP representation of the set of safe initial conditions and show that $\lceil n/2 \rceil$ trajectories generically suffice for safe learning. For $T = \infty$, we provide SDP-representable inner approximations of the set of safe initial conditions and show that one trajectory generically suffices for safe learning. We extend a number of our results to the cases where the initial uncertainty set contains sparse, low-rank, or permutation matrices, or when the system has a control input.
Second, we consider safely learning a general class of nonlinear dynamical systems. For the case $T=1$, we give an SOCP-based representation of the set of safe initial conditions. For $T=\infty$, we provide semidefinite representable inner approximations to the set of safe initial conditions. We show how one can safely collect trajectories and fit a polynomial model of the nonlinear dynamics that is consistent with the initial uncertainty set and best agrees with the observations. We also present some extensions to cases where the measurements are noisy or the dynamical system involves disturbances.
△ Less
Submitted 8 June, 2024; v1 submitted 20 May, 2023;
originally announced May 2023.
-
Is forgetting less a good inductive bias for forward transfer?
Authors:
Jiefeng Chen,
Timothy Nguyen,
Dilan Gorur,
Arslan Chaudhry
Abstract:
One of the main motivations of studying continual learning is that the problem setting allows a model to accrue knowledge from past tasks to learn new tasks more efficiently. However, recent studies suggest that the key metric that continual learning algorithms optimize, reduction in catastrophic forgetting, does not correlate well with the forward transfer of knowledge. We believe that the conclu…
▽ More
One of the main motivations of studying continual learning is that the problem setting allows a model to accrue knowledge from past tasks to learn new tasks more efficiently. However, recent studies suggest that the key metric that continual learning algorithms optimize, reduction in catastrophic forgetting, does not correlate well with the forward transfer of knowledge. We believe that the conclusion previous works reached is due to the way they measure forward transfer. We argue that the measure of forward transfer to a task should not be affected by the restrictions placed on the continual learner in order to preserve knowledge of previous tasks. Instead, forward transfer should be measured by how easy it is to learn a new task given a set of representations produced by continual learning on previous tasks. Under this notion of forward transfer, we evaluate different continual learning algorithms on a variety of image classification benchmarks. Our results indicate that less forgetful representations lead to a better forward transfer suggesting a strong correlation between retaining past information and learning efficiency on new tasks. Further, we found less forgetful representations to be more diverse and discriminative compared to their forgetful counterparts.
△ Less
Submitted 14 March, 2023;
originally announced March 2023.
-
Laser Inter-Satellite Link Setup Delay: Quantification, Impact, and Tolerable Value
Authors:
Dhiraj Bhattacharjee,
Aizaz U. Chaudhry,
Halim Yanikomeroglu,
Peng Hu,
Guillaume Lamontagne
Abstract:
Dynamic laser inter-satellite links (LISLs) provide the flexibility of connecting a pair of satellites as required (dynamically) while static LISLs need to be active continuously between the energy-constrained satellites. However, due to the LISL establishment time (termed herein as LISL setup delay) being in the order of seconds, realizing dynamic LISLs is currently unfeasible. Towards the realiz…
▽ More
Dynamic laser inter-satellite links (LISLs) provide the flexibility of connecting a pair of satellites as required (dynamically) while static LISLs need to be active continuously between the energy-constrained satellites. However, due to the LISL establishment time (termed herein as LISL setup delay) being in the order of seconds, realizing dynamic LISLs is currently unfeasible. Towards the realization of dynamic LISLs, we first study the quantification of LISL setup delay; then we calculate the end-to-end latency of a free-space optical satellite network (FSOSN) with the LISL setup delay; subsequently, we analyze the impact of LISL setup delay on the end-to-end latency of the FSOSN. We also provide design guidelines for the laser communication terminal manufacturers in the form of maximum tolerable value of LISL setup delay for which the FSOSN based on Starlink's Phase I satellite constellation will be meaningful to use for low-latency long-distance inter-continental data communications.
△ Less
Submitted 12 January, 2023;
originally announced January 2023.
-
Future Space Networks: Toward the Next Giant Leap for Humankind
Authors:
Mohammed Y. Abdelsadek,
Aizaz U. Chaudhry,
Tasneem Darwish,
Eylem Erdogan,
Gunes Karabulut-Kurt,
Pablo G. Madoery,
Olfa Ben Yahia,
Halim Yanikomeroglu
Abstract:
Due to the unprecedented advances in satellite fabrication and deployment, innovative communications and networking technologies, ambitious space projects and programs, and the resurgence of interest in satellite networks, there is a need to redefine space networks (SpaceNets) to incorporate all of these evolutions. This paper introduces a vision for future SpaceNets that considers advances in sev…
▽ More
Due to the unprecedented advances in satellite fabrication and deployment, innovative communications and networking technologies, ambitious space projects and programs, and the resurgence of interest in satellite networks, there is a need to redefine space networks (SpaceNets) to incorporate all of these evolutions. This paper introduces a vision for future SpaceNets that considers advances in several related domains. First, we present a reference architecture that captures the various network entities and terminals in a holistic manner. Based on this, space, air, and ground use cases are studied. Then, the architectures and technologies that enable the envisaged SpaceNets are investigated. In so doing, we highlight the activities and projects of different standardization bodies, satellite operators, and national organizations towards the envisioned SpaceNets. Finally, the challenges, potential solutions, and open issues from communications and networking perspectives are discussed.
△ Less
Submitted 11 December, 2022;
originally announced December 2022.
-
NEVIS'22: A Stream of 100 Tasks Sampled from 30 Years of Computer Vision Research
Authors:
Jorg Bornschein,
Alexandre Galashov,
Ross Hemsley,
Amal Rannen-Triki,
Yutian Chen,
Arslan Chaudhry,
Xu Owen He,
Arthur Douillard,
Massimo Caccia,
Qixuang Feng,
Jiajun Shen,
Sylvestre-Alvise Rebuffi,
Kitty Stacpoole,
Diego de las Casas,
Will Hawkins,
Angeliki Lazaridou,
Yee Whye Teh,
Andrei A. Rusu,
Razvan Pascanu,
Marc'Aurelio Ranzato
Abstract:
A shared goal of several machine learning communities like continual learning, meta-learning and transfer learning, is to design algorithms and models that efficiently and robustly adapt to unseen tasks. An even more ambitious goal is to build models that never stop adapting, and that become increasingly more efficient through time by suitably transferring the accrued knowledge. Beyond the study o…
▽ More
A shared goal of several machine learning communities like continual learning, meta-learning and transfer learning, is to design algorithms and models that efficiently and robustly adapt to unseen tasks. An even more ambitious goal is to build models that never stop adapting, and that become increasingly more efficient through time by suitably transferring the accrued knowledge. Beyond the study of the actual learning algorithm and model architecture, there are several hurdles towards our quest to build such models, such as the choice of learning protocol, metric of success and data needed to validate research hypotheses. In this work, we introduce the Never-Ending VIsual-classification Stream (NEVIS'22), a benchmark consisting of a stream of over 100 visual classification tasks, sorted chronologically and extracted from papers sampled uniformly from computer vision proceedings spanning the last three decades. The resulting stream reflects what the research community thought was meaningful at any point in time, and it serves as an ideal test bed to assess how well models can adapt to new tasks, and do so better and more efficiently as time goes by. Despite being limited to classification, the resulting stream has a rich diversity of tasks from OCR, to texture analysis, scene recognition, and so forth. The diversity is also reflected in the wide range of dataset sizes, spanning over four orders of magnitude. Overall, NEVIS'22 poses an unprecedented challenge for current sequential learning approaches due to the scale and diversity of tasks, yet with a low entry barrier as it is limited to a single modality and well understood supervised learning problems. Moreover, we provide a reference implementation including strong baselines and an evaluation protocol to compare methods in terms of their trade-off between accuracy and compute.
△ Less
Submitted 16 May, 2023; v1 submitted 15 November, 2022;
originally announced November 2022.
-
When does mixup promote local linearity in learned representations?
Authors:
Arslan Chaudhry,
Aditya Krishna Menon,
Andreas Veit,
Sadeep Jayasumana,
Srikumar Ramalingam,
Sanjiv Kumar
Abstract:
Mixup is a regularization technique that artificially produces new samples using convex combinations of original training points. This simple technique has shown strong empirical performance, and has been heavily used as part of semi-supervised learning techniques such as mixmatch~\citep{berthelot2019mixmatch} and interpolation consistent training (ICT)~\citep{verma2019interpolation}. In this pape…
▽ More
Mixup is a regularization technique that artificially produces new samples using convex combinations of original training points. This simple technique has shown strong empirical performance, and has been heavily used as part of semi-supervised learning techniques such as mixmatch~\citep{berthelot2019mixmatch} and interpolation consistent training (ICT)~\citep{verma2019interpolation}. In this paper, we look at Mixup through a \emph{representation learning} lens in a semi-supervised learning setup. In particular, we study the role of Mixup in promoting linearity in the learned network representations. Towards this, we study two questions: (1) how does the Mixup loss that enforces linearity in the \emph{last} network layer propagate the linearity to the \emph{earlier} layers?; and (2) how does the enforcement of stronger Mixup loss on more than two data points affect the convergence of training? We empirically investigate these properties of Mixup on vision datasets such as CIFAR-10, CIFAR-100 and SVHN. Our results show that supervised Mixup training does not make \emph{all} the network layers linear; in fact the \emph{intermediate layers} become more non-linear during Mixup training compared to a network that is trained \emph{without} Mixup. However, when Mixup is used as an unsupervised loss, we observe that all the network layers become more linear resulting in faster training convergence.
△ Less
Submitted 28 October, 2022;
originally announced October 2022.
-
Lung nodules segmentation from CT with DeepHealth toolkit
Authors:
Hafiza Ayesha Hoor Chaudhry,
Riccardo Renzulli,
Daniele Perlo,
Francesca Santinelli,
Stefano Tibaldi,
Carmen Cristiano,
Marco Grosso,
Attilio Fiandrotti,
Maurizio Lucenteforte,
Davide Cavagnino
Abstract:
The accurate and consistent border segmentation plays an important role in the tumor volume estimation and its treatment in the field of Medical Image Segmentation. Globally, Lung cancer is one of the leading causes of death and the early detection of lung nodules is essential for the early cancer diagnosis and survival rate of patients. The goal of this study was to demonstrate the feasibility of…
▽ More
The accurate and consistent border segmentation plays an important role in the tumor volume estimation and its treatment in the field of Medical Image Segmentation. Globally, Lung cancer is one of the leading causes of death and the early detection of lung nodules is essential for the early cancer diagnosis and survival rate of patients. The goal of this study was to demonstrate the feasibility of Deephealth toolkit including PyECVL and PyEDDL libraries to precisely segment lung nodules. Experiments for lung nodules segmentation has been carried out on UniToChest using PyECVL and PyEDDL, for data pre-processing as well as neural network training. The results depict accurate segmentation of lung nodules across a wide diameter range and better accuracy over a traditional detection approach. The datasets and the code used in this paper are publicly available as a baseline reference.
△ Less
Submitted 1 August, 2022;
originally announced August 2022.
-
A Transparency Index Framework for AI in Education
Authors:
Muhammad Ali Chaudhry,
Mutlu Cukurova,
Rose Luckin
Abstract:
Numerous AI ethics checklists and frameworks have been proposed focusing on different dimensions of ethical AI such as fairness, explainability, and safety. Yet, no such work has been done on develo** transparent AI systems for real-world educational scenarios. This paper presents a Transparency Index framework that has been iteratively co-designed with different stakeholders of AI in education,…
▽ More
Numerous AI ethics checklists and frameworks have been proposed focusing on different dimensions of ethical AI such as fairness, explainability, and safety. Yet, no such work has been done on develo** transparent AI systems for real-world educational scenarios. This paper presents a Transparency Index framework that has been iteratively co-designed with different stakeholders of AI in education, including educators, ed-tech experts, and AI practitioners. We map the requirements of transparency for different categories of stakeholders of AI in education and demonstrate that transparency considerations are embedded in the entire AI development process from the data collection stage until the AI system is deployed in the real world and iteratively improved. We also demonstrate how transparency enables the implementation of other ethical AI dimensions in Education like interpretability, accountability, and safety. In conclusion, we discuss the directions for future research in this newly emerging field. The main contribution of this study is that it highlights the importance of transparency in develo** AI-powered educational technologies and proposes an index framework for its conceptualization for AI in education.
△ Less
Submitted 9 May, 2022;
originally announced June 2022.
-
On Crossover Distance for Optical Wireless Satellite Networks and Optical Fiber Terrestrial Networks
Authors:
Aizaz U. Chaudhry,
Halim Yanikomeroglu
Abstract:
Optical wireless satellite networks (OWSNs) can provide lower latency data communications compared to optical fiber terrestrial networks (OFTNs). The crossover function enables to calculate the crossover distance for an OWSN and an OFTN. If the distance between two points on Earth is greater than the crossover distance, then switching or crossing over from the OFTN to the OWSN results in lower lat…
▽ More
Optical wireless satellite networks (OWSNs) can provide lower latency data communications compared to optical fiber terrestrial networks (OFTNs). The crossover function enables to calculate the crossover distance for an OWSN and an OFTN. If the distance between two points on Earth is greater than the crossover distance, then switching or crossing over from the OFTN to the OWSN results in lower latency for data communications between these points. In this work, we extend the previously proposed crossover function for a scenario such that intermediate satellites (or hops) are incorporated between ingress and egress satellites in the OWSN for a more realistic calculation of the crossover distance in this scenario. We consider different OWSNs with different satellite altitudes and different OFTNs with different optical fiber refractive indexes, and we study the effect of the number of hops on the crossover distance and length of a laser inter-satellite link (LISL). It is observed from the numerical results that the crossover distance increases with an increase in the number of hops, and this increase is higher at higher satellite altitudes in OWSNs and lower refractive indexes in OFTNs. Furthermore, an inverse relationship between the crossover distance and length of a LISL is observed. With an increase in the number of hops, the length of a LISL decreases as opposed to the crossover distance.
△ Less
Submitted 14 August, 2022; v1 submitted 6 June, 2022;
originally announced June 2022.
-
Architecture Matters in Continual Learning
Authors:
Seyed Iman Mirzadeh,
Arslan Chaudhry,
Dong Yin,
Timothy Nguyen,
Razvan Pascanu,
Dilan Gorur,
Mehrdad Farajtabar
Abstract:
A large body of research in continual learning is devoted to overcoming the catastrophic forgetting of neural networks by designing new algorithms that are robust to the distribution shifts. However, the majority of these works are strictly focused on the "algorithmic" part of continual learning for a "fixed neural network architecture", and the implications of using different architectures are mo…
▽ More
A large body of research in continual learning is devoted to overcoming the catastrophic forgetting of neural networks by designing new algorithms that are robust to the distribution shifts. However, the majority of these works are strictly focused on the "algorithmic" part of continual learning for a "fixed neural network architecture", and the implications of using different architectures are mostly neglected. Even the few existing continual learning methods that modify the model assume a fixed architecture and aim to develop an algorithm that efficiently uses the model throughout the learning experience. However, in this work, we show that the choice of architecture can significantly impact the continual learning performance, and different architectures lead to different trade-offs between the ability to remember previous tasks and learning new ones. Moreover, we study the impact of various architectural decisions, and our findings entail best practices and recommendations that can improve the continual learning performance.
△ Less
Submitted 1 February, 2022;
originally announced February 2022.
-
Wide Neural Networks Forget Less Catastrophically
Authors:
Seyed Iman Mirzadeh,
Arslan Chaudhry,
Dong Yin,
Huiyi Hu,
Razvan Pascanu,
Dilan Gorur,
Mehrdad Farajtabar
Abstract:
A primary focus area in continual learning research is alleviating the "catastrophic forgetting" problem in neural networks by designing new algorithms that are more robust to the distribution shifts. While the recent progress in continual learning literature is encouraging, our understanding of what properties of neural networks contribute to catastrophic forgetting is still limited. To address t…
▽ More
A primary focus area in continual learning research is alleviating the "catastrophic forgetting" problem in neural networks by designing new algorithms that are more robust to the distribution shifts. While the recent progress in continual learning literature is encouraging, our understanding of what properties of neural networks contribute to catastrophic forgetting is still limited. To address this, instead of focusing on continual learning algorithms, in this work, we focus on the model itself and study the impact of "width" of the neural network architecture on catastrophic forgetting, and show that width has a surprisingly significant effect on forgetting. To explain this effect, we study the learning dynamics of the network from various perspectives such as gradient orthogonality, sparsity, and lazy training regime. We provide potential explanations that are consistent with the empirical results across different architectures and continual learning benchmarks.
△ Less
Submitted 14 July, 2022; v1 submitted 21 October, 2021;
originally announced October 2021.
-
Multilevel Knowledge Transfer for Cross-Domain Object Detection
Authors:
Botos Csaba,
Xiaojuan Qi,
Arslan Chaudhry,
Puneet Dokania,
Philip Torr
Abstract:
Domain shift is a well known problem where a model trained on a particular domain (source) does not perform well when exposed to samples from a different domain (target). Unsupervised methods that can adapt to domain shift are highly desirable as they allow effective utilization of the source data without requiring additional annotated training data from the target. Practically, obtaining sufficie…
▽ More
Domain shift is a well known problem where a model trained on a particular domain (source) does not perform well when exposed to samples from a different domain (target). Unsupervised methods that can adapt to domain shift are highly desirable as they allow effective utilization of the source data without requiring additional annotated training data from the target. Practically, obtaining sufficient amount of annotated data from the target domain can be both infeasible and extremely expensive. In this work, we address the domain shift problem for the object detection task. Our approach relies on gradually removing the domain shift between the source and the target domains. The key ingredients to our approach are -- (a) map** the source to the target domain on pixel-level; (b) training a teacher network on the mapped source and the unannotated target domain using adversarial feature alignment; and (c) finally training a student network using the pseudo-labels obtained from the teacher. Experimentally, when tested on challenging scenarios involving domain shift, we consistently obtain significantly large performance gains over various recent state of the art approaches.
△ Less
Submitted 3 August, 2021; v1 submitted 2 August, 2021;
originally announced August 2021.
-
A Fast Heuristic for Gateway Location in Wireless Backhaul of 5G Ultra-Dense Networks
Authors:
Mital Raithatha,
Aizaz U. Chaudhry,
Roshdy H. M. Hafez,
John W. Chinneck
Abstract:
In 5G Ultra-Dense Networks, a distributed wireless backhaul is an attractive solution for forwarding traffic to the core. The macro-cell coverage area is divided into many small cells. A few of these cells are designated as gateways and are linked to the core by high-capacity fiber optic links. Each small cell is associated with one gateway and all small cells forward their traffic to their respec…
▽ More
In 5G Ultra-Dense Networks, a distributed wireless backhaul is an attractive solution for forwarding traffic to the core. The macro-cell coverage area is divided into many small cells. A few of these cells are designated as gateways and are linked to the core by high-capacity fiber optic links. Each small cell is associated with one gateway and all small cells forward their traffic to their respective gateway through multi-hop mesh networks. We investigate the gateway location problem and show that finding near-optimal gateway locations improves the backhaul network capacity. An exact p-median integer linear program is formulated for comparison with our novel K-GA heuristic that combines a Genetic Algorithm (GA) with K-means clustering to find near-optimal gateway locations. We compare the performance of KGA with six other approaches in terms of average number of hops and backhaul network capacity at different node densities through extensive Monte Carlo simulations. All approaches are tested in various user distribution scenarios, including uniform distribution, bivariate Gaussian distribution, and cluster distribution. In all cases K-GA provides near-optimal results, achieving average number of hops and backhaul network capacity within 2% of optimal while saving an average of 95% of the execution time.
△ Less
Submitted 22 February, 2021;
originally announced March 2021.
-
Safely Learning Dynamical Systems from Short Trajectories
Authors:
Amir Ali Ahmadi,
Abraar Chaudhry,
Vikas Sindhwani,
Stephen Tu
Abstract:
A fundamental challenge in learning to control an unknown dynamical system is to reduce model uncertainty by making measurements while maintaining safety. In this work, we formulate a mathematical definition of what it means to safely learn a dynamical system by sequentially deciding where to initialize the next trajectory. In our framework, the state of the system is required to stay within a giv…
▽ More
A fundamental challenge in learning to control an unknown dynamical system is to reduce model uncertainty by making measurements while maintaining safety. In this work, we formulate a mathematical definition of what it means to safely learn a dynamical system by sequentially deciding where to initialize the next trajectory. In our framework, the state of the system is required to stay within a given safety region under the (possibly repeated) action of all dynamical systems that are consistent with the information gathered so far. For our first two results, we consider the setting of safely learning linear dynamics. We present a linear programming-based algorithm that either safely recovers the true dynamics from trajectories of length one, or certifies that safe learning is impossible. We also give an efficient semidefinite representation of the set of initial conditions whose resulting trajectories of length two are guaranteed to stay in the safety region. For our final result, we study the problem of safely learning a nonlinear dynamical system. We give a second-order cone programming based representation of the set of initial conditions that are guaranteed to remain in the safety region after one application of the system dynamics.
△ Less
Submitted 24 November, 2020;
originally announced November 2020.
-
Continual Learning in Low-rank Orthogonal Subspaces
Authors:
Arslan Chaudhry,
Naeemullah Khan,
Puneet K. Dokania,
Philip H. S. Torr
Abstract:
In continual learning (CL), a learner is faced with a sequence of tasks, arriving one after the other, and the goal is to remember all the tasks once the continual learning experience is finished. The prior art in CL uses episodic memory, parameter regularization or extensible network structures to reduce interference among tasks, but in the end, all the approaches learn different tasks in a joint…
▽ More
In continual learning (CL), a learner is faced with a sequence of tasks, arriving one after the other, and the goal is to remember all the tasks once the continual learning experience is finished. The prior art in CL uses episodic memory, parameter regularization or extensible network structures to reduce interference among tasks, but in the end, all the approaches learn different tasks in a joint vector space. We believe this invariably leads to interference among different tasks. We propose to learn tasks in different (low-rank) vector subspaces that are kept orthogonal to each other in order to minimize interference. Further, to keep the gradients of different tasks coming from these subspaces orthogonal to each other, we learn isometric map**s by posing network training as an optimization problem over the Stiefel manifold. To the best of our understanding, we report, for the first time, strong results over experience-replay baseline with and without memory on standard classification benchmarks in continual learning. The code is made publicly available.
△ Less
Submitted 8 December, 2020; v1 submitted 22 October, 2020;
originally announced October 2020.
-
A Secure and Improved Multi Server Authentication Protocol Using Fuzzy Commitment
Authors:
Hafeez Ur Rehman,
Anwar Ghani,
Shehzad Ashraf Chaudhry,
Mohammed H. Alsharif,
Narjes Nabipour
Abstract:
Very recently, Barman et al. proposed a multi-server authentication protocol using fuzzy commitment. The authors claimed that their protocol provides anonymity while resisting all known attacks. In this paper, we analyze that Barman et al.'s protocol is still vulnerable to anonymity violation attack and impersonation based on the stolen smart attack; moreover, it has scalability issues. We then pr…
▽ More
Very recently, Barman et al. proposed a multi-server authentication protocol using fuzzy commitment. The authors claimed that their protocol provides anonymity while resisting all known attacks. In this paper, we analyze that Barman et al.'s protocol is still vulnerable to anonymity violation attack and impersonation based on the stolen smart attack; moreover, it has scalability issues. We then propose an improved and enhanced protocol to overcome the security weaknesses of Barman et al.'s scheme. The security of the proposed protocol is verified using BAN logic and widely accepted automated AVISPA tool. The BAN logic and automated AVISPA along with the informal analysis ensures the robustness of the scheme against all known attacks
△ Less
Submitted 16 April, 2020;
originally announced April 2020.
-
Using Hindsight to Anchor Past Knowledge in Continual Learning
Authors:
Arslan Chaudhry,
Albert Gordo,
Puneet K. Dokania,
Philip Torr,
David Lopez-Paz
Abstract:
In continual learning, the learner faces a stream of data whose distribution changes over time. Modern neural networks are known to suffer under this setting, as they quickly forget previously acquired knowledge. To address such catastrophic forgetting, many continual learning methods implement different types of experience replay, re-learning on past data stored in a small buffer known as episodi…
▽ More
In continual learning, the learner faces a stream of data whose distribution changes over time. Modern neural networks are known to suffer under this setting, as they quickly forget previously acquired knowledge. To address such catastrophic forgetting, many continual learning methods implement different types of experience replay, re-learning on past data stored in a small buffer known as episodic memory. In this work, we complement experience replay with a new objective that we call anchoring, where the learner uses bilevel optimization to update its knowledge on the current task, while kee** intact the predictions on some anchor points of past tasks. These anchor points are learned using gradient-based optimization to maximize forgetting, which is approximated by fine-tuning the currently trained model on the episodic memory of past tasks. Experiments on several supervised learning benchmarks for continual learning demonstrate that our approach improves the standard experience replay in terms of both accuracy and forgetting metrics and for various sizes of episodic memories.
△ Less
Submitted 2 March, 2021; v1 submitted 19 February, 2020;
originally announced February 2020.
-
On Tiny Episodic Memories in Continual Learning
Authors:
Arslan Chaudhry,
Marcus Rohrbach,
Mohamed Elhoseiny,
Thalaiyasingam Ajanthan,
Puneet K. Dokania,
Philip H. S. Torr,
Marc'Aurelio Ranzato
Abstract:
In continual learning (CL), an agent learns from a stream of tasks leveraging prior experience to transfer knowledge to future tasks. It is an ideal framework to decrease the amount of supervision in the existing learning algorithms. But for a successful knowledge transfer, the learner needs to remember how to perform previous tasks. One way to endow the learner the ability to perform tasks seen i…
▽ More
In continual learning (CL), an agent learns from a stream of tasks leveraging prior experience to transfer knowledge to future tasks. It is an ideal framework to decrease the amount of supervision in the existing learning algorithms. But for a successful knowledge transfer, the learner needs to remember how to perform previous tasks. One way to endow the learner the ability to perform tasks seen in the past is to store a small memory, dubbed episodic memory, that stores few examples from previous tasks and then to replay these examples when training for future tasks. In this work, we empirically analyze the effectiveness of a very small episodic memory in a CL setup where each training example is only seen once. Surprisingly, across four rather different supervised learning benchmarks adapted to CL, a very simple baseline, that jointly trains on both examples from the current task as well as examples stored in the episodic memory, significantly outperforms specifically designed CL approaches with and without episodic memory. Interestingly, we find that repetitive training on even tiny memories of past tasks does not harm generalization, on the contrary, it improves it, with gains between 7\% and 17\% when the memory is populated with a single example per class.
△ Less
Submitted 4 June, 2019; v1 submitted 27 February, 2019;
originally announced February 2019.
-
Efficient Lifelong Learning with A-GEM
Authors:
Arslan Chaudhry,
Marc'Aurelio Ranzato,
Marcus Rohrbach,
Mohamed Elhoseiny
Abstract:
In lifelong learning, the learner is presented with a sequence of tasks, incrementally building a data-driven prior which may be leveraged to speed up learning of a new task. In this work, we investigate the efficiency of current lifelong approaches, in terms of sample complexity, computational and memory cost. Towards this end, we first introduce a new and a more realistic evaluation protocol, wh…
▽ More
In lifelong learning, the learner is presented with a sequence of tasks, incrementally building a data-driven prior which may be leveraged to speed up learning of a new task. In this work, we investigate the efficiency of current lifelong approaches, in terms of sample complexity, computational and memory cost. Towards this end, we first introduce a new and a more realistic evaluation protocol, whereby learners observe each example only once and hyper-parameter selection is done on a small and disjoint set of tasks, which is not used for the actual learning experience and evaluation. Second, we introduce a new metric measuring how quickly a learner acquires a new skill. Third, we propose an improved version of GEM (Lopez-Paz & Ranzato, 2017), dubbed Averaged GEM (A-GEM), which enjoys the same or even better performance as GEM, while being almost as computationally and memory efficient as EWC (Kirkpatrick et al., 2016) and other regularization-based methods. Finally, we show that all algorithms including A-GEM can learn even more quickly if they are provided with task descriptors specifying the classification tasks under consideration. Our experiments on several standard lifelong learning benchmarks demonstrate that A-GEM has the best trade-off between accuracy and efficiency.
△ Less
Submitted 9 January, 2019; v1 submitted 2 December, 2018;
originally announced December 2018.
-
Riemannian Walk for Incremental Learning: Understanding Forgetting and Intransigence
Authors:
Arslan Chaudhry,
Puneet K. Dokania,
Thalaiyasingam Ajanthan,
Philip H. S. Torr
Abstract:
Incremental learning (IL) has received a lot of attention recently, however, the literature lacks a precise problem definition, proper evaluation settings, and metrics tailored specifically for the IL problem. One of the main objectives of this work is to fill these gaps so as to provide a common ground for better understanding of IL. The main challenge for an IL algorithm is to update the classif…
▽ More
Incremental learning (IL) has received a lot of attention recently, however, the literature lacks a precise problem definition, proper evaluation settings, and metrics tailored specifically for the IL problem. One of the main objectives of this work is to fill these gaps so as to provide a common ground for better understanding of IL. The main challenge for an IL algorithm is to update the classifier whilst preserving existing knowledge. We observe that, in addition to forgetting, a known issue while preserving knowledge, IL also suffers from a problem we call intransigence, inability of a model to update its knowledge. We introduce two metrics to quantify forgetting and intransigence that allow us to understand, analyse, and gain better insights into the behaviour of IL algorithms. We present RWalk, a generalization of EWC++ (our efficient version of EWC [Kirkpatrick2016EWC]) and Path Integral [Zenke2017Continual] with a theoretically grounded KL-divergence based perspective. We provide a thorough analysis of various IL algorithms on MNIST and CIFAR-100 datasets. In these experiments, RWalk obtains superior results in terms of accuracy, and also provides a better trade-off between forgetting and intransigence.
△ Less
Submitted 14 August, 2018; v1 submitted 30 January, 2018;
originally announced January 2018.
-
Discovering Class-Specific Pixels for Weakly-Supervised Semantic Segmentation
Authors:
Arslan Chaudhry,
Puneet K. Dokania,
Philip H. S. Torr
Abstract:
We propose an approach to discover class-specific pixels for the weakly-supervised semantic segmentation task. We show that properly combining saliency and attention maps allows us to obtain reliable cues capable of significantly boosting the performance. First, we propose a simple yet powerful hierarchical approach to discover the class-agnostic salient regions, obtained using a salient object de…
▽ More
We propose an approach to discover class-specific pixels for the weakly-supervised semantic segmentation task. We show that properly combining saliency and attention maps allows us to obtain reliable cues capable of significantly boosting the performance. First, we propose a simple yet powerful hierarchical approach to discover the class-agnostic salient regions, obtained using a salient object detector, which otherwise would be ignored. Second, we use fully convolutional attention maps to reliably localize the class-specific regions in a given image. We combine these two cues to discover class-specific pixels which are then used as an approximate ground truth for training a CNN. While solving the weakly supervised semantic segmentation task, we ensure that the image-level classification task is also solved in order to enforce the CNN to assign at least one pixel to each object present in the image. Experimentally, on the PASCAL VOC12 val and test sets, we obtain the mIoU of 60.8% and 61.9%, achieving the performance gains of 5.1% and 5.2% compared to the published state-of-the-art results. The code is made publicly available.
△ Less
Submitted 18 July, 2017;
originally announced July 2017.
-
Personal Data: Thinking Inside the Box
Authors:
Hamed Haddadi,
Heidi Howard,
Amir Chaudhry,
Jon Crowcroft,
Anil Madhavapeddy,
Richard Mortier
Abstract:
We propose there is a need for a technical platform enabling people to engage with the collection, management and consumption of personal data; and that this platform should itself be personal, under the direct control of the individual whose data it holds. In what follows, we refer to this platform as the Databox, a personal, networked service that collates personal data and can be used to make t…
▽ More
We propose there is a need for a technical platform enabling people to engage with the collection, management and consumption of personal data; and that this platform should itself be personal, under the direct control of the individual whose data it holds. In what follows, we refer to this platform as the Databox, a personal, networked service that collates personal data and can be used to make those data available. While your Databox is likely to be a virtual platform, in that it will involve multiple devices and services, at least one instance of it will exist in physical form such as on a physical form-factor computing device with associated storage and networking, such as a home hub.
△ Less
Submitted 20 January, 2015;
originally announced January 2015.
-
An Experience based Evaluation Process for ERP bids
Authors:
Adnan Al Bar,
Victor Basili,
Wajdi Al Jedaibi,
Abdul Jawad Chaudhry
Abstract:
Enterprise Resource Planning ERP systems integrate information across an entire organization that automate core activities such as finance accounting, human resources, manufacturing, production and supply chain management etc. to facilitate an integrated centralized system and rapid decision making resulting in cost reduction, greater planning, and increased control. Many organizations are updatin…
▽ More
Enterprise Resource Planning ERP systems integrate information across an entire organization that automate core activities such as finance accounting, human resources, manufacturing, production and supply chain management etc. to facilitate an integrated centralized system and rapid decision making resulting in cost reduction, greater planning, and increased control. Many organizations are updating their current management information systems with ERP systems. This is not a trivial task. They have to identify the organizations objectives and satisfy a myriad of stakeholders. They have to understand what business processes they have, how they can be improved, and what particular systems would best suit their needs. They have to understand how an ERP system is built, it involves the modification of an existing system with its own set of business rules. Deciding what to ask for and how to select the best option is a very complex operation and there is limited experience with this type of contracting in organizations. In this paper we discuss a particular experience with contracting out an ERP system, provide some lessons learned, and offer suggestions in how the RFP and bid selection processes could have been improved.
△ Less
Submitted 12 November, 2013;
originally announced November 2013.
-
On the Minimum Number of Transmissions in Single-Hop Wireless Coding Networks
Authors:
Salim Y. El Rouayheb,
Mohammad Asad R. Chaudhry,
Alex Sprintson
Abstract:
The advent of network coding presents promising opportunities in many areas of communication and networking. It has been recently shown that network coding technique can significantly increase the overall throughput of wireless networks by taking advantage of their broadcast nature. In wireless networks, each transmitted packet is broadcasted within a certain area and can be overheard by the nei…
▽ More
The advent of network coding presents promising opportunities in many areas of communication and networking. It has been recently shown that network coding technique can significantly increase the overall throughput of wireless networks by taking advantage of their broadcast nature. In wireless networks, each transmitted packet is broadcasted within a certain area and can be overheard by the neighboring nodes. When a node needs to transmit packets, it employs the opportunistic coding approach that uses the knowledge of what the node's neighbors have heard in order to reduce the number of transmissions. With this approach, each transmitted packet is a linear combination of the original packets over a certain finite field.
In this paper, we focus on the fundamental problem of finding the optimal encoding for the broadcasted packets that minimizes the overall number of transmissions. We show that this problem is NP-complete over GF(2) and establish several fundamental properties of the optimal solution. We also propose a simple heuristic solution for the problem based on graph coloring and present some empirical results for random settings.
△ Less
Submitted 5 July, 2007;
originally announced July 2007.