-
You Only Need One Step: Fast Super-Resolution with Stable Diffusion via Scale Distillation
Authors:
Mehdi Noroozi,
Isma Hadji,
Brais Martinez,
Adrian Bulat,
Georgios Tzimiropoulos
Abstract:
In this paper, we introduce YONOS-SR, a novel stable diffusion-based approach for image super-resolution that yields state-of-the-art results using only a single DDIM step. We propose a novel scale distillation approach to train our SR model. Instead of directly training our SR model on the scale factor of interest, we start by training a teacher model on a smaller magnification scale, thereby mak…
▽ More
In this paper, we introduce YONOS-SR, a novel stable diffusion-based approach for image super-resolution that yields state-of-the-art results using only a single DDIM step. We propose a novel scale distillation approach to train our SR model. Instead of directly training our SR model on the scale factor of interest, we start by training a teacher model on a smaller magnification scale, thereby making the SR problem simpler for the teacher. We then train a student model for a higher magnification scale, using the predictions of the teacher as a target during the training. This process is repeated iteratively until we reach the target scale factor of the final model. The rationale behind our scale distillation is that the teacher aids the student diffusion model training by i) providing a target adapted to the current noise level rather than using the same target coming from ground truth data for all noise levels and ii) providing an accurate target as the teacher has a simpler task to solve. We empirically show that the distilled model significantly outperforms the model trained for high scales directly, specifically with few steps during inference. Having a strong diffusion model that requires only one step allows us to freeze the U-Net and fine-tune the decoder on top of it. We show that the combination of spatially distilled U-Net and fine-tuned decoder outperforms state-of-the-art methods requiring 200 steps with only one single step.
△ Less
Submitted 30 January, 2024;
originally announced January 2024.
-
Implementation and Evaluation of Networked Model Predictive Control System on Universal Robot
Authors:
Mahsa Noroozi,
Kai Wang
Abstract:
Networked control systems are closed-loop feedback control systems containing system components that may be distributed geographically in different locations and interconnected via a communication network such as the Internet. The quality of network communication is a crucial factor that significantly affects the performance of remote control. This is due to the fact that network uncertainties can…
▽ More
Networked control systems are closed-loop feedback control systems containing system components that may be distributed geographically in different locations and interconnected via a communication network such as the Internet. The quality of network communication is a crucial factor that significantly affects the performance of remote control. This is due to the fact that network uncertainties can occur in the transmission of packets in the forward and backward channels of the system. The two most significant among these uncertainties are network time delay and packet loss. To overcome these challenges, the networked predictive control system has been proposed to provide improved performance and robustness using predictive controllers and compensation strategies. In particular, the model predictive control method is well-suited as an advanced approach compared to conventional methods. In this paper, a networked model predictive control system consisting of a model predictive control method and compensation strategies is implemented to control and stabilize a robot arm as a physical system. In particular, this work aims to analyze the performance of the system under the influence of network time delay and packet loss. Using appropriate performance and robustness metrics, an in-depth investigation of the impacts of these network uncertainties is performed. Furthermore, the forward and backward channels of the network are examined in detail in this study.
△ Less
Submitted 18 July, 2023;
originally announced July 2023.
-
Performance Evaluation of a New Scheduling Model Using Congestion Window Reservation
Authors:
Mahsa Noroozi,
Flavio Gallistl,
Majid Noroozi
Abstract:
Multipath QUIC is a transport protocol that allows for the use of multiple network interfaces for a single connection. It thereby offers, on the one hand, the possibility to gather a higher throughput, while, on the other hand, multiple paths can also be used to transmit data redundantly. Selective redundancy combines these two applications and thereby offers the potential to transmit time-critica…
▽ More
Multipath QUIC is a transport protocol that allows for the use of multiple network interfaces for a single connection. It thereby offers, on the one hand, the possibility to gather a higher throughput, while, on the other hand, multiple paths can also be used to transmit data redundantly. Selective redundancy combines these two applications and thereby offers the potential to transmit time-critical data. This paper considers scenarios where data with real-time requirements are transmitted redundantly while at the same time, non-critical data should make use of the aggregated throughput. A new model called congestion window reservation is proposed, which enables an immediate transmission of time-critical data. The performance of this method and its combination with selective redundancy is evaluated using emulab with real data. The results show that this technique leads to a smaller end-to-end latency and reliability for periodically generated priority data.
△ Less
Submitted 5 May, 2023;
originally announced May 2023.
-
Statistical Age-of-Information Bounds for Parallel Systems: When Do Independent Channels Make a Difference?
Authors:
Markus Fidler,
Jaya Champati,
Joerg Widmer,
Mahsa Noroozi
Abstract:
This paper contributes tail bounds of the age-of-information of a general class of parallel systems and explores their potential. Parallel systems arise in relevant cases, such as in multi-band mobile networks, multi-technology wireless access, or multi-path protocols, just to name a few. Typically, control over each communication channel is limited and random service outages and congestion cause…
▽ More
This paper contributes tail bounds of the age-of-information of a general class of parallel systems and explores their potential. Parallel systems arise in relevant cases, such as in multi-band mobile networks, multi-technology wireless access, or multi-path protocols, just to name a few. Typically, control over each communication channel is limited and random service outages and congestion cause buffering that impairs the age-of-information. The parallel use of independent channels promises a remedy, since outages on one channel may be compensated for by another. Surprisingly, for the well-known case of M$\mid$M$\mid$1 queues we find the opposite: pooling capacity in one channel performs better than a parallel system with the same total capacity. A generalization is not possible since there are no solutions for other types of parallel queues at hand. In this work, we prove a dual representation of age-of-information in min-plus algebra that connects to queueing models known from the theory of effective bandwidth/capacity and the stochastic network calculus. Exploiting these methods, we derive tail bounds of the age-of-information of parallel G$\mid$G$\mid$1 queues. In addition to parallel classical queues, we investigate Markov channels where, depending on the memory of the channel, we show the true advantage of parallel systems. We continue to investigate this new finding and provide insight into when capacity should be pooled in one channel or when independent parallel channels perform better. We complement our analysis with simulation results and evaluate different update policies, scheduling policies, and the use of heterogeneous channels that is most relevant for latest multi-band networks.
△ Less
Submitted 24 March, 2023;
originally announced March 2023.
-
An Interpretable Boosting-based Predictive Model for Transformation Temperatures of Shape Memory Alloys
Authors:
Sina Hossein Zadeh,
Amir Behbahanian,
John Broucek,
Mingzhou Fan,
Guillermo Vazquez Tovar,
Mohammad Noroozi,
William Trehern,
Xiaoning Qian,
Ibrahim Karaman,
Raymundo Arroyave
Abstract:
In this study, we demonstrate how the incorporation of appropriate feature engineering together with the selection of a Machine Learning (ML) algorithm that best suits the available dataset, leads to the development of a predictive model for transformation temperatures that can be applied to a wide range of shape memory alloys. We develop a gradient boosting ML surrogate model capable of predictin…
▽ More
In this study, we demonstrate how the incorporation of appropriate feature engineering together with the selection of a Machine Learning (ML) algorithm that best suits the available dataset, leads to the development of a predictive model for transformation temperatures that can be applied to a wide range of shape memory alloys. We develop a gradient boosting ML surrogate model capable of predicting Martensite Start, Martensite Finish, Austenite Start, and Austenite Finish transformation temperatures with an average accuracy of more than 95% by explicitly taking care of potential distribution changes when modeling different alloy systems. We included heat treatment, rolling, extrusion processing parameters, and alloy system categorical features in the model input features to achieve more accurate and realistic results. In addition, using Shapley values, which are calculated based on the average marginal contribution of features to all possible coalitions, this study was able to gain insights into the governing features and their effect on predicted transformation temperatures, providing a unique opportunity to examine the critical parameters and features in martensite transformation temperatures.
△ Less
Submitted 4 February, 2023;
originally announced February 2023.
-
Unified Fully and Timestamp Supervised Temporal Action Segmentation via Sequence to Sequence Translation
Authors:
Nadine Behrmann,
S. Alireza Golestaneh,
Zico Kolter,
Juergen Gall,
Mehdi Noroozi
Abstract:
This paper introduces a unified framework for video action segmentation via sequence to sequence (seq2seq) translation in a fully and timestamp supervised setup. In contrast to current state-of-the-art frame-level prediction methods, we view action segmentation as a seq2seq translation task, i.e., map** a sequence of video frames to a sequence of action segments. Our proposed method involves a s…
▽ More
This paper introduces a unified framework for video action segmentation via sequence to sequence (seq2seq) translation in a fully and timestamp supervised setup. In contrast to current state-of-the-art frame-level prediction methods, we view action segmentation as a seq2seq translation task, i.e., map** a sequence of video frames to a sequence of action segments. Our proposed method involves a series of modifications and auxiliary loss functions on the standard Transformer seq2seq translation model to cope with long input sequences opposed to short output sequences and relatively few videos. We incorporate an auxiliary supervision signal for the encoder via a frame-wise loss and propose a separate alignment decoder for an implicit duration prediction. Finally, we extend our framework to the timestamp supervised setting via our proposed constrained k-medoids algorithm to generate pseudo-segmentations. Our proposed framework performs consistently on both fully and timestamp supervised settings, outperforming or competing state-of-the-art on several datasets. Our code is publicly available at https://github.com/boschresearch/UVAST.
△ Less
Submitted 11 October, 2022; v1 submitted 1 September, 2022;
originally announced September 2022.
-
Performance Analysis of Universal Robot Control System Using Networked Predictive Control
Authors:
Mahsa Noroozi,
Lorenz Kies
Abstract:
Networked control systems are feedback control systems with system components distributed at different locations connected through a communication network. Since the communication network is carried out through the internet and there are bandwidth and packet size limitations, network constraints appear. Some of these constraints are time delay and packet loss. These network limitations can degrade…
▽ More
Networked control systems are feedback control systems with system components distributed at different locations connected through a communication network. Since the communication network is carried out through the internet and there are bandwidth and packet size limitations, network constraints appear. Some of these constraints are time delay and packet loss. These network limitations can degrade the performance and even destabilize the system. To overcome the adverse effect of these communication constraints, various approaches have been developed, among which a representative one is networked predictive control. This approach proposes a controller, which compensates for the network time delay and packet loss actively. This paper aims at implementing a networked predictive control system for controlling a robot arm through a computer network. The network delay is accounted for by a predictor, while the potential of packet loss is mitigated using redundant control packets. The results will show the stability of the system despite a high delay and a considerable packet loss. Additionally, improvements to previous networked predictive control systems will be suggested and an increase in performance can be shown. Lastly, the effects of different system and environment parameters on the control loop will be investigated.
△ Less
Submitted 16 August, 2022; v1 submitted 18 July, 2022;
originally announced July 2022.
-
Sparse Subspace Clustering in Diverse Multiplex Network Model
Authors:
Majid Noroozi,
Marianna Pensky
Abstract:
The paper considers the DIverse MultiPLEx (DIMPLE) network model, introduced in Pensky and Wang (2021), where all layers of the network have the same collection of nodes and are equipped with the Stochastic Block Models. In addition, all layers can be partitioned into groups with the same community structures, although the layers in the same group may have different matrices of block connection pr…
▽ More
The paper considers the DIverse MultiPLEx (DIMPLE) network model, introduced in Pensky and Wang (2021), where all layers of the network have the same collection of nodes and are equipped with the Stochastic Block Models. In addition, all layers can be partitioned into groups with the same community structures, although the layers in the same group may have different matrices of block connection probabilities. The DIMPLE model generalizes a multitude of papers that study multilayer networks with the same community structures in all layers, as well as the Mixture Multilayer Stochastic Block Model (MMLSBM), where the layers in the same group have identical matrices of block connection probabilities. While Pensky and Wang (2021) applied spectral clustering to the proxy of the adjacency tensor, the present paper uses Sparse Subspace Clustering (SSC) for identifying groups of layers with identical community structures. Under mild conditions, the latter leads to the strongly consistent between-layer clustering. In addition, SSC allows to handle much larger networks than methodology of Pensky and Wang (2021), and is perfectly suitable for application of parallel computing.
△ Less
Submitted 25 April, 2023; v1 submitted 15 June, 2022;
originally announced June 2022.
-
Age- and Deviation-of-Information of Time-Triggered and Event-Triggered Systems
Authors:
Mahsa Noroozi,
Markus Fidler
Abstract:
Age-of-information is a metric that quantifies the freshness of information obtained by sampling a remote sensor. In signal-agnostic sampling, sensor updates are triggered at certain times without being conditioned on the actual sensor signal. Optimal update policies have been researched and it is accepted that periodic updates achieve smaller age-of-information than random updates. We contribute…
▽ More
Age-of-information is a metric that quantifies the freshness of information obtained by sampling a remote sensor. In signal-agnostic sampling, sensor updates are triggered at certain times without being conditioned on the actual sensor signal. Optimal update policies have been researched and it is accepted that periodic updates achieve smaller age-of-information than random updates. We contribute a study of a signal-aware policy, where updates are triggered by a random sensor event. By definition, this implies random updates and as a consequence inferior age-of-information. Considering a notion of deviation-of-information as a signal-aware metric, our results show, however, that event-triggered systems can perform equally well as time-triggered systems while causing smaller mean network utilization.
△ Less
Submitted 3 June, 2022;
originally announced June 2022.
-
Ranking Info Noise Contrastive Estimation: Boosting Contrastive Learning via Ranked Positives
Authors:
David T. Hoffmann,
Nadine Behrmann,
Juergen Gall,
Thomas Brox,
Mehdi Noroozi
Abstract:
This paper introduces Ranking Info Noise Contrastive Estimation (RINCE), a new member in the family of InfoNCE losses that preserves a ranked ordering of positive samples. In contrast to the standard InfoNCE loss, which requires a strict binary separation of the training pairs into similar and dissimilar samples, RINCE can exploit information about a similarity ranking for learning a corresponding…
▽ More
This paper introduces Ranking Info Noise Contrastive Estimation (RINCE), a new member in the family of InfoNCE losses that preserves a ranked ordering of positive samples. In contrast to the standard InfoNCE loss, which requires a strict binary separation of the training pairs into similar and dissimilar samples, RINCE can exploit information about a similarity ranking for learning a corresponding embedding space. We show that the proposed loss function learns favorable embeddings compared to the standard InfoNCE whenever at least noisy ranking information can be obtained or when the definition of positives and negatives is blurry. We demonstrate this for a supervised classification task with additional superclass labels and noisy similarity scores. Furthermore, we show that RINCE can also be applied to unsupervised training with experiments on unsupervised representation learning from videos. In particular, the embedding yields higher classification accuracy, retrieval rates and performs better in out-of-distribution detection than the standard InfoNCE loss.
△ Less
Submitted 27 January, 2022;
originally announced January 2022.
-
A Min-plus Model of Age-of-Information with Worst-case and Statistical Bounds
Authors:
Mahsa Noroozi,
Markus Fidler
Abstract:
We consider networked sources that generate update messages with a defined rate and we investigate the age of that information at the receiver. Typical applications are in cyber-physical systems that depend on timely sensor updates. We phrase the age of information in the min-plus algebra of the network calculus. This facilitates a variety of models including wireless channels and schedulers with…
▽ More
We consider networked sources that generate update messages with a defined rate and we investigate the age of that information at the receiver. Typical applications are in cyber-physical systems that depend on timely sensor updates. We phrase the age of information in the min-plus algebra of the network calculus. This facilitates a variety of models including wireless channels and schedulers with random cross-traffic, as well as sources with periodic and random updates, respectively. We show how the age of information depends on the network service where, e.g., outages of a wireless channel cause delays. Further, our analytical expressions show two regimes depending on the update rate, where the age of information is either dominated by congestive delays or by idle waiting. We find that the optimal update rate strikes a balance between these two effects.
△ Less
Submitted 22 December, 2021;
originally announced December 2021.
-
TaylorSwiftNet: Taylor Driven Temporal Modeling for Swift Future Frame Prediction
Authors:
Saber Pourheydari,
Emad Bahrami,
Mohsen Fayyaz,
Gianpiero Francesca,
Mehdi Noroozi,
Juergen Gall
Abstract:
While recurrent neural networks (RNNs) demonstrate outstanding capabilities for future video frame prediction, they model dynamics in a discrete time space, i.e., they predict the frames sequentially with a fixed temporal step. RNNs are therefore prone to accumulate the error as the number of future frames increases. In contrast, partial differential equations (PDEs) model physical phenomena like…
▽ More
While recurrent neural networks (RNNs) demonstrate outstanding capabilities for future video frame prediction, they model dynamics in a discrete time space, i.e., they predict the frames sequentially with a fixed temporal step. RNNs are therefore prone to accumulate the error as the number of future frames increases. In contrast, partial differential equations (PDEs) model physical phenomena like dynamics in a continuous time space. However, the estimated PDE for frame forecasting needs to be numerically solved, which is done by discretization of the PDE and diminishes most of the advantages compared to discrete models. In this work, we, therefore, propose to approximate the motion in a video by a continuous function using the Taylor series. To this end, we introduce TaylorSwiftNet, a novel convolutional neural network that learns to estimate the higher order terms of the Taylor series for a given input video. TaylorSwiftNet can swiftly predict future frames in parallel and it allows to change the temporal resolution of the forecast frames on-the-fly. The experimental results on various datasets demonstrate the superiority of our model.
△ Less
Submitted 12 October, 2022; v1 submitted 27 October, 2021;
originally announced October 2021.
-
Long Short View Feature Decomposition via Contrastive Video Representation Learning
Authors:
Nadine Behrmann,
Mohsen Fayyaz,
Juergen Gall,
Mehdi Noroozi
Abstract:
Self-supervised video representation methods typically focus on the representation of temporal attributes in videos. However, the role of stationary versus non-stationary attributes is less explored: Stationary features, which remain similar throughout the video, enable the prediction of video-level action classes. Non-stationary features, which represent temporally varying attributes, are more be…
▽ More
Self-supervised video representation methods typically focus on the representation of temporal attributes in videos. However, the role of stationary versus non-stationary attributes is less explored: Stationary features, which remain similar throughout the video, enable the prediction of video-level action classes. Non-stationary features, which represent temporally varying attributes, are more beneficial for downstream tasks involving more fine-grained temporal understanding, such as action segmentation. We argue that a single representation to capture both types of features is sub-optimal, and propose to decompose the representation space into stationary and non-stationary features via contrastive learning from long and short views, i.e. long video sequences and their shorter sub-sequences. Stationary features are shared between the short and long views, while non-stationary features aggregate the short views to match the corresponding long view. To empirically verify our approach, we demonstrate that our stationary features work particularly well on an action recognition downstream task, while our non-stationary features perform better on action segmentation. Furthermore, we analyse the learned representations and find that stationary features capture more temporally stable, static attributes, while non-stationary features encompass more temporally varying ones.
△ Less
Submitted 23 September, 2021;
originally announced September 2021.
-
Self-labeled Conditional GANs
Authors:
Mehdi Noroozi
Abstract:
This paper introduces a novel and fully unsupervised framework for conditional GAN training in which labels are automatically obtained from data. We incorporate a clustering network into the standard conditional GAN framework that plays against the discriminator. With the generator, it aims to find a shared structured map** for associating pseudo-labels with the real and fake images. Our generat…
▽ More
This paper introduces a novel and fully unsupervised framework for conditional GAN training in which labels are automatically obtained from data. We incorporate a clustering network into the standard conditional GAN framework that plays against the discriminator. With the generator, it aims to find a shared structured map** for associating pseudo-labels with the real and fake images. Our generator outperforms unconditional GANs in terms of FID with significant margins on large scale datasets like ImageNet and LSUN. It also outperforms class conditional GANs trained on human labels on CIFAR10 and CIFAR100 where fine-grained annotations or a large number of samples per class are not available. Additionally, our clustering network exceeds the state-of-the-art on CIFAR100 clustering.
△ Less
Submitted 3 December, 2020;
originally announced December 2020.
-
3D CNNs with Adaptive Temporal Feature Resolutions
Authors:
Mohsen Fayyaz,
Emad Bahrami,
Ali Diba,
Mehdi Noroozi,
Ehsan Adeli,
Luc Van Gool,
Juergen Gall
Abstract:
While state-of-the-art 3D Convolutional Neural Networks (CNN) achieve very good results on action recognition datasets, they are computationally very expensive and require many GFLOPs. While the GFLOPs of a 3D CNN can be decreased by reducing the temporal feature resolution within the network, there is no setting that is optimal for all input clips. In this work, we therefore introduce a different…
▽ More
While state-of-the-art 3D Convolutional Neural Networks (CNN) achieve very good results on action recognition datasets, they are computationally very expensive and require many GFLOPs. While the GFLOPs of a 3D CNN can be decreased by reducing the temporal feature resolution within the network, there is no setting that is optimal for all input clips. In this work, we therefore introduce a differentiable Similarity Guided Sampling (SGS) module, which can be plugged into any existing 3D CNN architecture. SGS empowers 3D CNNs by learning the similarity of temporal features and grou** similar features together. As a result, the temporal feature resolution is not anymore static but it varies for each input video clip. By integrating SGS as an additional layer within current 3D CNNs, we can convert them into much more efficient 3D CNNs with adaptive temporal feature resolutions (ATFR). Our evaluations show that the proposed module improves the state-of-the-art by reducing the computational cost (GFLOPs) by half while preserving or even improving the accuracy. We evaluate our module by adding it to multiple state-of-the-art 3D CNNs on various datasets such as Kinetics-600, Kinetics-400, mini-Kinetics, Something-Something V2, UCF101, and HMDB51.
△ Less
Submitted 11 August, 2021; v1 submitted 17 November, 2020;
originally announced November 2020.
-
Unsupervised Video Representation Learning by Bidirectional Feature Prediction
Authors:
Nadine Behrmann,
Juergen Gall,
Mehdi Noroozi
Abstract:
This paper introduces a novel method for self-supervised video representation learning via feature prediction. In contrast to the previous methods that focus on future feature prediction, we argue that a supervisory signal arising from unobserved past frames is complementary to one that originates from the future frames. The rationale behind our method is to encourage the network to explore the te…
▽ More
This paper introduces a novel method for self-supervised video representation learning via feature prediction. In contrast to the previous methods that focus on future feature prediction, we argue that a supervisory signal arising from unobserved past frames is complementary to one that originates from the future frames. The rationale behind our method is to encourage the network to explore the temporal structure of videos by distinguishing between future and past given present observations. We train our model in a contrastive learning framework, where joint encoding of future and past provides us with a comprehensive set of temporal hard negatives via swap**. We empirically show that utilizing both signals enriches the learned representations for the downstream task of action recognition. It outperforms independent prediction of future and past.
△ Less
Submitted 11 November, 2020;
originally announced November 2020.
-
The Hierarchy of Block Models
Authors:
Majid Noroozi,
Marianna Pensky
Abstract:
There exist various types of network block models such as the Stochastic Block Model (SBM), the Degree Corrected Block Model (DCBM), and the Popularity Adjusted Block Model (PABM). While this leads to a variety of choices, the block models do not have a nested structure. In addition, there is a substantial jump in the number of parameters from the DCBM to the PABM. The objective of this paper is f…
▽ More
There exist various types of network block models such as the Stochastic Block Model (SBM), the Degree Corrected Block Model (DCBM), and the Popularity Adjusted Block Model (PABM). While this leads to a variety of choices, the block models do not have a nested structure. In addition, there is a substantial jump in the number of parameters from the DCBM to the PABM. The objective of this paper is formulation of a hierarchy of block model which does not rely on arbitrary identifiability conditions. We propose a Nested Block Model (NBM) that treats the SBM, the DCBM and the PABM as its particular cases with specific parameter values, and, in addition, allows a multitude of versions that are more complicated than DCBM but have fewer unknown parameters than the PABM. The latter allows one to carry out clustering and estimation without preliminary testing, to see which block model is really true.
△ Less
Submitted 13 March, 2021; v1 submitted 6 February, 2020;
originally announced February 2020.
-
Sparse Popularity Adjusted Stochastic Block Model
Authors:
Majid Noroozi,
Marianna Pensky,
Ramchandra Rimal
Abstract:
In the present paper we study a sparse stochastic network enabled with a block structure. The popular Stochastic Block Model (SBM) and the Degree Corrected Block Model (DCBM) address sparsity by placing an upper bound on the maximum probability of connections between any pair of nodes. As a result, sparsity describes only the behavior of network as a whole, without distinguishing between the block…
▽ More
In the present paper we study a sparse stochastic network enabled with a block structure. The popular Stochastic Block Model (SBM) and the Degree Corrected Block Model (DCBM) address sparsity by placing an upper bound on the maximum probability of connections between any pair of nodes. As a result, sparsity describes only the behavior of network as a whole, without distinguishing between the block-dependent sparsity patterns. To the best of our knowledge, the recently introduced Popularity Adjusted Block Model (PABM) is the only block model that allows to introduce a {\it structural sparsity} where some probabilities of connections are identically equal to zero while the rest of them remain above a certain threshold. The latter presents a more nuanced view of the network.
△ Less
Submitted 6 October, 2021; v1 submitted 3 October, 2019;
originally announced October 2019.
-
Estimation and Clustering in Popularity Adjusted Stochastic Block Model
Authors:
Majid Noroozi,
Ramchandra Rimal,
Marianna Pensky
Abstract:
The paper considers the Popularity Adjusted Block model (PABM) introduced by Sengupta and Chen (2018). We argue that the main appeal of the PABM is the flexibility of the spectral properties of the graph which makes the PABM an attractive choice for modeling networks that appear in biological sciences. We expand the theory of PABM to the case of an arbitrary number of communities which possibly gr…
▽ More
The paper considers the Popularity Adjusted Block model (PABM) introduced by Sengupta and Chen (2018). We argue that the main appeal of the PABM is the flexibility of the spectral properties of the graph which makes the PABM an attractive choice for modeling networks that appear in biological sciences. We expand the theory of PABM to the case of an arbitrary number of communities which possibly grows with a number of nodes in the network and is not assumed to be known. We produce the estimators of the probability matrix and the community structure and provide non-asymptotic upper bounds for the estimation and the clustering errors. We use the Sparse Subspace Clustering (SSC) approach to partition the network into communities, the approach that, to the best of our knowledge, has not been used for clustering network data. The theory is supplemented by a simulation study. In addition, we show advantages of the PABM for modeling a butterfly similarity network and a human brain functional network.
△ Less
Submitted 19 June, 2020; v1 submitted 1 February, 2019;
originally announced February 2019.
-
Boosting Self-Supervised Learning via Knowledge Transfer
Authors:
Mehdi Noroozi,
Ananth Vinjimoor,
Paolo Favaro,
Hamed Pirsiavash
Abstract:
In self-supervised learning, one trains a model to solve a so-called pretext task on a dataset without the need for human annotation. The main objective, however, is to transfer this model to a target domain and task. Currently, the most effective transfer strategy is fine-tuning, which restricts one to use the same model or parts thereof for both pretext and target tasks. In this paper, we presen…
▽ More
In self-supervised learning, one trains a model to solve a so-called pretext task on a dataset without the need for human annotation. The main objective, however, is to transfer this model to a target domain and task. Currently, the most effective transfer strategy is fine-tuning, which restricts one to use the same model or parts thereof for both pretext and target tasks. In this paper, we present a novel framework for self-supervised learning that overcomes limitations in designing and comparing different tasks, models, and data domains. In particular, our framework decouples the structure of the self-supervised model from the final task-specific fine-tuned model. This allows us to: 1) quantitatively assess previously incompatible models including handcrafted features; 2) show that deeper neural network models can learn better representations from the same pretext task; 3) transfer knowledge learned with a deep model to a shallower one and thus boost its learning. We use this framework to design a novel self-supervised task, which achieves state-of-the-art performance on the common benchmarks in PASCAL VOC 2007, ILSVRC12 and Places by a significant margin. Our learned features shrink the mAP gap between models trained via self-supervised learning and supervised learning from 5.9% to 2.6% in object detection on PASCAL VOC 2007.
△ Less
Submitted 1 May, 2018;
originally announced May 2018.
-
Reflection Separation and Deblurring of Plenoptic Images
Authors:
Paramanand Chandramouli,
Mehdi Noroozi,
Paolo Favaro
Abstract:
In this paper, we address the problem of reflection removal and deblurring from a single image captured by a plenoptic camera. We develop a two-stage approach to recover the scene depth and high resolution textures of the reflected and transmitted layers. For depth estimation in the presence of reflections, we train a classifier through convolutional neural networks. For recovering high resolution…
▽ More
In this paper, we address the problem of reflection removal and deblurring from a single image captured by a plenoptic camera. We develop a two-stage approach to recover the scene depth and high resolution textures of the reflected and transmitted layers. For depth estimation in the presence of reflections, we train a classifier through convolutional neural networks. For recovering high resolution textures, we assume that the scene is composed of planar regions and perform the reconstruction of each layer by using an explicit form of the plenoptic camera point spread function. The proposed framework also recovers the sharp scene texture with different motion blurs applied to each layer. We demonstrate our method on challenging real and synthetic images.
△ Less
Submitted 22 August, 2017;
originally announced August 2017.
-
Representation Learning by Learning to Count
Authors:
Mehdi Noroozi,
Hamed Pirsiavash,
Paolo Favaro
Abstract:
We introduce a novel method for representation learning that uses an artificial supervision signal based on counting visual primitives. This supervision signal is obtained from an equivariance relation, which does not require any manual annotation. We relate transformations of images to transformations of the representations. More specifically, we look for the representation that satisfies such re…
▽ More
We introduce a novel method for representation learning that uses an artificial supervision signal based on counting visual primitives. This supervision signal is obtained from an equivariance relation, which does not require any manual annotation. We relate transformations of images to transformations of the representations. More specifically, we look for the representation that satisfies such relation rather than the transformations that match a given representation. In this paper, we use two image transformations in the context of counting: scaling and tiling. The first transformation exploits the fact that the number of visual primitives should be invariant to scale. The second transformation allows us to equate the total number of visual primitives in each tile to that in the whole image. These two transformations are combined in one constraint and used to train a neural network with a contrastive loss. The proposed task produces representations that perform on par or exceed the state of the art in transfer learning benchmarks.
△ Less
Submitted 22 August, 2017;
originally announced August 2017.
-
Motion Deblurring in the Wild
Authors:
Mehdi Noroozi,
Paramanand Chandramouli,
Paolo Favaro
Abstract:
The task of image deblurring is a very ill-posed problem as both the image and the blur are unknown. Moreover, when pictures are taken in the wild, this task becomes even more challenging due to the blur varying spatially and the occlusions between the object. Due to the complexity of the general image model we propose a novel convolutional network architecture which directly generates the sharp i…
▽ More
The task of image deblurring is a very ill-posed problem as both the image and the blur are unknown. Moreover, when pictures are taken in the wild, this task becomes even more challenging due to the blur varying spatially and the occlusions between the object. Due to the complexity of the general image model we propose a novel convolutional network architecture which directly generates the sharp image.This network is built in three stages, and exploits the benefits of pyramid schemes often used in blind deconvolution. One of the main difficulties in training such a network is to design a suitable dataset. While useful data can be obtained by synthetically blurring a collection of images, more realistic data must be collected in the wild. To obtain such data we use a high frame rate video camera and keep one frame as the sharp image and frame average as the corresponding blurred image. We show that this realistic dataset is key in achieving state-of-the-art performance and dealing with occlusions.
△ Less
Submitted 29 August, 2017; v1 submitted 5 January, 2017;
originally announced January 2017.
-
Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles
Authors:
Mehdi Noroozi,
Paolo Favaro
Abstract:
In this paper we study the problem of image representation learning without human annotation. By following the principles of self-supervision, we build a convolutional neural network (CNN) that can be trained to solve Jigsaw puzzles as a pretext task, which requires no manual labeling, and then later repurposed to solve object classification and detection. To maintain the compatibility across task…
▽ More
In this paper we study the problem of image representation learning without human annotation. By following the principles of self-supervision, we build a convolutional neural network (CNN) that can be trained to solve Jigsaw puzzles as a pretext task, which requires no manual labeling, and then later repurposed to solve object classification and detection. To maintain the compatibility across tasks we introduce the context-free network (CFN), a siamese-ennead CNN. The CFN takes image tiles as input and explicitly limits the receptive field (or context) of its early processing units to one tile at a time. We show that the CFN includes fewer parameters than AlexNet while preserving the same semantic learning capabilities. By training the CFN to solve Jigsaw puzzles, we learn both a feature map** of object parts as well as their correct spatial arrangement. Our experimental evaluations show that the learned features capture semantically relevant content. Our proposed method for learning visual representations outperforms state of the art methods in several transfer learning benchmarks.
△ Less
Submitted 22 August, 2017; v1 submitted 30 March, 2016;
originally announced March 2016.