Search | arXiv e-print repository

Emu: Enhancing Image Generation Models Using Photogenic Needles in a Haystack

Authors: Xiaoliang Dai, Ji Hou, Chih-Yao Ma, Sam Tsai, Jialiang Wang, Rui Wang, Peizhao Zhang, Simon Vandenhende, Xiaofang Wang, Abhimanyu Dubey, Matthew Yu, Abhishek Kadian, Filip Radenovic, Dhruv Mahajan, Kunpeng Li, Yue Zhao, Vladan Petrovic, Mitesh Kumar Singh, Simran Motwani, Yi Wen, Yiwen Song, Roshan Sumbaly, Vignesh Ramanathan, Zijian He, Peter Vajda , et al. (1 additional authors not shown)

Abstract: Training text-to-image models with web scale image-text pairs enables the generation of a wide range of visual concepts from text. However, these pre-trained models often face challenges when it comes to generating highly aesthetic images. This creates the need for aesthetic alignment post pre-training. In this paper, we propose quality-tuning to effectively guide a pre-trained model to exclusivel… ▽ More Training text-to-image models with web scale image-text pairs enables the generation of a wide range of visual concepts from text. However, these pre-trained models often face challenges when it comes to generating highly aesthetic images. This creates the need for aesthetic alignment post pre-training. In this paper, we propose quality-tuning to effectively guide a pre-trained model to exclusively generate highly visually appealing images, while maintaining generality across visual concepts. Our key insight is that supervised fine-tuning with a set of surprisingly small but extremely visually appealing images can significantly improve the generation quality. We pre-train a latent diffusion model on $1.1$ billion image-text pairs and fine-tune it with only a few thousand carefully selected high-quality images. The resulting model, Emu, achieves a win rate of $82.9\%$ compared with its pre-trained only counterpart. Compared to the state-of-the-art SDXLv1.0, Emu is preferred $68.4\%$ and $71.3\%$ of the time on visual appeal on the standard PartiPrompts and our Open User Input benchmark based on the real-world usage of text-to-image models. In addition, we show that quality-tuning is a generic approach that is also effective for other architectures, including pixel diffusion and masked generative transformer models. △ Less

Submitted 27 September, 2023; originally announced September 2023.

arXiv:2306.11576 [pdf, other]

doi 10.1109/SISY50555.2020.9217082

Deep Learning Methods for Retinal Blood Vessel Segmentation: Evaluation on Images with Retinopathy of Prematurity

Authors: Gorana Gojić, Veljko Petrović, Radovan Turović, Dinu Dragan, Ana Oros, Dušan Gajić, Nebojša Horvat

Abstract: Automatic blood vessel segmentation from retinal images plays an important role in the diagnosis of many systemic and eye diseases, including retinopathy of prematurity. Current state-of-the-art research in blood vessel segmentation from retinal images is based on convolutional neural networks. The solutions proposed so far are trained and tested on images from a few available retinal blood vessel… ▽ More Automatic blood vessel segmentation from retinal images plays an important role in the diagnosis of many systemic and eye diseases, including retinopathy of prematurity. Current state-of-the-art research in blood vessel segmentation from retinal images is based on convolutional neural networks. The solutions proposed so far are trained and tested on images from a few available retinal blood vessel segmentation datasets, which might limit their performance when given an image with retinopathy of prematurity signs. In this paper, we evaluate the performance of three high-performing convolutional neural networks for retinal blood vessel segmentation in the context of blood vessel segmentation on retinopathy of prematurity retinal images. The main motive behind the study is to test if existing public datasets suffice to develop a high-performing predictor that could assist an ophthalmologist in retinopathy of prematurity diagnosis. To do so, we create a dataset consisting solely of retinopathy of prematurity images with retinal blood vessel annotations manually labeled by two observers, where one is the ophthalmologist experienced in retinopathy of prematurity treatment. Experimental results show that all three solutions have difficulties in detecting the retinal blood vessels of infants due to a lower contrast compared to images from public datasets as demonstrated by a significant drop in classification sensitivity. All three solutions segment alongside retinal also choroidal blood vessels which are not used to diagnose retinopathy of prematurity, but instead represent noise and are confused with retinal blood vessels. By visual and numerical observations, we observe that existing solutions for retinal blood vessel segmentation need improvement toward more detailed datasets or deeper models in order to assist the ophthalmologist in retinopathy of prematurity diagnosis. △ Less

Submitted 20 June, 2023; originally announced June 2023.

Journal ref: Proceedings of 18th International Symposium on Intelligent Systems and Informatics (SISY), IEEE, 2020, pp. 131-136

arXiv:2301.11623 [pdf, other]

Higher-Order Patterns Reveal Causal Timescales of Complex Systems

Authors: Luka V. Petrović, Anatol Wegner, Ingo Scholtes

Abstract: The analysis of temporal networks heavily depends on the analysis of time-respecting paths. However, before being able to model and analyze the time-respecting paths, we have to infer the timescales at which the temporal edges influence each other. In this work we introduce temporal path entropy, an information theoretic measure of temporal networks, with the aim to detect the timescales at which… ▽ More The analysis of temporal networks heavily depends on the analysis of time-respecting paths. However, before being able to model and analyze the time-respecting paths, we have to infer the timescales at which the temporal edges influence each other. In this work we introduce temporal path entropy, an information theoretic measure of temporal networks, with the aim to detect the timescales at which the causal influences occur in temporal networks. The measure can be used on temporal networks as a whole, or separately for each node. We find that the temporal path entropy has a non-trivial dependency on the causal timescales of synthetic and empirical temporal networks. Furthermore, we notice in both synthetic and empirical data that the temporal path entropy tends to decrease at timescales that correspond to the causal interactions. Our results imply that timescales relevant for the dynamics of complex systems can be detected in the temporal networks themselves, by measuring temporal path entropy. This is crucial for the analysis of temporal networks where inherent timescales are unavailable and hard to measure. △ Less

Submitted 27 January, 2023; originally announced January 2023.

Comments: 7 pages main manuscript, 20 pages in total, 27 figures

ACM Class: G.2.2; I.5.0

arXiv:2301.11120 [pdf, other]

Bayesian Detection of Mesoscale Structures in Pathway Data on Graphs

Authors: Luka V. Petrović, Vincenzo Perri

Abstract: Mesoscale structures are an integral part of the abstraction and analysis of complex systems. They reveal a node's function in the network, and facilitate our understanding of the network dynamics. For example, they can represent communities in social or citation networks, roles in corporate interactions, or core-periphery structures in transportation networks. We usually detect mesoscale structur… ▽ More Mesoscale structures are an integral part of the abstraction and analysis of complex systems. They reveal a node's function in the network, and facilitate our understanding of the network dynamics. For example, they can represent communities in social or citation networks, roles in corporate interactions, or core-periphery structures in transportation networks. We usually detect mesoscale structures under the assumption of independence of interactions. Still, in many cases, the interactions invalidate this assumption by occurring in a specific order. Such patterns emerge in pathway data; to capture them, we have to model the dependencies between interactions using higher-order network models. However, the detection of mesoscale structures in higher-order networks is still under-researched. In this work, we derive a Bayesian approach that simultaneously models the optimal partitioning of nodes in groups and the optimal higher-order network dynamics between the groups. In synthetic data we demonstrate that our method can recover both standard proximity-based communities and role-based grou**s of nodes. In synthetic and real world data we show that it can compete with baseline techniques, while additionally providing interpretable abstractions of network dynamics. △ Less

Submitted 16 January, 2023; originally announced January 2023.

Comments: 21 pages, 3 figures

arXiv:2301.01795 [pdf, other]

PACO: Parts and Attributes of Common Objects

Authors: Vignesh Ramanathan, Anmol Kalia, Vladan Petrovic, Yi Wen, Baixue Zheng, Baishan Guo, Rui Wang, Aaron Marquez, Rama Kovvuri, Abhishek Kadian, Amir Mousavi, Yiwen Song, Abhimanyu Dubey, Dhruv Mahajan

Abstract: Object models are gradually progressing from predicting just category labels to providing detailed descriptions of object instances. This motivates the need for large datasets which go beyond traditional object masks and provide richer annotations such as part masks and attributes. Hence, we introduce PACO: Parts and Attributes of Common Objects. It spans 75 object categories, 456 object-part cate… ▽ More Object models are gradually progressing from predicting just category labels to providing detailed descriptions of object instances. This motivates the need for large datasets which go beyond traditional object masks and provide richer annotations such as part masks and attributes. Hence, we introduce PACO: Parts and Attributes of Common Objects. It spans 75 object categories, 456 object-part categories and 55 attributes across image (LVIS) and video (Ego4D) datasets. We provide 641K part masks annotated across 260K object boxes, with roughly half of them exhaustively annotated with attributes as well. We design evaluation metrics and provide benchmark results for three tasks on the dataset: part mask segmentation, object and part attribute prediction and zero-shot instance detection. Dataset, models, and code are open-sourced at https://github.com/facebookresearch/paco. △ Less

Submitted 4 January, 2023; originally announced January 2023.

arXiv:2204.05494 [pdf, other]

Few-shot Learning with Noisy Labels

Authors: Kevin J Liang, Samrudhdhi B. Rangrej, Vladan Petrovic, Tal Hassner

Abstract: Few-shot learning (FSL) methods typically assume clean support sets with accurately labeled samples when training on novel classes. This assumption can often be unrealistic: support sets, no matter how small, can still include mislabeled samples. Robustness to label noise is therefore essential for FSL methods to be practical, but this problem surprisingly remains largely unexplored. To address mi… ▽ More Few-shot learning (FSL) methods typically assume clean support sets with accurately labeled samples when training on novel classes. This assumption can often be unrealistic: support sets, no matter how small, can still include mislabeled samples. Robustness to label noise is therefore essential for FSL methods to be practical, but this problem surprisingly remains largely unexplored. To address mislabeled samples in FSL settings, we make several technical contributions. (1) We offer simple, yet effective, feature aggregation methods, improving the prototypes used by ProtoNet, a popular FSL technique. (2) We describe a novel Transformer model for Noisy Few-Shot Learning (TraNFS). TraNFS leverages a transformer's attention mechanism to weigh mislabeled versus correct samples. (3) Finally, we extensively test these methods on noisy versions of MiniImageNet and TieredImageNet. Our results show that TraNFS is on-par with leading FSL methods on clean support sets, yet outperforms them, by far, in the presence of label noise. △ Less

Submitted 31 July, 2022; v1 submitted 11 April, 2022; originally announced April 2022.

Comments: Accepted to CVPR 2022

arXiv:2007.02861 [pdf, other]

Learning the Markov order of paths in a network

Authors: Luka V. Petrović, Ingo Scholtes

Abstract: We study the problem of learning the Markov order in categorical sequences that represent paths in a network, i.e. sequences of variable lengths where transitions between states are constrained to a known graph. Such data pose challenges for standard Markov order detection methods and demand modelling techniques that explicitly account for the graph constraint. Adopting a multi-order modelling fra… ▽ More We study the problem of learning the Markov order in categorical sequences that represent paths in a network, i.e. sequences of variable lengths where transitions between states are constrained to a known graph. Such data pose challenges for standard Markov order detection methods and demand modelling techniques that explicitly account for the graph constraint. Adopting a multi-order modelling framework for paths, we develop a Bayesian learning technique that (i) more reliably detects the correct Markov order compared to a competing method based on the likelihood ratio test, (ii) requires considerably less data compared to methods using AIC or BIC, and (iii) is robust against partial knowledge of the underlying constraints. We further show that a recently published method that uses a likelihood ratio test has a tendency to overfit the true Markov order of paths, which is not the case for our Bayesian technique. Our method is important for data scientists analyzing patterns in categorical sequence data that are subject to (partially) known constraints, e.g. sequences with forbidden words, mobility trajectories and click stream data, or sequence data in bioinformatics. Addressing the key challenge of model selection, our work is further relevant for the growing body of research that emphasizes the need for higher-order models in network analysis. △ Less

Submitted 6 July, 2020; originally announced July 2020.

Comments: 15 pages, 7 figures

MSC Class: 60J20 (Primary) 62F07; 68T05 (Secondary) ACM Class: G.3; I.5.1

arXiv:2002.10534 [pdf, ps, other]

Evaluating Registration Without Ground Truth

Authors: Carole J. Twining, Vladimir S. Petrović, Timothy F. Cootes, Roy S. Schestowitz, William R. Crum, Christopher J. Taylor

Abstract: We present a generic method for assessing the quality of non-rigid registration (NRR) algorithms, that does not depend on the existence of any ground truth, but depends solely on the data itself. The data is a set of images. The output of any NRR of such a set of images is a dense correspondence across the whole set. Given such a dense correspondence, it is possible to build various generative sta… ▽ More We present a generic method for assessing the quality of non-rigid registration (NRR) algorithms, that does not depend on the existence of any ground truth, but depends solely on the data itself. The data is a set of images. The output of any NRR of such a set of images is a dense correspondence across the whole set. Given such a dense correspondence, it is possible to build various generative statistical models of appearance variation across the set. We show that evaluating the quality of the registration can be mapped to the problem of evaluating the quality of the resultant statistical model. The quality of the model entails a comparison between the model and the image data that was used to construct it. It should be noted that this approach does not depend on the specifics of the registration algorithm used (i.e., whether a groupwise or pairwise algorithm was used to register the set of images), or on the specifics of the modelling approach used. We derive an index of image model specificity that can be used to assess image model quality, and hence the quality of registration. This approach is validated by comparing our assessment of registration quality with that derived from ground truth anatomical labeling. We demonstrate that our approach is capable of assessing NRR reliably without ground truth. Finally, to demonstrate the practicality of our method, different NRR algorithms -- both pairwise and groupwise -- are compared in terms of their performance on 3D MR brain data. △ Less

Submitted 24 February, 2020; originally announced February 2020.

Comments: 10 pages, 2 Figures, 3 Tables. Submitted to IEEE Transactions on Medical Imaging

arXiv:1905.11287 [pdf, other]

doi 10.1145/3442442.3452050

Counting Causal Paths in Big Times Series Data on Networks

Authors: Luka V. Petrovic, Ingo Scholtes

Abstract: Graph or network representations are an important foundation for data mining and machine learning tasks in relational data. Many tools of network analysis, like centrality measures, information ranking, or cluster detection rest on the assumption that links capture direct influence, and that paths represent possible indirect influence. This assumption is invalidated in time-stamped network data ca… ▽ More Graph or network representations are an important foundation for data mining and machine learning tasks in relational data. Many tools of network analysis, like centrality measures, information ranking, or cluster detection rest on the assumption that links capture direct influence, and that paths represent possible indirect influence. This assumption is invalidated in time-stamped network data capturing, e.g., dynamic social networks, biological sequences or financial transactions. In such data, for two time-stamped links (A,B) and (B,C) the chronological ordering and timing determines whether a causal path from node A via B to C exists. A number of works has shown that for that reason network analysis cannot be directly applied to time-stamped network data. Existing methods to address this issue require statistics on causal paths, which is computationally challenging for big data sets. Addressing this problem, we develop an efficient algorithm to count causal paths in time-stamped network data. Applying it to empirical data, we show that our method is more efficient than a baseline method implemented in an OpenSource data analytics package. Our method works efficiently for different values of the maximum time difference between consecutive links of a causal path and supports streaming scenarios. With it, we are closing a gap that hinders an efficient analysis of big time series data on complex networks. △ Less

Submitted 27 May, 2019; originally announced May 2019.

Comments: 10 pages, 2 figures

Journal ref: WWW '21: Companion Proceedings of the Web Conference 2021 (pp. 521-526)

Showing 1–9 of 9 results for author: Petrović, V