Search | arXiv e-print repository

No Training, No Problem: Rethinking Classifier-Free Guidance for Diffusion Models

Authors: Seyedmorteza Sadat, Manuel Kansy, Otmar Hilliges, Romann M. Weber

Abstract: Classifier-free guidance (CFG) has become the standard method for enhancing the quality of conditional diffusion models. However, employing CFG requires either training an unconditional model alongside the main diffusion model or modifying the training procedure by periodically inserting a null condition. There is also no clear extension of CFG to unconditional models. In this paper, we revisit th… ▽ More Classifier-free guidance (CFG) has become the standard method for enhancing the quality of conditional diffusion models. However, employing CFG requires either training an unconditional model alongside the main diffusion model or modifying the training procedure by periodically inserting a null condition. There is also no clear extension of CFG to unconditional models. In this paper, we revisit the core principles of CFG and introduce a new method, independent condition guidance (ICG), which provides the benefits of CFG without the need for any special training procedures. Our approach streamlines the training process of conditional diffusion models and can also be applied during inference on any pre-trained conditional model. Additionally, by leveraging the time-step information encoded in all diffusion networks, we propose an extension of CFG, called time-step guidance (TSG), which can be applied to any diffusion model, including unconditional ones. Our guidance techniques are easy to implement and have the same sampling cost as CFG. Through extensive experiments, we demonstrate that ICG matches the performance of standard CFG across various conditional diffusion models. Moreover, we show that TSG improves generation quality in a manner similar to CFG, without relying on any conditional information. △ Less

Submitted 2 July, 2024; originally announced July 2024.

arXiv:2405.20247 [pdf, other]

KerasCV and KerasNLP: Vision and Language Power-Ups

Authors: Matthew Watson, Divyashree Shivakumar Sreepathihalli, Francois Chollet, Martin Gorner, Kiranbir Sodhia, Ramesh Sampath, Tirth Patel, Haifeng **, Neel Kovelamudi, Gabriel Rasskin, Samaneh Saadat, Luke Wood, Chen Qian, Jonathan Bischof, Ian Stenbit, Abheesht Sharma, Anshuman Mishra

Abstract: We present the Keras domain packages KerasCV and KerasNLP, extensions of the Keras API for Computer Vision and Natural Language Processing workflows, capable of running on either JAX, TensorFlow, or PyTorch. These domain packages are designed to enable fast experimentation, with a focus on ease-of-use and performance. We adopt a modular, layered design: at the library's lowest level of abstraction… ▽ More We present the Keras domain packages KerasCV and KerasNLP, extensions of the Keras API for Computer Vision and Natural Language Processing workflows, capable of running on either JAX, TensorFlow, or PyTorch. These domain packages are designed to enable fast experimentation, with a focus on ease-of-use and performance. We adopt a modular, layered design: at the library's lowest level of abstraction, we provide building blocks for creating models and data preprocessing pipelines, and at the library's highest level of abstraction, we provide pretrained ``task" models for popular architectures such as Stable Diffusion, YOLOv8, GPT2, BERT, Mistral, CLIP, Gemma, T5, etc. Task models have built-in preprocessing, pretrained weights, and can be fine-tuned on raw inputs. To enable efficient training, we support XLA compilation for all models, and run all preprocessing via a compiled graph of TensorFlow operations using the tf.data API. The libraries are fully open-source (Apache 2.0 license) and available on GitHub. △ Less

Submitted 5 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

Comments: Submitted to Journal of Machine Learning Open Source Software

ACM Class: I.2.5; I.2.7; I.2.10

arXiv:2405.14477 [pdf, other]

LiteVAE: Lightweight and Efficient Variational Autoencoders for Latent Diffusion Models

Authors: Seyedmorteza Sadat, Jakob Buhmann, Derek Bradley, Otmar Hilliges, Romann M. Weber

Abstract: Advances in latent diffusion models (LDMs) have revolutionized high-resolution image generation, but the design space of the autoencoder that is central to these systems remains underexplored. In this paper, we introduce LiteVAE, a family of autoencoders for LDMs that leverage the 2D discrete wavelet transform to enhance scalability and computational efficiency over standard variational autoencode… ▽ More Advances in latent diffusion models (LDMs) have revolutionized high-resolution image generation, but the design space of the autoencoder that is central to these systems remains underexplored. In this paper, we introduce LiteVAE, a family of autoencoders for LDMs that leverage the 2D discrete wavelet transform to enhance scalability and computational efficiency over standard variational autoencoders (VAEs) with no sacrifice in output quality. We also investigate the training methodologies and the decoder architecture of LiteVAE and propose several enhancements that improve the training dynamics and reconstruction quality. Our base LiteVAE model matches the quality of the established VAEs in current LDMs with a six-fold reduction in encoder parameters, leading to faster training and lower GPU memory requirements, while our larger model outperforms VAEs of comparable complexity across all evaluated metrics (rFID, LPIPS, PSNR, and SSIM). △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2310.17347 [pdf, other]

CADS: Unleashing the Diversity of Diffusion Models through Condition-Annealed Sampling

Authors: Seyedmorteza Sadat, Jakob Buhmann, Derek Bradley, Otmar Hilliges, Romann M. Weber

Abstract: While conditional diffusion models are known to have good coverage of the data distribution, they still face limitations in output diversity, particularly when sampled with a high classifier-free guidance scale for optimal image quality or when trained on small datasets. We attribute this problem to the role of the conditioning signal in inference and offer an improved sampling strategy for diffus… ▽ More While conditional diffusion models are known to have good coverage of the data distribution, they still face limitations in output diversity, particularly when sampled with a high classifier-free guidance scale for optimal image quality or when trained on small datasets. We attribute this problem to the role of the conditioning signal in inference and offer an improved sampling strategy for diffusion models that can increase generation diversity, especially at high guidance scales, with minimal loss of sample quality. Our sampling strategy anneals the conditioning signal by adding scheduled, monotonically decreasing Gaussian noise to the conditioning vector during inference to balance diversity and condition alignment. Our Condition-Annealed Diffusion Sampler (CADS) can be used with any pretrained model and sampling algorithm, and we show that it boosts the diversity of diffusion models in various conditional generation tasks. Further, using an existing pretrained diffusion model, CADS achieves a new state-of-the-art FID of 1.70 and 2.31 for class-conditional ImageNet generation at 256$\times$256 and 512$\times$512 respectively. △ Less

Submitted 13 May, 2024; v1 submitted 26 October, 2023; originally announced October 2023.

Comments: Published as a conference paper at ICLR 2024

Journal ref: The Twelfth International Conference on Learning Representations (ICLR 2024)

arXiv:2108.13180 [pdf]

Flexible Quasi-Yagi-Uda antenna for 5G communication

Authors: Behzad Ashrafi Nia, Franco De Flaviis, Soheil Saadat

Abstract: This paper presents the design and experimental of a single and array Quasi Yagi-Uda antenna at 28 GHz. The proposed antenna is implemented on MFLEX flexible material with a thickness of 0.120mm for 5G applications. A wideband antenna operates within 24 to 29.5 GHz and exhibits almost the same end-fire radiation pattern over bandwidth with an average gain of 6.2dBi and 10.15dBi for single and arra… ▽ More This paper presents the design and experimental of a single and array Quasi Yagi-Uda antenna at 28 GHz. The proposed antenna is implemented on MFLEX flexible material with a thickness of 0.120mm for 5G applications. A wideband antenna operates within 24 to 29.5 GHz and exhibits almost the same end-fire radiation pattern over bandwidth with an average gain of 6.2dBi and 10.15dBi for single and array antennas. The flexible antenna was tested under bending conditions and results showed excellent performance at the 28GHz region △ Less

Submitted 12 August, 2021; originally announced August 2021.

Journal ref: 2021 IEEE International Symposium on Antennas and Propagation and USNC-URSI Radio Science Meeting (AP-S/URSI 2021)

arXiv:2103.09319 [pdf, other]

Do Bots Modify the Workflow of GitHub Teams?

Authors: Samaneh Saadat, Natalia Colmenares, Gita Sukthankar

Abstract: The ever-increasing complexity of modern software engineering projects makes the usage of automated assistants imperative. Bots can be used to complete repetitive tasks during development and testing, as well as promoting communication between team members through issue reporting and documentation. Although the ultimate aim of these automated assistants is to speed taskwork completion, their inclu… ▽ More The ever-increasing complexity of modern software engineering projects makes the usage of automated assistants imperative. Bots can be used to complete repetitive tasks during development and testing, as well as promoting communication between team members through issue reporting and documentation. Although the ultimate aim of these automated assistants is to speed taskwork completion, their inclusion into GitHub repositories may affect teamwork as well. This paper studies the question of how bots modify the team workflow. We examined the event sequences of repositories with bots and without bots using a contrast motif discovery method to detect subsequences that are more prevalent in one set of event sequences vs. the other. Our study reveals that teams with bots are more likely to intersperse comments throughout their coding activities, while not actually being more prolific commenters. △ Less

Submitted 16 March, 2021; originally announced March 2021.

arXiv:2011.03423 [pdf, other]

Analyzing the Productivity of GitHub Teams based on Formation Phase Activity

Authors: Samaneh Saadat, Olivia B. Newton, Gita Sukthankar, Stephen M. Fiore

Abstract: Our goal is to understand the characteristics of high-performing teams on GitHub. Towards this end, we collect data from software repositories and evaluate teams by examining differences in productivity. Our study focuses on the team formation phase, the first six months after repository creation. To better understand team activity, we clustered repositories based on the proportion of their work a… ▽ More Our goal is to understand the characteristics of high-performing teams on GitHub. Towards this end, we collect data from software repositories and evaluate teams by examining differences in productivity. Our study focuses on the team formation phase, the first six months after repository creation. To better understand team activity, we clustered repositories based on the proportion of their work activities and discovered three work styles in teams: toilers, communicators, and collaborators. Based on our results, we contend that early activities in software development repositories on GitHub establish coordination processes that enable effective collaborations over time. △ Less

Submitted 6 November, 2020; originally announced November 2020.

arXiv:2011.03371 [pdf, other]

Explaining Differences in Classes of Discrete Sequences

Authors: Samaneh Saadat, Gita Sukthankar

Abstract: While there are many machine learning methods to classify and cluster sequences, they fail to explain what are the differences in groups of sequences that make them distinguishable. Although in some cases having a black box model is sufficient, there is a need for increased explainability in research areas focused on human behaviors. For example, psychologists are less interested in having a model… ▽ More While there are many machine learning methods to classify and cluster sequences, they fail to explain what are the differences in groups of sequences that make them distinguishable. Although in some cases having a black box model is sufficient, there is a need for increased explainability in research areas focused on human behaviors. For example, psychologists are less interested in having a model that predicts human behavior with high accuracy and more concerned with identifying differences between actions that lead to divergent human behavior. This paper presents techniques for understanding differences between classes of discrete sequences. Approaches introduced in this paper can be utilized to interpret black box machine learning models on sequences. The first approach compares k-gram representations of sequences using the silhouette score. The second method characterizes differences by analyzing the distance matrix of subsequences. As a case study, we trained black box supervised learning methods to classify sequences of GitHub teams and then utilized our sequence analysis techniques to measure and characterize differences between event sequences of teams with bots and teams without bots. In our second case study, we classified Minecraft event sequences to infer their high-level actions and analyzed differences between low-level event sequences of actions. △ Less

Submitted 6 November, 2020; originally announced November 2020.

arXiv:2007.09210 [pdf, other]

doi 10.1109/NAPS50074.2021.9449706

Initializing Successive Linear Programming Solver for ACOPF using Machine Learning

Authors: Sayed Abdullah Sadat, Mostafa Sahraei-Ardakani

Abstract: A Successive linear programming (SLP) approach is one of the favorable approaches for solving large scale nonlinear optimization problems. Solving an alternating current optimal power flow (ACOPF) problem is no exception, particularly considering the large real-world transmission networks across the country. It is, however, essential to improve the computational performance of the SLP algorithm. O… ▽ More A Successive linear programming (SLP) approach is one of the favorable approaches for solving large scale nonlinear optimization problems. Solving an alternating current optimal power flow (ACOPF) problem is no exception, particularly considering the large real-world transmission networks across the country. It is, however, essential to improve the computational performance of the SLP algorithm. One way to achieve this goal is through the efficient initialization of the algorithm with a near-optimal solution. This paper examines various machine learning (ML) algorithms available in the Scikit-Learn library to initialize an SLP-ACOPF solver, including examining linear and nonlinear ML algorithms. We evaluate the quality of each of these machine learning algorithms for predicting variables needed for a power flow solution. The solution is then used as an initialization for an SLP-ACOPF algorithm. The approach is tested on a congested and non-congested 3 bus systems. The results obtained from the best-performed ML algorithm in this work are compared with the results of a DCOPF solution for the initialization of an SLP-ACOPF solver. △ Less

Submitted 17 July, 2020; originally announced July 2020.

Comments: 6 pages, 16 figures, submitted to the 2020 IEEE North American Power Symposium (NAPS)

arXiv:2003.11611 [pdf, other]

doi 10.1007/978-3-030-77517-9_11

Deep Agent: Studying the Dynamics of Information Spread and Evolution in Social Networks

Authors: Ivan Garibay, Toktam A. Oghaz, Niloofar Yousefi, Ece C. Mutlu, Madeline Schiappa, Steven Scheinert, Georgios C. Anagnostopoulos, Christina Bouwens, Stephen M. Fiore, Alexander Mantzaris, John T. Murphy, William Rand, Anastasia Salter, Mel Stanfill, Gita Sukthankar, Nisha Baral, Gabriel Fair, Chathika Gunaratne, Neda B. Hajiakhoond, Jasser Jasser, Chathura Jayalath, Olivia Newton, Samaneh Saadat, Chathurani Senevirathna, Rachel Winter , et al. (1 additional authors not shown)

Abstract: This paper explains the design of a social network analysis framework, developed under DARPA's SocialSim program, with novel architecture that models human emotional, cognitive and social factors. Our framework is both theory and data-driven, and utilizes domain expertise. Our simulation effort helps in understanding how information flows and evolves in social media platforms. We focused on modeli… ▽ More This paper explains the design of a social network analysis framework, developed under DARPA's SocialSim program, with novel architecture that models human emotional, cognitive and social factors. Our framework is both theory and data-driven, and utilizes domain expertise. Our simulation effort helps in understanding how information flows and evolves in social media platforms. We focused on modeling three information domains: cryptocurrencies, cyber threats, and software vulnerabilities for the three interrelated social environments: GitHub, Reddit, and Twitter. We participated in the SocialSim DARPA Challenge in December 2018, in which our models were subjected to extensive performance evaluation for accuracy, generalizability, explainability, and experimental power. This paper reports the main concepts and models, utilized in our social media modeling effort in develo** a multi-resolution simulation at the user, community, population, and content levels. △ Less

Submitted 29 May, 2021; v1 submitted 25 March, 2020; originally announced March 2020.

Comments: 16 pages

arXiv:1802.01059 [pdf, other]

Deep Temporal Clustering : Fully Unsupervised Learning of Time-Domain Features

Authors: Naveen Sai Madiraju, Seid M. Sadat, Dimitry Fisher, Homa Karimabadi

Abstract: Unsupervised learning of time series data, also known as temporal clustering, is a challenging problem in machine learning. Here we propose a novel algorithm, Deep Temporal Clustering (DTC), to naturally integrate dimensionality reduction and temporal clustering into a single end-to-end learning framework, fully unsupervised. The algorithm utilizes an autoencoder for temporal dimensionality reduct… ▽ More Unsupervised learning of time series data, also known as temporal clustering, is a challenging problem in machine learning. Here we propose a novel algorithm, Deep Temporal Clustering (DTC), to naturally integrate dimensionality reduction and temporal clustering into a single end-to-end learning framework, fully unsupervised. The algorithm utilizes an autoencoder for temporal dimensionality reduction and a novel temporal clustering layer for cluster assignment. Then it jointly optimizes the clustering objective and the dimensionality reduction objec tive. Based on requirement and application, the temporal clustering layer can be customized with any temporal similarity metric. Several similarity metrics and state-of-the-art algorithms are considered and compared. To gain insight into temporal features that the network has learned for its clustering, we apply a visualization method that generates a region of interest heatmap for the time series. The viability of the algorithm is demonstrated using time series data from diverse domains, ranging from earthquakes to spacecraft sensor data. In each case, we show that the proposed algorithm outperforms traditional methods. The superior performance is attributed to the fully integrated temporal dimensionality reduction and clustering criterion. △ Less

Submitted 3 February, 2018; originally announced February 2018.

Comments: 11 pages, 4 Figures, 1 Table

arXiv:1609.01710 [pdf]

Automation of Pedestrian Tracking in a Crowded Situation

Authors: Saman Saadat, Kardi Teknomo

Abstract: Studies on microscopic pedestrian requires large amounts of trajectory data from real-world pedestrian crowds. Such data collection, if done manually, needs tremendous effort and is very time consuming. Though many studies have asserted the possibility of automating this task using video cameras, we found that only a few have demonstrated good performance in very crowded situations or from a top-a… ▽ More Studies on microscopic pedestrian requires large amounts of trajectory data from real-world pedestrian crowds. Such data collection, if done manually, needs tremendous effort and is very time consuming. Though many studies have asserted the possibility of automating this task using video cameras, we found that only a few have demonstrated good performance in very crowded situations or from a top-angled view scene. This paper deals with tracking pedestrian crowd under heavy occlusions from an angular scene. Our automated tracking system consists of two modules that perform sequentially. The first module detects moving objects as blobs. The second module is a tracking system. We employ probability distribution from the detection of each pedestrian and use Bayesian update to track the next position. The result of such tracking is a database of pedestrian trajectories over time and space. With certain prior information, we showed that the system can track a large number of people under occlusion and clutter scene. △ Less

Submitted 6 September, 2016; originally announced September 2016.

Comments: 10 Pages, Saadat, S., and Teknomo, K., Automation of Pedestrian Tracking in a Crowded Situation, the Fifth International Conference on Pedestrian and Evacuation Dynamics, March 8-10, 2010, National Institute of Standards and Technology, Gaithersburg, MD USA

arXiv:0805.4211 [pdf]

Managing Critical Spreadsheets in a Compliant Environment

Authors: Soheil Saadat

Abstract: The use of uncontrolled financial spreadsheets can expose organizations to unacceptable business and compliance risks, including errors in the financial reporting process, spreadsheet misuse and fraud, or even significant operational errors. These risks have been well documented and thoroughly researched. With the advent of regulatory mandates such as SOX 404 and FDICIA in the U.S., and MiFID, B… ▽ More The use of uncontrolled financial spreadsheets can expose organizations to unacceptable business and compliance risks, including errors in the financial reporting process, spreadsheet misuse and fraud, or even significant operational errors. These risks have been well documented and thoroughly researched. With the advent of regulatory mandates such as SOX 404 and FDICIA in the U.S., and MiFID, Basel II and Combined Code in the UK and Europe, leading tax and audit firms are now recommending that organizations automate their internal controls over critical spreadsheets and other end-user computing applications, including Microsoft Access databases. At a minimum, auditors mandate version control, change control and access control for operational spreadsheets, with more advanced controls for critical financial spreadsheets. This paper summarises the key issues regarding the establishment and maintenance of control of Business Critical spreadsheets. △ Less

Submitted 27 May, 2008; originally announced May 2008.

Comments: 4 Pages

ACM Class: J.1; H.4.1; K.6.4; D.2.5; D.2.9; K.8.1

Journal ref: Proc. European Spreadsheet Risks Int. Grp. (EuSpRIG) 2007 21-24 ISBN 978-905617-58-6

Showing 1–13 of 13 results for author: Sadat, S