-
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
Authors:
Patrick Esser,
Sumith Kulal,
Andreas Blattmann,
Rahim Entezari,
Jonas Müller,
Harry Saini,
Yam Levi,
Dominik Lorenz,
Axel Sauer,
Frederic Boesel,
Dustin Podell,
Tim Dockhorn,
Zion English,
Kyle Lacey,
Alex Goodwin,
Yannik Marek,
Robin Rombach
Abstract:
Diffusion models create data from noise by inverting the forward paths of data towards noise and have emerged as a powerful generative modeling technique for high-dimensional, perceptual data such as images and videos. Rectified flow is a recent generative model formulation that connects data and noise in a straight line. Despite its better theoretical properties and conceptual simplicity, it is n…
▽ More
Diffusion models create data from noise by inverting the forward paths of data towards noise and have emerged as a powerful generative modeling technique for high-dimensional, perceptual data such as images and videos. Rectified flow is a recent generative model formulation that connects data and noise in a straight line. Despite its better theoretical properties and conceptual simplicity, it is not yet decisively established as standard practice. In this work, we improve existing noise sampling techniques for training rectified flow models by biasing them towards perceptually relevant scales. Through a large-scale study, we demonstrate the superior performance of this approach compared to established diffusion formulations for high-resolution text-to-image synthesis. Additionally, we present a novel transformer-based architecture for text-to-image generation that uses separate weights for the two modalities and enables a bidirectional flow of information between image and text tokens, improving text comprehension, typography, and human preference ratings. We demonstrate that this architecture follows predictable scaling trends and correlates lower validation loss to improved text-to-image synthesis as measured by various metrics and human evaluations. Our largest models outperform state-of-the-art models, and we will make our experimental data, code, and model weights publicly available.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
Stable LM 2 1.6B Technical Report
Authors:
Marco Bellagente,
Jonathan Tow,
Dakota Mahan,
Duy Phung,
Maksym Zhuravinskyi,
Reshinth Adithyan,
James Baicoianu,
Ben Brooks,
Nathan Cooper,
Ashish Datta,
Meng Lee,
Emad Mostaque,
Michael Pieler,
Nikhil Pinnaparju,
Paulo Rocha,
Harry Saini,
Hannah Teufel,
Niccolo Zanichelli,
Carlos Riquelme
Abstract:
We introduce StableLM 2 1.6B, the first in a new generation of our language model series. In this technical report, we present in detail the data and training procedure leading to the base and instruction-tuned versions of StableLM 2 1.6B. The weights for both models are available via Hugging Face for anyone to download and use. The report contains thorough evaluations of these models, including z…
▽ More
We introduce StableLM 2 1.6B, the first in a new generation of our language model series. In this technical report, we present in detail the data and training procedure leading to the base and instruction-tuned versions of StableLM 2 1.6B. The weights for both models are available via Hugging Face for anyone to download and use. The report contains thorough evaluations of these models, including zero- and few-shot benchmarks, multilingual benchmarks, and the MT benchmark focusing on multi-turn dialogues. At the time of publishing this report, StableLM 2 1.6B was the state-of-the-art open model under 2B parameters by a significant margin. Given its appealing small size, we also provide throughput measurements on a number of edge devices. In addition, we open source several quantized checkpoints and provide their performance metrics compared to the original model.
△ Less
Submitted 27 February, 2024;
originally announced February 2024.
-
Towards a Multimodal System for Precision Agriculture using IoT and Machine Learning
Authors:
Satvik Garg,
Pradyumn Pundir,
Himanshu **dal,
Hemraj Saini,
Somya Garg
Abstract:
Precision agriculture system is an arising idea that refers to overseeing farms utilizing current information and communication technologies to improve the quantity and quality of yields while advancing the human work required. The automation requires the assortment of information given by the sensors such as soil, water, light, humidity, temperature for additional information to furnish the opera…
▽ More
Precision agriculture system is an arising idea that refers to overseeing farms utilizing current information and communication technologies to improve the quantity and quality of yields while advancing the human work required. The automation requires the assortment of information given by the sensors such as soil, water, light, humidity, temperature for additional information to furnish the operator with exact data to acquire excellent yield to farmers. In this work, a study is proposed that incorporates all common state-of-the-art approaches for precision agriculture use. Technologies like the Internet of Things (IoT) for data collection, machine Learning for crop damage prediction, and deep learning for crop disease detection is used. The data collection using IoT is responsible for the measure of moisture levels for smart irrigation, n, p, k estimations of fertilizers for best yield development. For crop damage prediction, various algorithms like Random Forest (RF), Light gradient boosting machine (LGBM), XGBoost (XGB), Decision Tree (DT) and K Nearest Neighbor (KNN) are used. Subsequently, Pre-Trained Convolutional Neural Network (CNN) models such as VGG16, Resnet50, and DenseNet121 are also trained to check if the crop was tainted with some illness or not.
△ Less
Submitted 10 July, 2021;
originally announced July 2021.
-
Efficient Single-Shot Multibox Detector for Construction Site Monitoring
Authors:
Viral Thakar,
Himani Saini,
Walid Ahmed,
Mohammad M Soltani,
Ahmed Aly,
Jia Yuan Yu
Abstract:
Asset monitoring in construction sites is an intricate, manually intensive task, that can highly benefit from automated solutions engineered using deep neural networks. We use Single-Shot Multibox Detector --- SSD, for its fine balance between speed and accuracy, to leverage ubiquitously available images and videos from the surveillance cameras on the construction sites and automate the monitoring…
▽ More
Asset monitoring in construction sites is an intricate, manually intensive task, that can highly benefit from automated solutions engineered using deep neural networks. We use Single-Shot Multibox Detector --- SSD, for its fine balance between speed and accuracy, to leverage ubiquitously available images and videos from the surveillance cameras on the construction sites and automate the monitoring tasks, hence enabling project managers to better track the performance and optimize the utilization of each resource. We propose to improve the performance of SSD by clustering the predicted boxes instead of a greedy approach like non-maximum suppression. We do so using Affinity Propagation Clustering --- APC to cluster the predicted boxes based on the similarity index computed using the spatial features as well as location of predicted boxes. In our attempts, we have been able to improve the mean average precision of SSD by 3.77% on custom dataset consist of images from construction sites and by 1.67% on PASCAL VOC Challenge.
△ Less
Submitted 19 August, 2018; v1 submitted 16 August, 2018;
originally announced August 2018.
-
Compressing the Data Densely by New Geflochtener to Accelerate Web
Authors:
Hemant Kumar Saini,
Satpal Singh Kushwaha,
C. Rama Krishna
Abstract:
At the present scenario of the internet, there exist many optimization techniques to improve the Web speed but almost expensive in terms of bandwidth. So after a long investigation on different techniques to compress the data without any loss, a new algorithm is proposed based on L Z 77 family which selectively models the references with backward movement and encodes the longest matches through gr…
▽ More
At the present scenario of the internet, there exist many optimization techniques to improve the Web speed but almost expensive in terms of bandwidth. So after a long investigation on different techniques to compress the data without any loss, a new algorithm is proposed based on L Z 77 family which selectively models the references with backward movement and encodes the longest matches through greedy parsing with the shortest path technique to compresses the data with high density. This idea seems to be useful since the single Web Page contains many repetitive words which create havoc in consuming space, so let it removes such unnecessary redundancies with 70% efficiency and compress the pages with 23.75 - 35% compression ratio.
△ Less
Submitted 16 May, 2014;
originally announced May 2014.