Search | arXiv e-print repository

TiNO-Edit: Timestep and Noise Optimization for Robust Diffusion-Based Image Editing

Authors: Sherry X. Chen, Yaron Vaxman, Elad Ben Baruch, David Asulin, Aviad Moreshet, Kuo-Chin Lien, Misha Sra, Pradeep Sen

Abstract: Despite many attempts to leverage pre-trained text-to-image models (T2I) like Stable Diffusion (SD) for controllable image editing, producing good predictable results remains a challenge. Previous approaches have focused on either fine-tuning pre-trained T2I models on specific datasets to generate certain kinds of images (e.g., with a specific object or person), or on optimizing the weights, text… ▽ More Despite many attempts to leverage pre-trained text-to-image models (T2I) like Stable Diffusion (SD) for controllable image editing, producing good predictable results remains a challenge. Previous approaches have focused on either fine-tuning pre-trained T2I models on specific datasets to generate certain kinds of images (e.g., with a specific object or person), or on optimizing the weights, text prompts, and/or learning features for each input image in an attempt to coax the image generator to produce the desired result. However, these approaches all have shortcomings and fail to produce good results in a predictable and controllable manner. To address this problem, we present TiNO-Edit, an SD-based method that focuses on optimizing the noise patterns and diffusion timesteps during editing, something previously unexplored in the literature. With this simple change, we are able to generate results that both better align with the original images and reflect the desired result. Furthermore, we propose a set of new loss functions that operate in the latent domain of SD, greatly speeding up the optimization when compared to prior approaches, which operate in the pixel domain. Our method can be easily applied to variations of SD including Textual Inversion and DreamBooth that encode new concepts and incorporate them into the edited results. We present a host of image-editing capabilities enabled by our approach. Our code is publicly available at https://github.com/SherryXTChen/TiNO-Edit. △ Less

Submitted 17 April, 2024; originally announced April 2024.

Comments: Conference on Computer Vision and Pattern Recognition (CVPR) 2024

arXiv:2301.03734 [pdf, other]

Exoshuffle-CloudSort

Authors: Frank Sifei Luan, Stephanie Wang, Samyukta Yagati, Sean Kim, Kenneth Lien, Isaac Ong, Tony Hong, SangBin Cho, Eric Liang, Ion Stoica

Abstract: We present Exoshuffle-CloudSort, a sorting application running on top of Ray using the Exoshuffle architecture. Exoshuffle-CloudSort runs on Amazon EC2, with input and output data stored on Amazon S3. Using 40 i4i.4xlarge workers, Exoshuffle-CloudSort completes the 100 TB CloudSort Benchmark (Indy category) in 5378 seconds, with an average total cost of $97. We present Exoshuffle-CloudSort, a sorting application running on top of Ray using the Exoshuffle architecture. Exoshuffle-CloudSort runs on Amazon EC2, with input and output data stored on Amazon S3. Using 40 i4i.4xlarge workers, Exoshuffle-CloudSort completes the 100 TB CloudSort Benchmark (Indy category) in 5378 seconds, with an average total cost of $97. △ Less

Submitted 9 January, 2023; originally announced January 2023.

arXiv:2210.16476 [pdf, other]

Pair DETR: Contrastive Learning Speeds Up DETR Training

Authors: Seyed Mehdi Iranmanesh, Xiaotong Chen, Kuo-Chin Lien

Abstract: The DETR object detection approach applies the transformer encoder and decoder architecture to detect objects and achieves promising performance. In this paper, we present a simple approach to address the main problem of DETR, the slow convergence, by using representation learning technique. In this approach, we detect an object bounding box as a pair of keypoints, the top-left corner and the cent… ▽ More The DETR object detection approach applies the transformer encoder and decoder architecture to detect objects and achieves promising performance. In this paper, we present a simple approach to address the main problem of DETR, the slow convergence, by using representation learning technique. In this approach, we detect an object bounding box as a pair of keypoints, the top-left corner and the center, using two decoders. By detecting objects as paired keypoints, the model builds up a joint classification and pair association on the output queries from two decoders. For the pair association we propose utilizing contrastive self-supervised learning algorithm without requiring specialized architecture. Experimental results on MS COCO dataset show that Pair DETR can converge at least 10x faster than original DETR and 1.5x faster than Conditional DETR during training, while having consistently higher Average Precision scores. △ Less

Submitted 11 November, 2022; v1 submitted 28 October, 2022; originally announced October 2022.

Comments: arXiv admin note: text overlap with arXiv:2108.06152

arXiv:2206.06935 [pdf, other]

OSN Dashboard Tool For Sentiment Analysis

Authors: Andreas Kilde Lien, Lars Martin Randem, Hans Petter Fauchald Taralrud, Maryam Edalati

Abstract: The amount of opinionated data on the internet is rapidly increasing. More and more people are sharing their ideas and opinions in reviews, discussion forums, microblogs and general social media. As opinions are central in all human activities, sentiment analysis has been applied to gain insights in this type of data. There are proposed several approaches for sentiment classification. The major dr… ▽ More The amount of opinionated data on the internet is rapidly increasing. More and more people are sharing their ideas and opinions in reviews, discussion forums, microblogs and general social media. As opinions are central in all human activities, sentiment analysis has been applied to gain insights in this type of data. There are proposed several approaches for sentiment classification. The major drawback is the lack of standardized solutions for classification and high-level visualization. In this study, a sentiment analyzer dashboard for online social networking analysis is proposed. This, to enable people gaining insights in topics interesting to them. The tool allows users to run the desired sentiment analysis algorithm in the dashboard. In addition to providing several visualization types, the dashboard facilitates raw data results from the sentiment classification which can be downloaded for further analysis. △ Less

Submitted 14 June, 2022; originally announced June 2022.

Comments: Keywords Sentiment Analysis Machine Learning Twitter Opinion Mining Polarity Assessment

arXiv:2203.05072 [pdf, other]

Exoshuffle: An Extensible Shuffle Architecture

Authors: Frank Sifei Luan, Stephanie Wang, Samyukta Yagati, Sean Kim, Kenneth Lien, Isaac Ong, Tony Hong, SangBin Cho, Eric Liang, Ion Stoica

Abstract: Shuffle is one of the most expensive communication primitives in distributed data processing and is difficult to scale. Prior work addresses the scalability challenges of shuffle by building monolithic shuffle systems. These systems are costly to develop, and they are tightly integrated with batch processing frameworks that offer only high-level APIs such as SQL. New applications, such as ML train… ▽ More Shuffle is one of the most expensive communication primitives in distributed data processing and is difficult to scale. Prior work addresses the scalability challenges of shuffle by building monolithic shuffle systems. These systems are costly to develop, and they are tightly integrated with batch processing frameworks that offer only high-level APIs such as SQL. New applications, such as ML training, require more flexibility and finer-grained interoperability with shuffle. They are often unable to leverage existing shuffle optimizations. We propose an extensible shuffle architecture. We present Exoshuffle, a library for distributed shuffle that offers competitive performance and scalability as well as greater flexibility than monolithic shuffle systems. We design an architecture that decouples the shuffle control plane from the data plane without sacrificing performance. We build Exoshuffle on Ray, a distributed futures system for data and ML applications, and demonstrate that we can: (1) rewrite previous shuffle optimizations as application-level libraries with an order of magnitude less code, (2) achieve shuffle performance and scalability competitive with monolithic shuffle systems, and break the CloudSort record as the world's most cost-efficient sorting system, and (3) enable new applications such as ML training to easily leverage scalable shuffle. △ Less

Submitted 17 August, 2023; v1 submitted 9 March, 2022; originally announced March 2022.

arXiv:2201.00080 [pdf, other]

PatchTrack: Multiple Object Tracking Using Frame Patches

Authors: Xiaotong Chen, Seyed Mehdi Iranmanesh, Kuo-Chin Lien

Abstract: Object motion and object appearance are commonly used information in multiple object tracking (MOT) applications, either for associating detections across frames in tracking-by-detection methods or direct track predictions for joint-detection-and-tracking methods. However, not only are these two types of information often considered separately, but also they do not help optimize the usage of visua… ▽ More Object motion and object appearance are commonly used information in multiple object tracking (MOT) applications, either for associating detections across frames in tracking-by-detection methods or direct track predictions for joint-detection-and-tracking methods. However, not only are these two types of information often considered separately, but also they do not help optimize the usage of visual information from the current frame of interest directly. In this paper, we present PatchTrack, a Transformer-based joint-detection-and-tracking system that predicts tracks using patches of the current frame of interest. We use the Kalman filter to predict the locations of existing tracks in the current frame from the previous frame. Patches cropped from the predicted bounding boxes are sent to the Transformer decoder to infer new tracks. By utilizing both object motion and object appearance information encoded in patches, the proposed method pays more attention to where new tracks are more likely to occur. We show the effectiveness of PatchTrack on recent MOT benchmarks, including MOT16 (MOTA 73.71%, IDF1 65.77%) and MOT17 (MOTA 73.59%, IDF1 65.23%). The results are published on https://motchallenge.net/method/MOT=4725&chl=10. △ Less

Submitted 31 December, 2021; originally announced January 2022.

Comments: 11 pages, 4 figures, 2 tables

MSC Class: ACM-class: I.4.8

arXiv:1909.13163 [pdf, other]

Self-Supervised Learning of Depth and Ego-motion with Differentiable Bundle Adjustment

Authors: Yunxiao Shi, **g Zhu, Yi Fang, Kuochin Lien, Junli Gu

Abstract: Learning to predict scene depth and camera motion from RGB inputs only is a challenging task. Most existing learning based methods deal with this task in a supervised manner which require ground-truth data that is expensive to acquire. More recent approaches explore the possibility of estimating scene depth and camera pose in a self-supervised learning framework. Despite encouraging results are sh… ▽ More Learning to predict scene depth and camera motion from RGB inputs only is a challenging task. Most existing learning based methods deal with this task in a supervised manner which require ground-truth data that is expensive to acquire. More recent approaches explore the possibility of estimating scene depth and camera pose in a self-supervised learning framework. Despite encouraging results are shown, current methods either learn from monocular videos for depth and pose and typically do so without enforcing multi-view geometry constraints between scene structure and camera motion, or require stereo sequences as input where the ground-truth between-frame motion parameters need to be known. In this paper we propose to jointly optimize the scene depth and camera motion via incorporating differentiable Bundle Adjustment (BA) layer by minimizing the feature-metric error, and then form the photometric consistency loss with view synthesis as the final supervisory signal. The proposed approach only needs unlabeled monocular videos as input, and extensive experiments on the KITTI and Cityscapes dataset show that our method achieves state-of-the-art results in self-supervised approaches using monocular videos as input, and even gains advantage to the line of methods that learns from calibrated stereo sequences (i.e. with pose supervision). △ Less

Submitted 28 September, 2019; originally announced September 2019.

arXiv:1909.04594 [pdf, other]

Structure-Attentioned Memory Network for Monocular Depth Estimation

Authors: **g Zhu, Yunxiao Shi, Mengwei Ren, Yi Fang, Kuo-Chin Lien, Junli Gu

Abstract: Monocular depth estimation is a challenging task that aims to predict a corresponding depth map from a given single RGB image. Recent deep learning models have been proposed to predict the depth from the image by learning the alignment of deep features between the RGB image and the depth domains. In this paper, we present a novel approach, named Structure-Attentioned Memory Network, to more effect… ▽ More Monocular depth estimation is a challenging task that aims to predict a corresponding depth map from a given single RGB image. Recent deep learning models have been proposed to predict the depth from the image by learning the alignment of deep features between the RGB image and the depth domains. In this paper, we present a novel approach, named Structure-Attentioned Memory Network, to more effectively transfer domain features for monocular depth estimation by taking into account the common structure regularities (e.g., repetitive structure patterns, planar surfaces, symmetries) in domain adaptation. To this end, we introduce a new Structure-Oriented Memory (SOM) module to learn and memorize the structure-specific information between RGB image domain and the depth domain. More specifically, in the SOM module, we develop a Memorable Bank of Filters (MBF) unit to learn a set of filters that memorize the structure-aware image-depth residual pattern, and also an Attention Guided Controller (AGC) unit to control the filter selection in the MBF given image features queries. Given the query image feature, the trained SOM module is able to adaptively select the best customized filters for cross-domain feature transferring with an optimal structural disparity between image and depth. In summary, we focus on addressing this structure-specific domain adaption challenge by proposing a novel end-to-end multi-scale memorable network for monocular depth estimation. The experiments show that our proposed model demonstrates the superior performance compared to the existing supervised monocular depth estimation approaches on the challenging KITTI and NYU Depth V2 benchmarks. △ Less

Submitted 10 September, 2019; originally announced September 2019.

Comments: 8 pages, 6 figures

arXiv:1909.04182 [pdf, other]

Learning Object-specific Distance from a Monocular Image

Authors: **g Zhu, Yi Fang, Husam Abu-Haimed, Kuo-Chin Lien, Dongdong Fu, Junli Gu

Abstract: Environment perception, including object detection and distance estimation, is one of the most crucial tasks for autonomous driving. Many attentions have been paid on the object detection task, but distance estimation only arouse few interests in the computer vision community. Observing that the traditional inverse perspective map** algorithm performs poorly for objects far away from the camera… ▽ More Environment perception, including object detection and distance estimation, is one of the most crucial tasks for autonomous driving. Many attentions have been paid on the object detection task, but distance estimation only arouse few interests in the computer vision community. Observing that the traditional inverse perspective map** algorithm performs poorly for objects far away from the camera or on the curved road, in this paper, we address the challenging distance estimation problem by develo** the first end-to-end learning-based model to directly predict distances for given objects in the images. Besides the introduction of a learning-based base model, we further design an enhanced model with a keypoint regressor, where a projection loss is defined to enforce a better distance estimation, especially for objects close to the camera. To facilitate the research on this task, we construct the extented KITTI and nuScenes (mini) object detection datasets with a distance for each object. Our experiments demonstrate that our proposed methods outperform alternative approaches (e.g., the traditional IPM, SVR) on object-specific distance estimation, particularly for the challenging cases that objects are on a curved road. Moreover, the performance margin implies the effectiveness of our enhanced method. △ Less

Submitted 9 September, 2019; originally announced September 2019.

Comments: 10 pages, 6 figures, accepted by International Conference on Computer Vision (ICCV) 2019

arXiv:physics/0008149 [pdf]

A Real-Time Energy Monitor System for the IPNS Linac

Authors: J. C. Dooling, F. R. Brumwell, M. K. Lien, G. E. McMichael

Abstract: Injected beam energy and energy spread are critical parameters affecting the performance of our rapid cycling synchrotron (RCS). A real-time energy monitoring system is being installed to examine the H- beam out of the Intense Pulsed Neutron Source (IPNS) 50 MeV linac. The 200 MHz Alvarez linac serves as the injector for the 450 MeV IPNS RCS. The linac provides an 80 ms macropulse of approximate… ▽ More Injected beam energy and energy spread are critical parameters affecting the performance of our rapid cycling synchrotron (RCS). A real-time energy monitoring system is being installed to examine the H- beam out of the Intense Pulsed Neutron Source (IPNS) 50 MeV linac. The 200 MHz Alvarez linac serves as the injector for the 450 MeV IPNS RCS. The linac provides an 80 ms macropulse of approximately 3x1012 H- ions 30 times per second for coasting-beam injection into the RCS. The RCS delivers protons to a heavy-metal spallation neutron target for material science studies. Using a number of strip-line beam position monitors (BPMs) distributed along the 50 MeV transport line from the linac to the RCS, fast signals from the strip lines are digitized and transferred to a computer which performs an FFT. Corrections for cable attenuation and oscilloscope bandwidth are made in the frequency domain. Rectangular pulse train phasing (RPTP) is imposed on the spectra prior to obtaining the inverse transform (IFFT). After the IFFT, the reconstructed time-domain signal is analyzed for pulse width as it progresses along the transport line. Time-of-flight measurements of the BPM signals provide beam energy. Finally, using the 3-size measurement technique, the longitudinal emittance and energy spread of the beam are determined. △ Less

Submitted 18 August, 2000; originally announced August 2000.

Comments: 3 pages, 5 figures, 8 equations

Journal ref: eConf C000821 (2000) MOc18

Showing 1–10 of 10 results for author: Lien, K