-
Promptformer: Prompted Conformer Transducer for ASR
Authors:
Sergio Duarte-Torres,
Arunasish Sen,
Aman Rana,
Lukas Drude,
Alejandro Gomez-Alanis,
Andreas Schwarz,
Leif Rädel,
Volker Leutnant
Abstract:
Context cues carry information which can improve multi-turn interactions in automatic speech recognition (ASR) systems. In this paper, we introduce a novel mechanism inspired by hyper-prompting to fuse textual context with acoustic representations in the attention mechanism. Results on a test set with multi-turn interactions show that our method achieves 5.9% relative word error rate reduction (rW…
▽ More
Context cues carry information which can improve multi-turn interactions in automatic speech recognition (ASR) systems. In this paper, we introduce a novel mechanism inspired by hyper-prompting to fuse textual context with acoustic representations in the attention mechanism. Results on a test set with multi-turn interactions show that our method achieves 5.9% relative word error rate reduction (rWERR) over a strong baseline. We show that our method does not degrade in the absence of context and leads to improvements even if the model is trained without context. We further show that leveraging a pre-trained sentence-piece model for context embedding generation can outperform an external BERT model.
△ Less
Submitted 14 January, 2024;
originally announced January 2024.
-
Geometric Fabrics: Generalizing Classical Mechanics to Capture the Physics of Behavior
Authors:
Karl Van Wyk,
Mandy Xie,
Anqi Li,
Muhammad Asif Rana,
Buck Babich,
Bryan Peele,
Qian Wan,
Iretiayo Akinola,
Balakumar Sundaralingam,
Dieter Fox,
Byron Boots,
Nathan D. Ratliff
Abstract:
Classical mechanical systems are central to controller design in energy sha** methods of geometric control. However, their expressivity is limited by position-only metrics and the intimate link between metric and geometry. Recent work on Riemannian Motion Policies (RMPs) has shown that shedding these restrictions results in powerful design tools, but at the expense of theoretical stability guara…
▽ More
Classical mechanical systems are central to controller design in energy sha** methods of geometric control. However, their expressivity is limited by position-only metrics and the intimate link between metric and geometry. Recent work on Riemannian Motion Policies (RMPs) has shown that shedding these restrictions results in powerful design tools, but at the expense of theoretical stability guarantees. In this work, we generalize classical mechanics to what we call geometric fabrics, whose expressivity and theory enable the design of systems that outperform RMPs in practice. Geometric fabrics strictly generalize classical mechanics forming a new physics of behavior by first generalizing them to Finsler geometries and then explicitly bending them to shape their behavior while maintaining stability. We develop the theory of fabrics and present both a collection of controlled experiments examining their theoretical properties and a set of robot system experiments showing improved performance over a well-engineered and hardened implementation of RMPs, our current state-of-the-art in controller design.
△ Less
Submitted 18 January, 2022; v1 submitted 21 September, 2021;
originally announced September 2021.
-
DFENet: A Novel Dimension Fusion Edge Guided Network for Brain MRI Segmentation
Authors:
Hritam Basak,
Rukhshanda Hussain,
Ajay Rana
Abstract:
The rapid increment of morbidity of brain stroke in the last few years have been a driving force towards fast and accurate segmentation of stroke lesions from brain MRI images. With the recent development of deep-learning, computer-aided and segmentation methods of ischemic stroke lesions have been useful for clinicians in early diagnosis and treatment planning. However, most of these methods suff…
▽ More
The rapid increment of morbidity of brain stroke in the last few years have been a driving force towards fast and accurate segmentation of stroke lesions from brain MRI images. With the recent development of deep-learning, computer-aided and segmentation methods of ischemic stroke lesions have been useful for clinicians in early diagnosis and treatment planning. However, most of these methods suffer from inaccurate and unreliable segmentation results because of their inability to capture sufficient contextual features from the MRI volumes. To meet these requirements, 3D convolutional neural networks have been proposed, which, however, suffer from huge computational requirements. To mitigate these problems, we propose a novel Dimension Fusion Edge-guided network (DFENet) that can meet both of these requirements by fusing the features of 2D and 3D CNNs. Unlike other methods, our proposed network uses a parallel partial decoder (PPD) module for aggregating and upsampling selected features, rich in important contextual information. Additionally, we use an edge-guidance and enhanced mixing loss for constantly supervising and improvising the learning process of the network. The proposed method is evaluated on publicly available Anatomical Tracings of Lesions After Stroke (ATLAS) dataset, resulting in mean DSC, IoU, Precision and Recall values of 0.5457, 0.4015, 0.6371, and 0.4969 respectively. The results, when compared to other state-of-the-art methods, outperforms them by a significant margin. Therefore, the proposed model is robust, accurate, superior to the existing methods, and can be relied upon for biomedical applications.
△ Less
Submitted 22 October, 2021; v1 submitted 17 May, 2021;
originally announced May 2021.
-
RMP2: A Structured Composable Policy Class for Robot Learning
Authors:
Anqi Li,
Ching-An Cheng,
M. Asif Rana,
Man Xie,
Karl Van Wyk,
Nathan Ratliff,
Byron Boots
Abstract:
We consider the problem of learning motion policies for acceleration-based robotics systems with a structured policy class specified by RMPflow. RMPflow is a multi-task control framework that has been successfully applied in many robotics problems. Using RMPflow as a structured policy class in learning has several benefits, such as sufficient expressiveness, the flexibility to inject different lev…
▽ More
We consider the problem of learning motion policies for acceleration-based robotics systems with a structured policy class specified by RMPflow. RMPflow is a multi-task control framework that has been successfully applied in many robotics problems. Using RMPflow as a structured policy class in learning has several benefits, such as sufficient expressiveness, the flexibility to inject different levels of prior knowledge as well as the ability to transfer policies between robots. However, implementing a system for end-to-end learning RMPflow policies faces several computational challenges. In this work, we re-examine the message passing algorithm of RMPflow and propose a more efficient alternate algorithm, called RMP2, that uses modern automatic differentiation tools (such as TensorFlow and PyTorch) to compute RMPflow policies. Our new design retains the strengths of RMPflow while bringing in advantages from automatic differentiation, including 1) easy programming interfaces to designing complex transformations; 2) support of general directed acyclic graph (DAG) transformation structures; 3) end-to-end differentiability for policy learning; 4) improved computational efficiency. Because of these features, RMP2 can be treated as a structured policy class for efficient robot learning which is suitable encoding domain knowledge. Our experiments show that using structured policy class given by RMP2 can improve policy performance and safety in reinforcement learning tasks for goal reaching in cluttered space.
△ Less
Submitted 10 March, 2021;
originally announced March 2021.
-
Quality Assessment of Super-Resolved Omnidirectional Image Quality Using Tangential Views
Authors:
Cagri Ozcinar,
Aakanksha Rana
Abstract:
Omnidirectional images (ODIs), also known as 360-degree images, enable viewers to explore all directions of a given 360-degree scene from a fixed point. Designing an immersive imaging system with ODI is challenging as such systems require very large resolution coverage of the entire 360 viewing space to provide an enhanced quality of experience (QoE). Despite remarkable progress on single image su…
▽ More
Omnidirectional images (ODIs), also known as 360-degree images, enable viewers to explore all directions of a given 360-degree scene from a fixed point. Designing an immersive imaging system with ODI is challenging as such systems require very large resolution coverage of the entire 360 viewing space to provide an enhanced quality of experience (QoE). Despite remarkable progress on single image super-resolution (SISR) methods with deep-learning techniques, no study for quality assessments of super-resolved ODIs exists to analyze the quality of such SISR techniques. This paper proposes an objective, full-reference quality assessment framework which studies quality measurement for ODIs generated by GAN-based and CNN-based SISR methods. The quality assessment framework offers to utilize tangential views to cope with the spherical nature of a given ODIs. The generated tangential views are distortion-free and can be efficiently scaled to high-resolution spherical data for SISR quality measurement. We extensively evaluate two state-of-the-art SISR methods using widely used full-reference SISR quality metrics adapted to our designed framework. In addition, our study reveals that most objective metric show high performance over CNN based SISR, while subjective tests favors GAN-based architectures.
△ Less
Submitted 25 January, 2021;
originally announced January 2021.
-
A generalized deep learning model for multi-disease Chest X-Ray diagnostics
Authors:
Nabit Bajwa,
Kedar Bajwa,
Atif Rana,
M. Faique Shakeel,
Kashif Haqqi,
Suleiman Ali Khan
Abstract:
We investigate the generalizability of deep convolutional neural network (CNN) on the task of disease classification from chest x-rays collected over multiple sites. We systematically train the model using datasets from three independent sites with different patient populations: National Institute of Health (NIH), Stanford University Medical Centre (CheXpert), and Shifa International Hospital (SIH…
▽ More
We investigate the generalizability of deep convolutional neural network (CNN) on the task of disease classification from chest x-rays collected over multiple sites. We systematically train the model using datasets from three independent sites with different patient populations: National Institute of Health (NIH), Stanford University Medical Centre (CheXpert), and Shifa International Hospital (SIH). We formulate a sequential training approach and demonstrate that the model produces generalized prediction performance using held out test sets from the three sites. Our model generalizes better when trained on multiple datasets, with the CheXpert-Shifa-NET model performing significantly better (p-values < 0.05) than the models trained on individual datasets for 3 out of the 4 distinct disease classes. The code for training the model will be made available open source at: www.github.com/link-to-code at the time of publication.
△ Less
Submitted 17 October, 2020;
originally announced October 2020.
-
Sub-Pixel Back-Projection Network For Lightweight Single Image Super-Resolution
Authors:
Supratik Banerjee,
Cagri Ozcinar,
Aakanksha Rana,
Aljosa Smolic,
Michael Manzke
Abstract:
Convolutional neural network (CNN)-based methods have achieved great success for single-image superresolution (SISR). However, most models attempt to improve reconstruction accuracy while increasing the requirement of number of model parameters. To tackle this problem, in this paper, we study reducing the number of parameters and computational cost of CNN-based SISR methods while maintaining the a…
▽ More
Convolutional neural network (CNN)-based methods have achieved great success for single-image superresolution (SISR). However, most models attempt to improve reconstruction accuracy while increasing the requirement of number of model parameters. To tackle this problem, in this paper, we study reducing the number of parameters and computational cost of CNN-based SISR methods while maintaining the accuracy of super-resolution reconstruction performance. To this end, we introduce a novel network architecture for SISR, which strikes a good trade-off between reconstruction quality and low computational complexity. Specifically, we propose an iterative back-projection architecture using sub-pixel convolution instead of deconvolution layers. We evaluate the performance of computational and reconstruction accuracy for our proposed model with extensive quantitative and qualitative evaluations. Experimental results reveal that our proposed method uses fewer parameters and reduces the computational cost while maintaining reconstruction accuracy against state-of-the-art SISR methods over well-known four SR benchmark datasets. Code is available at "https://github.com/supratikbanerjee/SubPixel-BackProjection_SuperResolution".
△ Less
Submitted 3 August, 2020;
originally announced August 2020.
-
Euclideanizing Flows: Diffeomorphic Reduction for Learning Stable Dynamical Systems
Authors:
Muhammad Asif Rana,
Anqi Li,
Dieter Fox,
Byron Boots,
Fabio Ramos,
Nathan Ratliff
Abstract:
Robotic tasks often require motions with complex geometric structures. We present an approach to learn such motions from a limited number of human demonstrations by exploiting the regularity properties of human motions e.g. stability, smoothness, and boundedness. The complex motions are encoded as rollouts of a stable dynamical system, which, under a change of coordinates defined by a diffeomorphi…
▽ More
Robotic tasks often require motions with complex geometric structures. We present an approach to learn such motions from a limited number of human demonstrations by exploiting the regularity properties of human motions e.g. stability, smoothness, and boundedness. The complex motions are encoded as rollouts of a stable dynamical system, which, under a change of coordinates defined by a diffeomorphism, is equivalent to a simple, hand-specified dynamical system. As an immediate result of using diffeomorphisms, the stability property of the hand-specified dynamical system directly carry over to the learned dynamical system. Inspired by recent works in density estimation, we propose to represent the diffeomorphism as a composition of simple parameterized diffeomorphisms. Additional structure is imposed to provide guarantees on the smoothness of the generated motions. The efficacy of this approach is demonstrated through validation on an established benchmark as well demonstrations collected on a real-world robotic system.
△ Less
Submitted 21 September, 2020; v1 submitted 26 May, 2020;
originally announced May 2020.
-
Gabriella: An Online System for Real-Time Activity Detection in Untrimmed Security Videos
Authors:
Mamshad Nayeem Rizve,
Ugur Demir,
Praveen Tirupattur,
Aayush Jung Rana,
Kevin Duarte,
Ishan Dave,
Yogesh Singh Rawat,
Mubarak Shah
Abstract:
Activity detection in security videos is a difficult problem due to multiple factors such as large field of view, presence of multiple activities, varying scales and viewpoints, and its untrimmed nature. The existing research in activity detection is mainly focused on datasets, such as UCF-101, JHMDB, THUMOS, and AVA, which partially address these issues. The requirement of processing the security…
▽ More
Activity detection in security videos is a difficult problem due to multiple factors such as large field of view, presence of multiple activities, varying scales and viewpoints, and its untrimmed nature. The existing research in activity detection is mainly focused on datasets, such as UCF-101, JHMDB, THUMOS, and AVA, which partially address these issues. The requirement of processing the security videos in real-time makes this even more challenging. In this work we propose Gabriella, a real-time online system to perform activity detection on untrimmed security videos. The proposed method consists of three stages: tubelet extraction, activity classification, and online tubelet merging. For tubelet extraction, we propose a localization network which takes a video clip as input and spatio-temporally detects potential foreground regions at multiple scales to generate action tubelets. We propose a novel Patch-Dice loss to handle large variations in actor size. Our online processing of videos at a clip level drastically reduces the computation time in detecting activities. The detected tubelets are assigned activity class scores by the classification network and merged together using our proposed Tubelet-Merge Action-Split (TMAS) algorithm to form the final action detections. The TMAS algorithm efficiently connects the tubelets in an online fashion to generate action detections which are robust against varying length activities. We perform our experiments on the VIRAT and MEVA (Multiview Extended Video with Activities) datasets and demonstrate the effectiveness of the proposed approach in terms of speed (~100 fps) and performance with state-of-the-art results. The code and models will be made publicly available.
△ Less
Submitted 19 May, 2020; v1 submitted 23 April, 2020;
originally announced April 2020.
-
RESIRE: real space iterative reconstruction engine for Tomography
Authors:
Minh Pham,
Yakun Yuan,
Arjun Rana,
Jianwei Miao,
Stanley Osher
Abstract:
Tomography has made a revolutionary impact on diverse fields, ranging from macro-/mesoscopic scale studies in biology, radiology, plasma physics to the characterization of 3D atomic structure in material science. The fundamental of tomography is to reconstruct a 3D object from a set of 2D projections. To solve the tomography problem, many algorithms have been developed. Among them are methods usin…
▽ More
Tomography has made a revolutionary impact on diverse fields, ranging from macro-/mesoscopic scale studies in biology, radiology, plasma physics to the characterization of 3D atomic structure in material science. The fundamental of tomography is to reconstruct a 3D object from a set of 2D projections. To solve the tomography problem, many algorithms have been developed. Among them are methods using transformation technique such as computed tomography (CT) based on Radon transform and Generalized Fourier iterative reconstruction (GENFIRE) based on Fourier slice theorem (FST), and direct methods such as Simultaneous Iterative Reconstruction Technique (SIRT) and Simultaneous Algebraic Reconstruction Technique (SART) using gradient descent and algebra technique. In this paper, we propose a hybrid gradient descent to solve the tomography problem by combining Fourier slice theorem and calculus of variations. By using simulated and experimental data, we show that the state-of-art RESIRE can produce more superior results than previous methods; the reconstructed objects have higher quality and smaller relative errors. More importantly, RESIRE can deal with partially blocked projections rigorously where only part of projection information are provided while other methods fail. We anticipate RESIRE will not only improve the reconstruction quality in all existing tomographic applications, but also expand tomography method to a broad class of functional thin films. We expect RESIRE to find a broad applications across diverse disciplines.
△ Less
Submitted 25 April, 2020; v1 submitted 22 April, 2020;
originally announced April 2020.
-
ColorNet -- Estimating Colorfulness in Natural Images
Authors:
Emin Zerman,
Aakanksha Rana,
Aljosa Smolic
Abstract:
Measuring the colorfulness of a natural or virtual scene is critical for many applications in image processing field ranging from capturing to display. In this paper, we propose the first deep learning-based colorfulness estimation metric. For this purpose, we develop a color rating model which simultaneously learns to extracts the pertinent characteristic color features and the map** from featu…
▽ More
Measuring the colorfulness of a natural or virtual scene is critical for many applications in image processing field ranging from capturing to display. In this paper, we propose the first deep learning-based colorfulness estimation metric. For this purpose, we develop a color rating model which simultaneously learns to extracts the pertinent characteristic color features and the map** from feature space to the ideal colorfulness scores for a variety of natural colored images. Additionally, we propose to overcome the lack of adequate annotated dataset problem by combining/aligning two publicly available colorfulness databases using the results of a new subjective test which employs a common subset of both databases. Using the obtained subjectively annotated dataset with 180 colored images, we finally demonstrate the efficacy of our proposed model over the traditional methods, both quantitatively and qualitatively.
△ Less
Submitted 22 August, 2019;
originally announced August 2019.
-
Towards Generating Ambisonics Using Audio-Visual Cue for Virtual Reality
Authors:
Aakanksha Rana,
Cagri Ozcinar,
Aljoscha Smolic
Abstract:
Ambisonics i.e., a full-sphere surround sound, is quintessential with 360-degree visual content to provide a realistic virtual reality (VR) experience. While 360-degree visual content capture gained a tremendous boost recently, the estimation of corresponding spatial sound is still challenging due to the required sound-field microphones or information about the sound-source locations. In this pape…
▽ More
Ambisonics i.e., a full-sphere surround sound, is quintessential with 360-degree visual content to provide a realistic virtual reality (VR) experience. While 360-degree visual content capture gained a tremendous boost recently, the estimation of corresponding spatial sound is still challenging due to the required sound-field microphones or information about the sound-source locations. In this paper, we introduce a novel problem of generating Ambisonics in 360-degree videos using the audio-visual cue. With this aim, firstly, a novel 360-degree audio-visual video dataset of 265 videos is introduced with annotated sound-source locations. Secondly, a pipeline is designed for an automatic Ambisonic estimation problem. Benefiting from the deep learning-based audio-visual feature-embedding and prediction modules, our pipeline estimates the 3D sound-source locations and further use such locations to encode to the B-format. To benchmark our dataset and pipeline, we additionally propose evaluation criteria to investigate the performance using different 360-degree input representations. Our results demonstrate the efficacy of the proposed pipeline and open up a new area of research in 360-degree audio-visual analysis for future investigations.
△ Less
Submitted 16 August, 2019;
originally announced August 2019.
-
Super-resolution of Omnidirectional Images Using Adversarial Learning
Authors:
Cagri Ozcinar,
Aakanksha Rana,
Aljosa Smolic
Abstract:
An omnidirectional image (ODI) enables viewers to look in every direction from a fixed point through a head-mounted display providing an immersive experience compared to that of a standard image. Designing immersive virtual reality systems with ODIs is challenging as they require high resolution content. In this paper, we study super-resolution for ODIs and propose an improved generative adversari…
▽ More
An omnidirectional image (ODI) enables viewers to look in every direction from a fixed point through a head-mounted display providing an immersive experience compared to that of a standard image. Designing immersive virtual reality systems with ODIs is challenging as they require high resolution content. In this paper, we study super-resolution for ODIs and propose an improved generative adversarial network based model which is optimized to handle the artifacts obtained in the spherical observational space. Specifically, we propose to use a fast PatchGAN discriminator, as it needs fewer parameters and improves the super-resolution at a fine scale. We also explore the generative models with adversarial learning by introducing a spherical-content specific loss function, called 360-SS. To train and test the performance of our proposed model we prepare a dataset of 4500 ODIs. Our results demonstrate the efficacy of the proposed method and identify new challenges in ODI super-resolution for future investigations.
△ Less
Submitted 12 August, 2019;
originally announced August 2019.
-
Deep Tone Map** Operator for High Dynamic Range Images
Authors:
Aakanksha Rana,
Praveer Singh,
Giuseppe Valenzise,
Frederic Dufaux,
Nikos Komodakis,
Aljosa Smolic
Abstract:
A computationally fast tone map** operator (TMO) that can quickly adapt to a wide spectrum of high dynamic range (HDR) content is quintessential for visualization on varied low dynamic range (LDR) output devices such as movie screens or standard displays. Existing TMOs can successfully tone-map only a limited number of HDR content and require an extensive parameter tuning to yield the best subje…
▽ More
A computationally fast tone map** operator (TMO) that can quickly adapt to a wide spectrum of high dynamic range (HDR) content is quintessential for visualization on varied low dynamic range (LDR) output devices such as movie screens or standard displays. Existing TMOs can successfully tone-map only a limited number of HDR content and require an extensive parameter tuning to yield the best subjective-quality tone-mapped output. In this paper, we address this problem by proposing a fast, parameter-free and scene-adaptable deep tone map** operator (DeepTMO) that yields a high-resolution and high-subjective quality tone mapped output. Based on conditional generative adversarial network (cGAN), DeepTMO not only learns to adapt to vast scenic-content (e.g., outdoor, indoor, human, structures, etc.) but also tackles the HDR related scene-specific challenges such as contrast and brightness, while preserving the fine-grained details. We explore 4 possible combinations of Generator-Discriminator architectural designs to specifically address some prominent issues in HDR related deep-learning frameworks like blurring, tiling patterns and saturation artifacts. By exploring different influences of scales, loss-functions and normalization layers under a cGAN setting, we conclude with adopting a multi-scale model for our task. To further leverage on the large-scale availability of unlabeled HDR data, we train our network by generating targets using an objective HDR quality metric, namely Tone Map** Image Quality Index (TMQI). We demonstrate results both quantitatively and qualitatively, and showcase that our DeepTMO generates high-resolution, high-quality output images over a large spectrum of real-world scenes. Finally, we evaluate the perceived quality of our results by conducting a pair-wise subjective study which confirms the versatility of our method.
△ Less
Submitted 12 August, 2019;
originally announced August 2019.
-
High Accuracy Tumor Diagnoses and Benchmarking of Hematoxylin and Eosin Stained Prostate Core Biopsy Images Generated by Explainable Deep Neural Networks
Authors:
Aman Rana,
Alarice Lowe,
Marie Lithgow,
Katharine Horback,
Tyler Janovitz,
Annacarolina Da Silva,
Harrison Tsai,
Vignesh Shanmugam,
Hyung-** Yoon,
Pratik Shah
Abstract:
Histopathological diagnoses of tumors in tissue biopsy after Hematoxylin and Eosin (H&E) staining is the gold standard for oncology care. H&E staining is slow and uses dyes, reagents and precious tissue samples that cannot be reused. Thousands of native nonstained RGB Whole Slide Image (RWSI) patches of prostate core tissue biopsies were registered with their H&E stained versions. Conditional Gene…
▽ More
Histopathological diagnoses of tumors in tissue biopsy after Hematoxylin and Eosin (H&E) staining is the gold standard for oncology care. H&E staining is slow and uses dyes, reagents and precious tissue samples that cannot be reused. Thousands of native nonstained RGB Whole Slide Image (RWSI) patches of prostate core tissue biopsies were registered with their H&E stained versions. Conditional Generative Adversarial Neural Networks (cGANs) that automate conversion of native nonstained RWSI to computational H&E stained images were then trained. High similarities between computational and H&E dye stained images with Structural Similarity Index (SSIM) 0.902, Pearsons Correlation Coefficient (CC) 0.962 and Peak Signal to Noise Ratio (PSNR) 22.821 dB were calculated. A second cGAN performed accurate computational destaining of H&E dye stained images back to their native nonstained form with SSIM 0.9, CC 0.963 and PSNR 25.646 dB. A single-blind study computed more than 95% pixel-by-pixel overlap between prostate tumor annotations on computationally stained images, provided by five-board certified MD pathologists, with those on H&E dye stained counterparts. We report the first visualization and explanation of neural network kernel activation maps during H&E staining and destaining of RGB images by cGANs. High similarities between kernel activation maps of computational and H&E stained images (Mean-Squared Errors <0.0005) provide additional mathematical and mechanistic validation of the staining system. Our neural network framework thus is automated, explainable and performs high precision H&E staining and destaining of low cost native RGB images, and is computer vision and physician authenticated for rapid and accurate tumor diagnoses.
△ Less
Submitted 2 August, 2019;
originally announced August 2019.
-
A semi-implicit relaxed Douglas-Rachford algorithm (sir-DR) for Ptychograhpy
Authors:
Minh Pham,
Arjun Rana,
Jianwei Miao,
Stanley Osher
Abstract:
Alternating projection based methods, such as ePIE and rPIE, have been used widely in ptychography. However, they only work well if there are adequate measurements (diffraction patterns); in the case of sparse data (i.e. fewer measurements) alternating projection underperforms and might not even converge. In this paper, we propose semi-implicit relaxed Douglas Rachford (sir-DR), an accelerated ite…
▽ More
Alternating projection based methods, such as ePIE and rPIE, have been used widely in ptychography. However, they only work well if there are adequate measurements (diffraction patterns); in the case of sparse data (i.e. fewer measurements) alternating projection underperforms and might not even converge. In this paper, we propose semi-implicit relaxed Douglas Rachford (sir-DR), an accelerated iterative method, to solve the classical ptychography problem. Using both simulated and experimental data, we show that sir-DR improves the convergence speed and the reconstruction quality relative to ePIE and rPIE. Furthermore, in certain cases when sparsity is high, sir-DR converges while ePIE and rPIE fail. To facilitate others to use the algorithm, we post the Matlab source code of sir-DR on a public website (www.physics.ucla.edu/research/imaging/sir-DR). We anticipate that this algorithm can be generally applied to the ptychographic reconstruction of a wide range of samples in the physical and biological sciences.
△ Less
Submitted 5 June, 2019;
originally announced June 2019.