Search | arXiv e-print repository

FORA: Fast-Forward Caching in Diffusion Transformer Acceleration

Authors: Pratheba Selvaraju, Tianyu Ding, Tianyi Chen, Ilya Zharkov, Luming Liang

Abstract: Diffusion transformers (DiT) have become the de facto choice for generating high-quality images and videos, largely due to their scalability, which enables the construction of larger models for enhanced performance. However, the increased size of these models leads to higher inference costs, making them less attractive for real-time applications. We present Fast-FORward CAching (FORA), a simple ye… ▽ More Diffusion transformers (DiT) have become the de facto choice for generating high-quality images and videos, largely due to their scalability, which enables the construction of larger models for enhanced performance. However, the increased size of these models leads to higher inference costs, making them less attractive for real-time applications. We present Fast-FORward CAching (FORA), a simple yet effective approach designed to accelerate DiT by exploiting the repetitive nature of the diffusion process. FORA implements a caching mechanism that stores and reuses intermediate outputs from the attention and MLP layers across denoising steps, thereby reducing computational overhead. This approach does not require model retraining and seamlessly integrates with existing transformer-based diffusion models. Experiments show that FORA can speed up diffusion transformers several times over while only minimally affecting performance metrics such as the IS Score and FID. By enabling faster processing with minimal trade-offs in quality, FORA represents a significant advancement in deploying diffusion transformers for real-time applications. Code will be made publicly available at: https://github.com/prathebaselva/FORA. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2404.08292 [pdf, other]

AdaContour: Adaptive Contour Descriptor with Hierarchical Representation

Authors: Tianyu Ding, **xin Zhou, Tianyi Chen, Zhihui Zhu, Ilya Zharkov, Luming Liang

Abstract: Existing angle-based contour descriptors suffer from lossy representation for non-starconvex shapes. By and large, this is the result of the shape being registered with a single global inner center and a set of radii corresponding to a polar coordinate parameterization. In this paper, we propose AdaContour, an adaptive contour descriptor that uses multiple local representations to desirably charac… ▽ More Existing angle-based contour descriptors suffer from lossy representation for non-starconvex shapes. By and large, this is the result of the shape being registered with a single global inner center and a set of radii corresponding to a polar coordinate parameterization. In this paper, we propose AdaContour, an adaptive contour descriptor that uses multiple local representations to desirably characterize complex shapes. After hierarchically encoding object shapes in a training set and constructing a contour matrix of all subdivided regions, we compute a robust low-rank robust subspace and approximate each local contour by linearly combining the shared basis vectors to represent an object. Experiments show that AdaContour is able to represent shapes more accurately and robustly than other descriptors while retaining effectiveness. We validate AdaContour by integrating it into off-the-shelf detectors to enable instance segmentation which demonstrates faithful performance. The code is available at https://github.com/tding1/AdaContour. △ Less

Submitted 12 April, 2024; originally announced April 2024.

arXiv:2404.08111 [pdf, other]

S3Editor: A Sparse Semantic-Disentangled Self-Training Framework for Face Video Editing

Authors: Guangzhi Wang, Tianyi Chen, Kamran Ghasedi, HsiangTao Wu, Tianyu Ding, Chris Nuesmeyer, Ilya Zharkov, Mohan Kankanhalli, Luming Liang

Abstract: Face attribute editing plays a pivotal role in various applications. However, existing methods encounter challenges in achieving high-quality results while preserving identity, editing faithfulness, and temporal consistency. These challenges are rooted in issues related to the training pipeline, including limited supervision, architecture design, and optimization strategy. In this work, we introdu… ▽ More Face attribute editing plays a pivotal role in various applications. However, existing methods encounter challenges in achieving high-quality results while preserving identity, editing faithfulness, and temporal consistency. These challenges are rooted in issues related to the training pipeline, including limited supervision, architecture design, and optimization strategy. In this work, we introduce S3Editor, a Sparse Semantic-disentangled Self-training framework for face video editing. S3Editor is a generic solution that comprehensively addresses these challenges with three key contributions. Firstly, S3Editor adopts a self-training paradigm to enhance the training process through semi-supervision. Secondly, we propose a semantic disentangled architecture with a dynamic routing mechanism that accommodates diverse editing requirements. Thirdly, we present a structured sparse optimization schema that identifies and deactivates malicious neurons to further disentangle impacts from untarget attributes. S3Editor is model-agnostic and compatible with various editing approaches. Our extensive qualitative and quantitative results affirm that our approach significantly enhances identity preservation, editing fidelity, as well as temporal consistency. △ Less

Submitted 11 April, 2024; originally announced April 2024.

arXiv:2312.09411 [pdf, other]

OTOv3: Automatic Architecture-Agnostic Neural Network Training and Compression from Structured Pruning to Erasing Operators

Authors: Tianyi Chen, Tianyu Ding, Zhihui Zhu, Zeyu Chen, HsiangTao Wu, Ilya Zharkov, Luming Liang

Abstract: Compressing a predefined deep neural network (DNN) into a compact sub-network with competitive performance is crucial in the efficient machine learning realm. This topic spans various techniques, from structured pruning to neural architecture search, encompassing both pruning and erasing operators perspectives. Despite advancements, existing methods suffers from complex, multi-stage processes that… ▽ More Compressing a predefined deep neural network (DNN) into a compact sub-network with competitive performance is crucial in the efficient machine learning realm. This topic spans various techniques, from structured pruning to neural architecture search, encompassing both pruning and erasing operators perspectives. Despite advancements, existing methods suffers from complex, multi-stage processes that demand substantial engineering and domain knowledge, limiting their broader applications. We introduce the third-generation Only-Train-Once (OTOv3), which first automatically trains and compresses a general DNN through pruning and erasing operations, creating a compact and competitive sub-network without the need of fine-tuning. OTOv3 simplifies and automates the training and compression process, minimizes the engineering efforts required from users. It offers key technological advancements: (i) automatic search space construction for general DNNs based on dependency graph analysis; (ii) Dual Half-Space Projected Gradient (DHSPG) and its enhanced version with hierarchical search (H2SPG) to reliably solve (hierarchical) structured sparsity problems and ensure sub-network validity; and (iii) automated sub-network construction using solutions from DHSPG/H2SPG and dependency graphs. Our empirical results demonstrate the efficacy of OTOv3 across various benchmarks in structured pruning and neural architecture search. OTOv3 produces sub-networks that match or exceed the state-of-the-arts. The source code will be available at https://github.com/tianyic/only_train_once. △ Less

Submitted 14 December, 2023; originally announced December 2023.

Comments: 39 pages. Due to the page dim limitation, the full appendix is attached here https://tinyurl.com/otov3appendix. Recommend to zoom-in for finer details. arXiv admin note: text overlap with arXiv:2305.18030

arXiv:2312.00678 [pdf, other]

The Efficiency Spectrum of Large Language Models: An Algorithmic Survey

Authors: Tianyu Ding, Tianyi Chen, Haidong Zhu, Jiachen Jiang, Yiqi Zhong, **xin Zhou, Guangzhi Wang, Zhihui Zhu, Ilya Zharkov, Luming Liang

Abstract: The rapid growth of Large Language Models (LLMs) has been a driving force in transforming various domains, resha** the artificial general intelligence landscape. However, the increasing computational and memory demands of these models present substantial challenges, hindering both academic research and practical applications. To address these issues, a wide array of methods, including both algor… ▽ More The rapid growth of Large Language Models (LLMs) has been a driving force in transforming various domains, resha** the artificial general intelligence landscape. However, the increasing computational and memory demands of these models present substantial challenges, hindering both academic research and practical applications. To address these issues, a wide array of methods, including both algorithmic and hardware solutions, have been developed to enhance the efficiency of LLMs. This survey delivers a comprehensive review of algorithmic advancements aimed at improving LLM efficiency. Unlike other surveys that typically focus on specific areas such as training or model compression, this paper examines the multi-faceted dimensions of efficiency essential for the end-to-end algorithmic development of LLMs. Specifically, it covers various topics related to efficiency, including scaling laws, data utilization, architectural innovations, training and tuning strategies, and inference techniques. This paper aims to serve as a valuable resource for researchers and practitioners, laying the groundwork for future innovations in this critical research area. Our repository of relevant references is maintained at url{https://github.com/tding1/Efficient-LLM-Survey}. △ Less

Submitted 18 April, 2024; v1 submitted 1 December, 2023; originally announced December 2023.

arXiv:2312.00210 [pdf, other]

DREAM: Diffusion Rectification and Estimation-Adaptive Models

Authors: **xin Zhou, Tianyu Ding, Tianyi Chen, Jiachen Jiang, Ilya Zharkov, Zhihui Zhu, Luming Liang

Abstract: We present DREAM, a novel training framework representing Diffusion Rectification and Estimation Adaptive Models, requiring minimal code changes (just three lines) yet significantly enhancing the alignment of training with sampling in diffusion models. DREAM features two components: diffusion rectification, which adjusts training to reflect the sampling process, and estimation adaptation, which ba… ▽ More We present DREAM, a novel training framework representing Diffusion Rectification and Estimation Adaptive Models, requiring minimal code changes (just three lines) yet significantly enhancing the alignment of training with sampling in diffusion models. DREAM features two components: diffusion rectification, which adjusts training to reflect the sampling process, and estimation adaptation, which balances perception against distortion. When applied to image super-resolution (SR), DREAM adeptly navigates the tradeoff between minimizing distortion and preserving high image quality. Experiments demonstrate DREAM's superiority over standard diffusion-based SR methods, showing a $2$ to $3\times $ faster training convergence and a $10$ to $20\times$ reduction in sampling steps to achieve comparable results. We hope DREAM will inspire a rethinking of diffusion model training paradigms. △ Less

Submitted 19 March, 2024; v1 submitted 30 November, 2023; originally announced December 2023.

Comments: 16 pages, 22 figures, 5 tables; the first two authors contributed to this work equally

arXiv:2311.15510 [pdf, other]

CaesarNeRF: Calibrated Semantic Representation for Few-shot Generalizable Neural Rendering

Authors: Haidong Zhu, Tianyu Ding, Tianyi Chen, Ilya Zharkov, Ram Nevatia, Luming Liang

Abstract: Generalizability and few-shot learning are key challenges in Neural Radiance Fields (NeRF), often due to the lack of a holistic understanding in pixel-level rendering. We introduce CaesarNeRF, an end-to-end approach that leverages scene-level CAlibratEd SemAntic Representation along with pixel-level representations to advance few-shot, generalizable neural rendering, facilitating a holistic unders… ▽ More Generalizability and few-shot learning are key challenges in Neural Radiance Fields (NeRF), often due to the lack of a holistic understanding in pixel-level rendering. We introduce CaesarNeRF, an end-to-end approach that leverages scene-level CAlibratEd SemAntic Representation along with pixel-level representations to advance few-shot, generalizable neural rendering, facilitating a holistic understanding without compromising high-quality details. CaesarNeRF explicitly models pose differences of reference views to combine scene-level semantic representations, providing a calibrated holistic understanding. This calibration process aligns various viewpoints with precise location and is further enhanced by sequential refinement to capture varying details. Extensive experiments on public datasets, including LLFF, Shiny, mip-NeRF 360, and MVImgNet, show that CaesarNeRF delivers state-of-the-art performance across varying numbers of reference views, proving effective even with a single reference image. △ Less

Submitted 9 July, 2024; v1 submitted 26 November, 2023; originally announced November 2023.

Comments: Accepted to ECCV 2024. Project available at https://haidongz-usc.github.io/project/caesarnerf

arXiv:2311.03770 [pdf, other]

Lightweight Portrait Matting via Regional Attention and Refinement

Authors: Yatao Zhong, Ilya Zharkov

Abstract: We present a lightweight model for high resolution portrait matting. The model does not use any auxiliary inputs such as trimaps or background captures and achieves real time performance for HD videos and near real time for 4K. Our model is built upon a two-stage framework with a low resolution network for coarse alpha estimation followed by a refinement network for local region improvement. Howev… ▽ More We present a lightweight model for high resolution portrait matting. The model does not use any auxiliary inputs such as trimaps or background captures and achieves real time performance for HD videos and near real time for 4K. Our model is built upon a two-stage framework with a low resolution network for coarse alpha estimation followed by a refinement network for local region improvement. However, a naive implementation of the two-stage model suffers from poor matting quality if not utilizing any auxiliary inputs. We address the performance gap by leveraging the vision transformer (ViT) as the backbone of the low resolution network, motivated by the observation that the tokenization step of ViT can reduce spatial resolution while retain as much pixel information as possible. To inform local regions of the context, we propose a novel cross region attention (CRA) module in the refinement network to propagate the contextual information across the neighboring regions. We demonstrate that our method achieves superior results and outperforms other baselines on three benchmark datasets while only uses $1/20$ of the FLOPS compared to the existing state-of-the-art model. △ Less

Submitted 7 November, 2023; originally announced November 2023.

arXiv:2310.18356 [pdf, other]

LoRAShear: Efficient Large Language Model Structured Pruning and Knowledge Recovery

Authors: Tianyi Chen, Tianyu Ding, Badal Yadav, Ilya Zharkov, Luming Liang

Abstract: Large Language Models (LLMs) have transformed the landscape of artificial intelligence, while their enormous size presents significant challenges in terms of computational costs. We introduce LoRAShear, a novel efficient approach to structurally prune LLMs and recover knowledge. Given general LLMs, LoRAShear at first creates the dependency graphs over LoRA modules to discover minimally removal str… ▽ More Large Language Models (LLMs) have transformed the landscape of artificial intelligence, while their enormous size presents significant challenges in terms of computational costs. We introduce LoRAShear, a novel efficient approach to structurally prune LLMs and recover knowledge. Given general LLMs, LoRAShear at first creates the dependency graphs over LoRA modules to discover minimally removal structures and analyze the knowledge distribution. It then proceeds progressive structured pruning on LoRA adaptors and enables inherent knowledge transfer to better preserve the information in the redundant structures. To recover the lost knowledge during pruning, LoRAShear meticulously studies and proposes a dynamic fine-tuning schemes with dynamic data adaptors to effectively narrow down the performance gap to the full models. Numerical results demonstrate that by only using one GPU within a couple of GPU days, LoRAShear effectively reduced footprint of LLMs by 20% with only 1.0% performance degradation and significantly outperforms state-of-the-arts. The source code will be available at https://github.com/microsoft/lorashear. △ Less

Submitted 31 October, 2023; v1 submitted 23 October, 2023; originally announced October 2023.

arXiv:2308.16154 [pdf, other]

MMVP: Motion-Matrix-based Video Prediction

Authors: Yiqi Zhong, Luming Liang, Ilya Zharkov, Ulrich Neumann

Abstract: A central challenge of video prediction lies where the system has to reason the objects' future motions from image frames while simultaneously maintaining the consistency of their appearances across frames. This work introduces an end-to-end trainable two-stream video prediction framework, Motion-Matrix-based Video Prediction (MMVP), to tackle this challenge. Unlike previous methods that usually h… ▽ More A central challenge of video prediction lies where the system has to reason the objects' future motions from image frames while simultaneously maintaining the consistency of their appearances across frames. This work introduces an end-to-end trainable two-stream video prediction framework, Motion-Matrix-based Video Prediction (MMVP), to tackle this challenge. Unlike previous methods that usually handle motion prediction and appearance maintenance within the same set of modules, MMVP decouples motion and appearance information by constructing appearance-agnostic motion matrices. The motion matrices represent the temporal similarity of each and every pair of feature patches in the input frames, and are the sole input of the motion prediction module in MMVP. This design improves video prediction in both accuracy and efficiency, and reduces the model size. Results of extensive experiments demonstrate that MMVP outperforms state-of-the-art systems on public data sets by non-negligible large margins (about 1 db in PSNR, UCF Sports) in significantly smaller model sizes (84% the size or smaller). △ Less

Submitted 30 August, 2023; v1 submitted 30 August, 2023; originally announced August 2023.

Comments: ICCV 2023 (Oral)

arXiv:2305.18030 [pdf, other]

Automated Search-Space Generation Neural Architecture Search

Authors: Tianyi Chen, Luming Liang, Tianyu Ding, Ilya Zharkov

Abstract: To search an optimal sub-network within a general deep neural network (DNN), existing neural architecture search (NAS) methods typically rely on handcrafting a search space beforehand. Such requirements make it challenging to extend them onto general scenarios without significant human expertise and manual intervention. To overcome the limitations, we propose Automated Search-Space Generation Neur… ▽ More To search an optimal sub-network within a general deep neural network (DNN), existing neural architecture search (NAS) methods typically rely on handcrafting a search space beforehand. Such requirements make it challenging to extend them onto general scenarios without significant human expertise and manual intervention. To overcome the limitations, we propose Automated Search-Space Generation Neural Architecture Search (ASGNAS), perhaps the first automated system to train general DNNs that cover all candidate connections and operations and produce high-performing sub-networks in the one shot manner. Technologically, ASGNAS delivers three noticeable contributions to minimize human efforts: (i) automated search space generation for general DNNs; (ii) a Hierarchical Half-Space Projected Gradient (H2SPG) that leverages the hierarchy and dependency within generated search space to ensure the network validity during optimization, and reliably produces a solution with both high performance and hierarchical group sparsity; and (iii) automated sub-network construction upon the H2SPG solution. Numerically, we demonstrate the effectiveness of ASGNAS on a variety of general DNNs, including RegNet, StackedUnets, SuperResNet, and DARTS, over benchmark datasets such as CIFAR10, Fashion-MNIST, ImageNet, STL-10 , and SVNH. The sub-networks computed by ASGNAS achieve competitive even superior performance compared to the starting full DNNs and other state-of-the-arts. The library will be released at https://github.com/tianyic/only_train_once. △ Less

Submitted 5 October, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

Comments: Graph visualization for DARTS, SuperResNet are omitted for arXiv version due to exceeding page dimension limit. Please refer to the open-review version for taking the visualizations

arXiv:2303.06862 [pdf, other]

OTOV2: Automatic, Generic, User-Friendly

Authors: Tianyi Chen, Luming Liang, Tianyu Ding, Zhihui Zhu, Ilya Zharkov

Abstract: The existing model compression methods via structured pruning typically require complicated multi-stage procedures. Each individual stage necessitates numerous engineering efforts and domain-knowledge from the end-users which prevent their wider applications onto broader scenarios. We propose the second generation of Only-Train-Once (OTOv2), which first automatically trains and compresses a genera… ▽ More The existing model compression methods via structured pruning typically require complicated multi-stage procedures. Each individual stage necessitates numerous engineering efforts and domain-knowledge from the end-users which prevent their wider applications onto broader scenarios. We propose the second generation of Only-Train-Once (OTOv2), which first automatically trains and compresses a general DNN only once from scratch to produce a more compact model with competitive performance without fine-tuning. OTOv2 is automatic and pluggable into various deep learning applications, and requires almost minimal engineering efforts from the users. Methodologically, OTOv2 proposes two major improvements: (i) Autonomy: automatically exploits the dependency of general DNNs, partitions the trainable variables into Zero-Invariant Groups (ZIGs), and constructs the compressed model; and (ii) Dual Half-Space Projected Gradient (DHSPG): a novel optimizer to more reliably solve structured-sparsity problems. Numerically, we demonstrate the generality and autonomy of OTOv2 on a variety of model architectures such as VGG, ResNet, CARN, ConvNeXt, DenseNet and StackedUnets, the majority of which cannot be handled by other methods without extensive handcrafting efforts. Together with benchmark datasets including CIFAR10/100, DIV2K, Fashion-MNIST, SVNH and ImageNet, its effectiveness is validated by performing competitively or even better than the state-of-the-arts. The source code is available at https://github.com/tianyic/only_train_once. △ Less

Submitted 23 June, 2023; v1 submitted 13 March, 2023; originally announced March 2023.

Comments: Published on ICLR 2023. Remark here that a few images of dependency graphs can not be included in arXiv due to exceeding size limit

arXiv:2210.02391 [pdf, other]

Geometry Driven Progressive War** for One-Shot Face Animation

Authors: Yatao Zhong, Faezeh Amjadi, Ilya Zharkov

Abstract: Face animation aims at creating photo-realistic portrait videos with animated poses and expressions. A common practice is to generate displacement fields that are used to warp pixels and features from source to target. However, prior attempts often produce sub-optimal displacements. In this work, we present a geometry driven model and propose two geometric patterns as guidance: 3D face rendered di… ▽ More Face animation aims at creating photo-realistic portrait videos with animated poses and expressions. A common practice is to generate displacement fields that are used to warp pixels and features from source to target. However, prior attempts often produce sub-optimal displacements. In this work, we present a geometry driven model and propose two geometric patterns as guidance: 3D face rendered displacement maps and posed neural codes. The model can optionally use one of the patterns as guidance for displacement estimation. To model displacements at locations not covered by the face model (e.g., hair), we resort to source image features for contextual information and propose a progressive war** module that alternates between feature war** and displacement estimation at increasing resolutions. We show that the proposed model can synthesize portrait videos with high fidelity and achieve the new state-of-the-art results on the VoxCeleb1 and VoxCeleb2 datasets for both cross identity and same identity reconstruction. △ Less

Submitted 5 October, 2022; originally announced October 2022.

arXiv:2209.04551 [pdf, other]

Sparsity-guided Network Design for Frame Interpolation

Authors: Tianyu Ding, Luming Liang, Zhihui Zhu, Tianyi Chen, Ilya Zharkov

Abstract: DNN-based frame interpolation, which generates intermediate frames from two consecutive frames, is often dependent on model architectures with a large number of features, preventing their deployment on systems with limited resources, such as mobile devices. We present a compression-driven network design for frame interpolation that leverages model pruning through sparsity-inducing optimization to… ▽ More DNN-based frame interpolation, which generates intermediate frames from two consecutive frames, is often dependent on model architectures with a large number of features, preventing their deployment on systems with limited resources, such as mobile devices. We present a compression-driven network design for frame interpolation that leverages model pruning through sparsity-inducing optimization to greatly reduce the model size while attaining higher performance. Concretely, we begin by compressing the recently proposed AdaCoF model and demonstrating that a 10 times compressed AdaCoF performs similarly to its original counterpart, where different strategies for using layerwise sparsity information as a guide are comprehensively investigated under a variety of hyperparameter settings. We then enhance this compressed model by introducing a multi-resolution war** module, which improves visual consistency with multi-level details. As a result, we achieve a considerable performance gain with a quarter of the size of the original AdaCoF. In addition, our model performs favorably against other state-of-the-art approaches on a wide variety of datasets. We note that the suggested compression-driven framework is generic and can be easily transferred to other DNN-based frame interpolation algorithms. The source code is available at https://github.com/tding1/CDFI. △ Less

Submitted 9 September, 2022; originally announced September 2022.

Comments: Submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence. The corresponding CVPR paper can be found at arXiv:2103.10559

arXiv:2203.14186 [pdf, other]

RSTT: Real-time Spatial Temporal Transformer for Space-Time Video Super-Resolution

Authors: Zhicheng Geng, Luming Liang, Tianyu Ding, Ilya Zharkov

Abstract: Space-time video super-resolution (STVSR) is the task of interpolating videos with both Low Frame Rate (LFR) and Low Resolution (LR) to produce High-Frame-Rate (HFR) and also High-Resolution (HR) counterparts. The existing methods based on Convolutional Neural Network~(CNN) succeed in achieving visually satisfied results while suffer from slow inference speed due to their heavy architectures. We p… ▽ More Space-time video super-resolution (STVSR) is the task of interpolating videos with both Low Frame Rate (LFR) and Low Resolution (LR) to produce High-Frame-Rate (HFR) and also High-Resolution (HR) counterparts. The existing methods based on Convolutional Neural Network~(CNN) succeed in achieving visually satisfied results while suffer from slow inference speed due to their heavy architectures. We propose to resolve this issue by using a spatial-temporal transformer that naturally incorporates the spatial and temporal super resolution modules into a single model. Unlike CNN-based methods, we do not explicitly use separated building blocks for temporal interpolations and spatial super-resolutions; instead, we only use a single end-to-end transformer architecture. Specifically, a reusable dictionary is built by encoders based on the input LFR and LR frames, which is then utilized in the decoder part to synthesize the HFR and HR frames. Compared with the state-of-the-art TMNet \cite{xu2021temporal}, our network is $60\%$ smaller (4.5M vs 12.3M parameters) and $80\%$ faster (26.2fps vs 14.3fps on $720\times576$ frames) without sacrificing much performance. The source code is available at https://github.com/llmpass/RSTT. △ Less

Submitted 26 March, 2022; originally announced March 2022.

arXiv:2103.10559 [pdf, other]

CDFI: Compression-Driven Network Design for Frame Interpolation

Authors: Tianyu Ding, Luming Liang, Zhihui Zhu, Ilya Zharkov

Abstract: DNN-based frame interpolation--that generates the intermediate frames given two consecutive frames--typically relies on heavy model architectures with a huge number of features, preventing them from being deployed on systems with limited resources, e.g., mobile devices. We propose a compression-driven network design for frame interpolation (CDFI), that leverages model pruning through sparsity-indu… ▽ More DNN-based frame interpolation--that generates the intermediate frames given two consecutive frames--typically relies on heavy model architectures with a huge number of features, preventing them from being deployed on systems with limited resources, e.g., mobile devices. We propose a compression-driven network design for frame interpolation (CDFI), that leverages model pruning through sparsity-inducing optimization to significantly reduce the model size while achieving superior performance. Concretely, we first compress the recently proposed AdaCoF model and show that a 10X compressed AdaCoF performs similarly as its original counterpart; then we further improve this compressed model by introducing a multi-resolution war** module, which boosts visual consistencies with multi-level details. As a consequence, we achieve a significant performance gain with only a quarter in size compared with the original AdaCoF. Moreover, our model performs favorably against other state-of-the-arts in a broad range of datasets. Finally, the proposed compression-driven framework is generic and can be easily transferred to other DNN-based frame interpolation algorithm. Our source code is available at https://github.com/tding1/CDFI. △ Less

Submitted 27 March, 2021; v1 submitted 18 March, 2021; originally announced March 2021.

Comments: To appear in the proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

arXiv:2005.07492 [pdf, ps, other]

Tailoring a pair of pants: the phase tropical version

Authors: Ilia Zharkov

Abstract: We show that the phase tropical pair-of-pants is (ambient) isotopic to the complex pair-of-pants. This paper can serve as an addendum to the author's joint paper with Ruddat arXiv:2001.08267 where an isotopy between complex and ober-tropical pairs-of-pants was shown. Thus all three versions are isotopic. We show that the phase tropical pair-of-pants is (ambient) isotopic to the complex pair-of-pants. This paper can serve as an addendum to the author's joint paper with Ruddat arXiv:2001.08267 where an isotopy between complex and ober-tropical pairs-of-pants was shown. Thus all three versions are isotopic. △ Less

Submitted 13 May, 2020; originally announced May 2020.

Comments: 10 pages, 8 figures. arXiv admin note: text overlap with arXiv:2001.08267

arXiv:2004.08513 [pdf, other]

ImagePairs: Realistic Super Resolution Dataset via Beam Splitter Camera Rig

Authors: Hamid Reza Vaezi Joze, Ilya Zharkov, Karlton Powell, Carl Ringler, Luming Liang, Andy Roulston, Moshe Lutz, Vivek Pradeep

Abstract: Super Resolution is the problem of recovering a high-resolution image from a single or multiple low-resolution images of the same scene. It is an ill-posed problem since high frequency visual details of the scene are completely lost in low-resolution images. To overcome this, many machine learning approaches have been proposed aiming at training a model to recover the lost details in the new scene… ▽ More Super Resolution is the problem of recovering a high-resolution image from a single or multiple low-resolution images of the same scene. It is an ill-posed problem since high frequency visual details of the scene are completely lost in low-resolution images. To overcome this, many machine learning approaches have been proposed aiming at training a model to recover the lost details in the new scenes. Such approaches include the recent successful effort in utilizing deep learning techniques to solve super resolution problem. As proven, data itself plays a significant role in the machine learning process especially deep learning approaches which are data hungry. Therefore, to solve the problem, the process of gathering data and its formation could be equally as vital as the machine learning technique used. Herein, we are proposing a new data acquisition technique for gathering real image data set which could be used as an input for super resolution, noise cancellation and quality enhancement techniques. We use a beam-splitter to capture the same scene by a low resolution camera and a high resolution camera. Since we also release the raw images, this large-scale dataset could be used for other tasks such as ISP generation. Unlike current small-scale dataset used for these tasks, our proposed dataset includes 11,421 pairs of low-resolution high-resolution images of diverse scenes. To our knowledge this is the most complete dataset for super resolution, ISP and image quality enhancement. The benchmarking result shows how the new dataset can be successfully used to significantly improve the quality of real-world image super resolution. △ Less

Submitted 17 April, 2020; originally announced April 2020.

Journal ref: The IEEE Conference on Computer Vision and Pattern Recognition Workshop (CVPRW), 2020

arXiv:2003.08521 [pdf, other]

Compactifying torus fibrations over integral affine manifolds with singularities

Authors: Helge Ruddat, Ilia Zharkov

Abstract: This is an announcement of the following construction: given an integral affine manifold $B$ with singularities, we build a topological space $X$ which is a torus fibration over $B$. The main new feature of the fibration $X\to B$ is that it has the discriminant in codimension 2. This is an announcement of the following construction: given an integral affine manifold $B$ with singularities, we build a topological space $X$ which is a torus fibration over $B$. The main new feature of the fibration $X\to B$ is that it has the discriminant in codimension 2. △ Less

Submitted 18 March, 2020; originally announced March 2020.

Comments: 13 pages, 4 figures, questions and suggestions are welcome!

MSC Class: 14J32; 53A15; 14T05

arXiv:2002.02347 [pdf, ps, other]

Tropical Abelian varieties, Weil classes and the Hodge Conjecture

Authors: Ilia Zharkov

Abstract: We describe in some details an idea of M. Kontsevich how one can try to find a counterexample to the Hodge conjecture using tropical geometry. We describe in some details an idea of M. Kontsevich how one can try to find a counterexample to the Hodge conjecture using tropical geometry. △ Less

Submitted 6 February, 2020; originally announced February 2020.

Comments: 4 pages

arXiv:2001.08267 [pdf, other]

Tailoring a pair of pants

Authors: Helge Ruddat, Ilia Zharkov

Abstract: We show how to deform the map $\operatorname{Log}\colon (\mathbb{C}^*)^n \to \mathbb{R}^n$ such that the image of the complex pair of pants $P^\circ \subset {(\mathbb{C}^*)^n}$ is the tropical hyperplane by showing an (ambient) isotopy between $P^\circ \subset {(\mathbb{C}^*)^n}$ and a natural polyhedral subcomplex of the product of the two skeleta… ▽ More We show how to deform the map $\operatorname{Log}\colon (\mathbb{C}^*)^n \to \mathbb{R}^n$ such that the image of the complex pair of pants $P^\circ \subset {(\mathbb{C}^*)^n}$ is the tropical hyperplane by showing an (ambient) isotopy between $P^\circ \subset {(\mathbb{C}^*)^n}$ and a natural polyhedral subcomplex of the product of the two skeleta $S\times Σ\subset \mathcal{A} \times \mathcal{C}$ of the amoeba $\mathcal{A}$ and the coamoeba $\mathcal{C}$ of $P^\circ$. This lays the groundwork for having the discriminant to be of codimension 2 in topological Strominger-Yau-Zaslow torus fibrations. △ Less

Submitted 17 January, 2021; v1 submitted 22 January, 2020; originally announced January 2020.

Comments: final version, to appear in Adv. Math

MSC Class: 51D20; 57Q37; 57Q45; 14T05; 14J80; 57R52

arXiv:1803.11264 [pdf, other]

DIY Human Action Data Set Generation

Authors: Mehran Khodabandeh, Hamid Reza Vaezi Joze, Ilya Zharkov, Vivek Pradeep

Abstract: The recent successes in applying deep learning techniques to solve standard computer vision problems has aspired researchers to propose new computer vision problems in different domains. As previously established in the field, training data itself plays a significant role in the machine learning process, especially deep learning approaches which are data hungry. In order to solve each new problem… ▽ More The recent successes in applying deep learning techniques to solve standard computer vision problems has aspired researchers to propose new computer vision problems in different domains. As previously established in the field, training data itself plays a significant role in the machine learning process, especially deep learning approaches which are data hungry. In order to solve each new problem and get a decent performance, a large amount of data needs to be captured which may in many cases pose logistical difficulties. Therefore, the ability to generate de novo data or expand an existing data set, however small, in order to satisfy data requirement of current networks may be invaluable. Herein, we introduce a novel way to partition an action video clip into action, subject and context. Each part is manipulated separately and reassembled with our proposed video generation technique. Furthermore, our novel human skeleton trajectory generation along with our proposed video generation technique, enables us to generate unlimited action recognition training data. These techniques enables us to generate video action clips from an small set without costly and time-consuming data acquisition. Lastly, we prove through extensive set of experiments on two small human action recognition data sets, that this new data generation technique can improve the performance of current action recognition neural nets. △ Less

Submitted 29 March, 2018; originally announced March 2018.

Journal ref: The IEEE Conference on Computer Vision and Pattern Recognition Workshop (CVPRW), 2018

arXiv:1705.04482 [pdf, other]

Reinforcing the double dynamo model with solar-terrestrial activity in the past three millennia

Authors: V. V. Zharkova, S. J. Shepherd, E. Popova, S. I Zharkov

Abstract: Using a summary curve of two eigen vectors of solar magnetic field oscillations derived with Principal Components Analysis (PCA) from synoptic maps for solar cycles 21-24 as a proxy of solar activity, we extrapolate this curve backwards three millennia revealing 9 grand cycles lasting 350-400 years each. The summary curve shows a remarkable resemblance to the past sunspot and terrestrial activity:… ▽ More Using a summary curve of two eigen vectors of solar magnetic field oscillations derived with Principal Components Analysis (PCA) from synoptic maps for solar cycles 21-24 as a proxy of solar activity, we extrapolate this curve backwards three millennia revealing 9 grand cycles lasting 350-400 years each. The summary curve shows a remarkable resemblance to the past sunspot and terrestrial activity: grand minima - Maunder Minimum (1645-1715 AD), Wolf minimum (1280-1350 AD), Oort minimum (1010-1050 AD) and Homer minimum (800-900 BC); grand maxima - modern warm period (1990-2015), medieval warm period (900-1200 AD), Roman warm period (400-10 BC) and others. We verify the extrapolated activity curve by the pre-telescope observations of large sunspots with naked eye, by comparing the observed and simulated butterfly diagrams for Maunder Minimum (MM), by a maximum of the terrestrial temperature and extremely intense terrestrial auroras seen in the past grand cycle occurred in 14-16 centuries. We confirm the occurrence of upcoming Modern grand minimum in 2020-2053, which will have a shorter duration (3 cycles) and, thus, higher solar activity compared to MM. We argue that Sporer minimum (1450-1550) derived from the increased abundances of isotopes 14C and 10Be is likely produced by a strong increase of the terrestrial background radiation caused by the galactic cosmic rays of powerful supernovae. △ Less

Submitted 26 May, 2017; v1 submitted 12 May, 2017; originally announced May 2017.

Comments: 35 pages, 5 figures

arXiv:1610.05290 [pdf, other]

doi 10.2140/gt.2018.22.3287

Phase tropical hypersurfaces

Authors: Gabriel Kerr, Ilia Zharkov

Abstract: We prove that a generic smooth complex hypersurface in the complex torus is homeomorphic to the corresponding phase tropical hypersurface. We prove that a generic smooth complex hypersurface in the complex torus is homeomorphic to the corresponding phase tropical hypersurface. △ Less

Submitted 16 October, 2017; v1 submitted 17 October, 2016; originally announced October 2016.

Comments: 30 pages, 12 figures

MSC Class: 14T05; 14M25

Journal ref: Geom. Topol. 22 (2018) 3287-3320

arXiv:1604.01838 [pdf, other]

Tropical Homology

Authors: Ilia Itenberg, Ludmil Katzarkov, Grigory Mikhalkin, Ilia Zharkov

Abstract: Given a tropical variety X and two non-negative integers p and q we define homology group $H_{p,q}(X)$. We show that if X is a smooth tropical variety that can be represented as the tropical limit of a 1-parameter family of complex projective varieties, then $\dim H_{p,q}(X)$ coincides with the Hodge number $h^{p,q}$ of a general member of the family. Given a tropical variety X and two non-negative integers p and q we define homology group $H_{p,q}(X)$. We show that if X is a smooth tropical variety that can be represented as the tropical limit of a 1-parameter family of complex projective varieties, then $\dim H_{p,q}(X)$ coincides with the Hodge number $h^{p,q}$ of a general member of the family. △ Less

Submitted 7 December, 2020; v1 submitted 6 April, 2016; originally announced April 2016.

Comments: 42 PAGES, 1 figure, introduction expanded and references added, publication status added

Journal ref: Mathematische Annalen, 374 (1-2), pp. 963-1006 (2019)

arXiv:1305.4595 [pdf, other]

$C$ is not equivalent to $C^-$ in its Jacobian: a tropical point of view

Authors: Ilia Zharkov

Abstract: We show that the Abel-Jacobi image of a tropical curve $C$ in its Jacobian $J(C)$ is not algebraically equivalent to its reflection by using a simple calculation in tropical homology. We show that the Abel-Jacobi image of a tropical curve $C$ in its Jacobian $J(C)$ is not algebraically equivalent to its reflection by using a simple calculation in tropical homology. △ Less

Submitted 8 October, 2013; v1 submitted 20 May, 2013; originally announced May 2013.

Comments: 12 pages, 8 figures, journal version

arXiv:1302.0252 [pdf, other]

Tropical eigenwave and intermediate Jacobians

Authors: Grigory Mikhalkin, Ilia Zharkov

Abstract: Tropical manifolds are polyhedral complexes enhanced with certain kind of affine structure. This structure manifests itself through a particular cohomology class which we call the eigenwave of a tropical manifold. Other wave classes of similar type are responsible for deformations of the tropical structure. If a tropical manifold is approximable by a 1-parametric family of complex manifolds then… ▽ More Tropical manifolds are polyhedral complexes enhanced with certain kind of affine structure. This structure manifests itself through a particular cohomology class which we call the eigenwave of a tropical manifold. Other wave classes of similar type are responsible for deformations of the tropical structure. If a tropical manifold is approximable by a 1-parametric family of complex manifolds then the eigenwave records the monodromy of the family around the tropical limit. With the help of tropical homology and the eigenwave we define tropical intermediate Jacobians which can be viewed as tropical analogs of classical intermediate Jacobians. △ Less

Submitted 7 October, 2013; v1 submitted 1 February, 2013; originally announced February 2013.

Comments: 38 pages, 8 figures

arXiv:1209.1651 [pdf, other]

The Orlik-Solomon Algebra and the Bergman Fan of a Matroid

Authors: Ilia Zharkov

Abstract: Given a matroid $M$ one can define its Orlik-Solomon algebra $OS(M)$ and the Bergman fan $Σ_0(M)$. On the other hand to any rational polyhedral fan $Σ$ one can associate its tropical homology and cohomology groups $\F_\bullet(Σ)$, $\F^\bullet (Σ)$. We will show that the projective Orlik-Solomon algebra $OS_0(M)$ is canonically isomorphic to $\F^\bullet (Σ_0(M))$. Given a matroid $M$ one can define its Orlik-Solomon algebra $OS(M)$ and the Bergman fan $Σ_0(M)$. On the other hand to any rational polyhedral fan $Σ$ one can associate its tropical homology and cohomology groups $\F_\bullet(Σ)$, $\F^\bullet (Σ)$. We will show that the projective Orlik-Solomon algebra $OS_0(M)$ is canonically isomorphic to $\F^\bullet (Σ_0(M))$. △ Less

Submitted 8 October, 2013; v1 submitted 7 September, 2012; originally announced September 2012.

Comments: 7 pages, two figures, journal version

arXiv:0802.2787 [pdf, ps, other]

doi 10.1086/518731

On the origin of 3 seismic sources in the proton-rich flare of October 28, 2003

Authors: V. V. Zharkova, S. I. Zharkov

Abstract: The 3 seismic sources S1, S2 and S3 detected from MDI dopplergrams using the time-distance diagram technique are presented with the locations, areas and vertical and horizontal velocities of the visible wave displacements. Within the datacube of 120 Mm the horizontal velocities and the wave propagation times slightly vary from source to source. The momenta and start times measured from the TD di… ▽ More The 3 seismic sources S1, S2 and S3 detected from MDI dopplergrams using the time-distance diagram technique are presented with the locations, areas and vertical and horizontal velocities of the visible wave displacements. Within the datacube of 120 Mm the horizontal velocities and the wave propagation times slightly vary from source to source. The momenta and start times measured from the TD diagrams in the sources S1-S3 are compared with those delivered to the photosphere by different kinds of high energy particles with the parameters deduced from hard X-ray and $γ$-ray emission as well as by the hydrodynamic shocks caused by these particles. The energetic protons (power laws combined with quasi-thermal ones, or jets) are shown to deliver momentum high enough and to form the hydrodynamic shocks deeply in a flaring atmosphere that allows them to be delivered to the photosphere through much shorter distances and times. Then the seismic waves observed in the sources S2 and S3 can be explained by the momenta produced by hydrodynamic shocks which are caused by mixed proton beams and jets occurring nearly simultaneously with the third burst of hard X-ray (HXR) and $γ$-ray emission in the loops with footpoints in the locations of these sources. The seismic wave in the source S1, delayed by 4 and 2 minutes from the first and second HXR bursts, respectively, is likely to be associated with a hydrodynamic shock occurring in this loop from precipitation of a very powerful and hard electron beam with higher energy cutoff mixed with quasi-thermal protons generated by either of these 2 bursts. △ Less

Submitted 20 February, 2008; originally announced February 2008.

Comments: 36 pages, 10 figures

Journal ref: Astrophys.J.664:573-585,2007

arXiv:0712.3205 [pdf, ps, other]

Tropical theta characteristics

Authors: Ilia Zharkov

Abstract: This note is a follow up of math.AG/0612267v2 and it is largely inspired by a beautiful description of Baker-Norine of non-effective degree (g-1) divisors via chip-firing game. We consider the set of all theta characteristics on a tropical curve and identify the Riemann constant as a unique non-effective one among them. This note is a follow up of math.AG/0612267v2 and it is largely inspired by a beautiful description of Baker-Norine of non-effective degree (g-1) divisors via chip-firing game. We consider the set of all theta characteristics on a tropical curve and identify the Riemann constant as a unique non-effective one among them. △ Less

Submitted 18 February, 2009; v1 submitted 19 December, 2007; originally announced December 2007.

Comments: 4 pages, still an addendum to math.AG/0612267v2, exposition improved, description of positive theta added

arXiv:math/0612267 [pdf, ps, other]

Tropical curves, their Jacobians and Theta functions

Authors: Grigory Mikhalkin, Ilia Zharkov

Abstract: We study Jacobian varieties for tropical curves. These are real tori equipped with integral affine structure and symmetric bilinear form. We define tropical counterpart of the theta function and establish tropical versions of the Abel-Jacobi, Riemann-Roch and Riemann theta divisor theorems. We study Jacobian varieties for tropical curves. These are real tori equipped with integral affine structure and symmetric bilinear form. We define tropical counterpart of the theta function and establish tropical versions of the Abel-Jacobi, Riemann-Roch and Riemann theta divisor theorems. △ Less

Submitted 30 November, 2007; v1 submitted 11 December, 2006; originally announced December 2006.

Comments: Journal version. Errors corrected, proofs improved, references added. 32 pages, 11 figures

arXiv:math/0504181 [pdf, ps, other]

Integral affine structures on spheres III: complete intersections

Authors: Christian Haase, Ilia Zharkov

Abstract: We extend our model for affine structures on toric Calabi-Yau hypersurfaces math.AG/0205321 to the case of complete intersections. We extend our model for affine structures on toric Calabi-Yau hypersurfaces math.AG/0205321 to the case of complete intersections. △ Less

Submitted 9 April, 2005; originally announced April 2005.

Comments: 11 pages, 4 figures. Uses color. See http://www.math.duke.edu/~haase/ and http://abel.math.harvard.edu/~zharkov/

Report number: DUKE-CGTP-05-03 MSC Class: 14J32 (Primary) 14M25 (Secondary)

arXiv:math/0304116 [pdf, ps, other]

Limiting behavior of local Calabi-Yau metrics

Authors: Ilia Zharkov

Abstract: We use a generalization of the Gibbons-Hawking ansatz to study the behavior of certain non-compact Calabi-Yau manifolds in the large complex structure limit. This analysis provides an intermediate step toward proving the metric collapse conjecture for toric hypersurfaces and complete intersections. We use a generalization of the Gibbons-Hawking ansatz to study the behavior of certain non-compact Calabi-Yau manifolds in the large complex structure limit. This analysis provides an intermediate step toward proving the metric collapse conjecture for toric hypersurfaces and complete intersections. △ Less

Submitted 28 June, 2004; v1 submitted 8 April, 2003; originally announced April 2003.

Comments: 20 pages, journal version: exposition improved, several errors and typos corrected; one more error corrected

Report number: DUKE-CGTP-03-02

arXiv:math/0301222 [pdf, ps, other]

Integral affine structures on spheres and torus fibrations of Calabi-Yau toric hypersurfaces II

Authors: Christian Haase, Ilia Zharkov

Abstract: This paper is a continuation of our paper math.AG/0205321 where we have built a combinatorial model for the torus fibrations of Calabi-Yau toric hypersurfaces. This part addresses the connection between the model torus fibration and the complex and Kähler geometry of the hypersurfaces. This paper is a continuation of our paper math.AG/0205321 where we have built a combinatorial model for the torus fibrations of Calabi-Yau toric hypersurfaces. This part addresses the connection between the model torus fibration and the complex and Kähler geometry of the hypersurfaces. △ Less

Submitted 21 January, 2003; originally announced January 2003.

Comments: 20 pages, 3 figures. Comments will be greatly appreciated

Report number: DUKE-CGTP-03-01 MSC Class: 14J32 (Primary) 14M25 (Secondary)

arXiv:math/0205321 [pdf, ps, other]

Integral affine structures on spheres and torus fibrations of Calabi-Yau toric hypersurfaces I

Authors: Christian Haase, Ilia Zharkov

Abstract: We describe in purely combinatorial terms dual pairs of integral affine structures on spheres which come from the conjectural metric collapse of mirror families of Calabi-Yau toric hypersurfaces. The same structures arise on the base of a special Lagrangian torus fibration in the Strominger-Yau-Zaslow conjecture. We study the topological torus fibration in the large complex structure limit and s… ▽ More We describe in purely combinatorial terms dual pairs of integral affine structures on spheres which come from the conjectural metric collapse of mirror families of Calabi-Yau toric hypersurfaces. The same structures arise on the base of a special Lagrangian torus fibration in the Strominger-Yau-Zaslow conjecture. We study the topological torus fibration in the large complex structure limit and show that it coincides with our combinatorial model. △ Less

Submitted 30 May, 2002; originally announced May 2002.

Comments: 26 pages, 16 figures, see also http://www.duke.edu/~haase, or http://www.duke.edu/~zharkov

Report number: DUKE-CGTP-02-05 MSC Class: 14J32 (Primary) 14M25 (Secondary)

arXiv:math/0011112 [pdf, ps, other]

Theta-functions for indefinite polarizations

Authors: Ilia Zharkov

Abstract: We propose a generalization of the classical theta function to higher cohomology of the polarization line bundle on a family of complex tori with positive index. The constructed cocycles vary horizontally with respect to the (projective) flat connection on this family coming from a heat operator. They also possess modular properties similar to the classical ones. We propose a generalization of the classical theta function to higher cohomology of the polarization line bundle on a family of complex tori with positive index. The constructed cocycles vary horizontally with respect to the (projective) flat connection on this family coming from a heat operator. They also possess modular properties similar to the classical ones. △ Less

Submitted 10 July, 2003; v1 submitted 16 November, 2000; originally announced November 2000.

Comments: 19 pages, 1 figure; Version for publication

arXiv:math/9806091 [pdf, ps, other]

Torus Fibrations of Calabi-Yau Hypersurfaces in Toric Varieties and Mirror Symmetry

Authors: Ilia Zharkov

Abstract: We consider regular Calabi-Yau hypersurfaces in $N$-dimensional smooth toric varieties. On such a hypersurface in the neighborhood of the large complex structure limit point we construct a fibration over a sphere $S^{N-1}$ whose generic fibers are tori $T^{N-1}$. Also for certain one-parameter families of such hypersurfaces we show that the monodromy transformation is induced by a translation of… ▽ More We consider regular Calabi-Yau hypersurfaces in $N$-dimensional smooth toric varieties. On such a hypersurface in the neighborhood of the large complex structure limit point we construct a fibration over a sphere $S^{N-1}$ whose generic fibers are tori $T^{N-1}$. Also for certain one-parameter families of such hypersurfaces we show that the monodromy transformation is induced by a translation of the $T^{N-1}$ fibration by a section. Finally we construct a dual fibration and provide some evidence that it describes the mirror family. △ Less

Submitted 17 June, 1998; originally announced June 1998.

Comments: LaTeX2e, 21 page, 6 figures

Report number: UNW-97-12, HJ-97-35

Showing 1–37 of 37 results for author: Zharkov, I