Skip to main content

Showing 1–35 of 35 results for author: Kun, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.09750  [pdf, other

    cs.CV cs.AI

    ControlVAR: Exploring Controllable Visual Autoregressive Modeling

    Authors: Xiang Li, Kai Qiu, Hao Chen, Jason Kuen, Zhe Lin, Rita Singh, Bhiksha Raj

    Abstract: Conditional visual generation has witnessed remarkable progress with the advent of diffusion models (DMs), especially in tasks like control-to-image generation. However, challenges such as expensive computational cost, high inference latency, and difficulties of integration with large language models (LLMs) have necessitated exploring alternatives to DMs. This paper introduces ControlVAR, a novel… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 24 pages, 19 figures, 4 tables

  2. arXiv:2404.12386  [pdf, other

    cs.CV cs.LG

    SOHES: Self-supervised Open-world Hierarchical Entity Segmentation

    Authors: Shengcao Cao, Jiuxiang Gu, Jason Kuen, Hao Tan, Ruiyi Zhang, Handong Zhao, Ani Nenkova, Liang-Yan Gui, Tong Sun, Yu-Xiong Wang

    Abstract: Open-world entity segmentation, as an emerging computer vision task, aims at segmenting entities in images without being restricted by pre-defined classes, offering impressive generalization capabilities on unseen images and concepts. Despite its promise, existing entity segmentation methods like Segment Anything Model (SAM) rely heavily on costly expert annotators. This work presents Self-supervi… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: ICLR 2024

  3. arXiv:2311.03355  [pdf, other

    cs.CV cs.AI cs.LG

    SegGen: Supercharging Segmentation Models with Text2Mask and Mask2Img Synthesis

    Authors: Hanrong Ye, Jason Kuen, Qing Liu, Zhe Lin, Brian Price, Dan Xu

    Abstract: We propose SegGen, a highly-effective training data generation method for image segmentation, which pushes the performance limits of state-of-the-art segmentation models to a significant extent. SegGen designs and integrates two data generation strategies: MaskSyn and ImgSyn. (i) MaskSyn synthesizes new mask-image pairs via our proposed text-to-mask generation model and mask-to-image generation mo… ▽ More

    Submitted 4 July, 2024; v1 submitted 6 November, 2023; originally announced November 2023.

  4. arXiv:2305.17768  [pdf, other

    cs.CV

    AIMS: All-Inclusive Multi-Level Segmentation

    Authors: Lu Qi, Jason Kuen, Weidong Guo, Jiuxiang Gu, Zhe Lin, Bo Du, Yu Xu, Ming-Hsuan Yang

    Abstract: Despite the progress of image segmentation for accurate visual entity segmentation, completing the diverse requirements of image editing applications for different-level region-of-interest selections remains unsolved. In this paper, we propose a new task, All-Inclusive Multi-Level Segmentation (AIMS), which segments visual regions into three levels: part, entity, and relation (two entities with so… ▽ More

    Submitted 28 May, 2023; originally announced May 2023.

    Comments: Technical Report

  5. arXiv:2304.03372  [pdf, other

    cs.CV

    TopNet: Transformer-based Object Placement Network for Image Compositing

    Authors: Sijie Zhu, Zhe Lin, Scott Cohen, Jason Kuen, Zhifei Zhang, Chen Chen

    Abstract: We investigate the problem of automatically placing an object into a background image for image compositing. Given a background image and a segmented object, the goal is to train a model to predict plausible placements (location and scale) of the object for compositing. The quality of the composite image highly depends on the predicted location/scale. Existing works either generate candidate bound… ▽ More

    Submitted 6 April, 2023; originally announced April 2023.

    Comments: CVPR

  6. arXiv:2211.11742  [pdf, other

    cs.CV

    SceneComposer: Any-Level Semantic Image Synthesis

    Authors: Yu Zeng, Zhe Lin, Jianming Zhang, Qing Liu, John Collomosse, Jason Kuen, Vishal M. Patel

    Abstract: We propose a new framework for conditional image synthesis from semantic layouts of any precision levels, ranging from pure text to a 2D semantic canvas with precise shapes. More specifically, the input layout consists of one or more semantic regions with free-form text descriptions and adjustable precision levels, which can be set based on the desired controllability. The framework naturally redu… ▽ More

    Submitted 21 November, 2022; originally announced November 2022.

  7. arXiv:2211.05776  [pdf, other

    cs.CV

    High-Quality Entity Segmentation

    Authors: Lu Qi, Jason Kuen, Weidong Guo, Tiancheng Shen, Jiuxiang Gu, Jiaya Jia, Zhe Lin, Ming-Hsuan Yang

    Abstract: Dense image segmentation tasks e.g., semantic, panoptic) are useful for image editing, but existing methods can hardly generalize well in an in-the-wild setting where there are unrestricted image domains, classes, and image resolution and quality variations. Motivated by these observations, we construct a new entity segmentation dataset, with a strong focus on high-quality dense segmentation in th… ▽ More

    Submitted 2 April, 2023; v1 submitted 10 November, 2022; originally announced November 2022.

    Comments: The project webiste: http://luqi.info/entityv2.github.io/

  8. arXiv:2210.06776  [pdf, other

    cs.CV

    Improving the Reliability for Confidence Estimation

    Authors: Haoxuan Qu, Yanchao Li, Lin Geng Foo, Jason Kuen, Jiuxiang Gu, Jun Liu

    Abstract: Confidence estimation, a task that aims to evaluate the trustworthiness of the model's prediction output during deployment, has received lots of research attention recently, due to its importance for the safe deployment of deep models. Previous works have outlined two important qualities that a reliable confidence estimation model should possess, i.e., the ability to perform well under label imbal… ▽ More

    Submitted 13 October, 2022; originally announced October 2022.

    Comments: Accepted by ECCV 2022

  9. arXiv:2208.08493  [pdf, other

    cs.CV

    Text-to-Image Generation via Implicit Visual Guidance and Hypernetwork

    Authors: Xin Yuan, Zhe Lin, Jason Kuen, Jianming Zhang, John Collomosse

    Abstract: We develop an approach for text-to-image generation that embraces additional retrieval images, driven by a combination of implicit visual guidance loss and generative objectives. Unlike most existing text-to-image generation methods which merely take the text as input, our method dynamically feeds cross-modal search results into a unified training stage, hence improving the quality, controllabilit… ▽ More

    Submitted 17 August, 2022; originally announced August 2022.

  10. arXiv:2207.11441  [pdf, other

    cs.CV

    Meta Spatio-Temporal Debiasing for Video Scene Graph Generation

    Authors: Li Xu, Haoxuan Qu, Jason Kuen, Jiuxiang Gu, Jun Liu

    Abstract: Video scene graph generation (VidSGG) aims to parse the video content into scene graphs, which involves modeling the spatio-temporal contextual information in the video. However, due to the long-tailed training data in datasets, the generalization performance of existing VidSGG models can be affected by the spatio-temporal conditional bias problem. In this work, from the perspective of meta-learni… ▽ More

    Submitted 30 July, 2022; v1 submitted 23 July, 2022; originally announced July 2022.

    Comments: Accepted by ECCV 2022

  11. arXiv:2204.10939  [pdf, other

    cs.CL cs.CV

    Unified Pretraining Framework for Document Understanding

    Authors: Jiuxiang Gu, Jason Kuen, Vlad I. Morariu, Handong Zhao, Nikolaos Barmpalios, Rajiv Jain, Ani Nenkova, Tong Sun

    Abstract: Document intelligence automates the extraction of information from documents and supports many business applications. Recent self-supervised learning methods on large-scale unlabeled document datasets have opened up promising directions towards reducing annotation efforts by training models with self-supervised objectives. However, most of the existing document pretraining methods are still langua… ▽ More

    Submitted 28 April, 2022; v1 submitted 22 April, 2022; originally announced April 2022.

    Comments: 12 pages, 4 figures, NeurIPS 2021 (Updated Camera Ready)

  12. arXiv:2204.00125  [pdf, other

    cs.CV

    GALA: Toward Geometry-and-Lighting-Aware Object Search for Compositing

    Authors: Sijie Zhu, Zhe Lin, Scott Cohen, Jason Kuen, Zhifei Zhang, Chen Chen

    Abstract: Compositing-aware object search aims to find the most compatible objects for compositing given a background image and a query bounding box. Previous works focus on learning compatibility between the foreground object and background, but fail to learn other important factors from large-scale data, i.e. geometry and lighting. To move a step further, this paper proposes GALA (Geometry-and-Lighting-Aw… ▽ More

    Submitted 31 March, 2022; originally announced April 2022.

  13. arXiv:2112.04966  [pdf, other

    cs.CV

    CA-SSL: Class-Agnostic Semi-Supervised Learning for Detection and Segmentation

    Authors: Lu Qi, Jason Kuen, Zhe Lin, Jiuxiang Gu, Fengyun Rao, Dian Li, Weidong Guo, Zhen Wen, Ming-Hsuan Yang, Jiaya Jia

    Abstract: To improve instance-level detection/segmentation performance, existing self-supervised and semi-supervised methods extract either task-unrelated or task-specific training signals from unlabeled data. We show that these two approaches, at the two extreme ends of the task-specificity spectrum, are suboptimal for the task performance. Utilizing too little task-specific training signals causes underfi… ▽ More

    Submitted 19 July, 2022; v1 submitted 9 December, 2021; originally announced December 2021.

    Comments: Appeared in ECCV2022

  14. arXiv:2111.14482  [pdf, other

    cs.CV

    High Quality Segmentation for Ultra High-resolution Images

    Authors: Tiancheng Shen, Yuechen Zhang, Lu Qi, Jason Kuen, Xingyu Xie, Jianlong Wu, Zhe Lin, Jiaya Jia

    Abstract: To segment 4K or 6K ultra high-resolution images needs extra computation consideration in image segmentation. Common strategies, such as down-sampling, patch crop**, and cascade model, cannot address well the balance issue between accuracy and computation cost. Motivated by the fact that humans distinguish among objects continuously from coarse to precise levels, we propose the Continuous Refine… ▽ More

    Submitted 26 December, 2021; v1 submitted 29 November, 2021; originally announced November 2021.

  15. arXiv:2111.12698  [pdf, other

    cs.CV

    Open-Vocabulary Instance Segmentation via Robust Cross-Modal Pseudo-Labeling

    Authors: Dat Huynh, Jason Kuen, Zhe Lin, Jiuxiang Gu, Ehsan Elhamifar

    Abstract: Open-vocabulary instance segmentation aims at segmenting novel classes without mask annotations. It is an important step toward reducing laborious human supervision. Most existing works first pretrain a model on captioned images covering many novel classes and then finetune it on limited base classes with mask annotations. However, the high-level textual information learned from caption pretrainin… ▽ More

    Submitted 19 April, 2022; v1 submitted 24 November, 2021; originally announced November 2021.

  16. arXiv:2109.06875  [pdf, other

    cs.CV

    Multi-Scale Aligned Distillation for Low-Resolution Detection

    Authors: Lu Qi, Jason Kuen, Jiuxiang Gu, Zhe Lin, Yi Wang, Yukang Chen, Yanwei Li, Jiaya Jia

    Abstract: In instance-level detection tasks (e.g., object detection), reducing input resolution is an easy option to improve runtime efficiency. However, this option traditionally hurts the detection performance much. This paper focuses on boosting the performance of low-resolution models by distilling knowledge from a high- or multi-resolution model. We first identify the challenge of applying knowledge di… ▽ More

    Submitted 14 September, 2021; originally announced September 2021.

    Comments: In CVPR 2021

  17. arXiv:2107.14228  [pdf, other

    cs.CV cs.LG

    Open-World Entity Segmentation

    Authors: Lu Qi, Jason Kuen, Yi Wang, Jiuxiang Gu, Hengshuang Zhao, Zhe Lin, Philip Torr, Jiaya Jia

    Abstract: We introduce a new image segmentation task, called Entity Segmentation (ES), which aims to segment all visual entities (objects and stuffs) in an image without predicting their semantic labels. By removing the need of class label prediction, the models trained for such task can focus more on improving segmentation quality. It has many practical applications such as image manipulation and editing w… ▽ More

    Submitted 19 December, 2022; v1 submitted 29 July, 2021; originally announced July 2021.

    Comments: Project page: http://luqi.info/Entity_Web

  18. arXiv:2106.03331  [pdf, other

    cs.CV cs.CL

    SelfDoc: Self-Supervised Document Representation Learning

    Authors: Peizhao Li, Jiuxiang Gu, Jason Kuen, Vlad I. Morariu, Handong Zhao, Rajiv Jain, Varun Manjunatha, Hongfu Liu

    Abstract: We propose SelfDoc, a task-agnostic pre-training framework for document image understanding. Because documents are multimodal and are intended for sequential reading, our framework exploits the positional, textual, and visual information of every semantically meaningful component in a document, and it models the contextualization between each block of content. Unlike existing document pre-training… ▽ More

    Submitted 7 June, 2021; originally announced June 2021.

    Comments: To appear in CVPR'2021

  19. arXiv:2104.12836  [pdf, other

    cs.CV

    Multimodal Contrastive Training for Visual Representation Learning

    Authors: Xin Yuan, Zhe Lin, Jason Kuen, Jianming Zhang, Yilin Wang, Michael Maire, A**kya Kale, Baldo Faieta

    Abstract: We develop an approach to learning visual representations that embraces multimodal data, driven by a combination of intra- and inter-modal similarity preservation objectives. Unlike existing visual pre-training methods, which solve a proxy prediction task in a single domain, our method exploits intrinsic data properties within each modality and semantic information from cross-modal correlation sim… ▽ More

    Submitted 26 April, 2021; originally announced April 2021.

  20. arXiv:1909.06804  [pdf, other

    cs.CV cs.LG

    Scaling Object Detection by Transferring Classification Weights

    Authors: Jason Kuen, Federico Perazzi, Zhe Lin, Jianming Zhang, Yap-Peng Tan

    Abstract: Large scale object detection datasets are constantly increasing their size in terms of the number of classes and annotations count. Yet, the number of object-level categories annotated in detection datasets is an order of magnitude smaller than image-level classification labels. State-of-the art object detection models are trained in a supervised fashion and this limits the number of object classe… ▽ More

    Submitted 15 September, 2019; originally announced September 2019.

    Comments: ICCV 2019

  21. arXiv:1803.09937  [pdf, other

    cs.CV

    Dual Attention Matching Network for Context-Aware Feature Sequence based Person Re-Identification

    Authors: Jianlou Si, Honggang Zhang, Chun-Guang Li, Jason Kuen, Xiangfei Kong, Alex C. Kot, Gang Wang

    Abstract: Typical person re-identification (ReID) methods usually describe each pedestrian with a single feature vector and match them in a task-specific metric space. However, the methods based on a single feature vector are not sufficient enough to overcome visual ambiguity, which frequently occurs in real scenario. In this paper, we propose a novel end-to-end trainable framework, called Dual ATtention Ma… ▽ More

    Submitted 27 March, 2018; originally announced March 2018.

    Comments: 10 pages, 8 figures, 7 tables, accepted by CVPR 2018

  22. arXiv:1802.09665  [pdf, other

    cs.DS

    Polynomial Treedepth Bounds in Linear Colorings

    Authors: Jeremy Kun, Michael P. O'Brien, Marcin Pilipczuk, Blair D. Sullivan

    Abstract: Low-treedepth colorings are an important tool for algorithms that exploit structure in classes of bounded expansion; they guarantee subgraphs that use few colors have bounded treedepth. These colorings have an implicit tradeoff between the total number of colors used and the treedepth bound, and prior empirical work suggests that the former dominates the run time of existing algorithms in practice… ▽ More

    Submitted 24 July, 2018; v1 submitted 26 February, 2018; originally announced February 2018.

  23. arXiv:1801.09335  [pdf, other

    cs.LG cs.CV cs.NE

    Stochastic Downsampling for Cost-Adjustable Inference and Improved Regularization in Convolutional Networks

    Authors: Jason Kuen, Xiangfei Kong, Zhe Lin, Gang Wang, Jianxiong Yin, Simon See, Yap-Peng Tan

    Abstract: It is desirable to train convolutional networks (CNNs) to run more efficiently during inference. In many cases however, the computational budget that the system has for inference cannot be known beforehand during training, or the inference budget is dependent on the changing real-time resource availability. Thus, it is inadequate to train just inference-efficient CNNs, whose inference costs are no… ▽ More

    Submitted 28 January, 2018; originally announced January 2018.

  24. arXiv:1611.05552  [pdf, other

    cs.CV cs.LG cs.NE

    DelugeNets: Deep Networks with Efficient and Flexible Cross-layer Information Inflows

    Authors: Jason Kuen, Xiangfei Kong, Gang Wang, Yap-Peng Tan

    Abstract: Deluge Networks (DelugeNets) are deep neural networks which efficiently facilitate massive cross-layer information inflows from preceding layers to succeeding layers. The connections between layers in DelugeNets are established through cross-layer depthwise convolutional layers with learnable filters, acting as a flexible yet efficient selection mechanism. DelugeNets can propagate information acro… ▽ More

    Submitted 23 August, 2017; v1 submitted 16 November, 2016; originally announced November 2016.

    Comments: Code: https://github.com/xternalz/DelugeNets

  25. Self-taught learning of a deep invariant representation for visual tracking via temporal slowness principle

    Authors: Jason Kuen, Kian Ming Lim, Chin Poo Lee

    Abstract: Visual representation is crucial for a visual tracking method's performances. Conventionally, visual representations adopted in visual tracking rely on hand-crafted computer vision descriptors. These descriptors were developed generically without considering tracking-specific information. In this paper, we propose to learn complex-valued invariant representations from tracked sequential image patc… ▽ More

    Submitted 14 April, 2016; originally announced April 2016.

    Comments: Pattern Recognition (Elsevier), 2015

  26. arXiv:1604.03227  [pdf, ps, other

    cs.CV cs.LG stat.ML

    Recurrent Attentional Networks for Saliency Detection

    Authors: Jason Kuen, Zhenhua Wang, Gang Wang

    Abstract: Convolutional-deconvolution networks can be adopted to perform end-to-end saliency detection. But, they do not work well with objects of multiple scales. To overcome such a limitation, in this work, we propose a recurrent attentional convolutional-deconvolution network (RACDNN). Using spatial transformer and recurrent network units, RACDNN is able to iteratively attend to selected image sub-region… ▽ More

    Submitted 11 April, 2016; originally announced April 2016.

    Comments: CVPR 2016

  27. arXiv:1601.05764  [pdf, other

    cs.LG cs.CY

    A Confidence-Based Approach for Balancing Fairness and Accuracy

    Authors: Benjamin Fish, Jeremy Kun, Ádám D. Lelkes

    Abstract: We study three classical machine learning algorithms in the context of algorithmic fairness: adaptive boosting, support vector machines, and logistic regression. Our goal is to maintain the high accuracy of these learning algorithms while reducing the degree to which they discriminate against individuals because of their membership in a protected group. Our first contribution is a method for ach… ▽ More

    Submitted 21 January, 2016; originally announced January 2016.

  28. arXiv:1512.07108  [pdf, other

    cs.CV cs.LG cs.NE

    Recent Advances in Convolutional Neural Networks

    Authors: Jiuxiang Gu, Zhenhua Wang, Jason Kuen, Lianyang Ma, Amir Shahroudy, Bing Shuai, Ting Liu, Xingxing Wang, Li Wang, Gang Wang, Jianfei Cai, Tsuhan Chen

    Abstract: In the last few years, deep learning has led to very good performance on a variety of problems, such as visual recognition, speech recognition and natural language processing. Among different types of deep neural networks, convolutional neural networks have been most extensively studied. Leveraging on the rapid growth in the amount of the annotated data and the great improvements in the strengths… ▽ More

    Submitted 19 October, 2017; v1 submitted 22 December, 2015; originally announced December 2015.

    Comments: Pattern Recognition, Elsevier

  29. arXiv:1507.05206  [pdf, other

    cs.CR cs.DM cs.NI

    Interception in Distance-Vector Routing Networks

    Authors: David Burstein, Franklin Kenter, Jeremy Kun, Feng Shi

    Abstract: Despite the large effort devoted to cybersecurity research over the last decades, cyber intrusions and attacks are still increasing. With respect to routing networks, route hijacking has highlighted the need to reexamine the existing protocols that govern traffic routing. In particular, our pri- mary question is how the topology of a network affects the susceptibility of a routing protocol to endo… ▽ More

    Submitted 30 March, 2016; v1 submitted 18 July, 2015; originally announced July 2015.

  30. arXiv:1411.3640  [pdf, other

    cs.DM

    Network installation and recovery: approximation lower bounds and faster exact formulations

    Authors: Alexander Gutfraind, Jeremy Kun, Ádám D. Lelkes, Lev Reyzin

    Abstract: We study the Neighbor Aided Network Installation Problem (NANIP) introduced previously which asks for a minimal cost ordering of the vertices of a graph, where the cost of visiting a node is a function of the number of neighbors that have already been visited. This problem has applications in resource management and disaster recovery. In this paper we analyze the computational hardness of NANIP. I… ▽ More

    Submitted 13 November, 2014; originally announced November 2014.

  31. arXiv:1410.0245  [pdf, other

    cs.CC cs.DC

    On the Computational Complexity of MapReduce

    Authors: Benjamin Fish, Jeremy Kun, Ádám Dániel Lelkes, Lev Reyzin, György Turán

    Abstract: In this paper we study MapReduce computations from a complexity-theoretic perspective. First, we formulate a uniform version of the MRC model of Karloff et al. (2010). We then show that the class of regular languages, and moreover all of sublogarithmic space, lies in constant round MRC. This result also applies to the MPC model of Andoni et al. (2014). In addition, we prove that, conditioned on a… ▽ More

    Submitted 6 October, 2015; v1 submitted 1 October, 2014; originally announced October 2014.

  32. arXiv:1405.3210  [pdf, other

    cs.LG cs.SI physics.soc-ph

    Locally Boosted Graph Aggregation for Community Detection

    Authors: Jeremy Kun, Rajmonda Caceres, Kevin Carter

    Abstract: Learning the right graph representation from noisy, multi-source data has garnered significant interest in recent years. A central tenet of this problem is relational learning. Here the objective is to incorporate the partial information each data source gives us in a way that captures the true underlying relationships. To address this challenge, we present a general, boosting-inspired framework f… ▽ More

    Submitted 13 May, 2014; originally announced May 2014.

    Comments: arXiv admin note: substantial text overlap with arXiv:1401.3258

  33. arXiv:1402.4376  [pdf, other

    cs.CC cs.DS

    On Coloring Resilient Graphs

    Authors: Jeremy Kun, Lev Reyzin

    Abstract: We introduce a new notion of resilience for constraint satisfaction problems, with the goal of more precisely determining the boundary between NP-hardness and the existence of efficient algorithms for resilient instances. In particular, we study $r$-resiliently $k$-colorable graphs, which are those $k$-colorable graphs that remain $k$-colorable even after the addition of any $r$ new edges. We prov… ▽ More

    Submitted 11 June, 2014; v1 submitted 18 February, 2014; originally announced February 2014.

    Comments: Appearing in MFCS 2014

  34. arXiv:1401.3258  [pdf, other

    cs.LG cs.SI stat.ML

    A Boosting Approach to Learning Graph Representations

    Authors: Rajmonda Caceres, Kevin Carter, Jeremy Kun

    Abstract: Learning the right graph representation from noisy, multisource data has garnered significant interest in recent years. A central tenet of this problem is relational learning. Here the objective is to incorporate the partial information each data source gives us in a way that captures the true underlying relationships. To address this challenge, we present a general, boosting-inspired framework fo… ▽ More

    Submitted 14 January, 2014; originally announced January 2014.

  35. arXiv:1308.3258  [pdf, other

    cs.GT cs.DM

    Anti-Coordination Games and Stable Graph Colorings

    Authors: Jeremy Kun, Brian Powers, Lev Reyzin

    Abstract: Motivated by understanding non-strict and strict pure strategy equilibria in network anti-coordination games, we define notions of stable and, respectively, strictly stable colorings in graphs. We characterize the cases when such colorings exist and when the decision problem is NP-hard. These correspond to finding pure strategy equilibria in the anti-coordination games, whose price of anarchy we a… ▽ More

    Submitted 14 August, 2013; originally announced August 2013.

    Comments: Appearing in SAGT 2013