Search | arXiv e-print repository

arXiv:2406.19922 [pdf, other]

Parallax-tolerant Image Stitching via Segmentation-guided Multi-homography War**

Authors: Tianli Liao, Ce Wang, Lei Li, Guangen Liu, Nan Li

Abstract: Large parallax between images is an intractable issue in image stitching. Various war**-based methods are proposed to address it, yet the results are unsatisfactory. In this paper, we propose a novel image stitching method using multi-homography war** guided by image segmentation. Specifically, we leverage the Segment Anything Model to segment the target image into numerous contents and partit… ▽ More Large parallax between images is an intractable issue in image stitching. Various war**-based methods are proposed to address it, yet the results are unsatisfactory. In this paper, we propose a novel image stitching method using multi-homography war** guided by image segmentation. Specifically, we leverage the Segment Anything Model to segment the target image into numerous contents and partition the feature points into multiple subsets via the energy-based multi-homography fitting algorithm. The multiple subsets of feature points are used to calculate the corresponding multiple homographies. For each segmented content in the overlap** region, we select its best-fitting homography with the lowest photometric error. For each segmented content in the non-overlap** region, we calculate a weighted combination of the linearized homographies. Finally, the target image is warped via the best-fitting homographies to align with the reference image, and the final panorama is generated via linear blending. Comprehensive experimental results on the public datasets demonstrate that our method provides the best alignment accuracy by a large margin, compared with the state-of-the-art methods. The source code is available at https://github.com/tlliao/multi-homo-warp. △ Less

Submitted 28 June, 2024; originally announced June 2024.

Comments: 11 pages, 9 figures

arXiv:2406.07814 [pdf, other]

doi 10.1145/3630106.3658979

Collective Constitutional AI: Aligning a Language Model with Public Input

Authors: Saffron Huang, Divya Siddarth, Liane Lovitt, Thomas I. Liao, Esin Durmus, Alex Tamkin, Deep Ganguli

Abstract: There is growing consensus that language model (LM) developers should not be the sole deciders of LM behavior, creating a need for methods that enable the broader public to collectively shape the behavior of LM systems that affect them. To address this need, we present Collective Constitutional AI (CCAI): a multi-stage process for sourcing and integrating public input into LMs-from identifying a t… ▽ More There is growing consensus that language model (LM) developers should not be the sole deciders of LM behavior, creating a need for methods that enable the broader public to collectively shape the behavior of LM systems that affect them. To address this need, we present Collective Constitutional AI (CCAI): a multi-stage process for sourcing and integrating public input into LMs-from identifying a target population to sourcing principles to training and evaluating a model. We demonstrate the real-world practicality of this approach by creating what is, to our knowledge, the first LM fine-tuned with collectively sourced public input and evaluating this model against a baseline model trained with established principles from a LM developer. Our quantitative evaluations demonstrate several benefits of our approach: the CCAI-trained model shows lower bias across nine social dimensions compared to the baseline model, while maintaining equivalent performance on language, math, and helpful-harmless evaluations. Qualitative comparisons of the models suggest that the models differ on the basis of their respective constitutions, e.g., when prompted with contentious topics, the CCAI-trained model tends to generate responses that reframe the matter positively instead of a refusal. These results demonstrate a promising, tractable pathway toward publicly informed development of language models. △ Less

Submitted 11 June, 2024; originally announced June 2024.

ACM Class: I.2.7; K.4.2

Journal ref: Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency. 1395-1417

arXiv:2406.05773 [pdf, other]

CorrMAE: Pre-training Correspondence Transformers with Masked Autoencoder

Authors: Tangfei Liao, Xiaoqin Zhang, Guobao Xiao, Min Li, Tao Wang, Mang Ye

Abstract: Pre-training has emerged as a simple yet powerful methodology for representation learning across various domains. However, due to the expensive training cost and limited data, pre-training has not yet been extensively studied in correspondence pruning. To tackle these challenges, we propose a pre-training method to acquire a generic inliers-consistent representation by reconstructing masked corres… ▽ More Pre-training has emerged as a simple yet powerful methodology for representation learning across various domains. However, due to the expensive training cost and limited data, pre-training has not yet been extensively studied in correspondence pruning. To tackle these challenges, we propose a pre-training method to acquire a generic inliers-consistent representation by reconstructing masked correspondences, providing a strong initial representation for downstream tasks. Toward this objective, a modicum of true correspondences naturally serve as input, thus significantly reducing pre-training overhead. In practice, we introduce CorrMAE, an extension of the mask autoencoder framework tailored for the pre-training of correspondence pruning. CorrMAE involves two main phases, \ie correspondence learning and matching point reconstruction, guiding the reconstruction of masked correspondences through learning visible correspondence consistency. Herein, we employ a dual-branch structure with an ingenious positional encoding to reconstruct unordered and irregular correspondences. Also, a bi-level designed encoder is proposed for correspondence learning, which offers enhanced consistency learning capability and transferability. Extensive experiments have shown that the model pre-trained with our CorrMAE outperforms prior work on multiple challenging benchmarks. Meanwhile, our CorrMAE is primarily a task-driven pre-training method, and can achieve notable improvements for downstream tasks by pre-training on the targeted dataset. We hope this work can provide a starting point for correspondence pruning pre-training. △ Less

Submitted 9 June, 2024; originally announced June 2024.

arXiv:2405.20334 [pdf, other]

VividDream: Generating 3D Scene with Ambient Dynamics

Authors: Yao-Chih Lee, Yi-Ting Chen, Andrew Wang, Ting-Hsuan Liao, Brandon Y. Feng, Jia-Bin Huang

Abstract: We introduce VividDream, a method for generating explorable 4D scenes with ambient dynamics from a single input image or text prompt. VividDream first expands an input image into a static 3D point cloud through iterative inpainting and geometry merging. An ensemble of animated videos is then generated using video diffusion models with quality refinement techniques and conditioned on renderings of… ▽ More We introduce VividDream, a method for generating explorable 4D scenes with ambient dynamics from a single input image or text prompt. VividDream first expands an input image into a static 3D point cloud through iterative inpainting and geometry merging. An ensemble of animated videos is then generated using video diffusion models with quality refinement techniques and conditioned on renderings of the static 3D scene from the sampled camera trajectories. We then optimize a canonical 4D scene representation using an animated video ensemble, with per-video motion embeddings and visibility masks to mitigate inconsistencies. The resulting 4D scene enables free-view exploration of a 3D scene with plausible ambient scene dynamics. Experiments demonstrate that VividDream can provide human viewers with compelling 4D experiences generated based on diverse real images and text prompts. △ Less

Submitted 30 May, 2024; originally announced May 2024.

Comments: Project page: https://vivid-dream-4d.github.io

arXiv:2405.09839 [pdf, other]

Advances in Robust Federated Learning: Heterogeneity Considerations

Authors: Chuan Chen, Tianchi Liao, Xiaojun Deng, Zihou Wu, Sheng Huang, Zibin Zheng

Abstract: In the field of heterogeneous federated learning (FL), the key challenge is to efficiently and collaboratively train models across multiple clients with different data distributions, model structures, task objectives, computational capabilities, and communication resources. This diversity leads to significant heterogeneity, which increases the complexity of model training. In this paper, we first… ▽ More In the field of heterogeneous federated learning (FL), the key challenge is to efficiently and collaboratively train models across multiple clients with different data distributions, model structures, task objectives, computational capabilities, and communication resources. This diversity leads to significant heterogeneity, which increases the complexity of model training. In this paper, we first outline the basic concepts of heterogeneous federated learning and summarize the research challenges in federated learning in terms of five aspects: data, model, task, device, and communication. In addition, we explore how existing state-of-the-art approaches cope with the heterogeneity of federated learning, and categorize and review these approaches at three different levels: data-level, model-level, and architecture-level. Subsequently, the paper extensively discusses privacy-preserving strategies in heterogeneous federated learning environments. Finally, the paper discusses current open issues and directions for future research, aiming to promote the further development of heterogeneous federated learning. △ Less

Submitted 16 May, 2024; originally announced May 2024.

arXiv:2405.09024 [pdf, other]

Dynamic Loss Decay based Robust Oriented Object Detection on Remote Sensing Images with Noisy Labels

Authors: Guozhang Liu, Ting Liu, Mengke Yuan, Tao Pang, Guangxing Yang, Hao Fu, Tao Wang, Tongkui Liao

Abstract: The ambiguous appearance, tiny scale, and fine-grained classes of objects in remote sensing imagery inevitably lead to the noisy annotations in category labels of detection dataset. However, the effects and treatments of the label noises are underexplored in modern oriented remote sensing object detectors. To address this issue, we propose a robust oriented remote sensing object detection method t… ▽ More The ambiguous appearance, tiny scale, and fine-grained classes of objects in remote sensing imagery inevitably lead to the noisy annotations in category labels of detection dataset. However, the effects and treatments of the label noises are underexplored in modern oriented remote sensing object detectors. To address this issue, we propose a robust oriented remote sensing object detection method through dynamic loss decay (DLD) mechanism, inspired by the two phase ``early-learning'' and ``memorization'' learning dynamics of deep neural networks on clean and noisy samples. To be specific, we first observe the end point of early learning phase termed as EL, after which the models begin to memorize the false labels that significantly degrade the detection accuracy. Secondly, under the guidance of the training indicator, the losses of each sample are ranked in descending order, and we adaptively decay the losses of the top K largest ones (bad samples) in the following epochs. Because these large losses are of high confidence to be calculated with wrong labels. Experimental results show that the method achieves excellent noise resistance performance tested on multiple public datasets such as HRSC2016 and DOTA-v1.0/v2.0 with synthetic category label noise. Our solution also has won the 2st place in the "fine-grained object detection based on sub-meter remote sensing imagery" track with noisy labels of 2023 National Big Data and Computing Intelligence Challenge. △ Less

Submitted 14 May, 2024; originally announced May 2024.

arXiv:2403.12502 [pdf, other]

Under-actuated Robotic Gripper with Multiple Gras** Modes Inspired by Human Finger

Authors: Jihao Li, Tingbo Liao, Hassen Nigatu, Haotian Guo, Guodong Lu, Huixu Dong

Abstract: Under-actuated robot grippers as a pervasive tool of robots have become a considerable research focus. Despite their simplicity of mechanical design and control strategy, they suffer from poor versatility and weak adaptability, making widespread applications limited. To better relieve relevant research gaps, we present a novel 3-finger linkage-based gripper that realizes retractable and reconfigur… ▽ More Under-actuated robot grippers as a pervasive tool of robots have become a considerable research focus. Despite their simplicity of mechanical design and control strategy, they suffer from poor versatility and weak adaptability, making widespread applications limited. To better relieve relevant research gaps, we present a novel 3-finger linkage-based gripper that realizes retractable and reconfigurable multi-mode grasps driven by a single motor. Firstly, inspired by the changes that occurred in the contact surface with a human finger moving, we artfully design a slider-slide rail mechanism as the phalanx to achieve retraction of each finger, allowing for better performance in the envelo** gras** mode. Secondly, a reconfigurable structure is constructed to broaden the gras** range of objects' dimensions for the proposed gripper. By adjusting the configuration and gesture of each finger, the gripper can achieve five gras** modes. Thirdly, the proposed gripper is just actuated by a single motor, yet it can be capable of gras** and reconfiguring simultaneously. Finally, various experiments on grasps of slender, thin, and large-volume objects are implemented to evaluate the performance of the proposed gripper in practical scenarios, which demonstrates the excellent gras** capabilities of the gripper. △ Less

Submitted 19 March, 2024; originally announced March 2024.

Comments: 8 pages

arXiv:2402.18013 [pdf, other]

A Survey on Recent Advances in LLM-Based Multi-turn Dialogue Systems

Authors: Zihao Yi, Jiarui Ouyang, Yuwen Liu, Tianhao Liao, Zhe Xu, Ying Shen

Abstract: This survey provides a comprehensive review of research on multi-turn dialogue systems, with a particular focus on multi-turn dialogue systems based on large language models (LLMs). This paper aims to (a) give a summary of existing LLMs and approaches for adapting LLMs to downstream tasks; (b) elaborate recent advances in multi-turn dialogue systems, covering both LLM-based open-domain dialogue (O… ▽ More This survey provides a comprehensive review of research on multi-turn dialogue systems, with a particular focus on multi-turn dialogue systems based on large language models (LLMs). This paper aims to (a) give a summary of existing LLMs and approaches for adapting LLMs to downstream tasks; (b) elaborate recent advances in multi-turn dialogue systems, covering both LLM-based open-domain dialogue (ODD) and task-oriented dialogue (TOD) systems, along with datasets and evaluation metrics; (c) discuss some future emphasis and recent research problems arising from the development of LLMs and the increasing demands on multi-turn dialogue systems. △ Less

Submitted 27 February, 2024; originally announced February 2024.

Comments: 35 pages, 10 figures, ACM Computing Surveys

arXiv:2402.17202 [pdf, other]

FedBRB: An Effective Solution to the Small-to-Large Scenario in Device-Heterogeneity Federated Learning

Authors: Ziyue Xu, Mingfeng Xu, Tianchi Liao, Zibin Zheng, Chuan Chen

Abstract: Recently, the success of large models has demonstrated the importance of scaling up model size. This has spurred interest in exploring collaborative training of large-scale models from federated learning perspective. Due to computational constraints, many institutions struggle to train a large-scale model locally. Thus, training a larger global model using only smaller local models has become an i… ▽ More Recently, the success of large models has demonstrated the importance of scaling up model size. This has spurred interest in exploring collaborative training of large-scale models from federated learning perspective. Due to computational constraints, many institutions struggle to train a large-scale model locally. Thus, training a larger global model using only smaller local models has become an important scenario (i.e., the \textbf{small-to-large scenario}). Although recent device-heterogeneity federated learning approaches have started to explore this area, they face limitations in fully covering the parameter space of the global model. In this paper, we propose a method called \textbf{FedBRB} (\underline{B}lock-wise \underline{R}olling and weighted \underline{B}roadcast) based on the block concept. FedBRB can uses small local models to train all blocks of the large global model, and broadcasts the trained parameters to the entire space for faster information interaction. Experiments demonstrate FedBRB yields substantial performance gains, achieving state-of-the-art results in this scenario. Moreover, FedBRB using only minimal local models can even surpass baselines using larger local models. △ Less

Submitted 26 February, 2024; originally announced February 2024.

arXiv:2312.08774 [pdf, other]

VSFormer: Visual-Spatial Fusion Transformer for Correspondence Pruning

Authors: Tangfei Liao, Xiaoqin Zhang, Li Zhao, Tao Wang, Guobao Xiao

Abstract: Correspondence pruning aims to find correct matches (inliers) from an initial set of putative correspondences, which is a fundamental task for many applications. The process of finding is challenging, given the varying inlier ratios between scenes/image pairs due to significant visual differences. However, the performance of the existing methods is usually limited by the problem of lacking visual… ▽ More Correspondence pruning aims to find correct matches (inliers) from an initial set of putative correspondences, which is a fundamental task for many applications. The process of finding is challenging, given the varying inlier ratios between scenes/image pairs due to significant visual differences. However, the performance of the existing methods is usually limited by the problem of lacking visual cues (\eg texture, illumination, structure) of scenes. In this paper, we propose a Visual-Spatial Fusion Transformer (VSFormer) to identify inliers and recover camera poses accurately. Firstly, we obtain highly abstract visual cues of a scene with the cross attention between local features of two-view images. Then, we model these visual cues and correspondences by a joint visual-spatial fusion module, simultaneously embedding visual cues into correspondences for pruning. Additionally, to mine the consistency of correspondences, we also design a novel module that combines the KNN-based graph and the transformer, effectively capturing both local and global contexts. Extensive experiments have demonstrated that the proposed VSFormer outperforms state-of-the-art methods on outdoor and indoor benchmarks. Our code is provided at the following repository: https://github.com/sugar-fly/VSFormer. △ Less

Submitted 4 January, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

Comments: Accepted by AAAI2024

arXiv:2312.00048 [pdf, other]

Tokenized Model: A Blockchain-Empowered Decentralized Model Ownership Verification Platform

Authors: Yihao Li, Yanyi Lai, Tianchi Liao, Chuan Chen, Zibin Zheng

Abstract: With the development of practical deep learning models like generative AI, their excellent performance has brought huge economic value. For instance, ChatGPT has attracted more than 100 million users in three months. Since the model training requires a lot of data and computing power, a well-performing deep learning model is behind a huge effort and cost. Facing various model attacks, unauthorized… ▽ More With the development of practical deep learning models like generative AI, their excellent performance has brought huge economic value. For instance, ChatGPT has attracted more than 100 million users in three months. Since the model training requires a lot of data and computing power, a well-performing deep learning model is behind a huge effort and cost. Facing various model attacks, unauthorized use and abuse from the network that threaten the interests of model owners, in addition to considering legal and other administrative measures, it is equally important to protect the model's copyright from the technical means. By using the model watermarking technology, we point out the possibility of building a unified platform for model ownership verification. Given the application history of blockchain in copyright verification and the drawbacks of a centralized third-party, this paper considers combining model watermarking technology and blockchain to build a unified model copyright protection platform. By a new solution we called Tokenized Model, it protects the model's copyright by reliable ownership record and verification mechanism. It also promotes the financial value of model by constructing the model's transaction process and contribution shares of a model. In the typical case study, we also study the various performance under usual scenario to verify the effectiveness of this platform. △ Less

Submitted 27 November, 2023; originally announced December 2023.

arXiv:2311.18564 [pdf, other]

Seam-guided local alignment and stitching for large parallax images

Authors: Tianli Liao, Chenyang Zhao, Lei Li, Heling Cao

Abstract: Seam-cutting methods have been proven effective in the composition step of image stitching, especially for images with parallax. However, the effectiveness of seam-cutting usually depends on that images can be roughly aligned such that there exists a local region where a plausible seam can be found. For images with large parallax, current alignment methods often fall short of expectations. In this… ▽ More Seam-cutting methods have been proven effective in the composition step of image stitching, especially for images with parallax. However, the effectiveness of seam-cutting usually depends on that images can be roughly aligned such that there exists a local region where a plausible seam can be found. For images with large parallax, current alignment methods often fall short of expectations. In this paper, we propose a local alignment and stitching method guided by seam quality evaluation. First, we use existing image alignment and seam-cutting methods to calculate an initial seam and evaluate the quality of pixels along the seam. Then, for pixels with low qualities, we separate their enclosing patches in the aligned images and locally align them by extracting modified dense correspondences via SIFT flow. Finally, we composite the aligned patches via seam-cutting and merge them into the original aligned result to generate the final mosaic. Experiments show that compared with the state-of-the-art seam-cutting methods, our result is more plausible and with fewer artifacts. The code will be available at https://github.com/tlliao/Seam-guided-local-alignment. △ Less

Submitted 30 November, 2023; originally announced November 2023.

Comments: 13 pages, 12 figures, in peer review

arXiv:2310.13798 [pdf, other]

Specific versus General Principles for Constitutional AI

Authors: Sandipan Kundu, Yuntao Bai, Saurav Kadavath, Amanda Askell, Andrew Callahan, Anna Chen, Anna Goldie, Avital Balwit, Azalia Mirhoseini, Brayden McLean, Catherine Olsson, Cassie Evraets, Eli Tran-Johnson, Esin Durmus, Ethan Perez, Jackson Kernion, Jamie Kerr, Kamal Ndousse, Karina Nguyen, Nelson Elhage, Newton Cheng, Nicholas Schiefer, Nova DasSarma, Oliver Rausch, Robin Larson , et al. (11 additional authors not shown)

Abstract: Human feedback can prevent overtly harmful utterances in conversational models, but may not automatically mitigate subtle problematic behaviors such as a stated desire for self-preservation or power. Constitutional AI offers an alternative, replacing human feedback with feedback from AI models conditioned only on a list of written principles. We find this approach effectively prevents the expressi… ▽ More Human feedback can prevent overtly harmful utterances in conversational models, but may not automatically mitigate subtle problematic behaviors such as a stated desire for self-preservation or power. Constitutional AI offers an alternative, replacing human feedback with feedback from AI models conditioned only on a list of written principles. We find this approach effectively prevents the expression of such behaviors. The success of simple principles motivates us to ask: can models learn general ethical behaviors from only a single written principle? To test this, we run experiments using a principle roughly stated as "do what's best for humanity". We find that the largest dialogue models can generalize from this short constitution, resulting in harmless assistants with no stated interest in specific motivations like power. A general principle may thus partially avoid the need for a long list of constitutions targeting potentially harmful behaviors. However, more detailed constitutions still improve fine-grained control over specific types of harms. This suggests both general and specific principles have value for steering AI safely. △ Less

Submitted 20 October, 2023; originally announced October 2023.

arXiv:2308.10899 [pdf, other]

TADA! Text to Animatable Digital Avatars

Authors: Tingting Liao, Hongwei Yi, Yuliang Xiu, Jiaxaing Tang, Yangyi Huang, Justus Thies, Michael J. Black

Abstract: We introduce TADA, a simple-yet-effective approach that takes textual descriptions and produces expressive 3D avatars with high-quality geometry and lifelike textures, that can be animated and rendered with traditional graphics pipelines. Existing text-based character generation methods are limited in terms of geometry and texture quality, and cannot be realistically animated due to inconsistent a… ▽ More We introduce TADA, a simple-yet-effective approach that takes textual descriptions and produces expressive 3D avatars with high-quality geometry and lifelike textures, that can be animated and rendered with traditional graphics pipelines. Existing text-based character generation methods are limited in terms of geometry and texture quality, and cannot be realistically animated due to inconsistent alignment between the geometry and the texture, particularly in the face region. To overcome these limitations, TADA leverages the synergy of a 2D diffusion model and an animatable parametric body model. Specifically, we derive an optimizable high-resolution body model from SMPL-X with 3D displacements and a texture map, and use hierarchical rendering with score distillation sampling (SDS) to create high-quality, detailed, holistic 3D avatars from text. To ensure alignment between the geometry and texture, we render normals and RGB images of the generated character and exploit their latent embeddings in the SDS training process. We further introduce various expression parameters to deform the generated character during training, ensuring that the semantics of our generated character remain consistent with the original SMPL-X model, resulting in an animatable character. Comprehensive evaluations demonstrate that TADA significantly surpasses existing approaches on both qualitative and quantitative measures. TADA enables creation of large-scale digital character assets that are ready for animation and rendering, while also being easily editable through natural language. The code will be public for research purposes. △ Less

Submitted 21 August, 2023; originally announced August 2023.

arXiv:2308.08545 [pdf, other]

TeCH: Text-guided Reconstruction of Lifelike Clothed Humans

Authors: Yangyi Huang, Hongwei Yi, Yuliang Xiu, Tingting Liao, Jiaxiang Tang, Deng Cai, Justus Thies

Abstract: Despite recent research advancements in reconstructing clothed humans from a single image, accurately restoring the "unseen regions" with high-level details remains an unsolved challenge that lacks attention. Existing methods often generate overly smooth back-side surfaces with a blurry texture. But how to effectively capture all visual attributes of an individual from a single image, which are su… ▽ More Despite recent research advancements in reconstructing clothed humans from a single image, accurately restoring the "unseen regions" with high-level details remains an unsolved challenge that lacks attention. Existing methods often generate overly smooth back-side surfaces with a blurry texture. But how to effectively capture all visual attributes of an individual from a single image, which are sufficient to reconstruct unseen areas (e.g., the back view)? Motivated by the power of foundation models, TeCH reconstructs the 3D human by leveraging 1) descriptive text prompts (e.g., garments, colors, hairstyles) which are automatically generated via a garment parsing model and Visual Question Answering (VQA), 2) a personalized fine-tuned Text-to-Image diffusion model (T2I) which learns the "indescribable" appearance. To represent high-resolution 3D clothed humans at an affordable cost, we propose a hybrid 3D representation based on DMTet, which consists of an explicit body shape grid and an implicit distance field. Guided by the descriptive prompts + personalized T2I diffusion model, the geometry and texture of the 3D humans are optimized through multi-view Score Distillation Sampling (SDS) and reconstruction losses based on the original observation. TeCH produces high-fidelity 3D clothed humans with consistent & delicate texture, and detailed full-body geometry. Quantitative and qualitative experiments demonstrate that TeCH outperforms the state-of-the-art methods in terms of reconstruction accuracy and rendering quality. The code will be publicly available for research purposes at https://huangyangyi.github.io/TeCH △ Less

Submitted 19 August, 2023; v1 submitted 16 August, 2023; originally announced August 2023.

Comments: Project: https://huangyangyi.github.io/TeCH, Code: https://github.com/huangyangyi/TeCH

arXiv:2307.03823 [pdf, other]

Linguistic representations for fewer-shot relation extraction across domains

Authors: Sireesh Gururaja, Ritam Dutt, Tinglong Liao, Carolyn Rose

Abstract: Recent work has demonstrated the positive impact of incorporating linguistic representations as additional context and scaffolding on the in-domain performance of several NLP tasks. We extend this work by exploring the impact of linguistic representations on cross-domain performance in a few-shot transfer setting. An important question is whether linguistic representations enhance generalizability… ▽ More Recent work has demonstrated the positive impact of incorporating linguistic representations as additional context and scaffolding on the in-domain performance of several NLP tasks. We extend this work by exploring the impact of linguistic representations on cross-domain performance in a few-shot transfer setting. An important question is whether linguistic representations enhance generalizability by providing features that function as cross-domain pivots. We focus on the task of relation extraction on three datasets of procedural text in two domains, cooking and materials science. Our approach augments a popular transformer-based architecture by alternately incorporating syntactic and semantic graphs constructed by freely available off-the-shelf tools. We examine their utility for enhancing generalization, and investigate whether earlier findings, e.g. that semantic representations can be more helpful than syntactic ones, extend to relation extraction in multiple domains. We find that while the inclusion of these graphs results in significantly higher performance in few-shot transfer, both types of graph exhibit roughly equivalent utility. △ Less

Submitted 7 July, 2023; originally announced July 2023.

Comments: ACL 2023

arXiv:2306.16388 [pdf, other]

Towards Measuring the Representation of Subjective Global Opinions in Language Models

Authors: Esin Durmus, Karina Nguyen, Thomas I. Liao, Nicholas Schiefer, Amanda Askell, Anton Bakhtin, Carol Chen, Zac Hatfield-Dodds, Danny Hernandez, Nicholas Joseph, Liane Lovitt, Sam McCandlish, Orowa Sikder, Alex Tamkin, Janel Thamkul, Jared Kaplan, Jack Clark, Deep Ganguli

Abstract: Large language models (LLMs) may not equitably represent diverse global perspectives on societal issues. In this paper, we develop a quantitative framework to evaluate whose opinions model-generated responses are more similar to. We first build a dataset, GlobalOpinionQA, comprised of questions and answers from cross-national surveys designed to capture diverse opinions on global issues across dif… ▽ More Large language models (LLMs) may not equitably represent diverse global perspectives on societal issues. In this paper, we develop a quantitative framework to evaluate whose opinions model-generated responses are more similar to. We first build a dataset, GlobalOpinionQA, comprised of questions and answers from cross-national surveys designed to capture diverse opinions on global issues across different countries. Next, we define a metric that quantifies the similarity between LLM-generated survey responses and human responses, conditioned on country. With our framework, we run three experiments on an LLM trained to be helpful, honest, and harmless with Constitutional AI. By default, LLM responses tend to be more similar to the opinions of certain populations, such as those from the USA, and some European and South American countries, highlighting the potential for biases. When we prompt the model to consider a particular country's perspective, responses shift to be more similar to the opinions of the prompted populations, but can reflect harmful cultural stereotypes. When we translate GlobalOpinionQA questions to a target language, the model's responses do not necessarily become the most similar to the opinions of speakers of those languages. We release our dataset for others to use and build on. Our data is at https://huggingface.co/datasets/Anthropic/llm_global_opinions. We also provide an interactive visualization at https://llmglobalvalues.anthropic.com. △ Less

Submitted 11 April, 2024; v1 submitted 28 June, 2023; originally announced June 2023.

arXiv:2306.04212 [pdf, other]

Migrate Demographic Group For Fair GNNs

Authors: YanMing Hu, TianChi Liao, JiaLong Chen, **g Bian, ZiBin Zheng, Chuan Chen

Abstract: Graph Neural networks (GNNs) have been applied in many scenarios due to the superior performance of graph learning. However, fairness is always ignored when designing GNNs. As a consequence, biased information in training data can easily affect vanilla GNNs, causing biased results toward particular demographic groups (divided by sensitive attributes, such as race and age). There have been efforts… ▽ More Graph Neural networks (GNNs) have been applied in many scenarios due to the superior performance of graph learning. However, fairness is always ignored when designing GNNs. As a consequence, biased information in training data can easily affect vanilla GNNs, causing biased results toward particular demographic groups (divided by sensitive attributes, such as race and age). There have been efforts to address the fairness issue. However, existing fair techniques generally divide the demographic groups by raw sensitive attributes and assume that are fixed. The biased information correlated with raw sensitive attributes will run through the training process regardless of the implemented fair techniques. It is urgent to resolve this problem for training fair GNNs. To tackle this problem, we propose a brand new framework, FairMigration, which can dynamically migrate the demographic groups instead of kee** that fixed with raw sensitive attributes. FairMigration is composed of two training stages. In the first stage, the GNNs are initially optimized by personalized self-supervised learning, and the demographic groups are adjusted dynamically. In the second stage, the new demographic groups are frozen and supervised learning is carried out under the constraints of new demographic groups and adversarial training. Extensive experiments reveal that FairMigration balances model performance and fairness well. △ Less

Submitted 23 March, 2024; v1 submitted 7 June, 2023; originally announced June 2023.

arXiv:2306.00419 [pdf, other]

Challenges and Remedies to Privacy and Security in AIGC: Exploring the Potential of Privacy Computing, Blockchain, and Beyond

Authors: Chuan Chen, Zhenpeng Wu, Yanyi Lai, Wenlin Ou, Tianchi Liao, Zibin Zheng

Abstract: Artificial Intelligence Generated Content (AIGC) is one of the latest achievements in AI development. The content generated by related applications, such as text, images and audio, has sparked a heated discussion. Various derived AIGC applications are also gradually entering all walks of life, bringing unimaginable impact to people's daily lives. However, the rapid development of such generative t… ▽ More Artificial Intelligence Generated Content (AIGC) is one of the latest achievements in AI development. The content generated by related applications, such as text, images and audio, has sparked a heated discussion. Various derived AIGC applications are also gradually entering all walks of life, bringing unimaginable impact to people's daily lives. However, the rapid development of such generative tools has also raised concerns about privacy and security issues, and even copyright issues in AIGC. We note that advanced technologies such as blockchain and privacy computing can be combined with AIGC tools, but no work has yet been done to investigate their relevance and prospect in a systematic and detailed way. Therefore it is necessary to investigate how they can be used to protect the privacy and security of data in AIGC by fully exploring the aforementioned technologies. In this paper, we first systematically review the concept, classification and underlying technologies of AIGC. Then, we discuss the privacy and security challenges faced by AIGC from multiple perspectives and purposefully list the countermeasures that currently exist. We hope our survey will help researchers and industry to build a more secure and robust AIGC system. △ Less

Submitted 1 June, 2023; originally announced June 2023.

Comments: 43 pages, 10 figures

arXiv:2305.12329 [pdf]

doi 10.1007/978-3-030-30149-1_24

Anomaly Detection Using One-Class SVM for Logs of Juniper Router Devices

Authors: Tat-Bao-Thien Nguyen, Teh-Lu Liao, Tuan-Anh Vu

Abstract: The article deals with anomaly detection of Juniper router logs. Abnormal Juniper router logs include logs that are usually different from the normal operation, and they often reflect the abnormal operation of router devices. To prevent router devices from being damaged and help administrator to grasp the situation of error quickly, detecting abnormal operation soon is very important. In this work… ▽ More The article deals with anomaly detection of Juniper router logs. Abnormal Juniper router logs include logs that are usually different from the normal operation, and they often reflect the abnormal operation of router devices. To prevent router devices from being damaged and help administrator to grasp the situation of error quickly, detecting abnormal operation soon is very important. In this work, we present a new way to get important features from log data of Juniper router devices and use machine learning method (basing on One-Class SVM model) for anomaly detection. One-Class SVM model requires some knowledge and comprehension about logs of Juniper router devices so that it can analyze, interpret, and test the knowledge ac-quired. We collect log data from a lot of real Juniper router devices and clas-sify them based on our knowledge. Before these logs are used for training and testing the One-Class SVM model, the feature extraction phase for these data was carried out. Finally, with the proposed method, the system errors of the routers were dectected quickly and accurately. This may help our com-pany to reduce the operation cost for the router systems. △ Less

Submitted 20 May, 2023; originally announced May 2023.

Journal ref: In: Duong, T., Vo, NS., Nguyen, L., Vien, QT., Nguyen, VD. (eds) Industrial Networks and Intelligent Systems. INISCOM 2019

arXiv:2304.03903 [pdf, other]

High-Fidelity Clothed Avatar Reconstruction from a Single Image

Authors: Tingting Liao, Xiaomei Zhang, Yuliang Xiu, Hongwei Yi, Xudong Liu, Guo-Jun Qi, Yong Zhang, Xuan Wang, Xiangyu Zhu, Zhen Lei

Abstract: This paper presents a framework for efficient 3D clothed avatar reconstruction. By combining the advantages of the high accuracy of optimization-based methods and the efficiency of learning-based methods, we propose a coarse-to-fine way to realize a high-fidelity clothed avatar reconstruction (CAR) from a single image. At the first stage, we use an implicit model to learn the general shape in the… ▽ More This paper presents a framework for efficient 3D clothed avatar reconstruction. By combining the advantages of the high accuracy of optimization-based methods and the efficiency of learning-based methods, we propose a coarse-to-fine way to realize a high-fidelity clothed avatar reconstruction (CAR) from a single image. At the first stage, we use an implicit model to learn the general shape in the canonical space of a person in a learning-based way, and at the second stage, we refine the surface detail by estimating the non-rigid deformation in the posed space in an optimization way. A hyper-network is utilized to generate a good initialization so that the convergence o f the optimization process is greatly accelerated. Extensive experiments on various datasets show that the proposed CAR successfully produces high-fidelity avatars for arbitrarily clothed humans in real scenes. △ Less

Submitted 8 April, 2023; originally announced April 2023.

arXiv:2303.15772 [pdf, other]

Ecosystem Graphs: The Social Footprint of Foundation Models

Authors: Rishi Bommasani, Dilara Soylu, Thomas I. Liao, Kathleen A. Creel, Percy Liang

Abstract: Foundation models (e.g. ChatGPT, StableDiffusion) pervasively influence society, warranting immediate social attention. While the models themselves garner much attention, to accurately characterize their impact, we must consider the broader sociotechnical ecosystem. We propose Ecosystem Graphs as a documentation framework to transparently centralize knowledge of this ecosystem. Ecosystem Graphs is… ▽ More Foundation models (e.g. ChatGPT, StableDiffusion) pervasively influence society, warranting immediate social attention. While the models themselves garner much attention, to accurately characterize their impact, we must consider the broader sociotechnical ecosystem. We propose Ecosystem Graphs as a documentation framework to transparently centralize knowledge of this ecosystem. Ecosystem Graphs is composed of assets (datasets, models, applications) linked together by dependencies that indicate technical (e.g. how Bing relies on GPT-4) and social (e.g. how Microsoft relies on OpenAI) relationships. To supplement the graph structure, each asset is further enriched with fine-grained metadata (e.g. the license or training emissions). We document the ecosystem extensively at https://crfm.stanford.edu/ecosystem-graphs/. As of March 16, 2023, we annotate 262 assets (64 datasets, 128 models, 70 applications) from 63 organizations linked by 356 dependencies. We show Ecosystem Graphs functions as a powerful abstraction and interface for achieving the minimum transparency required to address myriad use cases. Therefore, we envision Ecosystem Graphs will be a community-maintained resource that provides value to stakeholders spanning AI researchers, industry professionals, social scientists, auditors and policymakers. △ Less

Submitted 28 March, 2023; originally announced March 2023.

Comments: Authored by the Center for Research on Foundation Models (CRFM) at the Stanford Institute for Human-Centered Artificial Intelligence (HAI). Ecosystem Graphs available at https://crfm.stanford.edu/ecosystem-graphs/

arXiv:2302.08510 [pdf, other]

Text-driven Visual Synthesis with Latent Diffusion Prior

Authors: Ting-Hsuan Liao, Songwei Ge, Yiran Xu, Yao-Chih Lee, Badour AlBahar, Jia-Bin Huang

Abstract: There has been tremendous progress in large-scale text-to-image synthesis driven by diffusion models enabling versatile downstream applications such as 3D object synthesis from texts, image editing, and customized generation. We present a generic approach using latent diffusion models as powerful image priors for various visual synthesis tasks. Existing methods that utilize such priors fail to use… ▽ More There has been tremendous progress in large-scale text-to-image synthesis driven by diffusion models enabling versatile downstream applications such as 3D object synthesis from texts, image editing, and customized generation. We present a generic approach using latent diffusion models as powerful image priors for various visual synthesis tasks. Existing methods that utilize such priors fail to use these models' full capabilities. To improve this, our core ideas are 1) a feature matching loss between features from different layers of the decoder to provide detailed guidance and 2) a KL divergence loss to regularize the predicted latent features and stabilize the training. We demonstrate the efficacy of our approach on three different applications, text-to-3D, StyleGAN adaptation, and layered image editing. Extensive results show our method compares favorably against baselines. △ Less

Submitted 3 April, 2023; v1 submitted 16 February, 2023; originally announced February 2023.

Comments: Project website: https://latent-diffusion-prior.github.io/

arXiv:2302.07459 [pdf, other]

The Capacity for Moral Self-Correction in Large Language Models

Authors: Deep Ganguli, Amanda Askell, Nicholas Schiefer, Thomas I. Liao, Kamilė Lukošiūtė, Anna Chen, Anna Goldie, Azalia Mirhoseini, Catherine Olsson, Danny Hernandez, Dawn Drain, Dustin Li, Eli Tran-Johnson, Ethan Perez, Jackson Kernion, Jamie Kerr, Jared Mueller, Joshua Landau, Kamal Ndousse, Karina Nguyen, Liane Lovitt, Michael Sellitto, Nelson Elhage, Noemi Mercado, Nova DasSarma , et al. (24 additional authors not shown)

Abstract: We test the hypothesis that language models trained with reinforcement learning from human feedback (RLHF) have the capability to "morally self-correct" -- to avoid producing harmful outputs -- if instructed to do so. We find strong evidence in support of this hypothesis across three different experiments, each of which reveal different facets of moral self-correction. We find that the capability… ▽ More We test the hypothesis that language models trained with reinforcement learning from human feedback (RLHF) have the capability to "morally self-correct" -- to avoid producing harmful outputs -- if instructed to do so. We find strong evidence in support of this hypothesis across three different experiments, each of which reveal different facets of moral self-correction. We find that the capability for moral self-correction emerges at 22B model parameters, and typically improves with increasing model size and RLHF training. We believe that at this level of scale, language models obtain two capabilities that they can use for moral self-correction: (1) they can follow instructions and (2) they can learn complex normative concepts of harm like stereoty**, bias, and discrimination. As such, they can follow instructions to avoid certain kinds of morally harmful outputs. We believe our results are cause for cautious optimism regarding the ability to train language models to abide by ethical principles. △ Less

Submitted 18 February, 2023; v1 submitted 14 February, 2023; originally announced February 2023.

arXiv:2302.05242 [pdf, other]

Hierarchical Motion Planning under Probabilistic Temporal Tasks and Safe-Return Constraints

Authors: Meng Guo, Tianjun Liao, Junjie Wang, Zhongkui Li

Abstract: Safety is crucial for robotic missions within an uncertain environment. Common safety requirements such as collision avoidance are only state-dependent, which can be restrictive for complex missions. In this work, we address a more general formulation as safe-return constraints, which require the existence of a return-policy to drive the system back to a set of safe states with high probability. T… ▽ More Safety is crucial for robotic missions within an uncertain environment. Common safety requirements such as collision avoidance are only state-dependent, which can be restrictive for complex missions. In this work, we address a more general formulation as safe-return constraints, which require the existence of a return-policy to drive the system back to a set of safe states with high probability. The robot motion is modeled as a Markov Decision Process (MDP) with probabilistic labels, which can be highly non-ergodic. The robotic task is specified as Linear Temporal Logic (LTL) formulas over these labels, such as surveillance and transportation. We first provide theoretical guarantees on the re-formulation of such safe-return constraints, and a baseline solution based on computing two complete product automata. Furthermore, to tackle the computational complexity, we propose a hierarchical planning algorithm that combines the feature-based symbolic and temporal abstraction with constrained optimization. It synthesizes simultaneously two dependent motion policies: the outbound policy minimizes the overall cost of satisfying the task with a high probability, while the return policy ensures the safe-return constraints. The problem formulation is versatile regarding the robot model, task specifications and safety constraints. The proposed hierarchical algorithm is more efficient and can solve much larger problems than the baseline solution, with only a slight loss of optimality. Numerical validations include simulations and hardware experiments of a search-and-rescue mission and a planetary exploration mission over various system sizes. △ Less

Submitted 10 February, 2023; originally announced February 2023.

Comments: 16 pages, 11 figures

arXiv:2211.08888 [pdf, other]

ELDA: Using Edges to Have an Edge on Semantic Segmentation Based UDA

Authors: Ting-Hsuan Liao, Huang-Ru Liao, Shan-Ya Yang, Jie-En Yao, Li-Yuan Tsao, Hsu-Shen Liu, Bo-Wun Cheng, Chen-Hao Chao, Chia-Che Chang, Yi-Chen Lo, Chun-Yi Lee

Abstract: Many unsupervised domain adaptation (UDA) methods have been proposed to bridge the domain gap by utilizing domain invariant information. Most approaches have chosen depth as such information and achieved remarkable success. Despite their effectiveness, using depth as domain invariant information in UDA tasks may lead to multiple issues, such as excessively high extraction costs and difficulties in… ▽ More Many unsupervised domain adaptation (UDA) methods have been proposed to bridge the domain gap by utilizing domain invariant information. Most approaches have chosen depth as such information and achieved remarkable success. Despite their effectiveness, using depth as domain invariant information in UDA tasks may lead to multiple issues, such as excessively high extraction costs and difficulties in achieving a reliable prediction quality. As a result, we introduce Edge Learning based Domain Adaptation (ELDA), a framework which incorporates edge information into its training process to serve as a type of domain invariant information. In our experiments, we quantitatively and qualitatively demonstrate that the incorporation of edge information is indeed beneficial and effective and enables ELDA to outperform the contemporary state-of-the-art methods on two commonly adopted benchmarks for semantic segmentation based UDA tasks. In addition, we show that ELDA is able to better separate the feature distributions of different classes. We further provide an ablation analysis to justify our design decisions. △ Less

Submitted 16 November, 2022; originally announced November 2022.

Comments: Accepted by BMVC2022. Ting-Hsuan Liao and Huang-Ru Liao contributed equally to this work

arXiv:2211.03990 [pdf, other]

doi 10.1109/ICASSP43922.2022.9746741

Robust Unstructured Knowledge Access in Conversational Dialogue with ASR Errors

Authors: Yik-Cheung Tam, Jiacheng Xu, Jiakai Zou, Zecheng Wang, Tinglong Liao, Shuhan Yuan

Abstract: Performance of spoken language understanding (SLU) can be degraded with automatic speech recognition (ASR) errors. We propose a novel approach to improve SLU robustness by randomly corrupting clean training text with an ASR error simulator, followed by self-correcting the errors and minimizing the target classification loss in a joint manner. In the proposed error simulator, we leverage confusion… ▽ More Performance of spoken language understanding (SLU) can be degraded with automatic speech recognition (ASR) errors. We propose a novel approach to improve SLU robustness by randomly corrupting clean training text with an ASR error simulator, followed by self-correcting the errors and minimizing the target classification loss in a joint manner. In the proposed error simulator, we leverage confusion networks generated from an ASR decoder without human transcriptions to generate a variety of error patterns for model training. We evaluate our approach on the DSTC10 challenge targeted for knowledge-grounded task-oriented conversational dialogues with ASR errors. Experimental results show the effectiveness of our proposed approach, boosting the knowledge-seeking turn detection (KTD) F1 significantly from 0.9433 to 0.9904. Knowledge cluster classification is boosted from 0.7924 to 0.9333 in Recall@1. After knowledge document re-ranking, our approach shows significant improvement in all knowledge selection metrics, from 0.7358 to 0.7806 in Recall@1, from 0.8301 to 0.9333 in Recall@5, and from 0.7798 to 0.8460 in MRR@5 on the test set. In the recent DSTC10 evaluation, our approach demonstrates significant improvement in knowledge selection, boosting Recall@1 from 0.495 to 0.7144 compared to the official baseline. Our source code is released in GitHub https://github.com/yctam/dstc10_track2_task2.git. △ Less

Submitted 7 November, 2022; originally announced November 2022.

Comments: 7 pages, 2 figures. Accepted at ICASSP 2022

Journal ref: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 6702-6706

arXiv:2210.08993 [pdf, other]

When Digital Economy Meets Web3.0: Applications and Challenges

Authors: Chuan Chen, Lei Zhang, Yihao Li, Tianchi Liao, Siran Zhao, Zibin Zheng, Huawei Huang, Jia**g Wu

Abstract: With the continuous development of web technology, Web3.0 has attracted a considerable amount of attention due to its unique decentralized characteristics. The digital economy is an important driver of high-quality economic development and is currently in a rapid development stage. In the digital economy scenario, the centralized nature of the Internet and other characteristics usually bring about… ▽ More With the continuous development of web technology, Web3.0 has attracted a considerable amount of attention due to its unique decentralized characteristics. The digital economy is an important driver of high-quality economic development and is currently in a rapid development stage. In the digital economy scenario, the centralized nature of the Internet and other characteristics usually bring about security issues such as infringement and privacy leakage. Therefore, it is necessary to investigate how to use Web3.0 technologies to solve the pain points encountered in the development of the digital economy by fully exploring the critical technologies of digital economy and Web3.0. In this paper, we discuss the aspects of Web3.0 that should be integrated with the digital economy to better find the entry point to solve the problems by examining the latest advances of Web3.0 in machine learning, finance, and data management. We hope this research will inspire those who are involved in both academia and industry, and finally help to build a favourable ecology for the digital economy. △ Less

Submitted 29 October, 2022; v1 submitted 26 September, 2022; originally announced October 2022.

Comments: 14 pages, 5 figures

arXiv:2209.07313 [pdf, other]

HarDNet-DFUS: An Enhanced Harmonically-Connected Network for Diabetic Foot Ulcer Image Segmentation and Colonoscopy Polyp Segmentation

Authors: Ting-Yu Liao, Ching-Hui Yang, Yu-Wen Lo, Kuan-Ying Lai, Po-Huai Shen, Youn-Long Lin

Abstract: We present a neural network architecture for medical image segmentation of diabetic foot ulcers and colonoscopy polyps. Diabetic foot ulcers are caused by neuropathic and vascular complications of diabetes mellitus. In order to provide a proper diagnosis and treatment, wound care professionals need to extract accurate morphological features from the foot wounds. Using computer-aided systems is a p… ▽ More We present a neural network architecture for medical image segmentation of diabetic foot ulcers and colonoscopy polyps. Diabetic foot ulcers are caused by neuropathic and vascular complications of diabetes mellitus. In order to provide a proper diagnosis and treatment, wound care professionals need to extract accurate morphological features from the foot wounds. Using computer-aided systems is a promising approach to extract related morphological features and segment the lesions. We propose a convolution neural network called HarDNet-DFUS by enhancing the backbone and replacing the decoder of HarDNet-MSEG, which was SOTA for colonoscopy polyp segmentation in 2021. For the MICCAI 2022 Diabetic Foot Ulcer Segmentation Challenge (DFUC2022), we train HarDNet-DFUS using the DFUC2022 dataset and increase its robustness by means of five-fold cross validation, Test Time Augmentation, etc. In the validation phase of DFUC2022, HarDNet-DFUS achieved 0.7063 mean dice and was ranked third among all participants. In the final testing phase of DFUC2022, it achieved 0.7287 mean dice and was the first place winner. HarDNet-DFUS also deliver excellent performance for the colonoscopy polyp segmentation task. It achieves 0.924 mean Dice on the famous Kvasir dataset, an improvement of 1.2\% over the original HarDNet-MSEG. The codes are available on https://github.com/kytimmylai/DFUC2022 (for Diabetic Foot Ulcers Segmentation) and https://github.com/YuWenLo/HarDNet-DFUS (for Colonoscopy Polyp Segmentation). △ Less

Submitted 15 September, 2022; originally announced September 2022.

arXiv:2208.08892 [pdf, other]

Pixel-Wise Prediction based Visual Odometry via Uncertainty Estimation

Authors: Hao-Wei Chen, Ting-Hsuan Liao, Hsuan-Kung Yang, Chun-Yi Lee

Abstract: This paper introduces pixel-wise prediction based visual odometry (PWVO), which is a dense prediction task that evaluates the values of translation and rotation for every pixel in its input observations. PWVO employs uncertainty estimation to identify the noisy regions in the input observations, and adopts a selection mechanism to integrate pixel-wise predictions based on the estimated uncertainty… ▽ More This paper introduces pixel-wise prediction based visual odometry (PWVO), which is a dense prediction task that evaluates the values of translation and rotation for every pixel in its input observations. PWVO employs uncertainty estimation to identify the noisy regions in the input observations, and adopts a selection mechanism to integrate pixel-wise predictions based on the estimated uncertainty maps to derive the final translation and rotation. In order to train PWVO in a comprehensive fashion, we further develop a data generation workflow for generating synthetic training data. The experimental results show that PWVO is able to deliver favorable results. In addition, our analyses validate the effectiveness of the designs adopted in PWVO, and demonstrate that the uncertainty maps estimated by PWVO is capable of capturing the noises in its input observations. △ Less

Submitted 18 August, 2022; originally announced August 2022.

arXiv:2207.05621 [pdf, other]

MSP-Former: Multi-Scale Projection Transformer for Single Image Desnowing

Authors: Sixiang Chen, Tian Ye, Yun Liu, Taodong Liao, **gxia Jiang, Erkang Chen, Peng Chen

Abstract: Snow removal causes challenges due to its characteristic of complex degradations. To this end, targeted treatment of multi-scale snow degradations is critical for the network to learn effective snow removal. In order to handle the diverse scenes, we propose a multi-scale projection transformer (MSP-Former), which understands and covers a variety of snow degradation features in a multi-path manner,… ▽ More Snow removal causes challenges due to its characteristic of complex degradations. To this end, targeted treatment of multi-scale snow degradations is critical for the network to learn effective snow removal. In order to handle the diverse scenes, we propose a multi-scale projection transformer (MSP-Former), which understands and covers a variety of snow degradation features in a multi-path manner, and integrates comprehensive scene context information for clean reconstruction via self-attention operation. For the local details of various snow degradations, the local capture module is introduced in parallel to assist in the rebuilding of a clean image. Such design achieves the SOTA performance on three desnowing benchmark datasets while costing the low parameters and computational complexity, providing a guarantee of practicality. △ Less

Submitted 11 March, 2023; v1 submitted 12 July, 2022; originally announced July 2022.

Comments: Accepted to ICASSP'2023

arXiv:2204.11184 [pdf, other]

MVP-Human Dataset for 3D Human Avatar Reconstruction from Unconstrained Frames

Authors: Xiangyu Zhu, Tingting Liao, Jiang**g Lyu, Xiang Yan, Yunfeng Wang, Kan Guo, Qiong Cao, Stan Z. Li, Zhen Lei

Abstract: In this paper, we consider a novel problem of reconstructing a 3D human avatar from multiple unconstrained frames, independent of assumptions on camera calibration, capture space, and constrained actions. The problem should be addressed by a framework that takes multiple unconstrained images as inputs, and generates a shape-with-skinning avatar in the canonical space, finished in one feed-forward… ▽ More In this paper, we consider a novel problem of reconstructing a 3D human avatar from multiple unconstrained frames, independent of assumptions on camera calibration, capture space, and constrained actions. The problem should be addressed by a framework that takes multiple unconstrained images as inputs, and generates a shape-with-skinning avatar in the canonical space, finished in one feed-forward pass. To this end, we present 3D Avatar Reconstruction in the wild (ARwild), which first reconstructs the implicit skinning fields in a multi-level manner, by which the image features from multiple images are aligned and integrated to estimate a pixel-aligned implicit function that represents the clothed shape. To enable the training and testing of the new framework, we contribute a large-scale dataset, MVP-Human (Multi-View and multi-Pose 3D Human), which contains 400 subjects, each of which has 15 scans in different poses and 8-view images for each pose, providing 6,000 3D scans and 48,000 images in total. Overall, benefits from the specific network architecture and the diverse data, the trained model enables 3D avatar reconstruction from unconstrained frames and achieves state-of-the-art performance. △ Less

Submitted 17 May, 2023; v1 submitted 23 April, 2022; originally announced April 2022.

Comments: Accepted by IEEE Transactions on Biometrics, Behavior, and Identity Science (TBIOM)

arXiv:2203.07910 [pdf, other]

doi 10.1109/BIBM55620.2022.9995660

Deep Transfer Learning with Graph Neural Network for Sensor-Based Human Activity Recognition

Authors: Yan Yan, Tianzheng Liao, **** Zhao, Jiahong Wang, Liang Ma, Wei Lv, **g Xiong, Lei Wang

Abstract: The sensor-based human activity recognition (HAR) in mobile application scenarios is often confronted with sensor modalities variation and annotated data deficiency. Given this observation, we devised a graph-inspired deep learning approach toward the sensor-based HAR tasks, which was further used to build a deep transfer learning model toward giving a tentative solution for these two challenging… ▽ More The sensor-based human activity recognition (HAR) in mobile application scenarios is often confronted with sensor modalities variation and annotated data deficiency. Given this observation, we devised a graph-inspired deep learning approach toward the sensor-based HAR tasks, which was further used to build a deep transfer learning model toward giving a tentative solution for these two challenging problems. Specifically, we present a multi-layer residual structure involved graph convolutional neural network (ResGCNN) toward the sensor-based HAR tasks, namely the HAR-ResGCNN approach. Experimental results on the PAMAP2 and mHealth data sets demonstrate that our ResGCNN is effective at capturing the characteristics of actions with comparable results compared to other sensor-based HAR models (with an average accuracy of 98.18% and 99.07%, respectively). More importantly, the deep transfer learning experiments using the ResGCNN model show excellent transferability and few-shot learning performance. The graph-based framework shows good meta-learning ability and is supposed to be a promising solution in sensor-based HAR tasks. △ Less

Submitted 14 March, 2022; originally announced March 2022.

arXiv:2203.04927 [pdf, other]

doi 10.1109/IROS47612.2022.9981638

Investigation of Factorized Optical Flows as Mid-Level Representations

Authors: Hsuan-Kung Yang, Tsu-Ching Hsiao, Ting-Hsuan Liao, Hsu-Shen Liu, Li-Yuan Tsao, Tzu-Wen Wang, Shan-Ya Yang, Yu-Wen Chen, Huang-Ru Liao, Chun-Yi Lee

Abstract: In this paper, we introduce a new concept of incorporating factorized flow maps as mid-level representations, for bridging the perception and the control modules in modular learning based robotic frameworks. To investigate the advantages of factorized flow maps and examine their interplay with the other types of mid-level representations, we further develop a configurable framework, along with fou… ▽ More In this paper, we introduce a new concept of incorporating factorized flow maps as mid-level representations, for bridging the perception and the control modules in modular learning based robotic frameworks. To investigate the advantages of factorized flow maps and examine their interplay with the other types of mid-level representations, we further develop a configurable framework, along with four different environments that contain both static and dynamic objects, for analyzing the impacts of factorized optical flow maps on the performance of deep reinforcement learning agents. Based on this framework, we report our experimental results on various scenarios, and offer a set of analyses to justify our hypothesis. Finally, we validate flow factorization in real world scenarios. △ Less

Submitted 10 March, 2022; v1 submitted 9 March, 2022; originally announced March 2022.

Comments: Ting-Hsuan Liao, Hsu-Shen Liu, Li-Yuan Tsao, Tzu-Wen Wang, and Shan-Ya Yang contributed equally to this work, names listed in alphabetical order; This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Journal ref: 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

arXiv:2202.06276 [pdf, other]

Natural Image Stitching Using Depth Maps

Authors: Tianli Liao, Nan Li

Abstract: Natural image stitching (NIS) aims to create one natural-looking mosaic from two overlap** images that capture the same 3D scene from different viewing positions. Challenges inevitably arise when the scene is non-planar and the camera baseline is wide, since parallax becomes not negligible in such cases. In this paper, we propose a novel NIS method using depth maps, which generates natural-looki… ▽ More Natural image stitching (NIS) aims to create one natural-looking mosaic from two overlap** images that capture the same 3D scene from different viewing positions. Challenges inevitably arise when the scene is non-planar and the camera baseline is wide, since parallax becomes not negligible in such cases. In this paper, we propose a novel NIS method using depth maps, which generates natural-looking mosaics against parallax in both overlap** and non-overlap** regions. Firstly, we construct a robust fitting method to filter out the outliers in feature matches and estimate the epipolar geometry between input images. Then, we draw a triangulation of the target image and estimate multiple local homographies, one per triangle, based on the locations of their vertices, the rectified depth values and the epipolar geometry. Finally, the war** image is rendered by the backward map** of piece-wise homographies. Panorama is then produced via average blending and image inpainting. Experimental results demonstrate that the proposed method not only provides accurate alignment in the overlap** regions but also virtual naturalness in the non-overlap** region. △ Less

Submitted 22 February, 2023; v1 submitted 13 February, 2022; originally announced February 2022.

Comments: 10 pages, 8 figures, under review

arXiv:2201.02312 [pdf, other]

A Transfer Learning Pipeline for Educational Resource Discovery with Application in Leading Paragraph Generation

Authors: Irene Li, Thomas George, Alexander Fabbri, Tammy Liao, Benjamin Chen, Rina Kawamura, Richard Zhou, Vanessa Yan, Swapnil Hingmire, Dragomir Radev

Abstract: Effective human learning depends on a wide selection of educational materials that align with the learner's current understanding of the topic. While the Internet has revolutionized human learning or education, a substantial resource accessibility barrier still exists. Namely, the excess of online information can make it challenging to navigate and discover high-quality learning materials. In this… ▽ More Effective human learning depends on a wide selection of educational materials that align with the learner's current understanding of the topic. While the Internet has revolutionized human learning or education, a substantial resource accessibility barrier still exists. Namely, the excess of online information can make it challenging to navigate and discover high-quality learning materials. In this paper, we propose the educational resource discovery (ERD) pipeline that automates web resource discovery for novel domains. The pipeline consists of three main steps: data collection, feature extraction, and resource classification. We start with a known source domain and conduct resource discovery on two unseen target domains via transfer learning. We first collect frequent queries from a set of seed documents and search on the web to obtain candidate resources, such as lecture slides and introductory blog posts. Then we introduce a novel pretrained information retrieval deep neural network model, query-document masked language modeling (QD-MLM), to extract deep features of these candidate resources. We apply a tree-based classifier to decide whether the candidate is a positive learning resource. The pipeline achieves F1 scores of 0.94 and 0.82 when evaluated on two similar but novel target domains. Finally, we demonstrate how this pipeline can benefit an application: leading paragraph generation for surveys. This is the first study that considers various web resources for survey generation, to the best of our knowledge. We also release a corpus of 39,728 manually labeled web resources and 659 queries from NLP, Computer Vision (CV), and Statistics (STATS). △ Less

Submitted 6 January, 2022; originally announced January 2022.

arXiv:2112.08578 [pdf, other]

CLICKER: A Computational LInguistics Classification Scheme for Educational Resources

Authors: Swapnil Hingmire, Irene Li, Rena Kawamura, Benjamin Chen, Alexander Fabbri, Xiangru Tang, Yixin Liu, Thomas George, Tammy Liao, Wai Pan Wong, Vanessa Yan, Richard Zhou, Girish K. Palshikar, Dragomir Radev

Abstract: A classification scheme of a scientific subject gives an overview of its body of knowledge. It can also be used to facilitate access to research articles and other materials related to the subject. For example, the ACM Computing Classification System (CCS) is used in the ACM Digital Library search interface and also for indexing computer science papers. We observed that a comprehensive classificat… ▽ More A classification scheme of a scientific subject gives an overview of its body of knowledge. It can also be used to facilitate access to research articles and other materials related to the subject. For example, the ACM Computing Classification System (CCS) is used in the ACM Digital Library search interface and also for indexing computer science papers. We observed that a comprehensive classification system like CCS or Mathematics Subject Classification (MSC) does not exist for Computational Linguistics (CL) and Natural Language Processing (NLP). We propose a classification scheme -- CLICKER for CL/NLP based on the analysis of online lectures from 77 university courses on this subject. The currently proposed taxonomy includes 334 topics and focuses on educational aspects of CL/NLP; it is based primarily, but not exclusively, on lecture notes from NLP courses. We discuss how such a taxonomy can help in various real-world applications, including tutoring platforms, resource retrieval, resource recommendation, prerequisite chain learning, and survey generation. △ Less

Submitted 15 December, 2021; originally announced December 2021.

Comments: 7 pages, 5 figures, 4 tables

arXiv:2109.08628 [pdf, other]

Autonomous Vision-based UAV Landing with Collision Avoidance using Deep Learning

Authors: Tianpei Liao, Amal Haridevan, Yibo Liu, **jun Shan

Abstract: There is a risk of collision when multiple UAVs land simultaneously without communication on the same platform. This work accomplishes vision-based autonomous landing and uses a deep-learning-based method to realize collision avoidance during the landing process. There is a risk of collision when multiple UAVs land simultaneously without communication on the same platform. This work accomplishes vision-based autonomous landing and uses a deep-learning-based method to realize collision avoidance during the landing process. △ Less

Submitted 17 September, 2021; originally announced September 2021.

arXiv:2012.00944 [pdf, other]

Tensor Completion via Convolutional Sparse Coding Regularization

Authors: Zhebin Wu, Tianchi Liao, Chuan Chen, Cong Liu, Zibin Zheng, Xiongjun Zhang

Abstract: Tensor data often suffer from missing value problem due to the complex high-dimensional structure while acquiring them. To complete the missing information, lots of Low-Rank Tensor Completion (LRTC) methods have been proposed, most of which depend on the low-rank property of tensor data. In this way, the low-rank component of the original data could be recovered roughly. However, the shortcoming i… ▽ More Tensor data often suffer from missing value problem due to the complex high-dimensional structure while acquiring them. To complete the missing information, lots of Low-Rank Tensor Completion (LRTC) methods have been proposed, most of which depend on the low-rank property of tensor data. In this way, the low-rank component of the original data could be recovered roughly. However, the shortcoming is that the detail information can not be fully restored, no matter the Sum of the Nuclear Norm (SNN) nor the Tensor Nuclear Norm (TNN) based methods. On the contrary, in the field of signal processing, Convolutional Sparse Coding (CSC) can provide a good representation of the high-frequency component of the image, which is generally associated with the detail component of the data. Nevertheless, CSC can not handle the low-frequency component well. To this end, we propose two novel methods, LRTC-CSC-I and LRTC-CSC-II, which adopt CSC as a supplementary regularization for LRTC to capture the high-frequency components. Therefore, the LRTC-CSC methods can not only solve the missing value problem but also recover the details. Moreover, the regularizer CSC can be trained with small samples due to the sparsity characteristic. Extensive experiments show the effectiveness of LRTC-CSC methods, and quantitative evaluation indicates that the performance of our models are superior to state-of-the-art methods. △ Less

Submitted 6 May, 2021; v1 submitted 1 December, 2020; originally announced December 2020.

arXiv:2010.11658 [pdf, other]

On the Compressed-Oracle Technique, and Post-Quantum Security of Proofs of Sequential Work

Authors: Kai-Min Chung, Serge Fehr, Yu-Hsuan Huang, Tai-Ning Liao

Abstract: We revisit the so-called compressed oracle technique, introduced by Zhandry for analyzing quantum algorithms in the quantum random oracle model (QROM). To start off with, we offer a concise exposition of the technique, which easily extends to the parallel-query QROM, where in each query-round the considered algorithm may make several queries to the QROM in parallel. This variant of the QROM allows… ▽ More We revisit the so-called compressed oracle technique, introduced by Zhandry for analyzing quantum algorithms in the quantum random oracle model (QROM). To start off with, we offer a concise exposition of the technique, which easily extends to the parallel-query QROM, where in each query-round the considered algorithm may make several queries to the QROM in parallel. This variant of the QROM allows for a more fine-grained query-complexity analysis. Our main technical contribution is a framework that simplifies the use of (the parallel-query generalization of) the compressed oracle technique for proving query complexity results. With our framework in place, whenever applicable, it is possible to prove quantum query complexity lower bounds by means of purely classical reasoning. More than that, for typical examples the crucial classical observations that give rise to the classical bounds are sufficient to conclude the corresponding quantum bounds. We demonstrate this on a few examples, recovering known results (like the optimality of parallel Grover), but also obtaining new results (like the optimality of parallel BHT collision search). Our main target is the hardness of finding a $q$-chain with fewer than $q$ parallel queries, i.e., a sequence $x_0, x_1,\ldots, x_q$ with $x_i = H(x_{i-1})$ for all $1 \leq i \leq q$. The above problem of finding a hash chain is of fundamental importance in the context of proofs of sequential work. Indeed, as a concrete cryptographic application of our techniques, we prove that the "Simple Proofs of Sequential Work" proposed by Cohen and Pietrzak remains secure against quantum attacks. Such an analysis is not simply a matter of plugging in our new bound; the entire protocol needs to be analyzed in the light of a quantum attack. Thanks to our framework, this can now be done with purely classical reasoning. △ Less

Submitted 9 July, 2021; v1 submitted 22 October, 2020; originally announced October 2020.

arXiv:2005.03457 [pdf, other]

NTIRE 2020 Challenge on NonHomogeneous Dehazing

Authors: Codruta O. Ancuti, Cosmin Ancuti, Florin-Alexandru Vasluianu, Radu Timofte, **g Liu, Haiyan Wu, Yuan Xie, Yanyun Qu, Lizhuang Ma, Ziling Huang, Qili Deng, Ju-Chin Chao, Tsung-Shan Yang, Peng-Wen Chen, Po-Min Hsu, Tzu-Yi Liao, Chung-En Sun, Pei-Yuan Wu, Jeonghyeok Do, Jongmin Park, Munchurl Kim, Kareem Metwaly, Xuelu Li, Tiantong Guo, Vishal Monga , et al. (27 additional authors not shown)

Abstract: This paper reviews the NTIRE 2020 Challenge on NonHomogeneous Dehazing of images (restoration of rich details in hazy image). We focus on the proposed solutions and their results evaluated on NH-Haze, a novel dataset consisting of 55 pairs of real haze free and nonhomogeneous hazy images recorded outdoor. NH-Haze is the first realistic nonhomogeneous haze dataset that provides ground truth images.… ▽ More This paper reviews the NTIRE 2020 Challenge on NonHomogeneous Dehazing of images (restoration of rich details in hazy image). We focus on the proposed solutions and their results evaluated on NH-Haze, a novel dataset consisting of 55 pairs of real haze free and nonhomogeneous hazy images recorded outdoor. NH-Haze is the first realistic nonhomogeneous haze dataset that provides ground truth images. The nonhomogeneous haze has been produced using a professional haze generator that imitates the real conditions of haze scenes. 168 participants registered in the challenge and 27 teams competed in the final testing phase. The proposed solutions gauge the state-of-the-art in image dehazing. △ Less

Submitted 7 May, 2020; originally announced May 2020.

Comments: CVPR Workshops Proceedings 2020

arXiv:1911.09176 [pdf, other]

Lower Bounds for Function Inversion with Quantum Advice

Authors: Kai-Min Chung, Tai-Ning Liao, Luowen Qian

Abstract: Function inversion is the problem that given a random function $f: [M] \to [N]$, we want to find pre-image of any image $f^{-1}(y)$ in time $T$. In this work, we revisit this problem under the preprocessing model where we can compute some auxiliary information or advice of size $S$ that only depends on $f$ but not on $y$. It is a well-studied problem in the classical settings, however, it is not c… ▽ More Function inversion is the problem that given a random function $f: [M] \to [N]$, we want to find pre-image of any image $f^{-1}(y)$ in time $T$. In this work, we revisit this problem under the preprocessing model where we can compute some auxiliary information or advice of size $S$ that only depends on $f$ but not on $y$. It is a well-studied problem in the classical settings, however, it is not clear how quantum algorithms can solve this task any better besides invoking Grover's algorithm, which does not leverage the power of preprocessing. Nayebi et al. proved a lower bound $ST^2 \ge \tildeΩ(N)$ for quantum algorithms inverting permutations, however, they only consider algorithms with classical advice. Hhan et al. subsequently extended this lower bound to fully quantum algorithms for inverting permutations. In this work, we give the same asymptotic lower bound to fully quantum algorithms for inverting functions for fully quantum algorithms under the regime where $M = O(N)$. In order to prove these bounds, we generalize the notion of quantum random access code, originally introduced by Ambainis et al., to the setting where we are given a list of (not necessarily independent) random variables, and we wish to compress them into a variable-length encoding such that we can retrieve a random element just using the encoding with high probability. As our main technical contribution, we give a nearly tight lower bound (for a wide parameter range) for this generalized notion of quantum random access codes, which may be of independent interest. △ Less

Submitted 8 April, 2020; v1 submitted 20 November, 2019; originally announced November 2019.

Comments: ITC full version

arXiv:1905.01334 [pdf, other]

Data-efficient Learning of Morphology and Controller for a Microrobot

Authors: Thomas Liao, Grant Wang, Brian Yang, Rene Lee, Kristofer Pister, Sergey Levine, Roberto Calandra

Abstract: Robot design is often a slow and difficult process requiring the iterative construction and testing of prototypes, with the goal of sequentially optimizing the design. For most robots, this process is further complicated by the need, when validating the capabilities of the hardware to solve the desired task, to already have an appropriate controller, which is in turn designed and tuned for the spe… ▽ More Robot design is often a slow and difficult process requiring the iterative construction and testing of prototypes, with the goal of sequentially optimizing the design. For most robots, this process is further complicated by the need, when validating the capabilities of the hardware to solve the desired task, to already have an appropriate controller, which is in turn designed and tuned for the specific hardware. In this paper, we propose a novel approach, HPC-BBO, to efficiently and automatically design hardware configurations, and evaluate them by also automatically tuning the corresponding controller. HPC-BBO is based on a hierarchical Bayesian optimization process which iteratively optimizes morphology configurations (based on the performance of the previous designs during the controller learning process) and subsequently learns the corresponding controllers (exploiting the knowledge collected from optimizing for previous morphologies). Moreover, HPC-BBO can select a "batch" of multiple morphology designs at once, thus parallelizing hardware validation and reducing the number of time-consuming production cycles. We validate HPC-BBO on the design of the morphology and controller for a simulated 6-legged microrobot. Experimental results show that HPC-BBO outperforms multiple competitive baselines, and yields a $360\%$ reduction in production cycles over standard Bayesian optimization, thus reducing the hypothetical manufacturing time of our microrobot from 21 to 4 months. △ Less

Submitted 3 May, 2019; originally announced May 2019.

Comments: Accepted at ICRA-2019. 6 pages

arXiv:1809.08753 [pdf, other]

An Iterative Refinement Approach for Social Media Headline Prediction

Authors: Chih-Chung Hsu, Chia-Yen Lee, Ting-Xuan Liao, Jun-Yi Lee, Tsai-Yne Hou, Ying-Chu Kuo, **g-Wen Lin, Ching-Yi Hsueh, Zhong-Xuan Zhan, Hsiang-Chin Chien

Abstract: In this study, we propose a novel iterative refinement approach to predict the popularity score of the social media meta-data effectively. With the rapid growth of the social media on the Internet, how to adequately forecast the view count or popularity becomes more important. Conventionally, the ensemble approach such as random forest regression achieves high and stable performance on various pre… ▽ More In this study, we propose a novel iterative refinement approach to predict the popularity score of the social media meta-data effectively. With the rapid growth of the social media on the Internet, how to adequately forecast the view count or popularity becomes more important. Conventionally, the ensemble approach such as random forest regression achieves high and stable performance on various prediction tasks. However, most of the regression methods may not precisely predict the extreme high or low values. To address this issue, we first predict the initial popularity score and retrieve their residues. In order to correctly compensate those extreme values, we adopt an ensemble regressor to compensate the residues to further improve the prediction performance. Comprehensive experiments are conducted to demonstrate the proposed iterative refinement approach outperforms the state-of-the-art regression approach. △ Less

Submitted 24 September, 2018; originally announced September 2018.

Comments: 5 pages, ACM Multimedia Conference 2018

arXiv:1808.01620 [pdf, other]

Schema Integration on Massive Data Sources

Authors: Tianbao Lia, Hongzhi Wang, Jianzhong Li, Hong Gao

Abstract: As the fundamental phrase of collecting and analyzing data, data integration is used in many applications, such as data cleaning, bioinformatics and pattern recognition. In big data era, one of the major problems of data integration is to obtain the global schema of data sources since the global schema could be hardly derived from massive data sources directly. In this paper, we attempt to solve s… ▽ More As the fundamental phrase of collecting and analyzing data, data integration is used in many applications, such as data cleaning, bioinformatics and pattern recognition. In big data era, one of the major problems of data integration is to obtain the global schema of data sources since the global schema could be hardly derived from massive data sources directly. In this paper, we attempt to solve such schema integration problem. For different scenarios, we develop batch and incremental schema integration algorithms. We consider the representation difference of attribute names in various data sources and propose ED Join and Semantic Join algorithms to integrate attributes with different representations. Extensive experimental results demonstrate that the proposed algorithms could integrate schemas efficiently and effectively. △ Less

Submitted 5 August, 2018; originally announced August 2018.

arXiv:1807.05119 [pdf, other]

Learning-based Natural Geometric Matching with Homography Prior

Authors: Yifang Xu, Tianli Liao, **g Chen

Abstract: Geometric matching is a key step in computer vision tasks. Previous learning-based methods for geometric matching concentrate more on improving alignment quality, while we argue the importance of naturalness issue simultaneously. To deal with this, firstly, Pearson correlation is applied to handle large intra-class variations of features in feature matching stage. Then, we parametrize homography t… ▽ More Geometric matching is a key step in computer vision tasks. Previous learning-based methods for geometric matching concentrate more on improving alignment quality, while we argue the importance of naturalness issue simultaneously. To deal with this, firstly, Pearson correlation is applied to handle large intra-class variations of features in feature matching stage. Then, we parametrize homography transformation with 9 parameters in full connected layer of our network, to better characterize large viewpoint variations compared with affine transformation. Furthermore, a novel loss function with Gaussian weights guarantees the model accuracy and efficiency in training procedure. Finally, we provide two choices for different purposes in geometric matching. When compositing homography with affine transformation, the alignment accuracy improves and all lines are preserved, which results in a more natural transformed image. When compositing homography with non-rigid thin-plate-spline transformation, the alignment accuracy further improves. Experimental results on Proposal Flow dataset show that our method outperforms state-of-the-art methods, both in terms of alignment accuracy and naturalness. △ Less

Submitted 13 July, 2018; originally announced July 2018.

Comments: 13 pages,4 figures

arXiv:1805.09578 [pdf, other]

Coarse-to-fine Seam Estimation for Image Stitching

Authors: Tianli Liao, **g Chen, Yifang Xu

Abstract: Seam-cutting and seam-driven techniques have been proven effective for handling imperfect image series in image stitching. Generally, seam-driven is to utilize seam-cutting to find a best seam from one or finite alignment hypotheses based on a predefined seam quality metric. However, the quality metrics in most methods are defined to measure the average performance of the pixels on the seam withou… ▽ More Seam-cutting and seam-driven techniques have been proven effective for handling imperfect image series in image stitching. Generally, seam-driven is to utilize seam-cutting to find a best seam from one or finite alignment hypotheses based on a predefined seam quality metric. However, the quality metrics in most methods are defined to measure the average performance of the pixels on the seam without considering the relevance and variance among them. This may cause that the seam with the minimal measure is not optimal (perception-inconsistent) in human perception. In this paper, we propose a novel coarse-to-fine seam estimation method which applies the evaluation in a different way. For pixels on the seam, we develop a patch-point evaluation algorithm concentrating more on the correlation and variation of them. The evaluations are then used to recalculate the difference map of the overlap** region and reestimate a stitching seam. This evaluation-reestimation procedure iterates until the current seam changes negligibly comparing with the previous seams. Experiments show that our proposed method can finally find a nearly perception-consistent seam after several iterations, which outperforms the conventional seam-cutting and other seam-driven methods. △ Less

Submitted 24 May, 2018; originally announced May 2018.

Comments: 5 pages, 4 figures

arXiv:1804.07492 [pdf, other]

Graph-based Hypothesis Generation for Parallax-tolerant Image Stitching

Authors: **g Chen, Nan Li, Tianli Liao

Abstract: The seam-driven approach has been proven fairly effective for parallax-tolerant image stitching, whose strategy is to search for an invisible seam from finite representative hypotheses of local alignment. In this paper, we propose a graph-based hypothesis generation and a seam-guided local alignment for improving the effectiveness and the efficiency of the seam-driven approach. The experiment demo… ▽ More The seam-driven approach has been proven fairly effective for parallax-tolerant image stitching, whose strategy is to search for an invisible seam from finite representative hypotheses of local alignment. In this paper, we propose a graph-based hypothesis generation and a seam-guided local alignment for improving the effectiveness and the efficiency of the seam-driven approach. The experiment demonstrates the significant reduction of number of hypotheses and the improved quality of naturalness of final stitching results, comparing to the state-of-the-art method SEAGULL. △ Less

Submitted 20 April, 2018; originally announced April 2018.

Comments: 3 pages, 3 figures, 2 tables

arXiv:1803.06655 [pdf, other]

Ratio-Preserving Half-Cylindrical Warps for Natural Image Stitching

Authors: Yifang Xu, **g Chen, Tianli Liao

Abstract: A novel warp for natural image stitching is proposed that utilizes the property of cylindrical warp and a horizontal pixel selection strategy. The proposed ratio-preserving half-cylindrical warp is a combination of homography and cylindrical warps which guarantees alignment by homography and possesses less projective distortion by cylindrical warp. Unlike previous approaches applying cylindrical w… ▽ More A novel warp for natural image stitching is proposed that utilizes the property of cylindrical warp and a horizontal pixel selection strategy. The proposed ratio-preserving half-cylindrical warp is a combination of homography and cylindrical warps which guarantees alignment by homography and possesses less projective distortion by cylindrical warp. Unlike previous approaches applying cylindrical warp before homography, we use partition lines to divide the image into different parts and apply homography in the overlap** region while a composition of homography and cylindrical warps in the non-overlap** region. The pixel selection strategy then samples the points in horizontal and reconstructs the image via interpolation to further reduce horizontal distortion by maintaining the ratio as similarity. With applying half-cylindrical warp and horizontal pixel selection, the projective distortion in vertical and horizontal is mitigated simultaneously. Experiments show that our warp is efficient and produces a more natural-looking stitched result than previous methods. △ Less

Submitted 18 March, 2018; originally announced March 2018.

Comments: 3 pages, 5 figures

arXiv:1802.04645 [pdf, other]

doi 10.1109/TIP.2019.2934344

Single-Perspective Warps in Natural Image Stitching

Authors: Tianli Liao, Nan Li

Abstract: Results of image stitching can be perceptually divided into single-perspective and multiple-perspective. Compared to the multiple-perspective result, the single-perspective result excels in perspective consistency but suffers from projective distortion. In this paper, we propose two single-perspective warps for natural image stitching. The first one is a parametric warp, which is a combination of… ▽ More Results of image stitching can be perceptually divided into single-perspective and multiple-perspective. Compared to the multiple-perspective result, the single-perspective result excels in perspective consistency but suffers from projective distortion. In this paper, we propose two single-perspective warps for natural image stitching. The first one is a parametric warp, which is a combination of the as-projective-as-possible warp and the quasi-homography warp via dual-feature. The second one is a mesh-based warp, which is determined by optimizing a total energy function that simultaneously emphasizes different characteristics of the single-perspective warp, including alignment, naturalness, distortion and saliency. A comprehensive evaluation demonstrates that the proposed warp outperforms some state-of-the-art warps, including homography, APAP, AutoStitch, SPHP and GSP. △ Less

Submitted 7 March, 2018; v1 submitted 13 February, 2018; originally announced February 2018.

Comments: 10 pages, 10 figures

Showing 1–50 of 51 results for author: Liao, T