Search | arXiv e-print repository

Refining Self-Supervised Learnt Speech Representation using Brain Activations

Authors: Hengyu Li, Kangdi Mei, Zhaoci Liu, Yang Ai, Li** Chen, Jie Zhang, Zhenhua Ling

Abstract: It was shown in literature that speech representations extracted by self-supervised pre-trained models exhibit similarities with brain activations of human for speech perception and fine-tuning speech representation models on downstream tasks can further improve the similarity. However, it still remains unclear if this similarity can be used to optimize the pre-trained speech models. In this work,… ▽ More It was shown in literature that speech representations extracted by self-supervised pre-trained models exhibit similarities with brain activations of human for speech perception and fine-tuning speech representation models on downstream tasks can further improve the similarity. However, it still remains unclear if this similarity can be used to optimize the pre-trained speech models. In this work, we therefore propose to use the brain activations recorded by fMRI to refine the often-used wav2vec2.0 model by aligning model representations toward human neural responses. Experimental results on SUPERB reveal that this operation is beneficial for several downstream tasks, e.g., speaker verification, automatic speech recognition, intent classification.One can then consider the proposed method as a new alternative to improve self-supervised speech models. △ Less

Submitted 13 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

Comments: accpeted by Interspeech2024

arXiv:2405.06907 [pdf, other]

AIOS Compiler: LLM as Interpreter for Natural Language Programming and Flow Programming of AI Agents

Authors: Shuyuan Xu, Zelong Li, Kai Mei, Yongfeng Zhang

Abstract: Since their inception, programming languages have trended towards greater readability and lower barriers for programmers. Following this trend, natural language can be a promising type of programming language that provides great flexibility and usability and helps towards the democracy of programming. However, the inherent vagueness, ambiguity, and verbosity of natural language pose significant ch… ▽ More Since their inception, programming languages have trended towards greater readability and lower barriers for programmers. Following this trend, natural language can be a promising type of programming language that provides great flexibility and usability and helps towards the democracy of programming. However, the inherent vagueness, ambiguity, and verbosity of natural language pose significant challenges in develo** an interpreter that can accurately understand the programming logic and execute instructions written in natural language. Fortunately, recent advancements in Large Language Models (LLMs) have demonstrated remarkable proficiency in interpreting complex natural language. Inspired by this, we develop a novel system for Code Representation and Execution (CoRE), which employs LLM as interpreter to interpret and execute natural language instructions. The proposed system unifies natural language programming, pseudo-code programming, and flow programming under the same representation for constructing language agents, while LLM serves as the interpreter to interpret and execute the agent programs. In this paper, we begin with defining the programming syntax that structures natural language instructions logically. During the execution, we incorporate external memory to minimize redundancy. Furthermore, we equip the designed interpreter with the capability to invoke external tools, compensating for the limitations of LLM in specialized domains or when accessing real-time information. This work is open-source at https://github.com/agiresearch/CoRE, https://github.com/agiresearch/OpenAGI, and https://github.com/agiresearch/AIOS. △ Less

Submitted 21 May, 2024; v1 submitted 11 May, 2024; originally announced May 2024.

Comments: 12 pages, 6 figures, comments and suggestions are welcome

arXiv:2404.07066 [pdf, other]

Exploring Concept Depth: How Large Language Models Acquire Knowledge at Different Layers?

Authors: Mingyu **, Qinkai Yu, **gyuan Huang, Qingcheng Zeng, Zhenting Wang, Wenyue Hua, Haiyan Zhao, Kai Mei, Yanda Meng, Kaize Ding, Fan Yang, Mengnan Du, Yongfeng Zhang

Abstract: Large language models (LLMs) have shown remarkable performances across a wide range of tasks. However, the mechanisms by which these models encode tasks of varying complexities remain poorly understood. In this paper, we explore the hypothesis that LLMs process concepts of varying complexities in different layers, introducing the idea of "Concept Depth" to suggest that more complex concepts are ty… ▽ More Large language models (LLMs) have shown remarkable performances across a wide range of tasks. However, the mechanisms by which these models encode tasks of varying complexities remain poorly understood. In this paper, we explore the hypothesis that LLMs process concepts of varying complexities in different layers, introducing the idea of "Concept Depth" to suggest that more complex concepts are typically acquired in deeper layers. Specifically, we categorize concepts based on their level of abstraction, defining them in the order of increasing complexity within factual, emotional, and inferential tasks. We conduct extensive probing experiments using layer-wise representations across various LLM families (Gemma, LLaMA, QWen) on various datasets spanning the three domains of tasks. Our findings reveal that models could efficiently conduct probing for simpler tasks in shallow layers, and more complex tasks typically necessitate deeper layers for accurate understanding. Additionally, we examine how external factors, such as adding noise to the input and quantizing the model weights, might affect layer-wise representations. Our findings suggest that these factors can impede the development of a conceptual understanding of LLMs until deeper layers are explored. We hope that our proposed concept and experimental insights will enhance the understanding of the mechanisms underlying LLMs. Our codes are available at https://github.com/Luckfort/CD. △ Less

Submitted 30 April, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

Comments: 12 pages

arXiv:2404.01367 [pdf, other]

Bigger is not Always Better: Scaling Properties of Latent Diffusion Models

Authors: Kangfu Mei, Zhengzhong Tu, Mauricio Delbracio, Hossein Talebi, Vishal M. Patel, Peyman Milanfar

Abstract: We study the scaling properties of latent diffusion models (LDMs) with an emphasis on their sampling efficiency. While improved network architecture and inference algorithms have shown to effectively boost sampling efficiency of diffusion models, the role of model size -- a critical determinant of sampling efficiency -- has not been thoroughly examined. Through empirical analysis of established te… ▽ More We study the scaling properties of latent diffusion models (LDMs) with an emphasis on their sampling efficiency. While improved network architecture and inference algorithms have shown to effectively boost sampling efficiency of diffusion models, the role of model size -- a critical determinant of sampling efficiency -- has not been thoroughly examined. Through empirical analysis of established text-to-image diffusion models, we conduct an in-depth investigation into how model size influences sampling efficiency across varying sampling steps. Our findings unveil a surprising trend: when operating under a given inference budget, smaller models frequently outperform their larger equivalents in generating high-quality results. Moreover, we extend our study to demonstrate the generalizability of the these findings by applying various diffusion samplers, exploring diverse downstream tasks, evaluating post-distilled models, as well as comparing performance relative to training compute. These findings open up new pathways for the development of LDM scaling strategies which can be employed to enhance generative capabilities within limited inference budgets. △ Less

Submitted 1 April, 2024; originally announced April 2024.

arXiv:2403.16971 [pdf, other]

AIOS: LLM Agent Operating System

Authors: Kai Mei, Zelong Li, Shuyuan Xu, Ruosong Ye, Yingqiang Ge, Yongfeng Zhang

Abstract: The integration and deployment of large language model (LLM)-based intelligent agents have been fraught with challenges that compromise their efficiency and efficacy. Among these issues are sub-optimal scheduling and resource allocation of agent requests over the LLM, the difficulties in maintaining context during interactions between agent and LLM, and the complexities inherent in integrating het… ▽ More The integration and deployment of large language model (LLM)-based intelligent agents have been fraught with challenges that compromise their efficiency and efficacy. Among these issues are sub-optimal scheduling and resource allocation of agent requests over the LLM, the difficulties in maintaining context during interactions between agent and LLM, and the complexities inherent in integrating heterogeneous agents with different capabilities and specializations. The rapid increase of agent quantity and complexity further exacerbates these issues, often leading to bottlenecks and sub-optimal utilization of resources. Inspired by these challenges, this paper presents AIOS, an LLM agent operating system, which embeds large language model into operating systems (OS) as the brain of the OS, enabling an operating system "with soul" -- an important step towards AGI. Specifically, AIOS is designed to optimize resource allocation, facilitate context switch across agents, enable concurrent execution of agents, provide tool service for agents, and maintain access control for agents. We present the architecture of such an operating system, outline the core challenges it aims to resolve, and provide the basic design and implementation of the AIOS. Our experiments on concurrent execution of multiple agents demonstrate the reliability and efficiency of our AIOS modules. Through this, we aim to not only improve the performance and efficiency of LLM agents but also to pioneer for better development and deployment of the AIOS ecosystem in the future. The project is open-source at https://github.com/agiresearch/AIOS. △ Less

Submitted 25 March, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

Comments: 14 pages, 5 figures, 5 tables; comments and suggestions are appreciated

arXiv:2402.13184 [pdf, other]

What if LLMs Have Different World Views: Simulating Alien Civilizations with LLM-based Agents

Authors: Mingyu **, Beichen Wang, Zhaoqian Xue, Suiyuan Zhu, Wenyue Hua, Hua Tang, Kai Mei, Mengnan Du, Yongfeng Zhang

Abstract: In this study, we introduce "CosmoAgent," an innovative artificial intelligence framework utilizing Large Language Models (LLMs) to simulate complex interactions between human and extraterrestrial civilizations, with a special emphasis on Stephen Hawking's cautionary advice about not sending radio signals haphazardly into the universe. The goal is to assess the feasibility of peaceful coexistence… ▽ More In this study, we introduce "CosmoAgent," an innovative artificial intelligence framework utilizing Large Language Models (LLMs) to simulate complex interactions between human and extraterrestrial civilizations, with a special emphasis on Stephen Hawking's cautionary advice about not sending radio signals haphazardly into the universe. The goal is to assess the feasibility of peaceful coexistence while considering potential risks that could threaten well-intentioned civilizations. Employing mathematical models and state transition matrices, our approach quantitatively evaluates the development trajectories of civilizations, offering insights into future decision-making at critical points of growth and saturation. Furthermore, the paper acknowledges the vast diversity in potential living conditions across the universe, which could foster unique cosmologies, ethical codes, and worldviews among various civilizations. Recognizing the Earth-centric bias inherent in current LLM designs, we propose the novel concept of using LLMs with diverse ethical paradigms and simulating interactions between entities with distinct moral principles. This innovative research provides a new way to understand complex inter-civilizational dynamics, expanding our perspective while pioneering novel strategies for conflict resolution, crucial for preventing interstellar conflicts. We have also released the code and datasets to enable further academic investigation into this interesting area of research. The code is available at https://github.com/agiresearch/AlienAgent. △ Less

Submitted 20 February, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

arXiv:2402.08021 [pdf, other]

doi 10.1145/3630106.3658996

Careless Whisper: Speech-to-Text Hallucination Harms

Authors: Allison Koenecke, Anna Seo Gyeong Choi, Katelyn X. Mei, Hilke Schellmann, Mona Sloane

Abstract: Speech-to-text services aim to transcribe input audio as accurately as possible. They increasingly play a role in everyday life, for example in personal voice assistants or in customer-company interactions. We evaluate Open AI's Whisper, a state-of-the-art automated speech recognition service outperforming industry competitors, as of 2023. While many of Whisper's transcriptions were highly accurat… ▽ More Speech-to-text services aim to transcribe input audio as accurately as possible. They increasingly play a role in everyday life, for example in personal voice assistants or in customer-company interactions. We evaluate Open AI's Whisper, a state-of-the-art automated speech recognition service outperforming industry competitors, as of 2023. While many of Whisper's transcriptions were highly accurate, we find that roughly 1\% of audio transcriptions contained entire hallucinated phrases or sentences which did not exist in any form in the underlying audio. We thematically analyze the Whisper-hallucinated content, finding that 38\% of hallucinations include explicit harms such as perpetuating violence, making up inaccurate associations, or implying false authority. We then study why hallucinations occur by observing the disparities in hallucination rates between speakers with aphasia (who have a lowered ability to express themselves using speech and voice) and a control group. We find that hallucinations disproportionately occur for individuals who speak with longer shares of non-vocal durations -- a common symptom of aphasia. We call on industry practitioners to ameliorate these language-model-based hallucinations in Whisper, and to raise awareness of potential biases amplified by hallucinations in downstream applications of speech-to-text models. △ Less

Submitted 2 May, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

arXiv:2312.02156 [pdf, other]

Latent Feature-Guided Diffusion Models for Shadow Removal

Authors: Kangfu Mei, Luis Figueroa, Zhe Lin, Zhihong Ding, Scott Cohen, Vishal M. Patel

Abstract: Recovering textures under shadows has remained a challenging problem due to the difficulty of inferring shadow-free scenes from shadow images. In this paper, we propose the use of diffusion models as they offer a promising approach to gradually refine the details of shadow regions during the diffusion process. Our method improves this process by conditioning on a learned latent feature space that… ▽ More Recovering textures under shadows has remained a challenging problem due to the difficulty of inferring shadow-free scenes from shadow images. In this paper, we propose the use of diffusion models as they offer a promising approach to gradually refine the details of shadow regions during the diffusion process. Our method improves this process by conditioning on a learned latent feature space that inherits the characteristics of shadow-free images, thus avoiding the limitation of conventional methods that condition on degraded images only. Additionally, we propose to alleviate potential local optima during training by fusing noise features with the diffusion network. We demonstrate the effectiveness of our approach which outperforms the previous best method by 13% in terms of RMSE on the AISTD dataset. Further, we explore instance-level shadow removal, where our model outperforms the previous best method by 82% in terms of RMSE on the DESOBA dataset. △ Less

Submitted 4 December, 2023; originally announced December 2023.

Comments: project page see https://kfmei.page/shadow-diffusion/index.html

arXiv:2311.17227 [pdf, other]

War and Peace (WarAgent): Large Language Model-based Multi-Agent Simulation of World Wars

Authors: Wenyue Hua, Lizhou Fan, Lingyao Li, Kai Mei, Jianchao Ji, Yingqiang Ge, Libby Hemphill, Yongfeng Zhang

Abstract: Can we avoid wars at the crossroads of history? This question has been pursued by individuals, scholars, policymakers, and organizations throughout human history. In this research, we attempt to answer the question based on the recent advances of Artificial Intelligence (AI) and Large Language Models (LLMs). We propose \textbf{WarAgent}, an LLM-powered multi-agent AI system, to simulate the partic… ▽ More Can we avoid wars at the crossroads of history? This question has been pursued by individuals, scholars, policymakers, and organizations throughout human history. In this research, we attempt to answer the question based on the recent advances of Artificial Intelligence (AI) and Large Language Models (LLMs). We propose \textbf{WarAgent}, an LLM-powered multi-agent AI system, to simulate the participating countries, their decisions, and the consequences, in historical international conflicts, including the World War I (WWI), the World War II (WWII), and the Warring States Period (WSP) in Ancient China. By evaluating the simulation effectiveness, we examine the advancements and limitations of cutting-edge AI systems' abilities in studying complex collective human behaviors such as international conflicts under diverse settings. In these simulations, the emergent interactions among agents also offer a novel perspective for examining the triggers and conditions that lead to war. Our findings offer data-driven and AI-augmented insights that can redefine how we approach conflict resolution and peacekee** strategies. The implications stretch beyond historical analysis, offering a blueprint for using AI to understand human history and possibly prevent future international conflicts. Code and data are available at \url{https://github.com/agiresearch/WarAgent}. △ Less

Submitted 30 January, 2024; v1 submitted 28 November, 2023; originally announced November 2023.

Comments: 47 pages, 9 figures, 5 tables

arXiv:2310.17488 [pdf, other]

LightLM: A Lightweight Deep and Narrow Language Model for Generative Recommendation

Authors: Kai Mei, Yongfeng Zhang

Abstract: This paper presents LightLM, a lightweight Transformer-based language model for generative recommendation. While Transformer-based generative modeling has gained importance in various AI sub-fields such as NLP and vision, generative recommendation is still in its infancy due to its unique demand on personalized generative modeling. Existing works on generative recommendation often use NLP-oriented… ▽ More This paper presents LightLM, a lightweight Transformer-based language model for generative recommendation. While Transformer-based generative modeling has gained importance in various AI sub-fields such as NLP and vision, generative recommendation is still in its infancy due to its unique demand on personalized generative modeling. Existing works on generative recommendation often use NLP-oriented Transformer architectures such as T5, GPT, LLaMA and M6, which are heavy-weight and are not specifically designed for recommendation tasks. LightLM tackles the issue by introducing a light-weight deep and narrow Transformer architecture, which is specifically tailored for direct generation of recommendation items. This structure is especially apt for straightforward generative recommendation and stems from the observation that language model does not have to be too wide for this task, as the input predominantly consists of short tokens that are well-suited for the model's capacity. We also show that our devised user and item ID indexing methods, i.e., Spectral Collaborative Indexing (SCI) and Graph Collaborative Indexing (GCI), enables the deep and narrow Transformer architecture to outperform large-scale language models for recommendation. Besides, to address the hallucination problem of generating items as output, we propose the constrained generation process for generative recommenders. Experiments on real-world datasets show that LightLM outperforms various competitive baselines in terms of both recommendation accuracy and efficiency. The code can be found at https://github.com/dongyuanjushi/LightLM. △ Less

Submitted 29 October, 2023; v1 submitted 26 October, 2023; originally announced October 2023.

arXiv:2310.01407 [pdf, other]

CoDi: Conditional Diffusion Distillation for Higher-Fidelity and Faster Image Generation

Authors: Kangfu Mei, Mauricio Delbracio, Hossein Talebi, Zhengzhong Tu, Vishal M. Patel, Peyman Milanfar

Abstract: Large generative diffusion models have revolutionized text-to-image generation and offer immense potential for conditional generation tasks such as image enhancement, restoration, editing, and compositing. However, their widespread adoption is hindered by the high computational cost, which limits their real-time application. To address this challenge, we introduce a novel method dubbed CoDi, that… ▽ More Large generative diffusion models have revolutionized text-to-image generation and offer immense potential for conditional generation tasks such as image enhancement, restoration, editing, and compositing. However, their widespread adoption is hindered by the high computational cost, which limits their real-time application. To address this challenge, we introduce a novel method dubbed CoDi, that adapts a pre-trained latent diffusion model to accept additional image conditioning inputs while significantly reducing the sampling steps required to achieve high-quality results. Our method can leverage architectures such as ControlNet to incorporate conditioning inputs without compromising the model's prior knowledge gained during large scale pre-training. Additionally, a conditional consistency loss enforces consistent predictions across diffusion steps, effectively compelling the model to generate high-quality images with conditions in a few steps. Our conditional-task learning and distillation approach outperforms previous distillation methods, achieving a new state-of-the-art in producing high-quality images with very few steps (e.g., 1-4) across multiple tasks, including super-resolution, text-guided image editing, and depth-to-image generation. △ Less

Submitted 17 February, 2024; v1 submitted 2 October, 2023; originally announced October 2023.

arXiv:2306.05550 [pdf, other]

doi 10.1145/3593013.3594109

Bias Against 93 Stigmatized Groups in Masked Language Models and Downstream Sentiment Classification Tasks

Authors: Katelyn X. Mei, Sonia Fereidooni, Aylin Caliskan

Abstract: The rapid deployment of artificial intelligence (AI) models demands a thorough investigation of biases and risks inherent in these models to understand their impact on individuals and society. This study extends the focus of bias evaluation in extant work by examining bias against social stigmas on a large scale. It focuses on 93 stigmatized groups in the United States, including a wide range of c… ▽ More The rapid deployment of artificial intelligence (AI) models demands a thorough investigation of biases and risks inherent in these models to understand their impact on individuals and society. This study extends the focus of bias evaluation in extant work by examining bias against social stigmas on a large scale. It focuses on 93 stigmatized groups in the United States, including a wide range of conditions related to disease, disability, drug use, mental illness, religion, sexuality, socioeconomic status, and other relevant factors. We investigate bias against these groups in English pre-trained Masked Language Models (MLMs) and their downstream sentiment classification tasks. To evaluate the presence of bias against 93 stigmatized conditions, we identify 29 non-stigmatized conditions to conduct a comparative analysis. Building upon a psychology scale of social rejection, the Social Distance Scale, we prompt six MLMs: RoBERTa-base, RoBERTa-large, XLNet-large, BERTweet-base, BERTweet-large, and DistilBERT. We use human annotations to analyze the predicted words from these models, with which we measure the extent of bias against stigmatized groups. When prompts include stigmatized conditions, the probability of MLMs predicting negative words is approximately 20 percent higher than when prompts have non-stigmatized conditions. In the sentiment classification tasks, when sentences include stigmatized conditions related to diseases, disability, education, and mental illness, they are more likely to be classified as negative. We also observe a strong correlation between bias in MLMs and their downstream sentiment classifiers (r =0.79). The evidence indicates that MLMs and their downstream sentiment classification tasks exhibit biases against socially stigmatized groups. △ Less

Submitted 8 June, 2023; originally announced June 2023.

Comments: 20 pages,12 figures,2 tables; ACM FAccT 2023

ACM Class: K.4; I.2.7; I.2.0

arXiv:2305.17826 [pdf, other]

NOTABLE: Transferable Backdoor Attacks Against Prompt-based NLP Models

Authors: Kai Mei, Zheng Li, Zhenting Wang, Yang Zhang, Shiqing Ma

Abstract: Prompt-based learning is vulnerable to backdoor attacks. Existing backdoor attacks against prompt-based models consider injecting backdoors into the entire embedding layers or word embedding vectors. Such attacks can be easily affected by retraining on downstream tasks and with different prompting strategies, limiting the transferability of backdoor attacks. In this work, we propose transferable b… ▽ More Prompt-based learning is vulnerable to backdoor attacks. Existing backdoor attacks against prompt-based models consider injecting backdoors into the entire embedding layers or word embedding vectors. Such attacks can be easily affected by retraining on downstream tasks and with different prompting strategies, limiting the transferability of backdoor attacks. In this work, we propose transferable backdoor attacks against prompt-based models, called NOTABLE, which is independent of downstream tasks and prompting strategies. Specifically, NOTABLE injects backdoors into the encoders of PLMs by utilizing an adaptive verbalizer to bind triggers to specific words (i.e., anchors). It activates the backdoor by pasting input with triggers to reach adversary-desired anchors, achieving independence from downstream tasks and prompting strategies. We conduct experiments on six NLP tasks, three popular models, and three prompting strategies. Empirical results show that NOTABLE achieves superior attack performance (i.e., attack success rate over 90% on all the datasets), and outperforms two state-of-the-art baselines. Evaluations on three defenses show the robustness of NOTABLE. Our code can be found at https://github.com/RU-System-Software-and-Security/Notable. △ Less

Submitted 28 May, 2023; originally announced May 2023.

arXiv:2305.14674 [pdf, other]

T1: Scaling Diffusion Probabilistic Fields to High-Resolution on Unified Visual Modalities

Authors: Kangfu Mei, Mo Zhou, Vishal M. Patel

Abstract: Diffusion Probabilistic Field (DPF) models the distribution of continuous functions defined over metric spaces. While DPF shows great potential for unifying data generation of various modalities including images, videos, and 3D geometry, it does not scale to a higher data resolution. This can be attributed to the ``scaling property'', where it is difficult for the model to capture local structures… ▽ More Diffusion Probabilistic Field (DPF) models the distribution of continuous functions defined over metric spaces. While DPF shows great potential for unifying data generation of various modalities including images, videos, and 3D geometry, it does not scale to a higher data resolution. This can be attributed to the ``scaling property'', where it is difficult for the model to capture local structures through uniform sampling. To this end, we propose a new model comprising of a view-wise sampling algorithm to focus on local structure learning, and incorporating additional guidance, e.g., text description, to complement the global geometry. The model can be scaled to generate high-resolution data while unifying multiple modalities. Experimental results on data generation in various modalities demonstrate the effectiveness of our model, as well as its potential as a foundation framework for scalable modality-unified visual content generation. △ Less

Submitted 23 May, 2023; originally announced May 2023.

Comments: for project page, see https://t1-diffusion-model.github.io

arXiv:2304.05959 [pdf, other]

UAV Obstacle Avoidance by Human-in-the-Loop Reinforcement in Arbitrary 3D Environment

Authors: Xuyang Li, Jianwu Fang, Kai Du, Kuizhi Mei, Jianru Xue

Abstract: This paper focuses on the continuous control of the unmanned aerial vehicle (UAV) based on a deep reinforcement learning method for a large-scale 3D complex environment. The purpose is to make the UAV reach any target point from a certain starting point, and the flying height and speed are variable during navigation. In this work, we propose a deep reinforcement learning (DRL)-based method combine… ▽ More This paper focuses on the continuous control of the unmanned aerial vehicle (UAV) based on a deep reinforcement learning method for a large-scale 3D complex environment. The purpose is to make the UAV reach any target point from a certain starting point, and the flying height and speed are variable during navigation. In this work, we propose a deep reinforcement learning (DRL)-based method combined with human-in-the-loop, which allows the UAV to avoid obstacles automatically during flying. We design multiple reward functions based on the relevant domain knowledge to guide UAV navigation. The role of human-in-the-loop is to dynamically change the reward function of the UAV in different situations to suit the obstacle avoidance of the UAV better. We verify the success rate and average step size on urban, rural, and forest scenarios, and the experimental results show that the proposed method can reduce the training convergence time and improve the efficiency and accuracy of navigation tasks. The code is available on the website https://github.com/Monnalo/UAV_navigation. △ Less

Submitted 6 April, 2023; originally announced April 2023.

Comments: accepted in CCC2023

arXiv:2304.04370 [pdf, other]

OpenAGI: When LLM Meets Domain Experts

Authors: Yingqiang Ge, Wenyue Hua, Kai Mei, Jianchao Ji, Juntao Tan, Shuyuan Xu, Zelong Li, Yongfeng Zhang

Abstract: Human Intelligence (HI) excels at combining basic skills to solve complex tasks. This capability is vital for Artificial Intelligence (AI) and should be embedded in comprehensive AI Agents, enabling them to harness expert models for complex task-solving towards Artificial General Intelligence (AGI). Large Language Models (LLMs) show promising learning and reasoning abilities, and can effectively u… ▽ More Human Intelligence (HI) excels at combining basic skills to solve complex tasks. This capability is vital for Artificial Intelligence (AI) and should be embedded in comprehensive AI Agents, enabling them to harness expert models for complex task-solving towards Artificial General Intelligence (AGI). Large Language Models (LLMs) show promising learning and reasoning abilities, and can effectively use external models, tools, plugins, or APIs to tackle complex problems. In this work, we introduce OpenAGI, an open-source AGI research and development platform designed for solving multi-step, real-world tasks. Specifically, OpenAGI uses a dual strategy, integrating standard benchmark tasks for benchmarking and evaluation, and open-ended tasks including more expandable models, tools, plugins, or APIs for creative problem-solving. Tasks are presented as natural language queries to the LLM, which then selects and executes appropriate models. We also propose a Reinforcement Learning from Task Feedback (RLTF) mechanism that uses task results to improve the LLM's task-solving ability, which creates a self-improving AI feedback loop. While we acknowledge that AGI is a broad and multifaceted research challenge with no singularly defined solution path, the integration of LLMs with domain-specific expert models, inspired by mirroring the blend of general and specialized intelligence in humans, offers a promising approach towards AGI. We are open-sourcing the OpenAGI project's code, dataset, benchmarks, evaluation methods, and the UI demo to foster community involvement in AGI advancement: https://github.com/agiresearch/OpenAGI. △ Less

Submitted 3 November, 2023; v1 submitted 9 April, 2023; originally announced April 2023.

Comments: In NeurIPS 2023

arXiv:2304.02786 [pdf, other]

UNICORN: A Unified Backdoor Trigger Inversion Framework

Authors: Zhenting Wang, Kai Mei, Juan Zhai, Shiqing Ma

Abstract: The backdoor attack, where the adversary uses inputs stamped with triggers (e.g., a patch) to activate pre-planted malicious behaviors, is a severe threat to Deep Neural Network (DNN) models. Trigger inversion is an effective way of identifying backdoor models and understanding embedded adversarial behaviors. A challenge of trigger inversion is that there are many ways of constructing the trigger.… ▽ More The backdoor attack, where the adversary uses inputs stamped with triggers (e.g., a patch) to activate pre-planted malicious behaviors, is a severe threat to Deep Neural Network (DNN) models. Trigger inversion is an effective way of identifying backdoor models and understanding embedded adversarial behaviors. A challenge of trigger inversion is that there are many ways of constructing the trigger. Existing methods cannot generalize to various types of triggers by making certain assumptions or attack-specific constraints. The fundamental reason is that existing work does not consider the trigger's design space in their formulation of the inversion problem. This work formally defines and analyzes the triggers injected in different spaces and the inversion problem. Then, it proposes a unified framework to invert backdoor triggers based on the formalization of triggers and the identified inner behaviors of backdoor models from our analysis. Our prototype UNICORN is general and effective in inverting backdoor triggers in DNNs. The code can be found at https://github.com/RU-System-Software-and-Security/UNICORN. △ Less

Submitted 5 April, 2023; originally announced April 2023.

arXiv:2302.06992 [pdf, other]

Hard-aware Instance Adaptive Self-training for Unsupervised Cross-domain Semantic Segmentation

Authors: Chuang Zhu, Kebin Liu, Wenqi Tang, Ke Mei, Jiaqi Zou, Tiejun Huang

Abstract: The divergence between labeled training data and unlabeled testing data is a significant challenge for recent deep learning models. Unsupervised domain adaptation (UDA) attempts to solve such problem. Recent works show that self-training is a powerful approach to UDA. However, existing methods have difficulty in balancing the scalability and performance. In this paper, we propose a hard-aware inst… ▽ More The divergence between labeled training data and unlabeled testing data is a significant challenge for recent deep learning models. Unsupervised domain adaptation (UDA) attempts to solve such problem. Recent works show that self-training is a powerful approach to UDA. However, existing methods have difficulty in balancing the scalability and performance. In this paper, we propose a hard-aware instance adaptive self-training framework for UDA on the task of semantic segmentation. To effectively improve the quality and diversity of pseudo-labels, we develop a novel pseudo-label generation strategy with an instance adaptive selector. We further enrich the hard class pseudo-labels with inter-image information through a skillfully designed hard-aware pseudo-label augmentation. Besides, we propose the region-adaptive regularization to smooth the pseudo-label region and sharpen the non-pseudo-label region. For the non-pseudo-label region, consistency constraint is also constructed to introduce stronger supervision signals during model optimization. Our method is so concise and efficient that it is easy to be generalized to other UDA methods. Experiments on GTA5 to Cityscapes, SYNTHIA to Cityscapes, and Cityscapes to Oxford RobotCar demonstrate the superior performance of our approach compared with the state-of-the-art methods. △ Less

Submitted 14 February, 2023; originally announced February 2023.

Comments: arXiv admin note: text overlap with arXiv:2008.12197

arXiv:2212.07352 [pdf, other]

Bi-Noising Diffusion: Towards Conditional Diffusion Models with Generative Restoration Priors

Authors: Kangfu Mei, Nithin Gopalakrishnan Nair, Vishal M. Patel

Abstract: Conditional diffusion probabilistic models can model the distribution of natural images and can generate diverse and realistic samples based on given conditions. However, oftentimes their results can be unrealistic with observable color shifts and textures. We believe that this issue results from the divergence between the probabilistic distribution learned by the model and the distribution of nat… ▽ More Conditional diffusion probabilistic models can model the distribution of natural images and can generate diverse and realistic samples based on given conditions. However, oftentimes their results can be unrealistic with observable color shifts and textures. We believe that this issue results from the divergence between the probabilistic distribution learned by the model and the distribution of natural images. The delicate conditions gradually enlarge the divergence during each sampling timestep. To address this issue, we introduce a new method that brings the predicted samples to the training data manifold using a pretrained unconditional diffusion model. The unconditional model acts as a regularizer and reduces the divergence introduced by the conditional model at each sampling step. We perform comprehensive experiments to demonstrate the effectiveness of our approach on super-resolution, colorization, turbulence removal, and image-deraining tasks. The improvements obtained by our method suggest that the priors can be incorporated as a general plugin for improving conditional diffusion models. △ Less

Submitted 14 December, 2022; originally announced December 2022.

arXiv:2212.00235 [pdf, other]

VIDM: Video Implicit Diffusion Models

Authors: Kangfu Mei, Vishal M. Patel

Abstract: Diffusion models have emerged as a powerful generative method for synthesizing high-quality and diverse set of images. In this paper, we propose a video generation method based on diffusion models, where the effects of motion are modeled in an implicit condition manner, i.e. one can sample plausible video motions according to the latent feature of frames. We improve the quality of the generated vi… ▽ More Diffusion models have emerged as a powerful generative method for synthesizing high-quality and diverse set of images. In this paper, we propose a video generation method based on diffusion models, where the effects of motion are modeled in an implicit condition manner, i.e. one can sample plausible video motions according to the latent feature of frames. We improve the quality of the generated videos by proposing multiple strategies such as sampling space truncation, robustness penalty, and positional group normalization. Various experiments are conducted on datasets consisting of videos with different resolutions and different number of frames. Results show that the proposed method outperforms the state-of-the-art generative adversarial network-based methods by a significant margin in terms of FVD scores as well as perceptible visual quality. △ Less

Submitted 30 November, 2022; originally announced December 2022.

Comments: AAAI2023 https://kfmei.page/vidm/

arXiv:2210.15127 [pdf, other]

Rethinking the Reverse-engineering of Trojan Triggers

Authors: Zhenting Wang, Kai Mei, Hailun Ding, Juan Zhai, Shiqing Ma

Abstract: Deep Neural Networks are vulnerable to Trojan (or backdoor) attacks. Reverse-engineering methods can reconstruct the trigger and thus identify affected models. Existing reverse-engineering methods only consider input space constraints, e.g., trigger size in the input space. Expressly, they assume the triggers are static patterns in the input space and fail to detect models with feature space trigg… ▽ More Deep Neural Networks are vulnerable to Trojan (or backdoor) attacks. Reverse-engineering methods can reconstruct the trigger and thus identify affected models. Existing reverse-engineering methods only consider input space constraints, e.g., trigger size in the input space. Expressly, they assume the triggers are static patterns in the input space and fail to detect models with feature space triggers such as image style transformations. We observe that both input-space and feature-space Trojans are associated with feature space hyperplanes. Based on this observation, we design a novel reverse-engineering method that exploits the feature space constraint to reverse-engineer Trojan triggers. Results on four datasets and seven different attacks demonstrate that our solution effectively defends both input-space and feature-space Trojans. It outperforms state-of-the-art reverse-engineering methods and other types of defenses in both Trojaned model detection and mitigation tasks. On average, the detection accuracy of our method is 93\%. For Trojan mitigation, our method can reduce the ASR (attack success rate) to only 0.26\% with the BA (benign accuracy) remaining nearly unchanged. Our code can be found at https://github.com/RU-System-Software-and-Security/FeatureRE. △ Less

Submitted 26 October, 2022; originally announced October 2022.

arXiv:2208.11284 [pdf, other]

AT-DDPM: Restoring Faces degraded by Atmospheric Turbulence using Denoising Diffusion Probabilistic Models

Authors: Nithin Gopalakrishnan Nair, Kangfu Mei, Vishal M. Patel

Abstract: Although many long-range imaging systems are designed to support extended vision applications, a natural obstacle to their operation is degradation due to atmospheric turbulence. Atmospheric turbulence causes significant degradation to image quality by introducing blur and geometric distortion. In recent years, various deep learning-based single image atmospheric turbulence mitigation methods, inc… ▽ More Although many long-range imaging systems are designed to support extended vision applications, a natural obstacle to their operation is degradation due to atmospheric turbulence. Atmospheric turbulence causes significant degradation to image quality by introducing blur and geometric distortion. In recent years, various deep learning-based single image atmospheric turbulence mitigation methods, including CNN-based and GAN inversion-based, have been proposed in the literature which attempt to remove the distortion in the image. However, some of these methods are difficult to train and often fail to reconstruct facial features and produce unrealistic results especially in the case of high turbulence. Denoising Diffusion Probabilistic Models (DDPMs) have recently gained some traction because of their stable training process and their ability to generate high quality images. In this paper, we propose the first DDPM-based solution for the problem of atmospheric turbulence mitigation. We also propose a fast sampling technique for reducing the inference times for conditional DDPMs. Extensive experiments are conducted on synthetic and real-world data to show the significance of our model. To facilitate further research, all codes and pretrained models are publically available at http://github.com/Nithin-GK/AT-DDPM △ Less

Submitted 20 September, 2022; v1 submitted 23 August, 2022; originally announced August 2022.

Comments: Accepted to IEEE WACV 2023

arXiv:2207.09302 [pdf, other]

Deep Semantic Statistics Matching (D2SM) Denoising Network

Authors: Kangfu Mei, Vishal M. Patel, Rui Huang

Abstract: The ultimate aim of image restoration like denoising is to find an exact correlation between the noisy and clear image domains. But the optimization of end-to-end denoising learning like pixel-wise losses is performed in a sample-to-sample manner, which ignores the intrinsic correlation of images, especially semantics. In this paper, we introduce the Deep Semantic Statistics Matching (D2SM) Denois… ▽ More The ultimate aim of image restoration like denoising is to find an exact correlation between the noisy and clear image domains. But the optimization of end-to-end denoising learning like pixel-wise losses is performed in a sample-to-sample manner, which ignores the intrinsic correlation of images, especially semantics. In this paper, we introduce the Deep Semantic Statistics Matching (D2SM) Denoising Network. It exploits semantic features of pretrained classification networks, then it implicitly matches the probabilistic distribution of clear images at the semantic feature space. By learning to preserve the semantic distribution of denoised images, we empirically find our method significantly improves the denoising capabilities of networks, and the denoised results can be better understood by high-level vision tasks. Comprehensive experiments conducted on the noisy Cityscapes dataset demonstrate the superiority of our method on both the denoising performance and semantic segmentation accuracy. Moreover, the performance improvement observed on our extended tasks including super-resolution and dehazing experiments shows its potentiality as a new general plug-and-play component. △ Less

Submitted 19 July, 2022; originally announced July 2022.

Comments: ECCV2022, for Project Page, see https://kfmei.page/d2sm/

arXiv:2204.08974 [pdf, other]

A comparison of different atmospheric turbulence simulation methods for image restoration

Authors: Nithin Gopalakrishnan Nair, Kangfu Mei, Vishal M. Patel

Abstract: Atmospheric turbulence deteriorates the quality of images captured by long-range imaging systems by introducing blur and geometric distortions to the captured scene. This leads to a drastic drop in performance when computer vision algorithms like object/face recognition and detection are performed on these images. In recent years, various deep learning-based atmospheric turbulence mitigation metho… ▽ More Atmospheric turbulence deteriorates the quality of images captured by long-range imaging systems by introducing blur and geometric distortions to the captured scene. This leads to a drastic drop in performance when computer vision algorithms like object/face recognition and detection are performed on these images. In recent years, various deep learning-based atmospheric turbulence mitigation methods have been proposed in the literature. These methods are often trained using synthetically generated images and tested on real-world images. Hence, the performance of these restoration methods depends on the type of simulation used for training the network. In this paper, we systematically evaluate the effectiveness of various turbulence simulation methods on image restoration. In particular, we evaluate the performance of two state-or-the-art restoration networks using six simulations method on a real-world LRFID dataset consisting of face images degraded by turbulence. This paper will provide guidance to the researchers and practitioners working in this field to choose the suitable data generation models for training deep models for turbulence mitigation. The implementation codes for the simulation methods, source codes for the networks, and the pre-trained models will be publicly made available. △ Less

Submitted 19 April, 2022; originally announced April 2022.

arXiv:2204.03057 [pdf, other]

Thermal to Visible Image Synthesis under Atmospheric Turbulence

Authors: Kangfu Mei, Yiqun Mei, Vishal M. Patel

Abstract: In many practical applications of long-range imaging such as biometrics and surveillance, thermal imagining modalities are often used to capture images in low-light and nighttime conditions. However, such imaging systems often suffer from atmospheric turbulence, which introduces severe blur and deformation artifacts to the captured images. Such an issue is unavoidable in long-range imaging and sig… ▽ More In many practical applications of long-range imaging such as biometrics and surveillance, thermal imagining modalities are often used to capture images in low-light and nighttime conditions. However, such imaging systems often suffer from atmospheric turbulence, which introduces severe blur and deformation artifacts to the captured images. Such an issue is unavoidable in long-range imaging and significantly decreases the face verification accuracy. In this paper, we first investigate the problem with a turbulence simulation method on real-world thermal images. An end-to-end reconstruction method is then proposed which can directly transform thermal images into visible-spectrum images by utilizing natural image priors based on a pre-trained StyleGAN2 network. Compared with the existing two-steps methods of consecutive turbulence mitigation and thermal to visible image translation, our method is demonstrated to be effective in terms of both the visual quality of the reconstructed results and face verification accuracy. Moreover, to the best of our knowledge, this is the first work that studies the problem of thermal to visible image translation under atmospheric turbulence. △ Less

Submitted 6 April, 2022; originally announced April 2022.

Comments: 4 pages, 3 figures

arXiv:2202.09954 [pdf, other]

doi 10.1109/TCOMM.2022.3201931

Theoretical Analysis of Deep Neural Networks in Physical Layer Communication

Authors: Jun Liu, Haitao Zhao, Dongtang Ma, Kai Mei, Jibo Wei

Abstract: Recently, deep neural network (DNN)-based physical layer communication techniques have attracted considerable interest. Although their potential to enhance communication systems and superb performance have been validated by simulation experiments, little attention has been paid to the theoretical analysis. Specifically, most studies in the physical layer have tended to focus on the application of… ▽ More Recently, deep neural network (DNN)-based physical layer communication techniques have attracted considerable interest. Although their potential to enhance communication systems and superb performance have been validated by simulation experiments, little attention has been paid to the theoretical analysis. Specifically, most studies in the physical layer have tended to focus on the application of DNN models to wireless communication problems but not to theoretically understand how does a DNN work in a communication system. In this paper, we aim to quantitatively analyze why DNNs can achieve comparable performance in the physical layer comparing with traditional techniques, and also drive their cost in terms of computational complexity. To achieve this goal, we first analyze the encoding performance of a DNN-based transmitter and compare it to a traditional one. And then, we theoretically analyze the performance of DNN-based estimator and compare it with traditional estimators. Third, we investigate and validate how information is flown in a DNN-based communication system under the information theoretic concepts. Our analysis develops a concise way to open the "black box" of DNNs in physical layer communication, which can be applied to support the design of DNN-based intelligent communication techniques and help to provide explainable performance assessment. △ Less

Submitted 26 August, 2022; v1 submitted 20 February, 2022; originally announced February 2022.

Comments: 15 pages, 13 figures, has been accepted for publication in IEEE Transactions on Communications. arXiv admin note: substantial text overlap with arXiv:2106.01124

Journal ref: IEEE Transactions on Communications, 2022

arXiv:2112.02379 [pdf, other]

doi 10.1109/JSTSP.2023.3238552

LTT-GAN: Looking Through Turbulence by Inverting GANs

Authors: Kangfu Mei, Vishal M. Patel

Abstract: In many applications of long-range imaging, we are faced with a scenario where a person appearing in the captured imagery is often degraded by atmospheric turbulence. However, restoring such degraded images for face verification is difficult since the degradation causes images to be geometrically distorted and blurry. To mitigate the turbulence effect, in this paper, we propose the first turbulenc… ▽ More In many applications of long-range imaging, we are faced with a scenario where a person appearing in the captured imagery is often degraded by atmospheric turbulence. However, restoring such degraded images for face verification is difficult since the degradation causes images to be geometrically distorted and blurry. To mitigate the turbulence effect, in this paper, we propose the first turbulence mitigation method that makes use of visual priors encapsulated by a well-trained GAN. Based on the visual priors, we propose to learn to preserve the identity of restored images on a spatial periodic contextual distance. Such a distance can keep the realism of restored images from the GAN while considering the identity difference at the network learning. In addition, hierarchical pseudo connections are proposed for facilitating the identity-preserving learning by introducing more appearance variance without identity changing. Extensive experiments show that our method significantly outperforms prior art in both the visual quality and face verification accuracy of restored results. △ Less

Submitted 4 December, 2021; originally announced December 2021.

Comments: Project Page: https://kfmei.page/LTT-GAN/

arXiv:2108.07401 [pdf, other]

doi 10.1109/TSE.2023.3285787

Mobile App Crowdsourced Test Report Consistency Detection via Deep Image-and-Text Fusion Understanding

Authors: Shengcheng Yu, Chunrong Fang, Quanjun Zhang, Zhihao Cao, Yexiao Yun, Zhenfei Cao, Kai Mei, Zhenyu Chen

Abstract: Crowdsourced testing, as a distinct testing paradigm, has attracted much attention in software testing, especially in mobile application (app) testing field. Compared with in-house testing, crowdsourced testing shows superiority with the diverse testing environments when faced with the mobile testing fragmentation problem. However, crowdsourced testing also encounters the low-quality test report p… ▽ More Crowdsourced testing, as a distinct testing paradigm, has attracted much attention in software testing, especially in mobile application (app) testing field. Compared with in-house testing, crowdsourced testing shows superiority with the diverse testing environments when faced with the mobile testing fragmentation problem. However, crowdsourced testing also encounters the low-quality test report problem caused by unprofessional crowdworkers involved with different expertise. In order to handle the submitted reports of uneven quality, app developers have to distinguish high-quality reports from low-quality ones to help the bug inspection. One kind of typical low-quality test report is inconsistent test reports, which means the textual descriptions are not focusing on the attached bug-occurring screenshots. According to our empirical survey, only 18.07% crowdsourced test reports are consistent. Inconsistent reports cause waste on mobile app testing. To solve the inconsistency problem, we propose ReCoDe to detect the consistency of crowdsourced test reports via deep image-and-text fusion understanding. ReCoDe is a two-stage approach that first classifies the reports based on textual descriptions into different categories according to the bug feature. In the second stage, ReCoDe has a deep understanding of the GUI image features of the app screenshots and then applies different strategies to handle different types of bugs to detect the consistency of the crowdsourced test reports. We conduct an experiment on a dataset with over 22k test reports to evaluate ReCoDe, and the results show the effectiveness of ReCoDe in detecting the consistency of crowdsourced test reports. Besides, a user study is conducted to prove the practical value of ReCoDe in effectively hel** app developers improve the efficiency of reviewing the crowdsourced test reports. △ Less

Submitted 12 June, 2023; v1 submitted 16 August, 2021; originally announced August 2021.

arXiv:2107.06712 [pdf, other]

doi 10.1109/TCOMM.2021.3095198

A Low Complexity Learning-based Channel Estimation for OFDM Systems with Online Training

Authors: Kai Mei, Jun Liu, Xiaoying Zhang, Kuo Cao, Nandana Rajatheva, Jibo Wei

Abstract: In this paper, we devise a highly efficient machine learning-based channel estimation for orthogonal frequency division multiplexing (OFDM) systems, in which the training of the estimator is performed online. A simple learning module is employed for the proposed learning-based estimator. The training process is thus much faster and the required training data is reduced significantly. Besides, a tr… ▽ More In this paper, we devise a highly efficient machine learning-based channel estimation for orthogonal frequency division multiplexing (OFDM) systems, in which the training of the estimator is performed online. A simple learning module is employed for the proposed learning-based estimator. The training process is thus much faster and the required training data is reduced significantly. Besides, a training data construction approach utilizing least square (LS) estimation results is proposed so that the training data can be collected during the data transmission. The feasibility of this novel construction approach is verified by theoretical analysis and simulations. Based on this construction approach, two alternative training data generation schemes are proposed. One scheme transmits additional block pilot symbols to create training data, while the other scheme adopts a decision-directed method and does not require extra pilot overhead. Simulation results show the robustness of the proposed channel estimation method. Furthermore, the proposed method shows better adaptation to practical imperfections compared with the conventional minimum mean-square error (MMSE) channel estimation. It outperforms the existing machine learning-based channel estimation techniques under varying channel conditions. △ Less

Submitted 14 July, 2021; originally announced July 2021.

Comments: 12 pages, 12 figures. To appear in IEEE Transactions on Communications

arXiv:2106.01124 [pdf, other]

Opening the Black Box of Deep Neural Networks in Physical Layer Communication

Authors: Jun Liu, Haitao Zhao, Dongtang Ma, Kai Mei, Jibo Wei

Abstract: Deep Neural Network (DNN)-based physical layer techniques are attracting considerable interest due to their potential to enhance communication systems. However, most studies in the physical layer have tended to focus on the application of DNN models to wireless communication problems but not to theoretically understand how does a DNN work in a communication system. In this paper, we aim to quantit… ▽ More Deep Neural Network (DNN)-based physical layer techniques are attracting considerable interest due to their potential to enhance communication systems. However, most studies in the physical layer have tended to focus on the application of DNN models to wireless communication problems but not to theoretically understand how does a DNN work in a communication system. In this paper, we aim to quantitatively analyze why DNNs can achieve comparable performance in the physical layer comparing with traditional techniques and their cost in terms of computational complexity. We further investigate and also experimentally validate how information is flown in a DNN-based communication system under the information theoretic concepts. △ Less

Submitted 18 February, 2022; v1 submitted 2 June, 2021; originally announced June 2021.

Comments: 6 pages, 5 figures, to be presented in the IEEE Wireless Communications and Networking Conference (WCNC) 2022 Workshop on Machine Learning for Communications: Future Large Scale MIMO and AI-Native Air-Interface

arXiv:2104.00848 [pdf, other]

SDAN: Squared Deformable Alignment Network for Learning Misaligned Optical Zoom

Authors: Kangfu Mei, Shenglong Ye, Rui Huang

Abstract: Deep Neural Network (DNN) based super-resolution algorithms have greatly improved the quality of the generated images. However, these algorithms often yield significant artifacts when dealing with real-world super-resolution problems due to the difficulty in learning misaligned optical zoom. In this paper, we introduce a Squared Deformable Alignment Network (SDAN) to address this issue. Our networ… ▽ More Deep Neural Network (DNN) based super-resolution algorithms have greatly improved the quality of the generated images. However, these algorithms often yield significant artifacts when dealing with real-world super-resolution problems due to the difficulty in learning misaligned optical zoom. In this paper, we introduce a Squared Deformable Alignment Network (SDAN) to address this issue. Our network learns squared per-point offsets for convolutional kernels, and then aligns features in corrected convolutional windows based on the offsets. So the misalignment will be minimized by the extracted aligned features. Different from the per-point offsets used in the vanilla Deformable Convolutional Network (DCN), our proposed squared offsets not only accelerate the offset learning but also improve the generation quality with fewer parameters. Besides, we further propose an efficient cross packing attention layer to boost the accuracy of the learned offsets. It leverages the packing and unpacking operations to enlarge the receptive field of the offset learning and to enhance the ability of extracting the spatial connection between the low-resolution images and the referenced images. Comprehensive experiments show the superiority of our method over other state-of-the-art methods in both computational efficiency and realistic details. △ Less

Submitted 25 November, 2021; v1 submitted 1 April, 2021; originally announced April 2021.

Comments: ICME21. Code is available at https://github.com/MKFMIKU/SDAN

arXiv:2103.05930 [pdf, other]

AttaNet: Attention-Augmented Network for Fast and Accurate Scene Parsing

Authors: Qi Song, Kangfu Mei, Rui Huang

Abstract: Two factors have proven to be very important to the performance of semantic segmentation models: global context and multi-level semantics. However, generating features that capture both factors always leads to high computational complexity, which is problematic in real-time scenarios. In this paper, we propose a new model, called Attention-Augmented Network (AttaNet), to capture both global contex… ▽ More Two factors have proven to be very important to the performance of semantic segmentation models: global context and multi-level semantics. However, generating features that capture both factors always leads to high computational complexity, which is problematic in real-time scenarios. In this paper, we propose a new model, called Attention-Augmented Network (AttaNet), to capture both global context and multilevel semantics while kee** the efficiency high. AttaNet consists of two primary modules: Strip Attention Module (SAM) and Attention Fusion Module (AFM). Viewing that in challenging images with low segmentation accuracy, there are a significantly larger amount of vertical strip areas than horizontal ones, SAM utilizes a stri** operation to reduce the complexity of encoding global context in the vertical direction drastically while kee** most of contextual information, compared to the non-local approaches. Moreover, AFM follows a cross-level aggregation strategy to limit the computation, and adopts an attention strategy to weight the importance of different levels of features at each pixel when fusing them, obtaining an efficient multi-level representation. We have conducted extensive experiments on two semantic segmentation benchmarks, and our network achieves different levels of speed/accuracy trade-offs on Cityscapes, e.g., 71 FPS/79.9% mIoU, 130 FPS/78.5% mIoU, and 180 FPS/70.1% mIoU, and leading performance on ADE20K as well. △ Less

Submitted 10 March, 2021; originally announced March 2021.

Comments: AAAI 2021

arXiv:2101.01479 [pdf, other]

Scale-Aware Network with Regional and Semantic Attentions for Crowd Counting under Cluttered Background

Authors: Qiaosi Yi, Yunxing Liu, Aiwen Jiang, Juncheng Li, Kangfu Mei, Mingwen Wang

Abstract: Crowd counting is an important task that shown great application value in public safety-related fields, which has attracted increasing attention in recent years. In the current research, the accuracy of counting numbers and crowd density estimation are the main concerns. Although the emergence of deep learning has greatly promoted the development of this field, crowd counting under cluttered backg… ▽ More Crowd counting is an important task that shown great application value in public safety-related fields, which has attracted increasing attention in recent years. In the current research, the accuracy of counting numbers and crowd density estimation are the main concerns. Although the emergence of deep learning has greatly promoted the development of this field, crowd counting under cluttered background is still a serious challenge. In order to solve this problem, we propose a ScaleAware Crowd Counting Network (SACCN) with regional and semantic attentions. The proposed SACCN distinguishes crowd and background by applying regional and semantic self-attention mechanisms on the shallow layers and deep layers, respectively. Moreover, the asymmetric multi-scale module (AMM) is proposed to deal with the problem of scale diversity, and regional attention based dense connections and skip connections are designed to alleviate the variations on crowd scales. Extensive experimental results on multiple public benchmarks demonstrate that our proposed SACCN achieves satisfied superior performances and outperform most state-of-the-art methods. All codes and pretrained models will be released soon. △ Less

Submitted 7 January, 2021; v1 submitted 5 January, 2021; originally announced January 2021.

arXiv:2008.13084 [pdf, other]

MDCN: Multi-scale Dense Cross Network for Image Super-Resolution

Authors: Juncheng Li, Faming Fang, Jiaqian Li, Kangfu Mei, Guixu Zhang

Abstract: Convolutional neural networks have been proven to be of great benefit for single-image super-resolution (SISR). However, previous works do not make full use of multi-scale features and ignore the inter-scale correlation between different upsampling factors, resulting in sub-optimal performance. Instead of blindly increasing the depth of the network, we are committed to mining image features and le… ▽ More Convolutional neural networks have been proven to be of great benefit for single-image super-resolution (SISR). However, previous works do not make full use of multi-scale features and ignore the inter-scale correlation between different upsampling factors, resulting in sub-optimal performance. Instead of blindly increasing the depth of the network, we are committed to mining image features and learning the inter-scale correlation between different upsampling factors. To achieve this, we propose a Multi-scale Dense Cross Network (MDCN), which achieves great performance with fewer parameters and less execution time. MDCN consists of multi-scale dense cross blocks (MDCBs), hierarchical feature distillation block (HFDB), and dynamic reconstruction block (DRB). Among them, MDCB aims to detect multi-scale features and maximize the use of image features flow at different scales, HFDB focuses on adaptively recalibrate channel-wise feature responses to achieve feature distillation, and DRB attempts to reconstruct SR images with different upsampling factors in a single model. It is worth noting that all these modules can run independently. It means that these modules can be selectively plugged into any CNN model to improve model performance. Extensive experiments show that MDCN achieves competitive results in SISR, especially in the reconstruction task with multiple upsampling factors. The code will be provided at https://github.com/MIVRC/MDCN-PyTorch. △ Less

Submitted 29 August, 2020; originally announced August 2020.

Comments: 15 pages, 15 figures

arXiv:2008.12197 [pdf, other]

Instance Adaptive Self-Training for Unsupervised Domain Adaptation

Authors: Ke Mei, Chuang Zhu, Jiaqi Zou, Shanghang Zhang

Abstract: The divergence between labeled training data and unlabeled testing data is a significant challenge for recent deep learning models. Unsupervised domain adaptation (UDA) attempts to solve such a problem. Recent works show that self-training is a powerful approach to UDA. However, existing methods have difficulty in balancing scalability and performance. In this paper, we propose an instance adaptiv… ▽ More The divergence between labeled training data and unlabeled testing data is a significant challenge for recent deep learning models. Unsupervised domain adaptation (UDA) attempts to solve such a problem. Recent works show that self-training is a powerful approach to UDA. However, existing methods have difficulty in balancing scalability and performance. In this paper, we propose an instance adaptive self-training framework for UDA on the task of semantic segmentation. To effectively improve the quality of pseudo-labels, we develop a novel pseudo-label generation strategy with an instance adaptive selector. Besides, we propose the region-guided regularization to smooth the pseudo-label region and sharpen the non-pseudo-label region. Our method is so concise and efficient that it is easy to be generalized to other unsupervised domain adaptation methods. Experiments on 'GTA5 to Cityscapes' and 'SYNTHIA to Cityscapes' demonstrate the superior performance of our approach compared with the state-of-the-art methods. △ Less

Submitted 27 August, 2020; originally announced August 2020.

Comments: ECCV 2020

arXiv:2008.10480 [pdf, other]

3rd Place Solution to "Google Landmark Retrieval 2020"

Authors: Ke Mei, Lei li, **chang Xu, Yanhua Cheng, Yugeng Lin

Abstract: Image retrieval is a fundamental problem in computer vision. This paper presents our 3rd place detailed solution to the Google Landmark Retrieval 2020 challenge. We focus on the exploration of data cleaning and models with metric learning. We use a data cleaning strategy based on embedding clustering. Besides, we employ a data augmentation method called Corner-Cutmix, which improves the model's ab… ▽ More Image retrieval is a fundamental problem in computer vision. This paper presents our 3rd place detailed solution to the Google Landmark Retrieval 2020 challenge. We focus on the exploration of data cleaning and models with metric learning. We use a data cleaning strategy based on embedding clustering. Besides, we employ a data augmentation method called Corner-Cutmix, which improves the model's ability to recognize multi-scale and occluded landmark images. We show in detail the ablation experiments and results of our method. △ Less

Submitted 24 August, 2020; v1 submitted 24 August, 2020; originally announced August 2020.

arXiv:2007.09248 [pdf, other]

doi 10.1109/TCCN.2021.3118465

Fine Timing and Frequency Synchronization for MIMO-OFDM: An Extreme Learning Approach

Authors: Jun Liu, Kai Mei, Xiaochen Zhang, Des McLernon, Dongtang Ma, Jibo Wei, Syed Ali Raza Zaidi

Abstract: Multiple-input multiple-output orthogonal frequency-division multiplexing (MIMO-OFDM) is a key technology component in the evolution towards cognitive radio (CR) in next-generation communication in which the accuracy of timing and frequency synchronization significantly impacts the overall system performance. In this paper, we propose a novel scheme leveraging extreme learning machine (ELM) to ach… ▽ More Multiple-input multiple-output orthogonal frequency-division multiplexing (MIMO-OFDM) is a key technology component in the evolution towards cognitive radio (CR) in next-generation communication in which the accuracy of timing and frequency synchronization significantly impacts the overall system performance. In this paper, we propose a novel scheme leveraging extreme learning machine (ELM) to achieve high-precision synchronization. Specifically, exploiting the preamble signals with synchronization offsets, two ELMs are incorporated into a traditional MIMO-OFDM system to estimate both the residual symbol timing offset (RSTO) and the residual carrier frequency offset (RCFO). The simulation results show that the performance of the proposed ELM-based synchronization scheme is superior to the traditional method under both additive white Gaussian noise (AWGN) and frequency selective fading channels. Furthermore, comparing with the existing machine learning based techniques, the proposed method shows outstanding performance without the requirement of perfect channel state information (CSI) and prohibitive computational complexity. Finally, the proposed method is robust in terms of the choice of channel parameters (e.g., number of paths) and also in terms of "generalization ability" from a machine learning standpoint. △ Less

Submitted 1 June, 2022; v1 submitted 17 July, 2020; originally announced July 2020.

Comments: 13 pages, 12 figures, has been accepted for publication in IEEE Transactions on Cognitive Communications and Networking

Journal ref: IEEE Transactions on Cognitive Communications and Networking, 2021

arXiv:2006.15954 [pdf, other]

Multi-level colonoscopy malignant tissue detection with adversarial CAC-UNet

Authors: Chuang Zhu, Ke Mei, Ting Peng, Yihao Luo, Jun Liu, Ying Wang, Mulan **

Abstract: The automatic and objective medical diagnostic model can be valuable to achieve early cancer detection, and thus reducing the mortality rate. In this paper, we propose a highly efficient multi-level malignant tissue detection through the designed adversarial CAC-UNet. A patch-level model with a pre-prediction strategy and a malignancy area guided label smoothing is adopted to remove the negative W… ▽ More The automatic and objective medical diagnostic model can be valuable to achieve early cancer detection, and thus reducing the mortality rate. In this paper, we propose a highly efficient multi-level malignant tissue detection through the designed adversarial CAC-UNet. A patch-level model with a pre-prediction strategy and a malignancy area guided label smoothing is adopted to remove the negative WSIs, with which to lower the risk of false positive detection. For the selected key patches by multi-model ensemble, an adversarial context-aware and appearance consistency UNet (CAC-UNet) is designed to achieve robust segmentation. In CAC-UNet, mirror designed discriminators are able to seamlessly fuse the whole feature maps of the skillfully designed powerful backbone network without any information loss. Besides, a mask prior is further added to guide the accurate segmentation mask prediction through an extra mask-domain discriminator. The proposed scheme achieves the best results in MICCAI DigestPath2019 challenge on colonoscopy tissue segmentation and classification task. The full implementation details and the trained models are available at https://github.com/Raykoooo/CAC-UNet. △ Less

Submitted 30 June, 2020; v1 submitted 29 June, 2020; originally announced June 2020.

Comments: accepted by Neurocomputing; winner of the MICCAI DigestPath 2019 challenge on colonoscopy tissue segmentation and classification task

arXiv:2006.13511 [pdf, other]

Disentangle Perceptual Learning through Online Contrastive Learning

Authors: Kangfu Mei, Yao Lu, Qiaosi Yi, Haoyu Wu, Juncheng Li, Rui Huang

Abstract: Pursuing realistic results according to human visual perception is the central concern in the image transformation tasks. Perceptual learning approaches like perceptual loss are empirically powerful for such tasks but they usually rely on the pre-trained classification network to provide features, which are not necessarily optimal in terms of visual perception of image transformation. In this pape… ▽ More Pursuing realistic results according to human visual perception is the central concern in the image transformation tasks. Perceptual learning approaches like perceptual loss are empirically powerful for such tasks but they usually rely on the pre-trained classification network to provide features, which are not necessarily optimal in terms of visual perception of image transformation. In this paper, we argue that, among the features representation from the pre-trained classification network, only limited dimensions are related to human visual perception, while others are irrelevant, although both will affect the final image transformation results. Under such an assumption, we try to disentangle the perception-relevant dimensions from the representation through our proposed online contrastive learning. The resulted network includes the pre-training part and a feature selection layer, followed by the contrastive learning module, which utilizes the transformed results, target images, and task-oriented distorted images as the positive, negative, and anchor samples, respectively. The contrastive learning aims at activating the perception-relevant dimensions and suppressing the irrelevant ones by using the triplet loss, so that the original representation can be disentangled for better perceptual quality. Experiments on various image transformation tasks demonstrate the superiority of our framework, in terms of human visual perception, to the existing approaches using pre-trained networks and empirically designed losses. △ Less

Submitted 24 June, 2020; originally announced June 2020.

Comments: 12 pages, 8 figures

arXiv:2004.13875 [pdf, other]

6G White Paper on Machine Learning in Wireless Communication Networks

Authors: Samad Ali, Walid Saad, Nandana Rajatheva, Kapseok Chang, Daniel Steinbach, Benjamin Sliwa, Christian Wietfeld, Kai Mei, Hamid Shiri, Hans-Jürgen Zepernick, Thi My Chinh Chu, Ijaz Ahmad, Jyrki Huusko, Jaakko Suutala, Shubhangi Bhadauria, Vimal Bhatia, Rangeet Mitra, Saidhiraj Amuru, Robert Abbas, Baohua Shao, Michele Capobianco, Guanghui Yu, Maelick Claes, Teemu Karvonen, Mingzhe Chen , et al. (2 additional authors not shown)

Abstract: The focus of this white paper is on machine learning (ML) in wireless communications. 6G wireless communication networks will be the backbone of the digital transformation of societies by providing ubiquitous, reliable, and near-instant wireless connectivity for humans and machines. Recent advances in ML research has led enable a wide range of novel technologies such as self-driving vehicles and v… ▽ More The focus of this white paper is on machine learning (ML) in wireless communications. 6G wireless communication networks will be the backbone of the digital transformation of societies by providing ubiquitous, reliable, and near-instant wireless connectivity for humans and machines. Recent advances in ML research has led enable a wide range of novel technologies such as self-driving vehicles and voice assistants. Such innovation is possible as a result of the availability of advanced ML models, large datasets, and high computational power. On the other hand, the ever-increasing demand for connectivity will require a lot of innovation in 6G wireless networks, and ML tools will play a major role in solving problems in the wireless domain. In this paper, we provide an overview of the vision of how ML will impact the wireless communication systems. We first give an overview of the ML methods that have the highest potential to be used in wireless networks. Then, we discuss the problems that can be solved by using ML in various layers of the network such as the physical layer, medium access layer, and application layer. Zero-touch optimization of wireless networks using ML is another interesting aspect that is discussed in this paper. Finally, at the end of each section, important research questions that the section aims to answer are presented. △ Less

Submitted 28 April, 2020; originally announced April 2020.

arXiv:2002.08587 [pdf, other]

Cross-stained Segmentation from Renal Biopsy Images Using Multi-level Adversarial Learning

Authors: Ke Mei, Chuang Zhu, Lei Jiang, Jun Liu, Yuanyuan Qiao

Abstract: Segmentation from renal pathological images is a key step in automatic analyzing the renal histological characteristics. However, the performance of models varies significantly in different types of stained datasets due to the appearance variations. In this paper, we design a robust and flexible model for cross-stained segmentation. It is a novel multi-level deep adversarial network architecture t… ▽ More Segmentation from renal pathological images is a key step in automatic analyzing the renal histological characteristics. However, the performance of models varies significantly in different types of stained datasets due to the appearance variations. In this paper, we design a robust and flexible model for cross-stained segmentation. It is a novel multi-level deep adversarial network architecture that consists of three sub-networks: (i) a segmentation network; (ii) a pair of multi-level mirrored discriminators for guiding the segmentation network to extract domain-invariant features; (iii) a shape discriminator that is utilized to further identify the output of the segmentation network and the ground truth. Experimental results on glomeruli segmentation from renal biopsy images indicate that our network is able to improve segmentation performance on target type of stained images and use unlabeled data to achieve similar accuracy to labeled data. In addition, this method can be easily applied to other tasks. △ Less

Submitted 20 February, 2020; originally announced February 2020.

Comments: Accepted by ICASSP2020

arXiv:1912.08148 [pdf, other]

doi 10.1049/cmu2.12250

Enhanced LMMSE Estimation Capable of Selecting Parameters

Authors: Kai Mei, Jun Liu, Xiaoran Liu, Jun Xiong, Xiaoying Zhang, Jibo Wei

Abstract: In the linear minimum mean square error (LMMSE) estimation for orthogonal frequency division multiplexing (OFDM) systems, the problem about the determination of the algorithm's parameters, especially those related with channel frequency response (CFR) correlation, has not been readily solved yet. Although many approaches have been proposed to determine the statistic parameters, it is hard to choos… ▽ More In the linear minimum mean square error (LMMSE) estimation for orthogonal frequency division multiplexing (OFDM) systems, the problem about the determination of the algorithm's parameters, especially those related with channel frequency response (CFR) correlation, has not been readily solved yet. Although many approaches have been proposed to determine the statistic parameters, it is hard to choose the best one within those approaches in the design phase, since every approach has its own most suitable application conditions and the real channel condition is unpredictable. In this paper, we propose an enhance LMMSE estimation capable of selecting parameters by itself. To this end, sampled noise MSE is first proposed to evaluate the practical performance of interpolation. Based on this evaluation index, a novel parameter comparison scheme is proposed to determine the parameters which can endow LMMSE estimation best performance within a parameter set. After that, the structure of the enhanced LMMSE is illustrated, and it is applied in OFDM systems. Besides, the issues about theoretical analysis on accuracy of the parameter comparison scheme, the parameter set design and algorithm complexity are explained in detail. At last, our analyses and performance of the proposed estimation method are demonstrated by simulation experiments. △ Less

Submitted 17 December, 2019; originally announced December 2019.

arXiv:1911.08098 [pdf, other]

HighEr-Resolution Network for Image Demosaicing and Enhancing

Authors: Kangfu Mei, Juncheng Li, Jiajie Zhang, Haoyu Wu, Jie Li, Rui Huang

Abstract: Neural-networks based image restoration methods tend to use low-resolution image patches for training. Although higher-resolution image patches can provide more global information, state-of-the-art methods cannot utilize them due to their huge GPU memory usage, as well as the instable training process. However, plenty of studies have shown that global information is crucial for image restoration t… ▽ More Neural-networks based image restoration methods tend to use low-resolution image patches for training. Although higher-resolution image patches can provide more global information, state-of-the-art methods cannot utilize them due to their huge GPU memory usage, as well as the instable training process. However, plenty of studies have shown that global information is crucial for image restoration tasks like image demosaicing and enhancing. In this work, we propose a HighEr-Resolution Network (HERN) to fully learning global information in high-resolution image patches. To achieve this, the HERN employs two parallel paths to learn image features in two different resolutions, respectively. By combining global-aware features and multi-scale features, our HERN is able to learn global information with feasible GPU memory usage. Besides, we introduce a progressive training method to solve the instability issue and accelerate model convergence. On the task of image demosaicing and enhancing, our HERN achieves state-of-the-art performance on the AIM2019 RAW to RGB map** challenge. The source code of our implementation is available at https://github.com/MKFMIKU/RAW2RGBNet. △ Less

Submitted 19 November, 2019; originally announced November 2019.

Comments: Accepted in ICCV 2019 Workshop (AIM2019 Raw to RGB Challenge Winner)

arXiv:1911.03886 [pdf, ps, other]

doi 10.1109/TCOMM.2021.3083597

Performance Analysis on Machine Learning-Based Channel Estimation

Authors: Kai Mei, Jun Liu, Xiaochen Zhang, Nandana Rajatheva, Jibo Wei

Abstract: Recently, machine learning-based channel estimation has attracted much attention. The performance of machine learning-based estimation has been validated by simulation experiments. However, little attention has been paid to the theoretical performance analysis. In this paper, we investigate the mean square error (MSE) performance of machine learning-based estimation. Hypothesis testing is employed… ▽ More Recently, machine learning-based channel estimation has attracted much attention. The performance of machine learning-based estimation has been validated by simulation experiments. However, little attention has been paid to the theoretical performance analysis. In this paper, we investigate the mean square error (MSE) performance of machine learning-based estimation. Hypothesis testing is employed to analyze its MSE upper bound. Furthermore, we build a statistical model for hypothesis testing, which holds when the linear learning module with a low input dimension is used in machine learning-based channel estimation, and derive a clear analytical relation between the size of the training data and performance. Then, we simulate the machine learning-based channel estimation in orthogonal frequency division multiplexing (OFDM) systems to verify our analysis results. Finally, the design considerations for the situation where only limited training data is available are discussed. In this situation, our analysis results can be applied to assess the performance and support the design of machine learning-based channel estimation. △ Less

Submitted 14 July, 2021; v1 submitted 10 November, 2019; originally announced November 2019.

Comments: 11 pages, 10 figures. To appear in IEEE Transactions on Communications

arXiv:1811.09346 [pdf, other]

Deep Neural Network Aided Scenario Identification in Wireless Multi-path Fading Channels

Authors: Jun Liu, Kai Mei, Dongtang Ma, Jibo Wei

Abstract: This letter illustrates our preliminary works in deep nerual network (DNN) for wireless communication scenario identification in wireless multi-path fading channels. In this letter, six kinds of channel scenarios referring to COST 207 channel model have been performed. 100% identification accuracy has been observed given signal-to-noise (SNR) over 20dB whereas a 88.4% average accuracy has been obt… ▽ More This letter illustrates our preliminary works in deep nerual network (DNN) for wireless communication scenario identification in wireless multi-path fading channels. In this letter, six kinds of channel scenarios referring to COST 207 channel model have been performed. 100% identification accuracy has been observed given signal-to-noise (SNR) over 20dB whereas a 88.4% average accuracy has been obtained where SNR ranged from 0dB to 40dB. The proposed method has tested under fast time-varying conditions, which were similar with real world wireless multi-path fading channels, enabling it to work feasibly in practical scenario identification. △ Less

Submitted 22 November, 2018; originally announced November 2018.

Comments: Draft of a four-page letter with 8 figures

arXiv:1811.07445 [pdf, other]

High-precision timing and frequency synchronization method for MIMO-OFDM systems in double-selective channels

Authors: Jun Liu, Kai Mei, Xiaochen Zhang, Xiaoying Zhang, Dongtang Ma, Jibo Wei

Abstract: In this letter, a novel synchronization method for MIMO-OFDM systems is proposed. The new approach has an accurate estimate of both symbol timing and large frequency offest. Simulation results show the excellent robustness of our method in double-selective channel even if the strongest multipath component arrives behind the first path. In this letter, a novel synchronization method for MIMO-OFDM systems is proposed. The new approach has an accurate estimate of both symbol timing and large frequency offest. Simulation results show the excellent robustness of our method in double-selective channel even if the strongest multipath component arrives behind the first path. △ Less

Submitted 18 November, 2018; originally announced November 2018.

Comments: 2 pages letter with 4 figures

arXiv:1810.12538 [pdf, ps, other]

doi 10.1109/TIP.2019.2953361

Phase asymmetry ultrasound despeckling with fractional anisotropic diffusion and total variation

Authors: Kunqiang Mei, Bin Hu, Baowei Fei, Binjie Qin

Abstract: We propose an ultrasound speckle filtering method for not only preserving various edge features but also filtering tissue-dependent complex speckle noises in ultrasound images. The key idea is to detect these various edges using a phase congruence-based edge significance measure called phase asymmetry (PAS), which is invariant to the intensity amplitude of edges and takes 0 in non-edge smooth regi… ▽ More We propose an ultrasound speckle filtering method for not only preserving various edge features but also filtering tissue-dependent complex speckle noises in ultrasound images. The key idea is to detect these various edges using a phase congruence-based edge significance measure called phase asymmetry (PAS), which is invariant to the intensity amplitude of edges and takes 0 in non-edge smooth regions and 1 at the idea step edge, while also taking intermediate values at slowly varying ramp edges. By leveraging the PAS metric in designing weighting coefficients to maintain a balance between fractional-order anisotropic diffusion and total variation (TV) filters in TV cost function, we propose a new fractional TV framework to not only achieve the best despeckling performance with ramp edge preservation but also reduce the staircase effect produced by integral-order filters. Then, we exploit the PAS metric in designing a new fractional-order diffusion coefficient to properly preserve low-contrast edges in diffusion filtering. Finally, different from fixed fractional-order diffusion filters, an adaptive fractional order is introduced based on the PAS metric to enhance various weak edges in the spatially transitional areas between objects. The proposed fractional TV model is minimized using the gradient descent method to obtain the final denoised image. The experimental results and real application of ultrasound breast image segmentation show that the proposed method outperforms other state-of-the-art ultrasound despeckling filters for both speckle reduction and feature preservation in terms of visual evaluation and quantitative indices. △ Less

Submitted 9 February, 2021; v1 submitted 30 October, 2018; originally announced October 2018.

Comments: 12

Journal ref: IEEE Transaction on Image Processing, 2020

arXiv:1810.02283 [pdf, other]

Progressive Feature Fusion Network for Realistic Image Dehazing

Authors: Kangfu Mei, Aiwen Jiang, Juncheng Li, Mingwen Wang

Abstract: Single image dehazing is a challenging ill-posed restoration problem. Various prior-based and learning-based methods have been proposed. Most of them follow a classic atmospheric scattering model which is an elegant simplified physical model based on the assumption of single-scattering and homogeneous atmospheric medium. The formulation of haze in realistic environment is more complicated. In this… ▽ More Single image dehazing is a challenging ill-posed restoration problem. Various prior-based and learning-based methods have been proposed. Most of them follow a classic atmospheric scattering model which is an elegant simplified physical model based on the assumption of single-scattering and homogeneous atmospheric medium. The formulation of haze in realistic environment is more complicated. In this paper, we propose to take its essential mechanism as "black box", and focus on learning an input-adaptive trainable end-to-end dehazing model. An U-Net like encoder-decoder deep network via progressive feature fusions has been proposed to directly learn highly nonlinear transformation function from observed hazy image to haze-free ground-truth. The proposed network is evaluated on two public image dehazing benchmarks. The experiments demonstrate that it can achieve superior performance when compared with popular state-of-the-art methods. With efficient GPU memory usage, it can satisfactorily recover ultra high definition hazed image up to 4K resolution, which is unaffordable by many deep learning based dehazing algorithms. △ Less

Submitted 4 October, 2018; originally announced October 2018.

Comments: 14 pages, 7 figures, 1 tables, accepted by ACCV2018

arXiv:1810.01831 [pdf, other]

An Effective Single-Image Super-Resolution Model Using Squeeze-and-Excitation Networks

Authors: Kangfu Mei, Aiwen Jiang, Juncheng Li, Jihua Ye, Mingwen Wang

Abstract: Recent works on single-image super-resolution are concentrated on improving performance through enhancing spatial encoding between convolutional layers. In this paper, we focus on modeling the correlations between channels of convolutional features. We present an effective deep residual network based on squeeze-and-excitation blocks (SEBlock) to reconstruct high-resolution (HR) image from low-reso… ▽ More Recent works on single-image super-resolution are concentrated on improving performance through enhancing spatial encoding between convolutional layers. In this paper, we focus on modeling the correlations between channels of convolutional features. We present an effective deep residual network based on squeeze-and-excitation blocks (SEBlock) to reconstruct high-resolution (HR) image from low-resolution (LR) image. SEBlock is used to adaptively recalibrate channel-wise feature map**s. Further, short connections between each SEBlock are used to remedy information loss. Extensive experiments show that our model can achieve the state-of-the-art performance and get finer texture details. △ Less

Submitted 3 October, 2018; originally announced October 2018.

Comments: 12 pages, accepted by ICONIP2018

arXiv:1608.07538 [pdf, other]

doi 10.1140/epjc/s10052-017-4968-5

Higgs Physics at the CLIC Electron-Positron Linear Collider

Authors: H. Abramowicz, A. Abusleme, K. Afanaciev, N. Alipour Tehrani, C. Balázs, Y. Benhammou, M. Benoit, B. Bilki, J. -J. Blaising, M. J. Boland, M. Boronat, O. Borysov, I. Božović-Jelisavčić, M. Buckland, S. Bugiel, P. N. Burrows, T. K. Charles, W. Daniluk, D. Dannheim, R. Dasgupta, M. Demarteau, M. A. Díaz Gutierrez, G. Eigen, K. Elsener, U. Felzmann , et al. (99 additional authors not shown)

Abstract: The Compact Linear Collider (CLIC) is an option for a future e+e- collider operating at centre-of-mass energies up to 3 TeV, providing sensitivity to a wide range of new physics phenomena and precision physics measurements at the energy frontier. This paper is the first comprehensive presentation of the Higgs physics reach of CLIC operating at three energy stages: sqrt(s) = 350 GeV, 1.4 TeV and 3… ▽ More The Compact Linear Collider (CLIC) is an option for a future e+e- collider operating at centre-of-mass energies up to 3 TeV, providing sensitivity to a wide range of new physics phenomena and precision physics measurements at the energy frontier. This paper is the first comprehensive presentation of the Higgs physics reach of CLIC operating at three energy stages: sqrt(s) = 350 GeV, 1.4 TeV and 3 TeV. The initial stage of operation allows the study of Higgs boson production in Higgsstrahlung (e+e- -> ZH) and WW-fusion (e+e- -> Hnunu), resulting in precise measurements of the production cross sections, the Higgs total decay width Gamma_H, and model-independent determinations of the Higgs couplings. Operation at sqrt(s) > 1 TeV provides high-statistics samples of Higgs bosons produced through WW-fusion, enabling tight constraints on the Higgs boson couplings. Studies of the rarer processes e+e- -> ttH and e+e- -> HHnunu allow measurements of the top Yukawa coupling and the Higgs boson self-coupling. This paper presents detailed studies of the precision achievable with Higgs measurements at CLIC and describes the interpretation of these measurements in a global fit. △ Less

Submitted 5 June, 2017; v1 submitted 26 August, 2016; originally announced August 2016.

Comments: 42 pages, 29 figures, accepted for publication in the European Physical Journal C

Report number: CLICdp-Pub-2016-001

Journal ref: Eur. Phys. J. C 77, 475 (2017)

Showing 1–50 of 50 results for author: Mei, K