Search | arXiv e-print repository

arXiv:2406.19931 [pdf, other]

Decoupling General and Personalized Knowledge in Federated Learning via Additive and Low-Rank Decomposition

Authors: Xinghao Wu, Xuefeng Liu, Jianwei Niu, Haolin Wang, Shaojie Tang, Guogang Zhu, Hao Su

Abstract: To address data heterogeneity, the key strategy of Personalized Federated Learning (PFL) is to decouple general knowledge (shared among clients) and client-specific knowledge, as the latter can have a negative impact on collaboration if not removed. Existing PFL methods primarily adopt a parameter partitioning approach, where the parameters of a model are designated as one of two types: parameters… ▽ More To address data heterogeneity, the key strategy of Personalized Federated Learning (PFL) is to decouple general knowledge (shared among clients) and client-specific knowledge, as the latter can have a negative impact on collaboration if not removed. Existing PFL methods primarily adopt a parameter partitioning approach, where the parameters of a model are designated as one of two types: parameters shared with other clients to extract general knowledge and parameters retained locally to learn client-specific knowledge. However, as these two types of parameters are put together like a jigsaw puzzle into a single model during the training process, each parameter may simultaneously absorb both general and client-specific knowledge, thus struggling to separate the two types of knowledge effectively. In this paper, we introduce FedDecomp, a simple but effective PFL paradigm that employs parameter additive decomposition to address this issue. Instead of assigning each parameter of a model as either a shared or personalized one, FedDecomp decomposes each parameter into the sum of two parameters: a shared one and a personalized one, thus achieving a more thorough decoupling of shared and personalized knowledge compared to the parameter partitioning method. In addition, as we find that retaining local knowledge of specific clients requires much lower model capacity compared with general knowledge across all clients, we let the matrix containing personalized parameters be low rank during the training process. Moreover, a new alternating training strategy is proposed to further improve the performance. Experimental results across multiple datasets and varying degrees of data heterogeneity demonstrate that FedDecomp outperforms state-of-the-art methods up to 4.9\%. △ Less

Submitted 28 June, 2024; originally announced June 2024.

Comments: 12 pages, 8 figures

arXiv:2406.19466 [pdf, other]

doi 10.1145/3658644.3670298

Data Poisoning Attacks to Locally Differentially Private Frequent Itemset Mining Protocols

Authors: Wei Tong, Haoyu Chen, Jiacheng Niu, Sheng Zhong

Abstract: Local differential privacy (LDP) provides a way for an untrusted data collector to aggregate users' data without violating their privacy. Various privacy-preserving data analysis tasks have been studied under the protection of LDP, such as frequency estimation, frequent itemset mining, and machine learning. Despite its privacy-preserving properties, recent research has demonstrated the vulnerabili… ▽ More Local differential privacy (LDP) provides a way for an untrusted data collector to aggregate users' data without violating their privacy. Various privacy-preserving data analysis tasks have been studied under the protection of LDP, such as frequency estimation, frequent itemset mining, and machine learning. Despite its privacy-preserving properties, recent research has demonstrated the vulnerability of certain LDP protocols to data poisoning attacks. However, existing data poisoning attacks are focused on basic statistics under LDP, such as frequency estimation and mean/variance estimation. As an important data analysis task, the security of LDP frequent itemset mining has yet to be thoroughly examined. In this paper, we aim to address this issue by presenting novel and practical data poisoning attacks against LDP frequent itemset mining protocols. By introducing a unified attack framework with composable attack operations, our data poisoning attack can successfully manipulate the state-of-the-art LDP frequent itemset mining protocols and has the potential to be adapted to other protocols with similar structures. We conduct extensive experiments on three datasets to compare the proposed attack with four baseline attacks. The results demonstrate the severity of the threat and the effectiveness of the proposed attack. △ Less

Submitted 27 June, 2024; originally announced June 2024.

Comments: To appear in ACM Conference on Computer and Communications Security (ACM CCS 2024)

arXiv:2405.19789 [pdf, other]

Estimating before Debiasing: A Bayesian Approach to Detaching Prior Bias in Federated Semi-Supervised Learning

Authors: Guogang Zhu, Xuefeng Liu, Xinghao Wu, Shaojie Tang, Chao Tang, Jianwei Niu, Hao Su

Abstract: Federated Semi-Supervised Learning (FSSL) leverages both labeled and unlabeled data on clients to collaboratively train a model.In FSSL, the heterogeneous data can introduce prediction bias into the model, causing the model's prediction to skew towards some certain classes. Existing FSSL methods primarily tackle this issue by enhancing consistency in model parameters or outputs. However, as the mo… ▽ More Federated Semi-Supervised Learning (FSSL) leverages both labeled and unlabeled data on clients to collaboratively train a model.In FSSL, the heterogeneous data can introduce prediction bias into the model, causing the model's prediction to skew towards some certain classes. Existing FSSL methods primarily tackle this issue by enhancing consistency in model parameters or outputs. However, as the models themselves are biased, merely constraining their consistency is not sufficient to alleviate prediction bias. In this paper, we explore this bias from a Bayesian perspective and demonstrate that it principally originates from label prior bias within the training data. Building upon this insight, we propose a debiasing method for FSSL named FedDB. FedDB utilizes the Average Prediction Probability of Unlabeled Data (APP-U) to approximate the biased prior.During local training, FedDB employs APP-U to refine pseudo-labeling through Bayes' theorem, thereby significantly reducing the label prior bias. Concurrently, during the model aggregation, FedDB uses APP-U from participating clients to formulate unbiased aggregate weights, thereby effectively diminishing bias in the global model. Experimental results show that FedDB can surpass existing FSSL methods. The code is available at https://github.com/GuogangZhu/FedDB. △ Less

Submitted 30 May, 2024; originally announced May 2024.

Comments: Accepted by IJCAI 2024

arXiv:2405.19694 [pdf, other]

Grade Like a Human: Rethinking Automated Assessment with Large Language Models

Authors: Wen**g Xie, Juxin Niu, Chun Jason Xue, Nan Guan

Abstract: While large language models (LLMs) have been used for automated grading, they have not yet achieved the same level of performance as humans, especially when it comes to grading complex questions. Existing research on this topic focuses on a particular step in the grading procedure: grading using predefined rubrics. However, grading is a multifaceted procedure that encompasses other crucial steps,… ▽ More While large language models (LLMs) have been used for automated grading, they have not yet achieved the same level of performance as humans, especially when it comes to grading complex questions. Existing research on this topic focuses on a particular step in the grading procedure: grading using predefined rubrics. However, grading is a multifaceted procedure that encompasses other crucial steps, such as grading rubrics design and post-grading review. There has been a lack of systematic research exploring the potential of LLMs to enhance the entire grading~process. In this paper, we propose an LLM-based grading system that addresses the entire grading procedure, including the following key components: 1) Develo** grading rubrics that not only consider the questions but also the student answers, which can more accurately reflect students' performance. 2) Under the guidance of grading rubrics, providing accurate and consistent scores for each student, along with customized feedback. 3) Conducting post-grading review to better ensure accuracy and fairness. Additionally, we collected a new dataset named OS from a university operating system course and conducted extensive experiments on both our new dataset and the widely used Mohler dataset. Experiments demonstrate the effectiveness of our proposed approach, providing some new insights for develo** automated grading systems based on LLMs. △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.04071 [pdf, other]

IMU-Aided Event-based Stereo Visual Odometry

Authors: Junkai Niu, Sheng Zhong, Yi Zhou

Abstract: Direct methods for event-based visual odometry solve the map** and camera pose tracking sub-problems by establishing implicit data association in a way that the generative model of events is exploited. The main bottlenecks faced by state-of-the-art work in this field include the high computational complexity of map** and the limited accuracy of tracking. In this paper, we improve our previous… ▽ More Direct methods for event-based visual odometry solve the map** and camera pose tracking sub-problems by establishing implicit data association in a way that the generative model of events is exploited. The main bottlenecks faced by state-of-the-art work in this field include the high computational complexity of map** and the limited accuracy of tracking. In this paper, we improve our previous direct pipeline \textit{Event-based Stereo Visual Odometry} in terms of accuracy and efficiency. To speed up the map** operation, we propose an efficient strategy of edge-pixel sampling according to the local dynamics of events. The map** performance in terms of completeness and local smoothness is also improved by combining the temporal stereo results and the static stereo results. To circumvent the degeneracy issue of camera pose tracking in recovering the yaw component of general 6-DoF motion, we introduce as a prior the gyroscope measurements via pre-integration. Experiments on publicly available datasets justify our improvement. We release our pipeline as an open-source software for future research in this field. △ Less

Submitted 7 May, 2024; originally announced May 2024.

Comments: 10 pages, 7 figures, ICRA

arXiv:2405.02421 [pdf, other]

What does the Knowledge Neuron Thesis Have to do with Knowledge?

Authors: **gcheng Niu, Andrew Liu, Zining Zhu, Gerald Penn

Abstract: We reassess the Knowledge Neuron (KN) Thesis: an interpretation of the mechanism underlying the ability of large language models to recall facts from a training corpus. This nascent thesis proposes that facts are recalled from the training corpus through the MLP weights in a manner resembling key-value memory, implying in effect that "knowledge" is stored in the network. Furthermore, by modifying… ▽ More We reassess the Knowledge Neuron (KN) Thesis: an interpretation of the mechanism underlying the ability of large language models to recall facts from a training corpus. This nascent thesis proposes that facts are recalled from the training corpus through the MLP weights in a manner resembling key-value memory, implying in effect that "knowledge" is stored in the network. Furthermore, by modifying the MLP modules, one can control the language model's generation of factual information. The plausibility of the KN thesis has been demonstrated by the success of KN-inspired model editing methods (Dai et al., 2022; Meng et al., 2022). We find that this thesis is, at best, an oversimplification. Not only have we found that we can edit the expression of certain linguistic phenomena using the same model editing methods but, through a more comprehensive evaluation, we have found that the KN thesis does not adequately explain the process of factual expression. While it is possible to argue that the MLP weights store complex patterns that are interpretable both syntactically and semantically, these patterns do not constitute "knowledge." To gain a more comprehensive understanding of the knowledge representation process, we must look beyond the MLP weights and explore recent models' complex layer structures and attention mechanisms. △ Less

Submitted 3 May, 2024; originally announced May 2024.

Comments: ICLR 2024 (Spotlight)

arXiv:2404.17808 [pdf, other]

Scaffold-BPE: Enhancing Byte Pair Encoding with Simple and Effective Scaffold Token Removal

Authors: Haoran Lian, Yizhe Xiong, Jianwei Niu, Shasha Mo, Zhenpeng Su, Zijia Lin, Peng Liu, Hui Chen, Guiguang Ding

Abstract: Byte Pair Encoding (BPE) serves as a foundation method for text tokenization in the Natural Language Processing (NLP) field. Despite its wide adoption, the original BPE algorithm harbors an inherent flaw: it inadvertently introduces a frequency imbalance for tokens in the text corpus. Since BPE iteratively merges the most frequent token pair in the text corpus while kee** all tokens that have be… ▽ More Byte Pair Encoding (BPE) serves as a foundation method for text tokenization in the Natural Language Processing (NLP) field. Despite its wide adoption, the original BPE algorithm harbors an inherent flaw: it inadvertently introduces a frequency imbalance for tokens in the text corpus. Since BPE iteratively merges the most frequent token pair in the text corpus while kee** all tokens that have been merged in the vocabulary, it unavoidably holds tokens that primarily represent subwords of complete words and appear infrequently on their own in the text corpus. We term such tokens as Scaffold Tokens. Due to their infrequent appearance in the text corpus, Scaffold Tokens pose a learning imbalance issue for language models. To address that issue, we propose Scaffold-BPE, which incorporates a dynamic scaffold token removal mechanism by parameter-free, computation-light, and easy-to-implement modifications to the original BPE. This novel approach ensures the exclusion of low-frequency Scaffold Tokens from the token representations for the given texts, thereby mitigating the issue of frequency imbalance and facilitating model training. On extensive experiments across language modeling tasks and machine translation tasks, Scaffold-BPE consistently outperforms the original BPE, well demonstrating its effectiveness and superiority. △ Less

Submitted 27 April, 2024; originally announced April 2024.

arXiv:2404.17785 [pdf, other]

Temporal Scaling Law for Large Language Models

Authors: Yizhe Xiong, Xiansheng Chen, Xin Ye, Hui Chen, Zijia Lin, Haoran Lian, Zhenpeng Su, Jianwei Niu, Guiguang Ding

Abstract: Recently, Large Language Models (LLMs) have been widely adopted in a wide range of tasks, leading to increasing attention towards the research on how scaling LLMs affects their performance. Existing works, termed Scaling Laws, have discovered that the final test loss of LLMs scales as power-laws with model size, computational budget, and dataset size. However, the temporal change of the test loss… ▽ More Recently, Large Language Models (LLMs) have been widely adopted in a wide range of tasks, leading to increasing attention towards the research on how scaling LLMs affects their performance. Existing works, termed Scaling Laws, have discovered that the final test loss of LLMs scales as power-laws with model size, computational budget, and dataset size. However, the temporal change of the test loss of an LLM throughout its pre-training process remains unexplored, though it is valuable in many aspects, such as selecting better hyperparameters \textit{directly} on the target LLM. In this paper, we propose the novel concept of Temporal Scaling Law, studying how the test loss of an LLM evolves as the training steps scale up. In contrast to modeling the test loss as a whole in a coarse-grained manner, we break it down and dive into the fine-grained test loss of each token position, and further develop a dynamic hyperbolic-law. Afterwards, we derive the much more precise temporal scaling law by studying the temporal patterns of the parameters in the dynamic hyperbolic-law. Results on both in-distribution (ID) and out-of-distribution (OOD) validation datasets demonstrate that our temporal scaling law accurately predicts the test loss of LLMs across training steps. Our temporal scaling law has broad practical applications. First, it enables direct and efficient hyperparameter selection on the target LLM, such as data mixture proportions. Secondly, viewing the LLM pre-training dynamics from the token position granularity provides some insights to enhance the understanding of LLM pre-training. △ Less

Submitted 16 June, 2024; v1 submitted 27 April, 2024; originally announced April 2024.

Comments: 8 pages, 3 figures; Under review

arXiv:2403.11506 [pdf, other]

End-To-End Underwater Video Enhancement: Dataset and Model

Authors: Dazhao Du, Enhan Li, Lingyu Si, Fanjiang Xu, Jianwei Niu

Abstract: Underwater video enhancement (UVE) aims to improve the visibility and frame quality of underwater videos, which has significant implications for marine research and exploration. However, existing methods primarily focus on develo** image enhancement algorithms to enhance each frame independently. There is a lack of supervised datasets and models specifically tailored for UVE tasks. To fill this… ▽ More Underwater video enhancement (UVE) aims to improve the visibility and frame quality of underwater videos, which has significant implications for marine research and exploration. However, existing methods primarily focus on develo** image enhancement algorithms to enhance each frame independently. There is a lack of supervised datasets and models specifically tailored for UVE tasks. To fill this gap, we construct the Synthetic Underwater Video Enhancement (SUVE) dataset, comprising 840 diverse underwater-style videos paired with ground-truth reference videos. Based on this dataset, we train a novel underwater video enhancement model, UVENet, which utilizes inter-frame relationships to achieve better enhancement performance. Through extensive experiments on both synthetic and real underwater videos, we demonstrate the effectiveness of our approach. This study represents the first comprehensive exploration of UVE to our knowledge. The code is available at https://anonymous.4open.science/r/UVENet. △ Less

Submitted 18 March, 2024; originally announced March 2024.

arXiv:2402.02797 [pdf, other]

Joint Attention-Guided Feature Fusion Network for Saliency Detection of Surface Defects

Authors: Xiaoheng Jiang, Feng Yan, Yang Lu, Ke Wang, Shuai Guo, Tianzhu Zhang, Yanwei Pang, Jianwei Niu, Mingliang Xu

Abstract: Surface defect inspection plays an important role in the process of industrial manufacture and production. Though Convolutional Neural Network (CNN) based defect inspection methods have made huge leaps, they still confront a lot of challenges such as defect scale variation, complex background, low contrast, and so on. To address these issues, we propose a joint attention-guided feature fusion netw… ▽ More Surface defect inspection plays an important role in the process of industrial manufacture and production. Though Convolutional Neural Network (CNN) based defect inspection methods have made huge leaps, they still confront a lot of challenges such as defect scale variation, complex background, low contrast, and so on. To address these issues, we propose a joint attention-guided feature fusion network (JAFFNet) for saliency detection of surface defects based on the encoder-decoder network. JAFFNet mainly incorporates a joint attention-guided feature fusion (JAFF) module into decoding stages to adaptively fuse low-level and high-level features. The JAFF module learns to emphasize defect features and suppress background noise during feature fusion, which is beneficial for detecting low-contrast defects. In addition, JAFFNet introduces a dense receptive field (DRF) module following the encoder to capture features with rich context information, which helps detect defects of different scales. The JAFF module mainly utilizes a learned joint channel-spatial attention map provided by high-level semantic features to guide feature fusion. The attention map makes the model pay more attention to defect features. The DRF module utilizes a sequence of multi-receptive-field (MRF) units with each taking as inputs all the preceding MRF feature maps and the original input. The obtained DRF features capture rich context information with a large range of receptive fields. Extensive experiments conducted on SD-saliency-900, Magnetic tile, and DAGM 2007 indicate that our method achieves promising performance in comparison with other state-of-the-art methods. Meanwhile, our method reaches a real-time defect detection speed of 66 FPS. △ Less

Submitted 5 February, 2024; originally announced February 2024.

arXiv:2401.01189 [pdf, other]

NID-SLAM: Neural Implicit Representation-based RGB-D SLAM in dynamic environments

Authors: Ziheng Xu, Jianwei Niu, Qingfeng Li, Tao Ren, Chen Chen

Abstract: Neural implicit representations have been explored to enhance visual SLAM algorithms, especially in providing high-fidelity dense map. Existing methods operate robustly in static scenes but struggle with the disruption caused by moving objects. In this paper we present NID-SLAM, which significantly improves the performance of neural SLAM in dynamic environments. We propose a new approach to enhanc… ▽ More Neural implicit representations have been explored to enhance visual SLAM algorithms, especially in providing high-fidelity dense map. Existing methods operate robustly in static scenes but struggle with the disruption caused by moving objects. In this paper we present NID-SLAM, which significantly improves the performance of neural SLAM in dynamic environments. We propose a new approach to enhance inaccurate regions in semantic masks, particularly in marginal areas. Utilizing the geometric information present in depth images, this method enables accurate removal of dynamic objects, thereby reducing the probability of camera drift. Additionally, we introduce a keyframe selection strategy for dynamic scenes, which enhances camera tracking robustness against large-scale objects and improves the efficiency of map**. Experiments on publicly available RGB-D datasets demonstrate that our method outperforms competitive neural SLAM approaches in tracking accuracy and map** quality in dynamic environments. △ Less

Submitted 16 May, 2024; v1 submitted 2 January, 2024; originally announced January 2024.

arXiv:2312.16470 [pdf, other]

ReSynthDetect: A Fundus Anomaly Detection Network with Reconstruction and Synthetic Features

Authors: **gqi Niu, Qinji Yu, Shiwen Dong, Zilong Wang, Kang Dang, Xiaowei Ding

Abstract: Detecting anomalies in fundus images through unsupervised methods is a challenging task due to the similarity between normal and abnormal tissues, as well as their indistinct boundaries. The current methods have limitations in accurately detecting subtle anomalies while avoiding false positives. To address these challenges, we propose the ReSynthDetect network which utilizes a reconstruction netwo… ▽ More Detecting anomalies in fundus images through unsupervised methods is a challenging task due to the similarity between normal and abnormal tissues, as well as their indistinct boundaries. The current methods have limitations in accurately detecting subtle anomalies while avoiding false positives. To address these challenges, we propose the ReSynthDetect network which utilizes a reconstruction network for modeling normal images, and an anomaly generator that produces synthetic anomalies consistent with the appearance of fundus images. By combining the features of consistent anomaly generation and image reconstruction, our method is suited for detecting fundus abnormalities. The proposed approach has been extensively tested on benchmark datasets such as EyeQ and IDRiD, demonstrating state-of-the-art performance in both image-level and pixel-level anomaly detection. Our experiments indicate a substantial 9% improvement in AUROC on EyeQ and a significant 17.1% improvement in AUPR on IDRiD. △ Less

Submitted 27 December, 2023; originally announced December 2023.

Comments: Accepted at BMVC2023

arXiv:2312.06240 [pdf, other]

UIEDP:Underwater Image Enhancement with Diffusion Prior

Authors: Dazhao Du, Enhan Li, Lingyu Si, Fanjiang Xu, Jianwei Niu, Fuchun Sun

Abstract: Underwater image enhancement (UIE) aims to generate clear images from low-quality underwater images. Due to the unavailability of clear reference images, researchers often synthesize them to construct paired datasets for training deep models. However, these synthesized images may sometimes lack quality, adversely affecting training outcomes. To address this issue, we propose UIE with Diffusion Pri… ▽ More Underwater image enhancement (UIE) aims to generate clear images from low-quality underwater images. Due to the unavailability of clear reference images, researchers often synthesize them to construct paired datasets for training deep models. However, these synthesized images may sometimes lack quality, adversely affecting training outcomes. To address this issue, we propose UIE with Diffusion Prior (UIEDP), a novel framework treating UIE as a posterior distribution sampling process of clear images conditioned on degraded underwater inputs. Specifically, UIEDP combines a pre-trained diffusion model capturing natural image priors with any existing UIE algorithm, leveraging the latter to guide conditional generation. The diffusion prior mitigates the drawbacks of inferior synthetic images, resulting in higher-quality image generation. Extensive experiments have demonstrated that our UIEDP yields significant improvements across various metrics, especially no-reference image quality assessment. And the generated enhanced images also exhibit a more natural appearance. △ Less

Submitted 11 December, 2023; originally announced December 2023.

arXiv:2312.00741 [pdf, ps, other]

Crystal: Enhancing Blockchain Mining Transparency with Quorum Certificate

Authors: Jianyu Niu, Fangyu Gai, Runchao Han, Ren Zhang, Yinqian Zhang, Chen Feng

Abstract: Researchers have discovered a series of theoretical attacks against Bitcoin's Nakamoto consensus; the most damaging ones are selfish mining, double-spending, and consistency delay attacks. These attacks have one common cause: block withholding. This paper proposes Crystal, which leverages quorum certificates to resist block withholding misbehavior. Crystal continuously elects committees from miner… ▽ More Researchers have discovered a series of theoretical attacks against Bitcoin's Nakamoto consensus; the most damaging ones are selfish mining, double-spending, and consistency delay attacks. These attacks have one common cause: block withholding. This paper proposes Crystal, which leverages quorum certificates to resist block withholding misbehavior. Crystal continuously elects committees from miners and requires each block to have a quorum certificate, i.e., a set of signatures issued by members of its committee. Consequently, an attacker has to publish its blocks to obtain quorum certificates, rendering block withholding impossible. To build Crystal, we design a novel two-round committee election in a Sybil-resistant, unpredictable and non-interactive way, and a reward mechanism to incentivize miners to follow the protocol. Our analysis and evaluations show that Crystal can significantly mitigate selfish mining and double-spending attacks. For example, in Bitcoin, an attacker with 30% of the total computation power will succeed in double-spending attacks with a probability of 15.6% to break the 6-confirmation rule; however, in Crystal, the success probability for the same attacker falls to 0.62%. We provide formal end-to-end safety proofs for Crystal, ensuring no unknown attacks will be introduced. To the best of our knowledge, Crystal is the first protocol that prevents selfish mining and double-spending attacks while providing safety proof. △ Less

Submitted 1 December, 2023; originally announced December 2023.

Comments: 17 pages, 9 figures

arXiv:2311.18189 [pdf, other]

Event-based Visual Inertial Velometer

Authors: Xiuyuan Lu, Yi Zhou, Junkai Niu, Sheng Zhong, Shaojie Shen

Abstract: Neuromorphic event-based cameras are bio-inspired visual sensors with asynchronous pixels and extremely high temporal resolution. Such favorable properties make them an excellent choice for solving state estimation tasks under aggressive ego motion. However, failures of camera pose tracking are frequently witnessed in state-of-the-art event-based visual odometry systems when the local map cannot b… ▽ More Neuromorphic event-based cameras are bio-inspired visual sensors with asynchronous pixels and extremely high temporal resolution. Such favorable properties make them an excellent choice for solving state estimation tasks under aggressive ego motion. However, failures of camera pose tracking are frequently witnessed in state-of-the-art event-based visual odometry systems when the local map cannot be updated in time. One of the biggest roadblocks for this specific field is the absence of efficient and robust methods for data association without imposing any assumption on the environment. This problem seems, however, unlikely to be addressed as in standard vision due to the motion-dependent observability of event data. Therefore, we propose a map**-free design for event-based visual-inertial state estimation in this paper. Instead of estimating the position of the event camera, we find that recovering the instantaneous linear velocity is more consistent with the differential working principle of event cameras. The proposed event-based visual-inertial velometer leverages a continuous-time formulation that incrementally fuses the heterogeneous measurements from a stereo event camera and an inertial measurement unit. Experiments on the synthetic dataset demonstrate that the proposed method can recover instantaneous linear velocity in metric scale with low latency. △ Less

Submitted 30 May, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

arXiv:2311.07783 [pdf, other]

Size-Aware Hypergraph Motifs

Authors: Jason Niu, Ilya D. Amburg, Sinan G. Aksoy, Ahmet Erdem Sarıyüce

Abstract: Complex systems frequently exhibit multi-way, rather than pairwise, interactions. These group interactions cannot be faithfully modeled as collections of pairwise interactions using graphs, and instead require hypergraphs. However, methods that analyze hypergraphs directly, rather than via lossy graph reductions, remain limited. Hypergraph motif mining holds promise in this regard, as motif patter… ▽ More Complex systems frequently exhibit multi-way, rather than pairwise, interactions. These group interactions cannot be faithfully modeled as collections of pairwise interactions using graphs, and instead require hypergraphs. However, methods that analyze hypergraphs directly, rather than via lossy graph reductions, remain limited. Hypergraph motif mining holds promise in this regard, as motif patterns serve as building blocks for larger group interactions which are inexpressible by graphs. Recent work has focused on categorizing and counting hypergraph motifs based on the existence of nodes in hyperedge intersection regions. Here, we argue that the relative sizes of hyperedge intersections within motifs contain varied and valuable information. We propose a suite of efficient algorithms for finding triplets of hyperedges based on optimizing the sizes of these intersection patterns. This formulation uncovers interesting local patterns of interaction, finding hyperedge triplets that either (1) are the least correlated with each other, (2) have the highest pairwise but not groupwise correlation, or (3) are the most correlated with each other. We formalize this as a combinatorial optimization problem and design efficient algorithms based on filtering hyperedges. Our experimental evaluation shows that the resulting hyperedge triplets yield insightful information on real-world hypergraphs. Our approach is also orders of magnitude faster than a naive baseline implementation. △ Less

Submitted 13 November, 2023; originally announced November 2023.

arXiv:2309.14755 [pdf, other]

Image Denoising via Style Disentanglement

Authors: **gwei Niu, Jun Cheng, Shan Tan

Abstract: Image denoising is a fundamental task in low-level computer vision. While recent deep learning-based image denoising methods have achieved impressive performance, they are black-box models and the underlying denoising principle remains unclear. In this paper, we propose a novel approach to image denoising that offers both clear denoising mechanism and good performance. We view noise as a type of i… ▽ More Image denoising is a fundamental task in low-level computer vision. While recent deep learning-based image denoising methods have achieved impressive performance, they are black-box models and the underlying denoising principle remains unclear. In this paper, we propose a novel approach to image denoising that offers both clear denoising mechanism and good performance. We view noise as a type of image style and remove it by incorporating noise-free styles derived from clean images. To achieve this, we design novel losses and network modules to extract noisy styles from noisy images and noise-free styles from clean images. The noise-free style induces low-response activations for noise features and high-response activations for content features in the feature space. This leads to the separation of clean contents from noise, effectively denoising the image. Unlike disentanglement-based image editing tasks that edit semantic-level attributes using styles, our main contribution lies in editing pixel-level attributes through global noise-free styles. We conduct extensive experiments on synthetic noise removal and real-world image denoising datasets (SIDD and DND), demonstrating the effectiveness of our method in terms of both PSNR and SSIM metrics. Moreover, we experimentally validate that our method offers good interpretability. △ Less

Submitted 26 September, 2023; originally announced September 2023.

arXiv:2309.11103 [pdf, other]

Bold but Cautious: Unlocking the Potential of Personalized Federated Learning through Cautiously Aggressive Collaboration

Authors: Xinghao Wu, Xuefeng Liu, Jianwei Niu, Guogang Zhu, Shaojie Tang

Abstract: Personalized federated learning (PFL) reduces the impact of non-independent and identically distributed (non-IID) data among clients by allowing each client to train a personalized model when collaborating with others. A key question in PFL is to decide which parameters of a client should be localized or shared with others. In current mainstream approaches, all layers that are sensitive to non-IID… ▽ More Personalized federated learning (PFL) reduces the impact of non-independent and identically distributed (non-IID) data among clients by allowing each client to train a personalized model when collaborating with others. A key question in PFL is to decide which parameters of a client should be localized or shared with others. In current mainstream approaches, all layers that are sensitive to non-IID data (such as classifier layers) are generally personalized. The reasoning behind this approach is understandable, as localizing parameters that are easily influenced by non-IID data can prevent the potential negative effect of collaboration. However, we believe that this approach is too conservative for collaboration. For example, for a certain client, even if its parameters are easily influenced by non-IID data, it can still benefit by sharing these parameters with clients having similar data distribution. This observation emphasizes the importance of considering not only the sensitivity to non-IID data but also the similarity of data distribution when determining which parameters should be localized in PFL. This paper introduces a novel guideline for client collaboration in PFL. Unlike existing approaches that prohibit all collaboration of sensitive parameters, our guideline allows clients to share more parameters with others, leading to improved model performance. Additionally, we propose a new PFL method named FedCAC, which employs a quantitative metric to evaluate each parameter's sensitivity to non-IID data and carefully selects collaborators based on this evaluation. Experimental results demonstrate that FedCAC enables clients to share more parameters with others, resulting in superior performance compared to state-of-the-art methods, particularly in scenarios where clients have diverse distributions. △ Less

Submitted 20 September, 2023; originally announced September 2023.

Comments: Accepted by ICCV2023

arXiv:2307.13995 [pdf, other]

Take Your Pick: Enabling Effective Personalized Federated Learning within Low-dimensional Feature Space

Authors: Guogang Zhu, Xuefeng Liu, Shaojie Tang, Jianwei Niu, Xinghao Wu, Jiaxing Shen

Abstract: Personalized federated learning (PFL) is a popular framework that allows clients to have different models to address application scenarios where clients' data are in different domains. The typical model of a client in PFL features a global encoder trained by all clients to extract universal features from the raw data and personalized layers (e.g., a classifier) trained using the client's local dat… ▽ More Personalized federated learning (PFL) is a popular framework that allows clients to have different models to address application scenarios where clients' data are in different domains. The typical model of a client in PFL features a global encoder trained by all clients to extract universal features from the raw data and personalized layers (e.g., a classifier) trained using the client's local data. Nonetheless, due to the differences between the data distributions of different clients (aka, domain gaps), the universal features produced by the global encoder largely encompass numerous components irrelevant to a certain client's local task. Some recent PFL methods address the above problem by personalizing specific parameters within the encoder. However, these methods encounter substantial challenges attributed to the high dimensionality and non-linearity of neural network parameter space. In contrast, the feature space exhibits a lower dimensionality, providing greater intuitiveness and interpretability as compared to the parameter space. To this end, we propose a novel PFL framework named FedPick. FedPick achieves PFL in the low-dimensional feature space by selecting task-relevant features adaptively for each client from the features generated by the global encoder based on its local data distribution. It presents a more accessible and interpretable implementation of PFL compared to those methods working in the parameter space. Extensive experimental results show that FedPick could effectively select task-relevant features for each client and improve model performance in cross-domain FL. △ Less

Submitted 26 July, 2023; originally announced July 2023.

Comments: 13 pages, 13 figures

arXiv:2307.09892 [pdf, other]

3Deformer: A Common Framework for Image-Guided Mesh Deformation

Authors: Hao Su, Xuefeng Liu, Jianwei Niu, Ji Wan, Xinghao Wu

Abstract: We propose 3Deformer, a general-purpose framework for interactive 3D shape editing. Given a source 3D mesh with semantic materials, and a user-specified semantic image, 3Deformer can accurately edit the source mesh following the shape guidance of the semantic image, while preserving the source topology as rigid as possible. Recent studies of 3D shape editing mostly focus on learning neural network… ▽ More We propose 3Deformer, a general-purpose framework for interactive 3D shape editing. Given a source 3D mesh with semantic materials, and a user-specified semantic image, 3Deformer can accurately edit the source mesh following the shape guidance of the semantic image, while preserving the source topology as rigid as possible. Recent studies of 3D shape editing mostly focus on learning neural networks to predict 3D shapes, which requires high-cost 3D training datasets and is limited to handling objects involved in the datasets. Unlike these studies, our 3Deformer is a non-training and common framework, which only requires supervision of readily-available semantic images, and is compatible with editing various objects unlimited by datasets. In 3Deformer, the source mesh is deformed utilizing the differentiable renderer technique, according to the correspondences between semantic images and mesh materials. However, guiding complex 3D shapes with a simple 2D image incurs extra challenges, that is, the deform accuracy, surface smoothness, geometric rigidity, and global synchronization of the edited mesh should be guaranteed. To address these challenges, we propose a hierarchical optimization architecture to balance the global and local shape features, and propose further various strategies and losses to improve properties of accuracy, smoothness, rigidity, and so on. Extensive experiments show that our 3Deformer is able to produce impressive results and reaches the state-of-the-art level. △ Less

Submitted 19 July, 2023; originally announced July 2023.

arXiv:2307.06123 [pdf, other]

SoK: Comparing Different Membership Inference Attacks with a Comprehensive Benchmark

Authors: Jun Niu, Xiaoyan Zhu, Moxuan Zeng, Ge Zhang, Qingyang Zhao, Chunhui Huang, Yangming Zhang, Suyu An, Yangzhong Wang, Xinghui Yue, Zhipeng He, Weihao Guo, Kuo Shen, Peng Liu, Yulong Shen, Xiaohong Jiang, Jianfeng Ma, Yuqing Zhang

Abstract: Membership inference (MI) attacks threaten user privacy through determining if a given data example has been used to train a target model. However, it has been increasingly recognized that the "comparing different MI attacks" methodology used in the existing works has serious limitations. Due to these limitations, we found (through the experiments in this work) that some comparison results reporte… ▽ More Membership inference (MI) attacks threaten user privacy through determining if a given data example has been used to train a target model. However, it has been increasingly recognized that the "comparing different MI attacks" methodology used in the existing works has serious limitations. Due to these limitations, we found (through the experiments in this work) that some comparison results reported in the literature are quite misleading. In this paper, we seek to develop a comprehensive benchmark for comparing different MI attacks, called MIBench, which consists not only the evaluation metrics, but also the evaluation scenarios. And we design the evaluation scenarios from four perspectives: the distance distribution of data samples in the target dataset, the distance between data samples of the target dataset, the differential distance between two datasets (i.e., the target dataset and a generated dataset with only nonmembers), and the ratio of the samples that are made no inferences by an MI attack. The evaluation metrics consist of ten typical evaluation metrics. We have identified three principles for the proposed "comparing different MI attacks" methodology, and we have designed and implemented the MIBench benchmark with 84 evaluation scenarios for each dataset. In total, we have used our benchmark to fairly and systematically compare 15 state-of-the-art MI attack algorithms across 588 evaluation scenarios, and these evaluation scenarios cover 7 widely used datasets and 7 representative types of models. All codes and evaluations of MIBench are publicly available at https://github.com/MIBench/MIBench.github.io/blob/main/README.md. △ Less

Submitted 12 July, 2023; originally announced July 2023.

Comments: 21 pages,15 figures

arXiv:2306.02701 [pdf, other]

Unlocking the Potential of Federated Learning for Deeper Models

Authors: Haolin Wang, Xuefeng Liu, Jianwei Niu, Shaojie Tang, Jiaxing Shen

Abstract: Federated learning (FL) is a new paradigm for distributed machine learning that allows a global model to be trained across multiple clients without compromising their privacy. Although FL has demonstrated remarkable success in various scenarios, recent studies mainly utilize shallow and small neural networks. In our research, we discover a significant performance decline when applying the existing… ▽ More Federated learning (FL) is a new paradigm for distributed machine learning that allows a global model to be trained across multiple clients without compromising their privacy. Although FL has demonstrated remarkable success in various scenarios, recent studies mainly utilize shallow and small neural networks. In our research, we discover a significant performance decline when applying the existing FL framework to deeper neural networks, even when client data are independently and identically distributed (i.i.d.). Our further investigation shows that the decline is due to the continuous accumulation of dissimilarities among client models during the layer-by-layer back-propagation process, which we refer to as "divergence accumulation." As deeper models involve a longer chain of divergence accumulation, they tend to manifest greater divergence, subsequently leading to performance decline. Both theoretical derivations and empirical evidence are proposed to support the existence of divergence accumulation and its amplified effects in deeper models. To address this issue, we propose several technical guidelines based on reducing divergence, such as using wider models and reducing the receptive field. These approaches can greatly improve the accuracy of FL on deeper models. For example, the application of these guidelines can boost the ResNet101 model's performance by as much as 43\% on the Tiny-ImageNet dataset. △ Less

Submitted 5 June, 2023; originally announced June 2023.

Comments: 16 pages, 8 figures

arXiv:2303.03817 [pdf, other]

doi 10.1109/ISBI53787.2023.10230619

Region and Spatial Aware Anomaly Detection for Fundus Images

Authors: **gqi Niu, Shiwen Dong, Qinji Yu, Kang Dang, Xiaowei Ding

Abstract: Recently anomaly detection has drawn much attention in diagnosing ocular diseases. Most existing anomaly detection research in fundus images has relatively large anomaly scores in the salient retinal structures, such as blood vessels, optical cups and discs. In this paper, we propose a Region and Spatial Aware Anomaly Detection (ReSAD) method for fundus images, which obtains local region and long-… ▽ More Recently anomaly detection has drawn much attention in diagnosing ocular diseases. Most existing anomaly detection research in fundus images has relatively large anomaly scores in the salient retinal structures, such as blood vessels, optical cups and discs. In this paper, we propose a Region and Spatial Aware Anomaly Detection (ReSAD) method for fundus images, which obtains local region and long-range spatial information to reduce the false positives in the normal structure. ReSAD transfers a pre-trained model to extract the features of normal fundus images and applies the Region-and-Spatial-Aware feature Combination module (ReSC) for pixel-level features to build a memory bank. In the testing phase, ReSAD uses the memory bank to determine out-of-distribution samples as abnormalities. Our method significantly outperforms the existing anomaly detection methods for fundus images on two publicly benchmark datasets. △ Less

Submitted 7 March, 2023; originally announced March 2023.

Report number: 2303.03817

Journal ref: 2023 IEEE 20th International Symposium on Biomedical Imaging (ISBI), Cartagena, Colombia, 2023, pp. 1-5

arXiv:2302.03222 [pdf, other]

Bringing the State-of-the-Art to Customers: A Neural Agent Assistant Framework for Customer Service Support

Authors: Stephen Obadinma, Faiza Khan Khattak, Shirley Wang, Tania Sidhom, Elaine Lau, Sean Robertson, **gcheng Niu, Winnie Au, Alif Munim, Karthik Raja K. Bhaskar, Bencheng Wei, Iris Ren, Waqar Muhammad, Erin Li, Bukola Ishola, Michael Wang, Griffin Tanner, Yu-Jia Shiah, Sean X. Zhang, Kwesi P. Apponsah, Kanishk Patel, Jaswinder Narain, Deval Pandya, Xiaodan Zhu, Frank Rudzicz , et al. (1 additional authors not shown)

Abstract: Building Agent Assistants that can help improve customer service support requires inputs from industry users and their customers, as well as knowledge about state-of-the-art Natural Language Processing (NLP) technology. We combine expertise from academia and industry to bridge the gap and build task/domain-specific Neural Agent Assistants (NAA) with three high-level components for: (1) Intent Iden… ▽ More Building Agent Assistants that can help improve customer service support requires inputs from industry users and their customers, as well as knowledge about state-of-the-art Natural Language Processing (NLP) technology. We combine expertise from academia and industry to bridge the gap and build task/domain-specific Neural Agent Assistants (NAA) with three high-level components for: (1) Intent Identification, (2) Context Retrieval, and (3) Response Generation. In this paper, we outline the pipeline of the NAA's core system and also present three case studies in which three industry partners successfully adapt the framework to find solutions to their unique challenges. Our findings suggest that a collaborative process is instrumental in spurring the development of emerging NLP models for Conversational AI tasks in industry. The full reference implementation code and results are available at \url{https://github.com/VectorInstitute/NAA} △ Less

Submitted 6 February, 2023; originally announced February 2023.

Comments: Camera Ready Version of Paper Published in EMNLP 2022 Industry Track

arXiv:2210.07490 [pdf, other]

Exploring Vanilla U-Net for Lesion Segmentation from Whole-body FDG-PET/CT Scans

Authors: ** Ye, Haoyu Wang, Ziyan Huang, Zhongying Deng, Yanzhou Su, Can Tu, Qian Wu, Yuncheng Yang, Meng Wei, **gqi Niu, Junjun He

Abstract: Tumor lesion segmentation is one of the most important tasks in medical image analysis. In clinical practice, Fluorodeoxyglucose Positron-Emission Tomography~(FDG-PET) is a widely used technique to identify and quantify metabolically active tumors. However, since FDG-PET scans only provide metabolic information, healthy tissue or benign disease with irregular glucose consumption may be mistaken fo… ▽ More Tumor lesion segmentation is one of the most important tasks in medical image analysis. In clinical practice, Fluorodeoxyglucose Positron-Emission Tomography~(FDG-PET) is a widely used technique to identify and quantify metabolically active tumors. However, since FDG-PET scans only provide metabolic information, healthy tissue or benign disease with irregular glucose consumption may be mistaken for cancer. To handle this challenge, PET is commonly combined with Computed Tomography~(CT), with the CT used to obtain the anatomic structure of the patient. The combination of PET-based metabolic and CT-based anatomic information can contribute to better tumor segmentation results. %Computed tomography~(CT) is a popular modality to illustrate the anatomic structure of the patient. The combination of PET and CT is promising to handle this challenge by utilizing metabolic and anatomic information. In this paper, we explore the potential of U-Net for lesion segmentation in whole-body FDG-PET/CT scans from three aspects, including network architecture, data preprocessing, and data augmentation. The experimental results demonstrate that the vanilla U-Net with proper input shape can achieve satisfactory performance. Specifically, our method achieves first place in both preliminary and final leaderboards of the autoPET 2022 challenge. Our code is available at https://github.com/Ye**0111/autoPET2022_Blackbean. △ Less

Submitted 13 October, 2022; originally announced October 2022.

Comments: autoPET 2022, MICCAI 2022 challenge, champion

arXiv:2209.08512 [pdf, other]

Phalanx: A Practical Byzantine Ordered Consensus Protocol

Authors: Guangren Wang, Liang Cai, Fangyu Gai, Jianyu Niu

Abstract: Byzantine fault tolerance (BFT) consensus is a fundamental primitive for distributed computation. However, BFT protocols suffer from the ordering manipulation, in which an adversary can make front-running. Several protocols are proposed to resolve the manipulation problem, but there are some limitations for them. The batch-based protocols such as Themis has significant performance loss because of… ▽ More Byzantine fault tolerance (BFT) consensus is a fundamental primitive for distributed computation. However, BFT protocols suffer from the ordering manipulation, in which an adversary can make front-running. Several protocols are proposed to resolve the manipulation problem, but there are some limitations for them. The batch-based protocols such as Themis has significant performance loss because of the use of complex algorithms to find strongly connected components (SCCs). The timestamp-based protocols such as Pompe have simplified the ordering phase, but they are limited on fairness that the adversary can manipulate the ordering via timestamps of transactions. In this paper, we propose a Byzantine ordered consensus protocol called Phalanx, in which transactions are committed by anchor-based ordering strategy. The anchor-based strategy makes aggregation of the Lamport logical clock of transactions on each participant and generates the final ordering without complex detection for SCCs. Therefore, Phalanx has achieved satisfying performance and performs better in resisting ordering manipulation than timestamp-based strategy. △ Less

Submitted 18 September, 2022; originally announced September 2022.

arXiv:2209.02247 [pdf, ps, other]

An evaluation of U-Net in Renal Structure Segmentation

Authors: Haoyu Wang, Ziyan Huang, ** Ye, Can Tu, Yuncheng Yang, Shiyi Du, Zhongying Deng, Chenglong Ma, **gqi Niu, Junjun He

Abstract: Renal structure segmentation from computed tomography angiography~(CTA) is essential for many computer-assisted renal cancer treatment applications. Kidney PArsing~(KiPA 2022) Challenge aims to build a fine-grained multi-structure dataset and improve the segmentation of multiple renal structures. Recently, U-Net has dominated the medical image segmentation. In the KiPA challenge, we evaluated seve… ▽ More Renal structure segmentation from computed tomography angiography~(CTA) is essential for many computer-assisted renal cancer treatment applications. Kidney PArsing~(KiPA 2022) Challenge aims to build a fine-grained multi-structure dataset and improve the segmentation of multiple renal structures. Recently, U-Net has dominated the medical image segmentation. In the KiPA challenge, we evaluated several U-Net variants and selected the best models for the final submission. △ Less

Submitted 6 September, 2022; originally announced September 2022.

arXiv:2208.02858 [pdf, other]

An Empirical Study on Ethereum Private Transactions and the Security Implications

Authors: Xingyu Lyu, Mengya Zhang, Xiaokuan Zhang, Jianyu Niu, Yinqian Zhang, Zhiqiang Lin

Abstract: Recently, Decentralized Finance (DeFi) platforms on Ethereum are booming, and numerous traders are trying to capitalize on the opportunity for maximizing their benefits by launching front-running attacks and extracting Miner Extractable Values (MEVs) based on information in the public mempool. To protect end users from being harmed and hide transactions from the mempool, private transactions, a sp… ▽ More Recently, Decentralized Finance (DeFi) platforms on Ethereum are booming, and numerous traders are trying to capitalize on the opportunity for maximizing their benefits by launching front-running attacks and extracting Miner Extractable Values (MEVs) based on information in the public mempool. To protect end users from being harmed and hide transactions from the mempool, private transactions, a special type of transactions that are sent directly to miners, were invented. Private transactions have a high probability of being packed to the front positions of a block and being added to the blockchain by the target miner, without going through the public mempool, thus reducing the risk of being attacked by malicious entities. Despite the good intention of inventing private transactions, due to their stealthy nature, private transactions have also been used by attackers to launch attacks, which has a negative impact on the Ethereum ecosystem. However, existing works only touch upon private transactions as by-products when studying MEV, while a systematic study on private transactions is still missing. To fill this gap and paint a complete picture of private transactions, we take the first step towards investigating the private transactions on Ethereum. In particular, we collect large-scale private transaction datasets and perform analysis on their characteristics, transaction costs and miner profits, as well as security impacts. This work provides deep insights on different aspects of private transactions. △ Less

Submitted 4 August, 2022; originally announced August 2022.

arXiv:2207.07301 [pdf, other]

Robust Deep Compressive Sensing with Recurrent-Residual Structural Constraints

Authors: Jun Niu

Abstract: Existing deep compressive sensing (CS) methods either ignore adaptive online optimization or depend on costly iterative optimizer during reconstruction. This work explores a novel image CS framework with recurrent-residual structural constraint, termed as R$^2$CS-NET. The R$^2$CS-NET first progressively optimizes the acquired samplings through a novel recurrent neural network. The cascaded residua… ▽ More Existing deep compressive sensing (CS) methods either ignore adaptive online optimization or depend on costly iterative optimizer during reconstruction. This work explores a novel image CS framework with recurrent-residual structural constraint, termed as R$^2$CS-NET. The R$^2$CS-NET first progressively optimizes the acquired samplings through a novel recurrent neural network. The cascaded residual convolutional network then fully reconstructs the image from optimized latent representation. As the first deep CS framework efficiently bridging adaptive online optimization, the R$^2$CS-NET integrates the robustness of online optimization with the efficiency and nonlinear capacity of deep learning methods. Signal correlation has been addressed through the network architecture. The adaptive sensing nature further makes it an ideal candidate for color image CS via leveraging channel correlation. Numerical experiments verify the proposed recurrent latent optimization design not only fulfills the adaptation motivation, but also outperforms classic long short-term memory (LSTM) architecture in the same scenario. The overall framework demonstrates hardware implementation feasibility, with leading robustness and generalization capability among existing deep CS benchmarks. △ Less

Submitted 15 July, 2022; originally announced July 2022.

arXiv:2206.01884 [pdf]

A Superimposed Divide-and-Conquer Image Recognition Method for SEM Images of Nanoparticles on The Surface of Monocrystalline silicon with High Aggregation Degree

Authors: Ruiling Xiao, Jiayang Niu

Abstract: The nanoparticle size and distribution information in the SEM images of silicon crystals are generally counted by manual methods. The realization of automatic machine recognition is significant in materials science. This paper proposed a superposition partitioning image recognition method to realize automatic recognition and information statistics of silicon crystal nanoparticle SEM images. Especi… ▽ More The nanoparticle size and distribution information in the SEM images of silicon crystals are generally counted by manual methods. The realization of automatic machine recognition is significant in materials science. This paper proposed a superposition partitioning image recognition method to realize automatic recognition and information statistics of silicon crystal nanoparticle SEM images. Especially for the complex and highly aggregated characteristics of silicon crystal particle size, an accurate recognition step and contour statistics method based on morphological processing are given. This method has technical reference value for the recognition of Monocrystalline silicon surface nanoparticle images under different SEM shooting conditions. Besides, it outperforms other methods in terms of recognition accuracy and algorithm efficiency. △ Less

Submitted 3 June, 2022; originally announced June 2022.

arXiv:2205.00834 [pdf, other]

Convex Augmentation for Total Variation Based Phase Retrieval

Authors: Jianwei Niu, Hok Shing Wong, Tieyong Zeng

Abstract: Phase retrieval is an important problem with significant physical and industrial applications. In this paper, we consider the case where the magnitude of the measurement of an underlying signal is corrupted by Gaussian noise. We introduce a convex augmentation approach for phase retrieval based on total variation regularization. In contrast to popular convex relaxation models like PhaseLift, our m… ▽ More Phase retrieval is an important problem with significant physical and industrial applications. In this paper, we consider the case where the magnitude of the measurement of an underlying signal is corrupted by Gaussian noise. We introduce a convex augmentation approach for phase retrieval based on total variation regularization. In contrast to popular convex relaxation models like PhaseLift, our model can be efficiently solved by a modified semi-proximal alternating direction method of multipliers (sPADMM). The modified sPADMM is more general and flexible than the standard one, and its convergence is also established in this paper. Extensive numerical experiments are conducted to showcase the effectiveness of the proposed method. △ Less

Submitted 21 April, 2022; originally announced May 2022.

arXiv:2203.15455 [pdf, other]

WeNet 2.0: More Productive End-to-End Speech Recognition Toolkit

Authors: Binbin Zhang, Di Wu, Zhendong Peng, Xingchen Song, Zhuoyuan Yao, Hang Lv, Lei Xie, Chao Yang, Fu** Pan, Jianwei Niu

Abstract: Recently, we made available WeNet, a production-oriented end-to-end speech recognition toolkit, which introduces a unified two-pass (U2) framework and a built-in runtime to address the streaming and non-streaming decoding modes in a single model. To further improve ASR performance and facilitate various production requirements, in this paper, we present WeNet 2.0 with four important updates. (1) W… ▽ More Recently, we made available WeNet, a production-oriented end-to-end speech recognition toolkit, which introduces a unified two-pass (U2) framework and a built-in runtime to address the streaming and non-streaming decoding modes in a single model. To further improve ASR performance and facilitate various production requirements, in this paper, we present WeNet 2.0 with four important updates. (1) We propose U2++, a unified two-pass framework with bidirectional attention decoders, which includes the future contextual information by a right-to-left attention decoder to improve the representative ability of the shared encoder and the performance during the rescoring stage. (2) We introduce an n-gram based language model and a WFST-based decoder into WeNet 2.0, promoting the use of rich text data in production scenarios. (3) We design a unified contextual biasing framework, which leverages user-specific context (e.g., contact lists) to provide rapid adaptation ability for production and improves ASR accuracy in both with-LM and without-LM scenarios. (4) We design a unified IO to support large-scale data for effective model training. In summary, the brand-new WeNet 2.0 achieves up to 10\% relative recognition performance improvement over the original WeNet on various corpora and makes available several important production-oriented features. △ Less

Submitted 5 July, 2022; v1 submitted 29 March, 2022; originally announced March 2022.

arXiv:2203.05158 [pdf, other]

Scaling Blockchain Consensus via a Robust Shared Mempool

Authors: Fangyu Gai, Jianyu Niu, Ivan Beschastnikh, Chen Feng, Sheng Wang

Abstract: There is a resurgence of interest in Byzantine fault-tolerant (BFT) systems due to blockchains. However, leader-based BFT consensus protocols used by permissioned blockchains have limited scalability and robustness. To alleviate the leader bottleneck in BFT consensus, we introduce Stratus, a robust shared mempool protocol that decouples transaction distribution from consensus. Our idea is to have… ▽ More There is a resurgence of interest in Byzantine fault-tolerant (BFT) systems due to blockchains. However, leader-based BFT consensus protocols used by permissioned blockchains have limited scalability and robustness. To alleviate the leader bottleneck in BFT consensus, we introduce Stratus, a robust shared mempool protocol that decouples transaction distribution from consensus. Our idea is to have replicas disseminate transactions in a distributed manner and have the leader only propose transaction ids. Stratus uses a provably available broadcast (PAB) protocol to ensure the availability of the referenced transactions. We implemented and evaluated Stratus by integrating it with state-of-the-art BFT-based blockchain protocols and evaluated these protocols in both LAN and WAN settings. Our results show that Stratus-based protocols achieve up to $5\sim20\times$ more throughput than their native counterparts in a network with hundreds of replicas. In addition, the performance of Stratus degrades gracefully in the presence of network asynchrony, Byzantine attackers, and unbalanced workloads. Our design provides easy-to-use APIs so that other BFT systems suffering from leader bottlenecks can use Stratus. △ Less

Submitted 25 September, 2022; v1 submitted 10 March, 2022; originally announced March 2022.

Comments: This work is to appear in ICDE 2023

arXiv:2110.04830 [pdf, other]

MARVEL: Raster Manga Vectorization via Primitive-wise Deep Reinforcement Learning

Authors: Hao Su, Jianwei Niu, Xuefeng Liu, Jiahe Cui, Ji Wan

Abstract: Manga is a fashionable Japanese-style comic form that is composed of black-and-white strokes and is generally displayed as raster images on digital devices. Typical mangas have simple textures, wide lines, and few color gradients, which are vectorizable natures to enjoy the merits of vector graphics, e.g., adaptive resolutions and small file sizes. In this paper, we propose MARVEL (MAnga's Raster… ▽ More Manga is a fashionable Japanese-style comic form that is composed of black-and-white strokes and is generally displayed as raster images on digital devices. Typical mangas have simple textures, wide lines, and few color gradients, which are vectorizable natures to enjoy the merits of vector graphics, e.g., adaptive resolutions and small file sizes. In this paper, we propose MARVEL (MAnga's Raster to VEctor Learning), a primitive-wise approach for vectorizing raster mangas by Deep Reinforcement Learning (DRL). Unlike previous learning-based methods which predict vector parameters for an entire image, MARVEL introduces a new perspective that regards an entire manga as a collection of basic primitives\textemdash stroke lines, and designs a DRL model to decompose the target image into a primitive sequence for achieving accurate vectorization. To improve vectorization accuracies and decrease file sizes, we further propose a stroke accuracy reward to predict accurate stroke lines, and a pruning mechanism to avoid generating erroneous and repeated strokes. Extensive subjective and objective experiments show that our MARVEL can generate impressive results and reaches the state-of-the-art level. Our code is open-source at: https://github.com/SwordHolderSH/Mang2Vec. △ Less

Submitted 18 July, 2023; v1 submitted 10 October, 2021; originally announced October 2021.

Comments: The name of the previous version paper was: Mang2Vec: Vectorization of raster manga by deep reinforcement learning

arXiv:2107.04947 [pdf, ps, other]

On the Performance of Pipelined HotStuff

Authors: Jianyu Niu, Fangyu Gai, Mohammad M. Jalalzai, Chen Feng

Abstract: HotStuff is a state-of-the-art Byzantine fault-tolerant consensus protocol. It can be pipelined to build large-scale blockchains. One of its variants called LibraBFT is adopted in Facebook's Libra blockchain. Although it is well known that pipelined HotStuff is secure against up to $1/3$ of Byzantine nodes, its performance in terms of throughput and delay is still under-explored. In this paper, we… ▽ More HotStuff is a state-of-the-art Byzantine fault-tolerant consensus protocol. It can be pipelined to build large-scale blockchains. One of its variants called LibraBFT is adopted in Facebook's Libra blockchain. Although it is well known that pipelined HotStuff is secure against up to $1/3$ of Byzantine nodes, its performance in terms of throughput and delay is still under-explored. In this paper, we develop a multi-metric evaluation framework to quantitatively analyze pipelined \mbox{HotStuff's performance} with respect to its chain growth rate, chain quality, and latency. We then propose two attack strategies and evaluate their effects on the performance of pipelined HotStuff. Our analysis shows that the chain growth rate (resp, chain quality) of pipelined HotStuff under our attacks can drop to as low as 4/9 (resp, 12/17) of that without attacks when $1/3$ nodes are Byzantine. As another application, we use our framework to evaluate certain engineering optimizations adopted by LibraBFT. We find that these optimizations make the system more vulnerable to our attacks than the original pipelined HotStuff. Finally, we provide two countermeasures to thwart these attacks. We hope that our studies can shed light on the rigorous understanding of the state-of-the-art pipelined HotStuff protocol as well as its variants. △ Less

Submitted 10 July, 2021; originally announced July 2021.

Comments: IEEE International Conference on Computer Communications (INFOCOM' 21)

arXiv:2106.00666 [pdf, other]

You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection

Authors: Yuxin Fang, Bencheng Liao, Xinggang Wang, Jiemin Fang, Jiyang Qi, Rui Wu, Jianwei Niu, Wenyu Liu

Abstract: Can Transformer perform 2D object- and region-level recognition from a pure sequence-to-sequence perspective with minimal knowledge about the 2D spatial structure? To answer this question, we present You Only Look at One Sequence (YOLOS), a series of object detection models based on the vanilla Vision Transformer with the fewest possible modifications, region priors, as well as inductive biases of… ▽ More Can Transformer perform 2D object- and region-level recognition from a pure sequence-to-sequence perspective with minimal knowledge about the 2D spatial structure? To answer this question, we present You Only Look at One Sequence (YOLOS), a series of object detection models based on the vanilla Vision Transformer with the fewest possible modifications, region priors, as well as inductive biases of the target task. We find that YOLOS pre-trained on the mid-sized ImageNet-1k dataset only can already achieve quite competitive performance on the challenging COCO object detection benchmark, e.g., YOLOS-Base directly adopted from BERT-Base architecture can obtain 42.0 box AP on COCO val. We also discuss the impacts as well as limitations of current pre-train schemes and model scaling strategies for Transformer in vision through YOLOS. Code and pre-trained models are available at https://github.com/hustvl/YOLOS. △ Less

Submitted 26 October, 2021; v1 submitted 1 June, 2021; originally announced June 2021.

Comments: NeurIPS 2021 Camera Ready

arXiv:2104.12533 [pdf, other]

Visformer: The Vision-friendly Transformer

Authors: Zhengsu Chen, Lingxi Xie, Jianwei Niu, Xuefeng Liu, Longhui Wei, Qi Tian

Abstract: The past year has witnessed the rapid development of applying the Transformer module to vision problems. While some researchers have demonstrated that Transformer-based models enjoy a favorable ability of fitting data, there are still growing number of evidences showing that these models suffer over-fitting especially when the training data is limited. This paper offers an empirical study by perfo… ▽ More The past year has witnessed the rapid development of applying the Transformer module to vision problems. While some researchers have demonstrated that Transformer-based models enjoy a favorable ability of fitting data, there are still growing number of evidences showing that these models suffer over-fitting especially when the training data is limited. This paper offers an empirical study by performing step-by-step operations to gradually transit a Transformer-based model to a convolution-based model. The results we obtain during the transition process deliver useful messages for improving visual recognition. Based on these observations, we propose a new architecture named Visformer, which is abbreviated from the `Vision-friendly Transformer'. With the same computational complexity, Visformer outperforms both the Transformer-based and convolution-based models in terms of ImageNet classification accuracy, and the advantage becomes more significant when the model complexity is lower or the training set is smaller. The code is available at https://github.com/danczs/Visformer. △ Less

Submitted 18 December, 2021; v1 submitted 26 April, 2021; originally announced April 2021.

arXiv:2103.12542 [pdf, other]

EmgAuth: Unlocking Smartphones with EMG Signals

Authors: Boyu Fan, Xiang Su, Jianwei Niu, Pan Hui

Abstract: Screen lock is a critical security feature for smartphones to prevent unauthorized access. Although various screen unlocking technologies, including fingerprint and facial recognition, have been widely adopted, they still have some limitations. For example, fingerprints can be stolen by special material stickers and facial recognition systems can be cheated by 3D-printed head models. In this paper… ▽ More Screen lock is a critical security feature for smartphones to prevent unauthorized access. Although various screen unlocking technologies, including fingerprint and facial recognition, have been widely adopted, they still have some limitations. For example, fingerprints can be stolen by special material stickers and facial recognition systems can be cheated by 3D-printed head models. In this paper, we propose EmgAuth, a novel electromyography(EMG)-based smartphone unlocking system based on the Siamese network. EmgAuth enables users to unlock their smartphones by leveraging the EMG data of the smartphone users collected from Myo armbands. When training the Siamese network, we design a special data augmentation technique to make the system resilient to the rotation of the armband, which makes EmgAuth free of calibration. We conduct extensive experiments including 53 participants and the evaluation results verify that EmgAuth can effectively authenticate users with an average true acceptance rate of 91.81% while kee** the average false acceptance rate of 7.43%. In addition, we also demonstrate that EmgAuth can work well for smartphones with different screen sizes and for different scenarios when users are placing smartphones at different locations and with different orientations. EmgAuth shows great promise to serve as a good supplement for existing screen unlocking systems to improve the safety of smartphones. △ Less

Submitted 21 March, 2021; originally announced March 2021.

Comments: 13 pages, 16 figures

arXiv:2103.00777 [pdf, other]

Dissecting the Performance of Chained-BFT

Authors: Fangyu Gai, Ali Farahbakhsh, Jianyu Niu, Chen Feng, Ivan Beschastnikh, Hao Duan

Abstract: Permissioned blockchains employ Byzantine fault-tolerant (BFT) state machine replication (SMR) to reach agreement on an ever-growing, linearly ordered log of transactions. A new paradigm, combined with decades of research in BFT SMR and blockchain (namely chained-BFT, or cBFT), has emerged for directly constructing blockchain protocols. Chained-BFT protocols have a unifying propose-vote scheme ins… ▽ More Permissioned blockchains employ Byzantine fault-tolerant (BFT) state machine replication (SMR) to reach agreement on an ever-growing, linearly ordered log of transactions. A new paradigm, combined with decades of research in BFT SMR and blockchain (namely chained-BFT, or cBFT), has emerged for directly constructing blockchain protocols. Chained-BFT protocols have a unifying propose-vote scheme instead of multiple different voting phases with a set of voting and commit rules to guarantee safety and liveness. However, distinct voting and commit rules impose varying impacts on performance under different workloads, network conditions, and Byzantine attacks. Therefore, a fair comparison of the proposed protocols poses a challenge that has not yet been addressed by existing work. We fill this gap by studying a family of cBFT protocols with a two-pronged systematic approach. First, we present an evaluation framework, Bamboo, for quick prototy** of cBFT protocols and that includes helpful benchmarking facilities. To validate Bamboo, we introduce an analytic model using queuing theory which also offers a back-of-the-envelope guide for dissecting these protocols. We build multiple cBFT protocols using Bamboo and we are the first to fairly compare three representatives (i.e., HotStuff, two-chain HotStuff, and Streamlet). We evaluated these protocols under various parameters and scenarios, including two Byzantine attacks that have not been widely discussed in the literature. Our findings reveal interesting trade-offs (e.g., responsiveness vs. forking-resilience) between different cBFT protocols and their design choices, which provide developers and researchers with insights into the design and implementation of this protocol family. △ Less

Submitted 1 March, 2021; originally announced March 2021.

Comments: 12 pages

arXiv:2012.15766 [pdf, other]

SelectScale: Mining More Patterns from Images via Selective and Soft Dropout

Authors: Zhengsu Chen, Jianwei Niu, Xuefeng Liu, Shaojie Tang

Abstract: Convolutional neural networks (CNNs) have achieved remarkable success in image recognition. Although the internal patterns of the input images are effectively learned by the CNNs, these patterns only constitute a small proportion of useful patterns contained in the input images. This can be attributed to the fact that the CNNs will stop learning if the learned patterns are enough to make a correct… ▽ More Convolutional neural networks (CNNs) have achieved remarkable success in image recognition. Although the internal patterns of the input images are effectively learned by the CNNs, these patterns only constitute a small proportion of useful patterns contained in the input images. This can be attributed to the fact that the CNNs will stop learning if the learned patterns are enough to make a correct classification. Network regularization methods like dropout and SpatialDropout can ease this problem. During training, they randomly drop the features. These dropout methods, in essence, change the patterns learned by the networks, and in turn, forces the networks to learn other patterns to make the correct classification. However, the above methods have an important drawback. Randomly drop** features is generally inefficient and can introduce unnecessary noise. To tackle this problem, we propose SelectScale. Instead of randomly drop** units, SelectScale selects the important features in networks and adjusts them during training. Using SelectScale, we improve the performance of CNNs on CIFAR and ImageNet. △ Less

Submitted 30 November, 2020; originally announced December 2020.

Comments: arXiv admin note: text overlap with arXiv:1810.09849 by other authors

arXiv:2012.14770 [pdf, other]

Hybrid Interest Modeling for Long-tailed Users

Authors: Lifang Deng, ** Niu, Angulia Yang, Qidi Xu, Xiang Fu, Jiandong Zhang, Anxiang Zeng

Abstract: User behavior modeling is a key technique for recommender systems. However, most methods focus on head users with large-scale interactions and hence suffer from data sparsity issues. Several solutions integrate side information such as demographic features and product reviews, another is to transfer knowledge from other rich data sources. We argue that current methods are limited by the strict pri… ▽ More User behavior modeling is a key technique for recommender systems. However, most methods focus on head users with large-scale interactions and hence suffer from data sparsity issues. Several solutions integrate side information such as demographic features and product reviews, another is to transfer knowledge from other rich data sources. We argue that current methods are limited by the strict privacy policy and have low scalability in real-world applications and few works consider the behavioral characteristics behind long-tailed users. In this work, we propose the Hybrid Interest Modeling (HIM) network to hybrid both personalized interest and semi-personalized interest in learning long-tailed users' preferences in the recommendation. To achieve this, we first design the User Behavior Pyramid (UBP) module to capture the fine-grained personalized interest of high confidence from sparse even noisy positive feedbacks. Moreover, the individual interaction is too sparse and not enough for modeling user interest adequately, we design the User Behavior Clustering (UBC) module to learn latent user interest groups with self-supervised learning mechanism novelly, which capture coarse-grained semi-personalized interest from group-item interaction data. Extensive experiments on both public and industrial datasets verify the superiority of HIM compared with the state-of-the-art baselines. △ Less

Submitted 29 December, 2020; originally announced December 2020.

arXiv:2012.01636 [pdf, other]

EBFT: Simplifying BFT Consensus Through Egalitarianism

Authors: Jianyu Niu, Runchao Han, Shengqi Liu, Fangyu Gai, Ivan Beschastnikh, Yinqian Zhang, Chen Feng

Abstract: We present Egalitarian BFT (EBFT), a simple and high-performance framework of BFT consensus protocols for decentralized systems like blockchains. The key innovation in EBFT is egalitarian block generation: nodes randomly and non-interactively propose blocks containing client transactions, rather than relying on a leader to do so. Apart from deterministic safety and liveness guarantees standard in… ▽ More We present Egalitarian BFT (EBFT), a simple and high-performance framework of BFT consensus protocols for decentralized systems like blockchains. The key innovation in EBFT is egalitarian block generation: nodes randomly and non-interactively propose blocks containing client transactions, rather than relying on a leader to do so. Apart from deterministic safety and liveness guarantees standard in BFT protocols, the egalitarian design provides two novel features: (i) EBFT is resilient to attacks targeting the leader, such as bribery and targeted DoS attacks, and (ii) EBFT does not require any fail-over protocol to detect and replace the faulty leader. EBFT consists of three protocols: EBFT-Syn for synchronous networks, EBFT-PSyn for partially synchronous networks, and EBFT-Turbo that builds on EBFT for high performance. We implement EBFT and evaluate its performance on AWS. To compare EBFT with state-of-the-art BFT protocols, we build EBFT-PSyn based on Bamboo, an open-source platform for prototy** partially synchronous BFT protocols. We evaluate EBFT-PSyn and HotStuff on EC2 with up to 16 nodes. The evaluation shows that EBFT-PSyn achieves better throughput and latency than HotStuff. To demonstrate its simplicity and practicality, we build EBFT on the Go version of Bitcoin, btcd. We implemented EBFT-Syn, EBFT-PSyn and EBFT-Turbo in about 920 LoCs in total. This indicates that EBFT can be built on top of existing blockchains with relatively little effort. We evaluate these protocols on EC2 instances with up to 256 nodes. Our evaluation shows that EBFT-Syn (resp. EBFT-PSyn) achieves a latency of 6 (resp. 1) seconds, and an optimized version of EBFT-PSyn processes up to 3.6k transactions per second and has a latency of 8 seconds. △ Less

Submitted 12 March, 2023; v1 submitted 2 December, 2020; originally announced December 2020.

Comments: 17 page, 12 figures

arXiv:2011.08516 [pdf, other]

ACSC: Automatic Calibration for Non-repetitive Scanning Solid-State LiDAR and Camera Systems

Authors: Jiahe Cui, Jianwei Niu, Zhenchao Ouyang, Yunxiang He, Dian Liu

Abstract: Recently, the rapid development of Solid-State LiDAR (SSL) enables low-cost and efficient obtainment of 3D point clouds from the environment, which has inspired a large quantity of studies and applications. However, the non-uniformity of its scanning pattern, and the inconsistency of the ranging error distribution bring challenges to its calibration task. In this paper, we proposed a fully automat… ▽ More Recently, the rapid development of Solid-State LiDAR (SSL) enables low-cost and efficient obtainment of 3D point clouds from the environment, which has inspired a large quantity of studies and applications. However, the non-uniformity of its scanning pattern, and the inconsistency of the ranging error distribution bring challenges to its calibration task. In this paper, we proposed a fully automatic calibration method for the non-repetitive scanning SSL and camera systems. First, a temporal-spatial-based geometric feature refinement method is presented, to extract effective features from SSL point clouds; then, the 3D corners of the calibration target (a printed checkerboard) are estimated with the reflectance distribution of points. Based on the above, a target-based extrinsic calibration method is finally proposed. We evaluate the proposed method on different types of LiDAR and camera sensor combinations in real conditions, and achieve accuracy and robustness calibration results. The code is available at https://github.com/HViktorTsoi/ACSC.git . △ Less

Submitted 17 November, 2020; originally announced November 2020.

Comments: conference

arXiv:2011.07815 [pdf, other]

An End-to-end Method for Producing Scanning-robust Stylized QR Codes

Authors: Hao Su, Jianwei Niu, Xuefeng Liu, Qingfeng Li, Ji Wan, Mingliang Xu, Tao Ren

Abstract: Quick Response (QR) code is one of the most worldwide used two-dimensional codes.~Traditional QR codes appear as random collections of black-and-white modules that lack visual semantics and aesthetic elements, which inspires the recent works to beautify the appearances of QR codes. However, these works adopt fixed generation algorithms and therefore can only generate QR codes with a pre-defined st… ▽ More Quick Response (QR) code is one of the most worldwide used two-dimensional codes.~Traditional QR codes appear as random collections of black-and-white modules that lack visual semantics and aesthetic elements, which inspires the recent works to beautify the appearances of QR codes. However, these works adopt fixed generation algorithms and therefore can only generate QR codes with a pre-defined style. In this paper, combining the Neural Style Transfer technique, we propose a novel end-to-end method, named ArtCoder, to generate the stylized QR codes that are personalized, diverse, attractive, and scanning-robust.~To guarantee that the generated stylized QR codes are still scanning-robust, we propose a Sampling-Simulation layer, a module-based code loss, and a competition mechanism. The experimental results show that our stylized QR codes have high-quality in both the visual effect and the scanning-robustness, and they are able to support the real-world application. △ Less

Submitted 16 November, 2020; originally announced November 2020.

Comments: 11 pages, 16 figures

arXiv:2010.11454 [pdf, other]

doi 10.1109/TDSC.2023.3308848

Fast-HotStuff: A Fast and Resilient HotStuff Protocol

Authors: Mohammad M. Jalalzai, Jianyu Niu, Chen Feng, Fangyu Gai

Abstract: The HotStuff protocol is a breakthrough in Byzantine Fault Tolerant (BFT) consensus that enjoys both responsiveness and linear view change. It creatively adds an additional round to classic BFT protocols (like PBFT) using two rounds. This brings us to an interesting question: Is this additional round really necessary in practice? In this paper, we answer this question by designing a new two-round… ▽ More The HotStuff protocol is a breakthrough in Byzantine Fault Tolerant (BFT) consensus that enjoys both responsiveness and linear view change. It creatively adds an additional round to classic BFT protocols (like PBFT) using two rounds. This brings us to an interesting question: Is this additional round really necessary in practice? In this paper, we answer this question by designing a new two-round BFT protocol called Fast-HotStuff, which enjoys responsiveness and efficient view change that is comparable to linear view change in terms of performance. Compared to (three-round) HotStuff, Fast-HotStuff has lower latency and is more robust against performance attacks that HotStuff is susceptible to. △ Less

Submitted 3 November, 2022; v1 submitted 22 October, 2020; originally announced October 2020.

Journal ref: IEEE Transactions on Dependable and Secure Computing ( Early Access ), Page(s): 1 - 17, Date of Publication: 25 August 2023

arXiv:2007.00639 [pdf, other]

End-to-End JPEG Decoding and Artifacts Suppression Using Heterogeneous Residual Convolutional Neural Network

Authors: Jun Niu

Abstract: Existing deep learning models separate JPEG artifacts suppression from the decoding protocol as independent task. In this work, we take one step forward to design a true end-to-end heterogeneous residual convolutional neural network (HR-CNN) with spectrum decomposition and heterogeneous reconstruction mechanism. Benefitting from the full CNN architecture and GPU acceleration, the proposed model co… ▽ More Existing deep learning models separate JPEG artifacts suppression from the decoding protocol as independent task. In this work, we take one step forward to design a true end-to-end heterogeneous residual convolutional neural network (HR-CNN) with spectrum decomposition and heterogeneous reconstruction mechanism. Benefitting from the full CNN architecture and GPU acceleration, the proposed model considerably improves the reconstruction efficiency. Numerical experiments show that the overall reconstruction speed reaches to the same magnitude of the standard CPU JPEG decoding protocol, while both decoding and artifacts suppression are completed together. We formulate the JPEG artifacts suppression task as an interactive process of decoding and image detail reconstructions. A heterogeneous, fully convolutional, mechanism is proposed to particularly address the uncorrelated nature of different spectral channels. Directly starting from the JPEG code in k-space, the network first extracts the spectral samples channel by channel, and restores the spectral snapshots with expanded throughput. These intermediate snapshots are then heterogeneously decoded and merged into the pixel space image. A cascaded residual learning segment is designed to further enhance the image details. Experiments verify that the model achieves outstanding performance in JPEG artifacts suppression, while its full convolutional operations and elegant network structure offers higher computational efficiency for practical online usage compared with other deep learning models on this topic. △ Less

Submitted 1 July, 2020; originally announced July 2020.

arXiv:2007.00186 [pdf, other]

The Hermes BFT for Blockchains

Authors: Mohammad M. Jalalzai, Chen Feng, Costas Busch, Golden G. Richard III, Jianyu Niu

Abstract: The performance of partially synchronous BFT-based consensus protocols is highly dependent on the primary node. All participant nodes in the network are blocked until they receive a proposal from the primary node to begin the consensus process.Therefore, an honest but slack node (with limited bandwidth) can adversely affect the performance when selected as primary. Hermes decreases protocol depend… ▽ More The performance of partially synchronous BFT-based consensus protocols is highly dependent on the primary node. All participant nodes in the network are blocked until they receive a proposal from the primary node to begin the consensus process.Therefore, an honest but slack node (with limited bandwidth) can adversely affect the performance when selected as primary. Hermes decreases protocol dependency on the primary node and minimizes transmission delay induced by the slack primary while kee** low message complexity and latency. Hermes achieves these performance improvements by relaxing strong BFT agreement (safety) guarantees only for a specific type of Byzantine faults (also called equivocated faults). Interestingly, we show that in Hermes equivocating by a Byzantine primary is unlikely, expensive and ineffective. Therefore, the safety of Hermes is comparable to the general BFT consensus. We deployed and tested Hermes on 190 Amazon EC2 instances. In these tests, Hermes's performance was comparable to the state-of-the-art BFT protocol for blockchains (when the network size is large) in the absence of slack nodes. Whereas, in the presence of slack nodes Hermes out performed the state-of-the-art BFT protocol by more than 4x in terms of throughput as well as 15x in terms of latency. △ Less

Submitted 30 June, 2020; originally announced July 2020.

arXiv:2004.12150 [pdf, other]

doi 10.1016/j.media.2021.101985

A Survey on Incorporating Domain Knowledge into Deep Learning for Medical Image Analysis

Authors: Xiaozheng Xie, Jianwei Niu, Xuefeng Liu, Zhengsu Chen, Shaojie Tang, Shui Yu

Abstract: Although deep learning models like CNNs have achieved great success in medical image analysis, the small size of medical datasets remains a major bottleneck in this area. To address this problem, researchers have started looking for external information beyond current available medical datasets. Traditional approaches generally leverage the information from natural images via transfer learning. Mo… ▽ More Although deep learning models like CNNs have achieved great success in medical image analysis, the small size of medical datasets remains a major bottleneck in this area. To address this problem, researchers have started looking for external information beyond current available medical datasets. Traditional approaches generally leverage the information from natural images via transfer learning. More recent works utilize the domain knowledge from medical doctors, to create networks that resemble how medical doctors are trained, mimic their diagnostic patterns, or focus on the features or areas they pay particular attention to. In this survey, we summarize the current progress on integrating medical domain knowledge into deep learning models for various tasks, such as disease diagnosis, lesion, organ and abnormality detection, lesion and organ segmentation. For each task, we systematically categorize different kinds of medical domain knowledge that have been utilized and their corresponding integrating methods. We also provide current challenges and directions for future research. △ Less

Submitted 8 February, 2021; v1 submitted 25 April, 2020; originally announced April 2020.

Comments: 27 pages, 18 figures

Journal ref: Medical Image Analysis 2021

arXiv:2004.10634 [pdf, other]

MangaGAN: Unpaired Photo-to-Manga Translation Based on The Methodology of Manga Drawing

Authors: Hao Su, Jianwei Niu, Xuefeng Liu, Qingfeng Li, Jiahe Cui, Ji Wan

Abstract: Manga is a world popular comic form originated in Japan, which typically employs black-and-white stroke lines and geometric exaggeration to describe humans' appearances, poses, and actions. In this paper, we propose MangaGAN, the first method based on Generative Adversarial Network (GAN) for unpaired photo-to-manga translation. Inspired by how experienced manga artists draw manga, MangaGAN generat… ▽ More Manga is a world popular comic form originated in Japan, which typically employs black-and-white stroke lines and geometric exaggeration to describe humans' appearances, poses, and actions. In this paper, we propose MangaGAN, the first method based on Generative Adversarial Network (GAN) for unpaired photo-to-manga translation. Inspired by how experienced manga artists draw manga, MangaGAN generates the geometric features of manga face by a designed GAN model and delicately translates each facial region into the manga domain by a tailored multi-GANs architecture. For training MangaGAN, we construct a new dataset collected from a popular manga work, containing manga facial features, landmarks, bodies, and so on. Moreover, to produce high-quality manga faces, we further propose a structural smoothing loss to smooth stroke-lines and avoid noisy pixels, and a similarity preserving module to improve the similarity between domains of photo and manga. Extensive experiments show that MangaGAN can produce high-quality manga faces which preserve both the facial similarity and a popular manga style, and outperforms other related state-of-the-art methods. △ Less

Submitted 17 December, 2020; v1 submitted 22 April, 2020; originally announced April 2020.

Comments: 17 pages

arXiv:2004.02767 [pdf, other]

Network Adjustment: Channel Search Guided by FLOPs Utilization Ratio

Authors: Zhengsu Chen, Jianwei Niu, Lingxi Xie, Xuefeng Liu, Longhui Wei, Qi Tian

Abstract: Automatic designing computationally efficient neural networks has received much attention in recent years. Existing approaches either utilize network pruning or leverage the network architecture search methods. This paper presents a new framework named network adjustment, which considers network accuracy as a function of FLOPs, so that under each network configuration, one can estimate the FLOPs u… ▽ More Automatic designing computationally efficient neural networks has received much attention in recent years. Existing approaches either utilize network pruning or leverage the network architecture search methods. This paper presents a new framework named network adjustment, which considers network accuracy as a function of FLOPs, so that under each network configuration, one can estimate the FLOPs utilization ratio (FUR) for each layer and use it to determine whether to increase or decrease the number of channels on the layer. Note that FUR, like the gradient of a non-linear function, is accurate only in a small neighborhood of the current network. Hence, we design an iterative mechanism so that the initial network undergoes a number of steps, each of which has a small `adjusting rate' to control the changes to the network. The computational overhead of the entire search process is reasonable, i.e., comparable to that of re-training the final model from scratch. Experiments on standard image classification datasets and a wide range of base networks demonstrate the effectiveness of our approach, which consistently outperforms the pruning counterpart. The code is available at https://github.com/danczs/NetworkAdjustment. △ Less

Submitted 6 April, 2020; originally announced April 2020.

Showing 1–50 of 64 results for author: Niu, J