Search | arXiv e-print repository

Open-domain Question Answering via Chain of Reasoning over Heterogeneous Knowledge

Authors: Kaixin Ma, Hao Cheng, Xiaodong Liu, Eric Nyberg, Jianfeng Gao

Abstract: We propose a novel open-domain question answering (ODQA) framework for answering single/multi-hop questions across heterogeneous knowledge sources. The key novelty of our method is the introduction of the intermediary modules into the current retriever-reader pipeline. Unlike previous methods that solely rely on the retriever for gathering all evidence in isolation, our intermediary performs a cha… ▽ More We propose a novel open-domain question answering (ODQA) framework for answering single/multi-hop questions across heterogeneous knowledge sources. The key novelty of our method is the introduction of the intermediary modules into the current retriever-reader pipeline. Unlike previous methods that solely rely on the retriever for gathering all evidence in isolation, our intermediary performs a chain of reasoning over the retrieved set. Specifically, our method links the retrieved evidence with its related global context into graphs and organizes them into a candidate list of evidence chains. Built upon pretrained language models, our system achieves competitive performance on two ODQA datasets, OTT-QA and NQ, against tables and passages from Wikipedia. In particular, our model substantially outperforms the previous state-of-the-art on OTT-QA with an exact match score of 47.3 (45 % relative gain). △ Less

Submitted 21 October, 2022; originally announced October 2022.

Comments: Findings of EMNLP 2022

arXiv:2210.03904 [pdf, other]

LW-ISP: A Lightweight Model with ISP and Deep Learning

Authors: Hongyang Chen, Kaisheng Ma

Abstract: The deep learning (DL)-based methods of low-level tasks have many advantages over the traditional camera in terms of hardware prospects, error accumulation and imaging effects. Recently, the application of deep learning to replace the image signal processing (ISP) pipeline has appeared one after another; however, there is still a long way to go towards real landing. In this paper, we show the poss… ▽ More The deep learning (DL)-based methods of low-level tasks have many advantages over the traditional camera in terms of hardware prospects, error accumulation and imaging effects. Recently, the application of deep learning to replace the image signal processing (ISP) pipeline has appeared one after another; however, there is still a long way to go towards real landing. In this paper, we show the possibility of learning-based method to achieve real-time high-performance processing in the ISP pipeline. We propose LW-ISP, a novel architecture designed to implicitly learn the image map** from RAW data to RGB image. Based on U-Net architecture, we propose the fine-grained attention module and a plug-and-play upsampling block suitable for low-level tasks. In particular, we design a heterogeneous distillation algorithm to distill the implicit features and reconstruction information of the clean image, so as to guide the learning of the student model. Our experiments demonstrate that LW-ISP has achieved a 0.38 dB improvement in PSNR compared to the previous best method, while the model parameters and calculation have been reduced by 23 times and 81 times. The inference efficiency has been accelerated by at least 15 times. Without bells and whistles, LW-ISP has achieved quite competitive results in ISP subtasks including image denoising and enhancement. △ Less

Submitted 8 October, 2022; originally announced October 2022.

Comments: 16 PAGES, ACCEPTED AS A CONFERENCE PAPER AT: BMVC 2022

arXiv:2210.03659 [pdf, other]

Spatio-temporal Tendency Reasoning for Human Body Pose and Shape Estimation from Videos

Authors: Boyang Zhang, Su** Wu, Hu Cao, Kehua Ma, Pan Li, Lei Lin

Abstract: In this paper, we present a spatio-temporal tendency reasoning (STR) network for recovering human body pose and shape from videos. Previous approaches have focused on how to extend 3D human datasets and temporal-based learning to promote accuracy and temporal smoothing. Different from them, our STR aims to learn accurate and natural motion sequences in an unconstrained environment through temporal… ▽ More In this paper, we present a spatio-temporal tendency reasoning (STR) network for recovering human body pose and shape from videos. Previous approaches have focused on how to extend 3D human datasets and temporal-based learning to promote accuracy and temporal smoothing. Different from them, our STR aims to learn accurate and natural motion sequences in an unconstrained environment through temporal and spatial tendency and to fully excavate the spatio-temporal features of existing video data. To this end, our STR learns the representation of features in the temporal and spatial dimensions respectively, to concentrate on a more robust representation of spatio-temporal features. More specifically, for efficient temporal modeling, we first propose a temporal tendency reasoning (TTR) module. TTR constructs a time-dimensional hierarchical residual connection representation within a video sequence to effectively reason temporal sequences' tendencies and retain effective dissemination of human information. Meanwhile, for enhancing the spatial representation, we design a spatial tendency enhancing (STE) module to further learns to excite spatially time-frequency domain sensitive features in human motion information representations. Finally, we introduce integration strategies to integrate and refine the spatio-temporal feature representations. Extensive experimental findings on large-scale publically available datasets reveal that our STR remains competitive with the state-of-the-art on three datasets. Our code are available at https://github.com/Changboyang/STR.git. △ Less

Submitted 9 October, 2022; v1 submitted 7 October, 2022; originally announced October 2022.

Comments: Accepted by BMVC2022

arXiv:2210.02257 [pdf, other]

Hiding Images in Deep Probabilistic Models

Authors: Haoyu Chen, Linqi Song, Zhenxing Qian, Xinpeng Zhang, Kede Ma

Abstract: Data hiding with deep neural networks (DNNs) has experienced impressive successes in recent years. A prevailing scheme is to train an autoencoder, consisting of an encoding network to embed (or transform) secret messages in (or into) a carrier, and a decoding network to extract the hidden messages. This scheme may suffer from several limitations regarding practicability, security, and embedding ca… ▽ More Data hiding with deep neural networks (DNNs) has experienced impressive successes in recent years. A prevailing scheme is to train an autoencoder, consisting of an encoding network to embed (or transform) secret messages in (or into) a carrier, and a decoding network to extract the hidden messages. This scheme may suffer from several limitations regarding practicability, security, and embedding capacity. In this work, we describe a different computational framework to hide images in deep probabilistic models. Specifically, we use a DNN to model the probability density of cover images, and hide a secret image in one particular location of the learned distribution. As an instantiation, we adopt a SinGAN, a pyramid of generative adversarial networks (GANs), to learn the patch distribution of one cover image. We hide the secret image by fitting a deterministic map** from a fixed set of noise maps (generated by an embedding key) to the secret image during patch distribution learning. The stego SinGAN, behaving as the original SinGAN, is publicly communicated; only the receiver with the embedding key is able to extract the secret image. We demonstrate the feasibility of our SinGAN approach in terms of extraction accuracy and model security. Moreover, we show the flexibility of the proposed method in terms of hiding multiple images for different receivers and obfuscating the secret image. △ Less

Submitted 5 October, 2022; originally announced October 2022.

arXiv:2210.02245 [pdf, other]

Channel Modeling for UAV-to-Ground Communications with Posture Variation and Fuselage Scattering Effect

Authors: Boyu Hua, Haoran Ni, Qiuming Zhu, Cheng-Xiang Wang, Tongtong Zhou, Kai Mao, Junwei Bao, Xiaofei Zhang

Abstract: Unmanned aerial vehicle (UAV)-to-ground (U2G) channel models play a pivotal role for reliable communications between UAV and ground terminal. This paper proposes a three-dimensional (3D) non-stationary hybrid model including both large-scale and small-scale fading for U2G multiple-input-multiple-output (MIMO) channels. Distinctive channel characteristics under U2G scenarios, i.e., 3D trajectory an… ▽ More Unmanned aerial vehicle (UAV)-to-ground (U2G) channel models play a pivotal role for reliable communications between UAV and ground terminal. This paper proposes a three-dimensional (3D) non-stationary hybrid model including both large-scale and small-scale fading for U2G multiple-input-multiple-output (MIMO) channels. Distinctive channel characteristics under U2G scenarios, i.e., 3D trajectory and posture of UAV, fuselage scattering effect (FSE), and posture variation fading (PVF), are incorporated into the proposed model. The channel parameters, i.e., path loss (PL), shadow fading (SF), path delay, and path angle, are generated incorporating machine learning (ML) and ray tracing (RT) techniques to capture the structure-related characteristics. In order to guarantee the physical continuity of channel parameters such as Doppler phase and path power, the time evolution methods of inter- and intra- stationary intervals are proposed. Key statistical properties , i.e., temporal autocorrection function (ACF), power delay profile (PDP), level crossing rate (LCR), average fading duration (AFD), and stationary interval (SI) are given, and the impact of the change of fuselage and posture variation is analyzed. It is demonstrated that both posture variation and fuselage scattering have crucial effects on channel characteristics. The validity and practicability of the proposed model are verified by comparing the simulation results with the measured ones. △ Less

Submitted 13 October, 2022; v1 submitted 5 October, 2022; originally announced October 2022.

arXiv:2210.00933 [pdf, other]

Perceptual Attacks of No-Reference Image Quality Models with Human-in-the-Loop

Authors: Weixia Zhang, Dingquan Li, Xiongkuo Min, Guangtao Zhai, Guodong Guo, Xiaokang Yang, Kede Ma

Abstract: No-reference image quality assessment (NR-IQA) aims to quantify how humans perceive visual distortions of digital images without access to their undistorted references. NR-IQA models are extensively studied in computational vision, and are widely used for performance evaluation and perceptual optimization of man-made vision systems. Here we make one of the first attempts to examine the perceptual… ▽ More No-reference image quality assessment (NR-IQA) aims to quantify how humans perceive visual distortions of digital images without access to their undistorted references. NR-IQA models are extensively studied in computational vision, and are widely used for performance evaluation and perceptual optimization of man-made vision systems. Here we make one of the first attempts to examine the perceptual robustness of NR-IQA models. Under a Lagrangian formulation, we identify insightful connections of the proposed perceptual attack to previous beautiful ideas in computer vision and machine learning. We test one knowledge-driven and three data-driven NR-IQA methods under four full-reference IQA models (as approximations to human perception of just-noticeable differences). Through carefully designed psychophysical experiments, we find that all four NR-IQA models are vulnerable to the proposed perceptual attack. More interestingly, we observe that the generated counterexamples are not transferable, manifesting themselves as distinct design flows of respective NR-IQA methods. △ Less

Submitted 3 October, 2022; originally announced October 2022.

Comments: NeurIPS 2022

arXiv:2209.11119 [pdf, ps, other]

Anyon condensation, topological quantum information scrambling, and Andreev-like reflection of non-Abelian anyons in quantum Hall interfaces

Authors: Ken K. W. Ma

Abstract: Quantum information scrambling is the spread of local information into correlation throughout the entire quantum many-body system. This concept has become a central topic in different contexts. In this work, we restate the connection between anyon condensation and topological quantum information scrambling in quantum Hall interfaces. We consider the interface between the Abelian Halperin-330 state… ▽ More Quantum information scrambling is the spread of local information into correlation throughout the entire quantum many-body system. This concept has become a central topic in different contexts. In this work, we restate the connection between anyon condensation and topological quantum information scrambling in quantum Hall interfaces. We consider the interface between the Abelian Halperin-330 state and the non-Abelian Read-Rezayi state. We verify explicitly that the interface can be fully gapped. This allows the transmutation of local pseudospin information carried by an Abelian anyon into topological information stored entirely by the anyons in the non-Abelian quantum Hall liquid, with no scrambled information stored at the interface. In combination with our previous work [K. K. W. Ma and K. Yang, Phys. Rev. B 105, 045306 (2022)], our results demonstrate the dependence of the scrambling mechanism on the gapfulness of the interface. Possible Andreev-like reflection of non-Abelian anyons in the fully gapped interface is also discussed. △ Less

Submitted 10 October, 2022; v1 submitted 22 September, 2022; originally announced September 2022.

Comments: References on the possible Andreev-like reflection of electrons in interacting one-dimensional wires are added

arXiv:2209.10819 [pdf, other]

doi 10.3847/1538-4357/ac94ce

Structure in the Magnetic Field of the Milky Way Disk and Halo traced by Faraday Rotation

Authors: John M. Dickey, Jennifer West, Alec J. M. Thomson, T. L. Landecker, A. Bracco, E. Carretti, J. L. Han, A. S. Hill, Y. K. Ma, S. A. Mao, A. Ordog, Jo-Anne C. Brown, K. A. Douglas, A. Erceg, V. Jelic, R. Kothes, M. Wolleben

Abstract: Magnetic fields in the ionized medium of the disk and halo of the Milky Way impose Faraday rotation on linearly polarized radio emission. We compare two surveys map** the Galactic Faraday rotation, one showing the rotation measures of extragalactic sources seen through the Galaxy (from Hutschenreuter et al 2022), and one showing the Faraday depth of the diffuse Galactic synchrotron emission from… ▽ More Magnetic fields in the ionized medium of the disk and halo of the Milky Way impose Faraday rotation on linearly polarized radio emission. We compare two surveys map** the Galactic Faraday rotation, one showing the rotation measures of extragalactic sources seen through the Galaxy (from Hutschenreuter et al 2022), and one showing the Faraday depth of the diffuse Galactic synchrotron emission from the Global Magneto-Ionic Medium Survey. Comparing the two data sets in 5deg x 10deg bins shows good agreement at intermediate latitudes, 10 < |b| < 50 deg, and little correlation between them at lower and higher latitudes. Where they agree, both tracers show clear patterns as a function of Galactic longitude: in the Northern Hemisphere a strong sin(2 x longitude) pattern, and in the Southern hemisphere a sin(longitude + pi) pattern. Pulsars with height above or below the plane |z| > 300 pc show similar longitude dependence in their rotation measures. Nearby non-thermal structures show rotation measure shadows as does the Orion-Eridanus superbubble. We describe families of dynamo models that could explain the observed patterns in the two hemispheres. We suggest that a field reversal, known to cross the plane a few hundred pc inside the solar circle, could shift to positive z with increasing Galactic radius to explain the sin(2xlongitude) pattern in the Northern Hemisphere. Correlation shows that rotation measures from extragalactic sources are one to two times the corresponding rotation measure of the diffuse emission, implying Faraday complexity along some lines of sight, especially in the Southern hemisphere. △ Less

Submitted 22 September, 2022; originally announced September 2022.

Comments: 37 pages, 26 figures, Ap. J. accepted

arXiv:2209.09965 [pdf, other]

FoVolNet: Fast Volume Rendering using Foveated Deep Neural Networks

Authors: David Bauer, Qi Wu, Kwan-Liu Ma

Abstract: Volume data is found in many important scientific and engineering applications. Rendering this data for visualization at high quality and interactive rates for demanding applications such as virtual reality is still not easily achievable even using professional-grade hardware. We introduce FoVolNet -- a method to significantly increase the performance of volume data visualization. We develop a cos… ▽ More Volume data is found in many important scientific and engineering applications. Rendering this data for visualization at high quality and interactive rates for demanding applications such as virtual reality is still not easily achievable even using professional-grade hardware. We introduce FoVolNet -- a method to significantly increase the performance of volume data visualization. We develop a cost-effective foveated rendering pipeline that sparsely samples a volume around a focal point and reconstructs the full-frame using a deep neural network. Foveated rendering is a technique that prioritizes rendering computations around the user's focal point. This approach leverages properties of the human visual system, thereby saving computational resources when rendering data in the periphery of the user's field of vision. Our reconstruction network combines direct and kernel prediction methods to produce fast, stable, and perceptually convincing output. With a slim design and the use of quantization, our method outperforms state-of-the-art neural reconstruction techniques in both end-to-end frame times and visual quality. We conduct extensive evaluations of the system's rendering performance, inference speed, and perceptual properties, and we provide comparisons to competing neural image reconstruction techniques. Our test results show that FoVolNet consistently achieves significant time saving over conventional rendering while preserving perceptual quality. △ Less

Submitted 20 September, 2022; originally announced September 2022.

Comments: To appear at IEEE VIS 2022 and later TVCG

arXiv:2209.09841 [pdf, other]

doi 10.1145/3581783.3612281

Exploring Inconsistent Knowledge Distillation for Object Detection with Data Augmentation

Authors: Jiawei Liang, Siyuan Liang, Aishan Liu, Ke Ma, **gzhi Li, Xiaochun Cao

Abstract: Knowledge Distillation (KD) for object detection aims to train a compact detector by transferring knowledge from a teacher model. Since the teacher model perceives data in a way different from humans, existing KD methods only distill knowledge that is consistent with labels annotated by human expert while neglecting knowledge that is not consistent with human perception, which results in insuffici… ▽ More Knowledge Distillation (KD) for object detection aims to train a compact detector by transferring knowledge from a teacher model. Since the teacher model perceives data in a way different from humans, existing KD methods only distill knowledge that is consistent with labels annotated by human expert while neglecting knowledge that is not consistent with human perception, which results in insufficient distillation and sub-optimal performance. In this paper, we propose inconsistent knowledge distillation (IKD), which aims to distill knowledge inherent in the teacher model's counter-intuitive perceptions. We start by considering the teacher model's counter-intuitive perceptions of frequency and non-robust features. Unlike previous works that exploit fine-grained features or introduce additional regularizations, we extract inconsistent knowledge by providing diverse input using data augmentation. Specifically, we propose a sample-specific data augmentation to transfer the teacher model's ability in capturing distinct frequency components and suggest an adversarial feature augmentation to extract the teacher model's perceptions of non-robust features in the data. Extensive experiments demonstrate the effectiveness of our method which outperforms state-of-the-art KD baselines on one-stage, two-stage and anchor-free object detectors (at most +1.0 mAP). Our codes will be made available at \url{https://github.com/JWLiang007/IKD.git}. △ Less

Submitted 21 February, 2024; v1 submitted 20 September, 2022; originally announced September 2022.

Comments: ACMMM 2023 Oral

arXiv:2209.08800 [pdf, ps, other]

A Realistic 3D Non-Stationary Channel Model for UAV-to-Vehicle Communications Incorporating Fuselage Posture

Authors: Boyu Hua, Tongtong Zhou, Qiuming Zhu, Kai Mao, Junwei Bao, Weizhi Zhong, Naeem Ahmed

Abstract: Considering the unmanned aerial vehicle (UAV) three-dimensional (3D) posture, a novel 3D non-stationary geometry-based stochastic model (GBSM) is proposed for multiple-input multiple-output (MIMO) UAV-to-vehicle (U2V) channels. It consists of a line-of-sight (LoS) and non-line-of-sight (NLoS) components. The factor of fuselage posture is considered by introducing a time-variant 3D posture matrix.… ▽ More Considering the unmanned aerial vehicle (UAV) three-dimensional (3D) posture, a novel 3D non-stationary geometry-based stochastic model (GBSM) is proposed for multiple-input multiple-output (MIMO) UAV-to-vehicle (U2V) channels. It consists of a line-of-sight (LoS) and non-line-of-sight (NLoS) components. The factor of fuselage posture is considered by introducing a time-variant 3D posture matrix. Some important statistical properties, i.e. the temporal autocorrelation function (ACF) and spatial cross correlation function (CCF), are derived and investigated. Simulation results show that the fuselage posture has significant impact on the U2V channel characteristic and aggravate the non-stationarity. The agreements between analytical, simulated, and measured results verify the correctness of proposed model and derivations. Moreover, it is demonstrated that the proposed model is also compatible to the existing GBSM without considering fuselage posture. △ Less

Submitted 19 September, 2022; originally announced September 2022.

Comments: 12 pages, 8 figures, CNCOM

arXiv:2209.05742 [pdf, other]

doi 10.1109/TPAMI.2022.3190939

A Tale of HodgeRank and Spectral Method: Target Attack Against Rank Aggregation Is the Fixed Point of Adversarial Game

Authors: Ke Ma, Qianqian Xu, **shan Zeng, Guorong Li, Xiaochun Cao, Qingming Huang

Abstract: Rank aggregation with pairwise comparisons has shown promising results in elections, sports competitions, recommendations, and information retrieval. However, little attention has been paid to the security issue of such algorithms, in contrast to numerous research work on the computational and statistical characteristics. Driven by huge profits, the potential adversary has strong motivation and in… ▽ More Rank aggregation with pairwise comparisons has shown promising results in elections, sports competitions, recommendations, and information retrieval. However, little attention has been paid to the security issue of such algorithms, in contrast to numerous research work on the computational and statistical characteristics. Driven by huge profits, the potential adversary has strong motivation and incentives to manipulate the ranking list. Meanwhile, the intrinsic vulnerability of the rank aggregation methods is not well studied in the literature. To fully understand the possible risks, we focus on the purposeful adversary who desires to designate the aggregated results by modifying the pairwise data in this paper. From the perspective of the dynamical system, the attack behavior with a target ranking list is a fixed point belonging to the composition of the adversary and the victim. To perform the targeted attack, we formulate the interaction between the adversary and the victim as a game-theoretic framework consisting of two continuous operators while Nash equilibrium is established. Then two procedures against HodgeRank and RankCentrality are constructed to produce the modification of the original data. Furthermore, we prove that the victims will produce the target ranking list once the adversary masters the complete information. It is noteworthy that the proposed methods allow the adversary only to hold incomplete information or imperfect feedback and perform the purposeful attack. The effectiveness of the suggested target attack strategies is demonstrated by a series of toy simulations and several real-world data experiments. These experimental results show that the proposed methods could achieve the attacker's goal in the sense that the leading candidate of the perturbed ranking list is the designated one by the adversary. △ Less

Submitted 13 September, 2022; originally announced September 2022.

Comments: 33 pages, https://github.com/alphaprime/Target_Attack_Rank_Aggregation

Journal ref: Early Access by TPAMI 2022 (https://ieeexplore.ieee.org/document/9830042)

arXiv:2208.12848 [pdf, other]

Coalescing Global and Local Information for Procedural Text Understanding

Authors: Kaixin Ma, Filip Ilievski, Jonathan Francis, Eric Nyberg, Alessandro Oltramari

Abstract: Procedural text understanding is a challenging language reasoning task that requires models to track entity states across the development of a narrative. A complete procedural understanding solution should combine three core aspects: local and global views of the inputs, and global view of outputs. Prior methods considered a subset of these aspects, resulting in either low precision or low recall.… ▽ More Procedural text understanding is a challenging language reasoning task that requires models to track entity states across the development of a narrative. A complete procedural understanding solution should combine three core aspects: local and global views of the inputs, and global view of outputs. Prior methods considered a subset of these aspects, resulting in either low precision or low recall. In this paper, we propose Coalescing Global and Local Information (CGLI), a new model that builds entity- and timestep-aware input representations (local input) considering the whole context (global input), and we jointly model the entity states with a structured prediction objective (global output). Thus, CGLI simultaneously optimizes for both precision and recall. We extend CGLI with additional output layers and integrate it into a story reasoning framework. Extensive experiments on a popular procedural text understanding dataset show that our model achieves state-of-the-art results; experiments on a story reasoning benchmark show the positive impact of our model on downstream reasoning. △ Less

Submitted 26 August, 2022; originally announced August 2022.

Comments: COLING 2022

arXiv:2208.12462 [pdf, other]

Seg4Reg+: Consistency Learning between Spine Segmentation and Cobb Angle Regression

Authors: Yi Lin, Luyan Liu, Kai Ma, Yefeng Zheng

Abstract: Automated methods for Cobb angle estimation are of high demand for scoliosis assessment. Existing methods typically calculate the Cobb angle from landmark estimation, or simply combine the low-level task (e.g., landmark detection and spine segmentation) with the Cobb angle regression task, without fully exploring the benefits from each other. In this study, we propose a novel multi-task framework,… ▽ More Automated methods for Cobb angle estimation are of high demand for scoliosis assessment. Existing methods typically calculate the Cobb angle from landmark estimation, or simply combine the low-level task (e.g., landmark detection and spine segmentation) with the Cobb angle regression task, without fully exploring the benefits from each other. In this study, we propose a novel multi-task framework, named Seg4Reg+, which jointly optimizes the segmentation and regression networks. We thoroughly investigate both local and global consistency and knowledge transfer between each other. Specifically, we propose an attention regularization module leveraging class activation maps (CAMs) from image-segmentation pairs to discover additional supervision in the regression network, and the CAMs can serve as a region-of-interest enhancement gate to facilitate the segmentation task in turn. Meanwhile, we design a novel triangle consistency learning to train the two networks jointly for global optimization. The evaluations performed on the public AASCE Challenge dataset demonstrate the effectiveness of each module and superior performance of our model to the state-of-the-art methods. △ Less

Submitted 26 August, 2022; originally announced August 2022.

Comments: Accepted by MICCAI 2021

arXiv:2208.07908 [pdf, other]

doi 10.1016/B978-0-323-90800-9.00135-9

Fractional quantum Hall effect at the filling factor $ν=5/2$

Authors: Ken K. W. Ma, Michael R. Peterson, V. W. Scarola, Kun Yang

Abstract: The fractional quantum Hall (FQH) effect at the filling factor $ν=5/2$ was discovered in GaAs heterostructures more than 35 years ago. Various topological orders have been proposed as possible candidates to describe this FQH state. Some of them possess non-Abelian anyon excitations, an entirely new type of quasiparticle with fascinating properties. If observed, non-Abelian anyons could offer funda… ▽ More The fractional quantum Hall (FQH) effect at the filling factor $ν=5/2$ was discovered in GaAs heterostructures more than 35 years ago. Various topological orders have been proposed as possible candidates to describe this FQH state. Some of them possess non-Abelian anyon excitations, an entirely new type of quasiparticle with fascinating properties. If observed, non-Abelian anyons could offer fundamental building blocks of a topological quantum computer. Nevertheless, the nature of the FQH state at $ν=5/2$ is still under debate. In this chapter, we provide an overview of the theoretical background, numerical results, and experimental measurements pertaining to this special FQH state. Furthermore, we review some recent developments and their possible interpretations. Possible future directions toward resolving the nature of the $5/2$ state are also discussed. △ Less

Submitted 29 September, 2022; v1 submitted 16 August, 2022; originally announced August 2022.

Comments: Updated version; A chapter for Encyclopedia of Condensed Matter Physics, 2nd edition (Elsevier)

Journal ref: Encyclopedia of Condensed Matter Physics, 2nd edition (2023)

arXiv:2208.06970 [pdf, other]

Level Set Restricted Voronoi Tessellation for Large scale Spatial Statistical Analysis

Authors: Tyson Neuroth, Martin Rieth, Konduri Aditya, Myoungkyu Lee, Jacqueline H Chen, Kwan-Liu Ma

Abstract: Spatial statistical analysis of multivariate volumetric data can be challenging due to scale, complexity, and occlusion. Advances in topological segmentation, feature extraction, and statistical summarization have helped overcome the challenges. This work introduces a new spatial statistical decomposition method based on level sets, connected components, and a novel variation of the restricted cen… ▽ More Spatial statistical analysis of multivariate volumetric data can be challenging due to scale, complexity, and occlusion. Advances in topological segmentation, feature extraction, and statistical summarization have helped overcome the challenges. This work introduces a new spatial statistical decomposition method based on level sets, connected components, and a novel variation of the restricted centroidal Voronoi tessellation that is better suited for spatial statistical decomposition and parallel efficiency. The resulting data structures organize features into a coherent nested hierarchy to support flexible and efficient out-of-core region-of-interest extraction. Next, we provide an efficient parallel implementation. Finally, an interactive visualization system based on this approach is designed and then applied to turbulent combustion data. The combined approach enables an interactive spatial statistical analysis workflow for large-scale data with a top-down approach through multiple-levels-of-detail that links phase space statistics with spatial features. △ Less

Submitted 14 August, 2022; originally announced August 2022.

arXiv:2207.14769 [pdf, other]

Image Quality Assessment: Integrating Model-Centric and Data-Centric Approaches

Authors: Peibei Cao, Dingquan Li, Kede Ma

Abstract: Learning-based image quality assessment (IQA) has made remarkable progress in the past decade, but nearly all consider the two key components -- model and data -- in isolation. Specifically, model-centric IQA focuses on develo** ``better'' objective quality methods on fixed and extensively reused datasets, with a great danger of overfitting. Data-centric IQA involves conducting psychophysical ex… ▽ More Learning-based image quality assessment (IQA) has made remarkable progress in the past decade, but nearly all consider the two key components -- model and data -- in isolation. Specifically, model-centric IQA focuses on develo** ``better'' objective quality methods on fixed and extensively reused datasets, with a great danger of overfitting. Data-centric IQA involves conducting psychophysical experiments to construct ``better'' human-annotated datasets, which unfortunately ignores current IQA models during dataset creation. In this paper, we first design a series of experiments to probe computationally that such isolation of model and data impedes further progress of IQA. We then describe a computational framework that integrates model-centric and data-centric IQA. As a specific example, we design computational modules to quantify the sampling-worthiness of candidate images. Experimental results show that the proposed sampling-worthiness module successfully spots diverse failures of the examined blind IQA models, which are indeed worthy samples to be included in next-generation datasets. △ Less

Submitted 8 December, 2023; v1 submitted 29 July, 2022; originally announced July 2022.

arXiv:2207.13688 [pdf, ps, other]

doi 10.1103/PhysRevB.106.214313

Eigenstate thermalization and disappearance of quantum many-body scar states in interacting fermion systems

Authors: Ken K. W. Ma, A. Volya, Kun Yang

Abstract: The recent discovery of quantum many-body scar states has revealed the possibility of having states with low entanglement that violate the eigenstate thermalization hypothesis in nonintegrable systems. Such states with low entanglement entropy are rare but naturally exist in the integrable system of free fermions. Here, we demonstrate analytically that these atypical states would be always elimina… ▽ More The recent discovery of quantum many-body scar states has revealed the possibility of having states with low entanglement that violate the eigenstate thermalization hypothesis in nonintegrable systems. Such states with low entanglement entropy are rare but naturally exist in the integrable system of free fermions. Here, we demonstrate analytically that these atypical states would be always eliminated when an arbitrary weak interaction is introduced between the fermions. In particular, we show that the probability of having a many-body scar state with entanglement entropy satisfying a sub-volume scaling law decreases double exponentially as the system size. Thus, our results provide a quantitative argument for the disappearance of scar states in interacting fermion systems. △ Less

Submitted 2 January, 2023; v1 submitted 27 July, 2022; originally announced July 2022.

Comments: Accepted version by PRB

Journal ref: Phys. Rev. B 106, 214313 (2022)

arXiv:2207.11620 [pdf, other]

Interactive Volume Visualization via Multi-Resolution Hash Encoding based Neural Representation

Authors: Qi Wu, David Bauer, Michael J. Doyle, Kwan-Liu Ma

Abstract: Neural networks have shown great potential in compressing volume data for visualization. However, due to the high cost of training and inference, such volumetric neural representations have thus far only been applied to offline data processing and non-interactive rendering. In this paper, we demonstrate that by simultaneously leveraging modern GPU tensor cores, a native CUDA neural network framewo… ▽ More Neural networks have shown great potential in compressing volume data for visualization. However, due to the high cost of training and inference, such volumetric neural representations have thus far only been applied to offline data processing and non-interactive rendering. In this paper, we demonstrate that by simultaneously leveraging modern GPU tensor cores, a native CUDA neural network framework, and a well-designed rendering algorithm with macro-cell acceleration, we can interactively ray trace volumetric neural representations (10-60fps). Our neural representations are also high-fidelity (PSNR > 30dB) and compact (10-1000x smaller). Additionally, we show that it is possible to fit the entire training step inside a rendering loop and skip the pre-training process completely. To support extreme-scale volume data, we also develop an efficient out-of-core training strategy, which allows our volumetric neural representation training to potentially scale up to terascale using only an NVIDIA RTX 3090 workstation. △ Less

Submitted 29 June, 2023; v1 submitted 23 July, 2022; originally announced July 2022.

Comments: There is a supplementary video for this manuscript, which can be accessed via this link: https://drive.google.com/file/d/17wSgIm_VsoeGhfyZwMpOnCYy2Mj3ydGv/view?usp=sharing

arXiv:2207.10232 [pdf, other]

Optimal, centralized dynamic curbside parking space zoning

Authors: Nawaf Nazir, Shushman Choudhury, Stephen Zoepf, Ke Ma, Chase Dowling

Abstract: In this paper we formulate a dynamic mixed integer program for optimally zoning curbside parking spaces subject to transportation policy-inspired constraints and regularization terms. First, we illustrate how given some objective of curb zoning valuation as a function of zone type (e.g., paid parking or bus stop), dynamically rezoning involves unrolling this optimization program over a fixed time… ▽ More In this paper we formulate a dynamic mixed integer program for optimally zoning curbside parking spaces subject to transportation policy-inspired constraints and regularization terms. First, we illustrate how given some objective of curb zoning valuation as a function of zone type (e.g., paid parking or bus stop), dynamically rezoning involves unrolling this optimization program over a fixed time horizon. Second, we implement two different solution methods that optimize for a given curb zoning value function. In the first method, we solve long horizon dynamic zoning problems via approximate dynamic programming. In the second method, we employ Dantzig-Wolfe decomposition to break-up the mixed-integer program into a master problem and several sub-problems that we solve in parallel; this decomposition accelerates the MIP solver considerably. We present simulation results and comparisons of the different employed techniques on vehicle arrival-rate data obtained for a neighborhood in downtown Seattle, Washington, USA △ Less

Submitted 20 July, 2022; originally announced July 2022.

arXiv:2207.09689 [pdf, other]

Uncertainty Inspired Underwater Image Enhancement

Authors: Zhenqi Fu, Wu Wang, Yue Huang, Xinghao Ding, Kai-Kuang Ma

Abstract: A main challenge faced in the deep learning-based Underwater Image Enhancement (UIE) is that the ground truth high-quality image is unavailable. Most of the existing methods first generate approximate reference maps and then train an enhancement network with certainty. This kind of method fails to handle the ambiguity of the reference map. In this paper, we resolve UIE into distribution estimation… ▽ More A main challenge faced in the deep learning-based Underwater Image Enhancement (UIE) is that the ground truth high-quality image is unavailable. Most of the existing methods first generate approximate reference maps and then train an enhancement network with certainty. This kind of method fails to handle the ambiguity of the reference map. In this paper, we resolve UIE into distribution estimation and consensus process. We present a novel probabilistic network to learn the enhancement distribution of degraded underwater images. Specifically, we combine conditional variational autoencoder with adaptive instance normalization to construct the enhancement distribution. After that, we adopt a consensus process to predict a deterministic result based on a set of samples from the distribution. By learning the enhancement distribution, our method can cope with the bias introduced in the reference map labeling to some extent. Additionally, the consensus process is useful to capture a robust and stable result. We examined the proposed method on two widely used real-world underwater image enhancement datasets. Experimental results demonstrate that our approach enables sampling possible enhancement predictions. Meanwhile, the consensus estimate yields competitive performance compared with state-of-the-art UIE methods. Code available at https://github.com/zhenqifu/PUIE-Net. △ Less

Submitted 20 July, 2022; originally announced July 2022.

arXiv:2207.09312 [pdf, other]

Towards Trustworthy Healthcare AI: Attention-Based Feature Learning for COVID-19 Screening With Chest Radiography

Authors: Kai Ma, Pengcheng Xi, Karim Habashy, Ashkan Ebadi, Stéphane Tremblay, Alexander Wong

Abstract: Building AI models with trustworthiness is important especially in regulated areas such as healthcare. In tackling COVID-19, previous work uses convolutional neural networks as the backbone architecture, which has shown to be prone to over-caution and overconfidence in making decisions, rendering them less trustworthy -- a crucial flaw in the context of medical imaging. In this study, we propose a… ▽ More Building AI models with trustworthiness is important especially in regulated areas such as healthcare. In tackling COVID-19, previous work uses convolutional neural networks as the backbone architecture, which has shown to be prone to over-caution and overconfidence in making decisions, rendering them less trustworthy -- a crucial flaw in the context of medical imaging. In this study, we propose a feature learning approach using Vision Transformers, which use an attention-based mechanism, and examine the representation learning capability of Transformers as a new backbone architecture for medical imaging. Through the task of classifying COVID-19 chest radiographs, we investigate into whether generalization capabilities benefit solely from Vision Transformers' architectural advances. Quantitative and qualitative evaluations are conducted on the trustworthiness of the models, through the use of "trust score" computation and a visual explainability technique. We conclude that the attention-based feature learning approach is promising in building trustworthy deep learning models for healthcare. △ Less

Submitted 19 July, 2022; originally announced July 2022.

Comments: Accepted to 39th International Conference on Machine Learning, Workshop on Healthcare AI and COVID-19

arXiv:2207.08859 [pdf, other]

Prior-Guided Adversarial Initialization for Fast Adversarial Training

Authors: Xiaojun Jia, Yong Zhang, Xingxing Wei, Baoyuan Wu, Ke Ma, Jue Wang, Xiaochun Cao

Abstract: Fast adversarial training (FAT) effectively improves the efficiency of standard adversarial training (SAT). However, initial FAT encounters catastrophic overfitting, i.e.,the robust accuracy against adversarial attacks suddenly and dramatically decreases. Though several FAT variants spare no effort to prevent overfitting, they sacrifice much calculation cost. In this paper, we explore the differen… ▽ More Fast adversarial training (FAT) effectively improves the efficiency of standard adversarial training (SAT). However, initial FAT encounters catastrophic overfitting, i.e.,the robust accuracy against adversarial attacks suddenly and dramatically decreases. Though several FAT variants spare no effort to prevent overfitting, they sacrifice much calculation cost. In this paper, we explore the difference between the training processes of SAT and FAT and observe that the attack success rate of adversarial examples (AEs) of FAT gets worse gradually in the late training stage, resulting in overfitting. The AEs are generated by the fast gradient sign method (FGSM) with a zero or random initialization. Based on the observation, we propose a prior-guided FGSM initialization method to avoid overfitting after investigating several initialization strategies, improving the quality of the AEs during the whole training process. The initialization is formed by leveraging historically generated AEs without additional calculation cost. We further provide a theoretical analysis for the proposed initialization method. We also propose a simple yet effective regularizer based on the prior-guided initialization,i.e., the currently generated perturbation should not deviate too much from the prior-guided initialization. The regularizer adopts both historical and current adversarial perturbations to guide the model learning. Evaluations on four datasets demonstrate that the proposed method can prevent catastrophic overfitting and outperform state-of-the-art FAT methods. The code is released at https://github.com/jiaxiaojunQAQ/FGSM-PGI. △ Less

Submitted 18 July, 2022; originally announced July 2022.

Comments: ECCV 2022

Journal ref: ECCV 2022

arXiv:2207.08549 [pdf, other]

Dense Cross-Query-and-Support Attention Weighted Mask Aggregation for Few-Shot Segmentation

Authors: Xinyu Shi, Dong Wei, Yu Zhang, Donghuan Lu, Munan Ning, Jiashun Chen, Kai Ma, Yefeng Zheng

Abstract: Research into Few-shot Semantic Segmentation (FSS) has attracted great attention, with the goal to segment target objects in a query image given only a few annotated support images of the target class. A key to this challenging task is to fully utilize the information in the support images by exploiting fine-grained correlations between the query and support images. However, most existing approach… ▽ More Research into Few-shot Semantic Segmentation (FSS) has attracted great attention, with the goal to segment target objects in a query image given only a few annotated support images of the target class. A key to this challenging task is to fully utilize the information in the support images by exploiting fine-grained correlations between the query and support images. However, most existing approaches either compressed the support information into a few class-wise prototypes, or used partial support information (e.g., only foreground) at the pixel level, causing non-negligible information loss. In this paper, we propose Dense pixel-wise Cross-query-and-support Attention weighted Mask Aggregation (DCAMA), where both foreground and background support information are fully exploited via multi-level pixel-wise correlations between paired query and support features. Implemented with the scaled dot-product attention in the Transformer architecture, DCAMA treats every query pixel as a token, computes its similarities with all support pixels, and predicts its segmentation label as an additive aggregation of all the support pixels' labels -- weighted by the similarities. Based on the unique formulation of DCAMA, we further propose efficient and effective one-pass inference for n-shot segmentation, where pixels of all support images are collected for the mask aggregation at once. Experiments show that our DCAMA significantly advances the state of the art on standard FSS benchmarks of PASCAL-5i, COCO-20i, and FSS-1000, e.g., with 3.1%, 9.7%, and 3.6% absolute improvements in 1-shot mIoU over previous best records. Ablative studies also verify the design DCAMA. △ Less

Submitted 18 July, 2022; originally announced July 2022.

Comments: ECCV 2022

arXiv:2207.08267 [pdf, other]

doi 10.1007/978-3-031-16434-7_19

Gigapixel Whole-Slide Images Classification using Locally Supervised Learning

Authors: **gwei Zhang, Xin Zhang, Ke Ma, Rajarsi Gupta, Joel Saltz, Maria Vakalopoulou, Dimitris Samaras

Abstract: Histopathology whole slide images (WSIs) play a very important role in clinical studies and serve as the gold standard for many cancer diagnoses. However, generating automatic tools for processing WSIs is challenging due to their enormous sizes. Currently, to deal with this issue, conventional methods rely on a multiple instance learning (MIL) strategy to process a WSI at patch level. Although eff… ▽ More Histopathology whole slide images (WSIs) play a very important role in clinical studies and serve as the gold standard for many cancer diagnoses. However, generating automatic tools for processing WSIs is challenging due to their enormous sizes. Currently, to deal with this issue, conventional methods rely on a multiple instance learning (MIL) strategy to process a WSI at patch level. Although effective, such methods are computationally expensive, because tiling a WSI into patches takes time and does not explore the spatial relations between these tiles. To tackle these limitations, we propose a locally supervised learning framework which processes the entire slide by exploring the entire local and global information that it contains. This framework divides a pre-trained network into several modules and optimizes each module locally using an auxiliary model. We also introduce a random feature reconstruction unit (RFR) to preserve distinguishing features during training and improve the performance of our method by 1% to 3%. Extensive experiments on three publicly available WSI datasets: TCGA-NSCLC, TCGA-RCC and LKS, highlight the superiority of our method on different classification tasks. Our method outperforms the state-of-the-art MIL methods by 2% to 5% in accuracy, while being 7 to 10 times faster. Additionally, when dividing it into eight modules, our method requires as little as 20% of the total gpu memory required by end-to-end training. Our code is available at https://github.com/cvlab-stonybrook/local_learning_wsi. △ Less

Submitted 26 September, 2022; v1 submitted 17 July, 2022; originally announced July 2022.

Comments: Accepted to MICCAI 2022 Oral

Journal ref: International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, Cham, 2022

arXiv:2207.06618 [pdf, other]

doi 10.1103/PhysRevLett.129.036601

Anisotropic, two-dimensional, disordered Wigner solid

Authors: Md. S. Hossain, M. K. Ma, K. A. Villegas-Rosales, Y. J. Chung, L. N. Pfeiffer, K. W. West, K. W. Baldwin, M. Shayegan

Abstract: The interplay between the Fermi sea anisotropy, electron-electron interaction, and localization phenomena can give rise to exotic many-body phases. An exciting example is an anisotropic two-dimensional (2D) Wigner solid (WS), where electrons form an ordered array with an anisotropic lattice structure. Such a state has eluded experiments up to now as its realization is extremely demanding: First, a… ▽ More The interplay between the Fermi sea anisotropy, electron-electron interaction, and localization phenomena can give rise to exotic many-body phases. An exciting example is an anisotropic two-dimensional (2D) Wigner solid (WS), where electrons form an ordered array with an anisotropic lattice structure. Such a state has eluded experiments up to now as its realization is extremely demanding: First, a WS entails very low densities where the Coulomb interaction dominates over the kinetic (Fermi) energy. Attaining such low densities while kee** the disorder low is very challenging. Second, the low-density requirement has to be fulfilled in a material that hosts an anisotropic Fermi sea. Here, we report transport measurements in a clean (low-disorder) 2D electron system with anisotropic effective mass and Fermi sea. The data reveal that at extremely low electron densities, when the r_s parameter, the ratio of the Coulomb to the Fermi energy, exceeds 38, the current-voltage characteristics become strongly nonlinear at small dc biases. Several key features of the nonlinear characteristics, including their anisotropic voltage thresholds, are consistent with the formation of a disordered, anisotropic WS pinned by the ubiquitous disorder potential. △ Less

Submitted 13 July, 2022; originally announced July 2022.

Journal ref: Phys. Rev. Lett. 129, 036601 (2022)

arXiv:2207.05306 [pdf, other]

Contrastive Deep Supervision

Authors: Linfeng Zhang, Xin Chen, Junbo Zhang, Runpei Dong, Kaisheng Ma

Abstract: The success of deep learning is usually accompanied by the growth in neural network depth. However, the traditional training method only supervises the neural network at its last layer and propagates the supervision layer-by-layer, which leads to hardship in optimizing the intermediate layers. Recently, deep supervision has been proposed to add auxiliary classifiers to the intermediate layers of d… ▽ More The success of deep learning is usually accompanied by the growth in neural network depth. However, the traditional training method only supervises the neural network at its last layer and propagates the supervision layer-by-layer, which leads to hardship in optimizing the intermediate layers. Recently, deep supervision has been proposed to add auxiliary classifiers to the intermediate layers of deep neural networks. By optimizing these auxiliary classifiers with the supervised task loss, the supervision can be applied to the shallow layers directly. However, deep supervision conflicts with the well-known observation that the shallow layers learn low-level features instead of task-biased high-level semantic features. To address this issue, this paper proposes a novel training framework named Contrastive Deep Supervision, which supervises the intermediate layers with augmentation-based contrastive learning. Experimental results on nine popular datasets with eleven models demonstrate its effects on general image classification, fine-grained image classification and object detection in supervised learning, semi-supervised learning and knowledge distillation. Codes have been released in Github. △ Less

Submitted 12 July, 2022; originally announced July 2022.

Comments: Accepted in ECCV2022

arXiv:2206.13891 [pdf, other]

Feature Learning for Nonlinear Dimensionality Reduction toward Maximal Extraction of Hidden Patterns

Authors: Takanori Fujiwara, Yun-Hsin Kuo, Anders Ynnerman, Kwan-Liu Ma

Abstract: Dimensionality reduction (DR) plays a vital role in the visual analysis of high-dimensional data. One main aim of DR is to reveal hidden patterns that lie on intrinsic low-dimensional manifolds. However, DR often overlooks important patterns when the manifolds are distorted or masked by certain influential data attributes. This paper presents a feature learning framework, FEALM, designed to genera… ▽ More Dimensionality reduction (DR) plays a vital role in the visual analysis of high-dimensional data. One main aim of DR is to reveal hidden patterns that lie on intrinsic low-dimensional manifolds. However, DR often overlooks important patterns when the manifolds are distorted or masked by certain influential data attributes. This paper presents a feature learning framework, FEALM, designed to generate a set of optimized data projections for nonlinear DR in order to capture important patterns in the hidden manifolds. These projections produce maximally different nearest-neighbor graphs so that resultant DR outcomes are significantly different. To achieve such a capability, we design an optimization algorithm as well as introduce a new graph dissimilarity measure, named neighbor-shape dissimilarity. Additionally, we develop interactive visualizations to assist comparison of obtained DR results and interpretation of each DR result. We demonstrate FEALM's effectiveness through experiments and case studies using synthetic and real-world datasets. △ Less

Submitted 24 February, 2023; v1 submitted 28 June, 2022; originally announced June 2022.

Comments: Accepted by PacificVis 2023. The previous preprint version was titled "Feature Learning for Dimensionality Reduction toward Maximal Extraction of Hidden Patterns" (arxiv:2206.13891v2)

arXiv:2206.13170 [pdf, other]

Measuring and Improving the Use of Graph Information in Graph Neural Networks

Authors: Yifan Hou, Jian Zhang, James Cheng, Kaili Ma, Richard T. B. Ma, Hongzhi Chen, Ming-Chang Yang

Abstract: Graph neural networks (GNNs) have been widely used for representation learning on graph data. However, there is limited understanding on how much performance GNNs actually gain from graph data. This paper introduces a context-surrounding GNN framework and proposes two smoothness metrics to measure the quantity and quality of information obtained from graph data. A new GNN model, called CS-GNN, is… ▽ More Graph neural networks (GNNs) have been widely used for representation learning on graph data. However, there is limited understanding on how much performance GNNs actually gain from graph data. This paper introduces a context-surrounding GNN framework and proposes two smoothness metrics to measure the quantity and quality of information obtained from graph data. A new GNN model, called CS-GNN, is then designed to improve the use of graph information based on the smoothness values of a graph. CS-GNN is shown to achieve better performance than existing methods in different types of real graphs. △ Less

Submitted 27 June, 2022; originally announced June 2022.

Comments: This paper has been published in ICLR 2020. Code and Dataset can be found here: https://github.com/yifan-h/CS-GNN

arXiv:2206.09146 [pdf, other]

A Perceptually Optimized and Self-Calibrated Tone Map** Operator

Authors: Peibei Cao, Chenyang Le, Yuming Fang, Kede Ma

Abstract: With the increasing popularity and accessibility of high dynamic range (HDR) photography, tone map** operators (TMOs) for dynamic range compression are practically demanding. In this paper, we develop a two-stage neural network-based TMO that is self-calibrated and perceptually optimized. In Stage one, motivated by the physiology of the early stages of the human visual system, we first decompose… ▽ More With the increasing popularity and accessibility of high dynamic range (HDR) photography, tone map** operators (TMOs) for dynamic range compression are practically demanding. In this paper, we develop a two-stage neural network-based TMO that is self-calibrated and perceptually optimized. In Stage one, motivated by the physiology of the early stages of the human visual system, we first decompose an HDR image into a normalized Laplacian pyramid. We then use two lightweight deep neural networks (DNNs), taking the normalized representation as input and estimating the Laplacian pyramid of the corresponding LDR image. We optimize the tone map** network by minimizing the normalized Laplacian pyramid distance (NLPD), a perceptual metric aligning with human judgments of tone-mapped image quality. In Stage two, the input HDR image is self-calibrated to compute the final LDR image. We feed the same HDR image but rescaled with different maximum luminances to the learned tone map** network, and generate a pseudo-multi-exposure image stack with different detail visibility and color saturation. We then train another lightweight DNN to fuse the LDR image stack into a desired LDR image by maximizing a variant of the structural similarity index for multi-exposure image fusion (MEF-SSIM), which has been proven perceptually relevant to fused image quality. The proposed self-calibration mechanism through MEF enables our TMO to accept uncalibrated HDR images, while being physiology-driven. Extensive experiments show that our method produces images with consistently better visual quality. Additionally, since our method builds upon three lightweight DNNs, it is among the fastest local TMOs. △ Less

Submitted 25 August, 2023; v1 submitted 18 June, 2022; originally announced June 2022.

Comments: 15 pages,17 figures

arXiv:2206.08751 [pdf, other]

Perceptual Quality Assessment of Virtual Reality Videos in the Wild

Authors: Wen Wen, Mu Li, Yiru Yao, Xiangjie Sui, Yabin Zhang, Long Lan, Yuming Fang, Kede Ma

Abstract: Investigating how people perceive virtual reality (VR) videos in the wild (i.e., those captured by everyday users) is a crucial and challenging task in VR-related applications due to complex authentic distortions localized in space and time. Existing panoramic video databases only consider synthetic distortions, assume fixed viewing conditions, and are limited in size. To overcome these shortcomin… ▽ More Investigating how people perceive virtual reality (VR) videos in the wild (i.e., those captured by everyday users) is a crucial and challenging task in VR-related applications due to complex authentic distortions localized in space and time. Existing panoramic video databases only consider synthetic distortions, assume fixed viewing conditions, and are limited in size. To overcome these shortcomings, we construct the VR Video Quality in the Wild (VRVQW) database, containing $502$ user-generated videos with diverse content and distortion characteristics. Based on VRVQW, we conduct a formal psychophysical experiment to record the scanpaths and perceived quality scores from $139$ participants under two different viewing conditions. We provide a thorough statistical analysis of the recorded data, observing significant impact of viewing conditions on both human scanpaths and perceived quality. Moreover, we develop an objective quality assessment model for VR videos based on pseudocylindrical representation and convolution. Results on the proposed VRVQW show that our method is superior to existing video quality assessment models. We have made the database and code available at https://github.com/limuhit/VR-Video-Quality-in-the-Wild. △ Less

Submitted 15 March, 2024; v1 submitted 12 June, 2022; originally announced June 2022.

Comments: Accepted by IEEE Transactions on Circuits and Systems for Video Technology

arXiv:2206.07766 [pdf, other]

Pareto Invariant Risk Minimization: Towards Mitigating the Optimization Dilemma in Out-of-Distribution Generalization

Authors: Yongqiang Chen, Kaiwen Zhou, Yatao Bian, Binghui Xie, Bingzhe Wu, Yonggang Zhang, Kaili Ma, Han Yang, Peilin Zhao, Bo Han, James Cheng

Abstract: Recently, there has been a growing surge of interest in enabling machine learning systems to generalize well to Out-of-Distribution (OOD) data. Most efforts are devoted to advancing optimization objectives that regularize models to capture the underlying invariance; however, there often are compromises in the optimization process of these OOD objectives: i) Many OOD objectives have to be relaxed a… ▽ More Recently, there has been a growing surge of interest in enabling machine learning systems to generalize well to Out-of-Distribution (OOD) data. Most efforts are devoted to advancing optimization objectives that regularize models to capture the underlying invariance; however, there often are compromises in the optimization process of these OOD objectives: i) Many OOD objectives have to be relaxed as penalty terms of Empirical Risk Minimization (ERM) for the ease of optimization, while the relaxed forms can weaken the robustness of the original objective; ii) The penalty terms also require careful tuning of the penalty weights due to the intrinsic conflicts between ERM and OOD objectives. Consequently, these compromises could easily lead to suboptimal performance of either the ERM or OOD objective. To address these issues, we introduce a multi-objective optimization (MOO) perspective to understand the OOD optimization process, and propose a new optimization scheme called PAreto Invariant Risk Minimization (PAIR). PAIR improves the robustness of OOD objectives by cooperatively optimizing with other OOD objectives, thereby bridging the gaps caused by the relaxations. Then PAIR approaches a Pareto optimal solution that trades off the ERM and OOD objectives properly. Extensive experiments on challenging benchmarks, WILDS, show that PAIR alleviates the compromises and yields top OOD performances. △ Less

Submitted 2 March, 2023; v1 submitted 15 June, 2022; originally announced June 2022.

Comments: ICLR 2023, 50 pages, 58 figures

arXiv:2206.00379 [pdf]

doi 10.1145/3489517.3530576

YOLoC: DeploY Large-Scale Neural Network by ROM-based Computing-in-Memory using ResiduaL Branch on a Chip

Authors: Yiming Chen, Guodong Yin, Zhanhong Tan, Mingyen Lee, Zekun Yang, Yongpan Liu, Huazhong Yang, Kaisheng Ma, Xueqing Li

Abstract: Computing-in-memory (CiM) is a promising technique to achieve high energy efficiency in data-intensive matrix-vector multiplication (MVM) by relieving the memory bottleneck. Unfortunately, due to the limited SRAM capacity, existing SRAM-based CiM needs to reload the weights from DRAM in large-scale networks. This undesired fact weakens the energy efficiency significantly. This work, for the first… ▽ More Computing-in-memory (CiM) is a promising technique to achieve high energy efficiency in data-intensive matrix-vector multiplication (MVM) by relieving the memory bottleneck. Unfortunately, due to the limited SRAM capacity, existing SRAM-based CiM needs to reload the weights from DRAM in large-scale networks. This undesired fact weakens the energy efficiency significantly. This work, for the first time, proposes the concept, design, and optimization of computing-in-ROM to achieve much higher on-chip memory capacity, and thus less DRAM access and lower energy consumption. Furthermore, to support different computing scenarios with varying weights, a weight fine-tune technique, namely Residual Branch (ReBranch), is also proposed. ReBranch combines ROM-CiM and assisting SRAM-CiM to ahieve high versatility. YOLoC, a ReBranch-assisted ROM-CiM framework for object detection is presented and evaluated. With the same area in 28nm CMOS, YOLoC for several datasets has shown significant energy efficiency improvement by 14.8x for YOLO (Darknet-19) and 4.8x for ResNet-18, with <8% latency overhead and almost no mean average precision (mAP) loss (-0.5% ~ +0.2%), compared with the fully SRAM-based CiM. △ Less

Submitted 1 June, 2022; originally announced June 2022.

Comments: 6 pages, 14 figures. to be published in DAC 2022

Journal ref: Design Automation Conference 2022

arXiv:2206.00227 [pdf, other]

Rethinking the Augmentation Module in Contrastive Learning: Learning Hierarchical Augmentation Invariance with Expanded Views

Authors: Junbo Zhang, Kaisheng Ma

Abstract: A data augmentation module is utilized in contrastive learning to transform the given data example into two views, which is considered essential and irreplaceable. However, the predetermined composition of multiple data augmentations brings two drawbacks. First, the artificial choice of augmentation types brings specific representational invariances to the model, which have different degrees of po… ▽ More A data augmentation module is utilized in contrastive learning to transform the given data example into two views, which is considered essential and irreplaceable. However, the predetermined composition of multiple data augmentations brings two drawbacks. First, the artificial choice of augmentation types brings specific representational invariances to the model, which have different degrees of positive and negative effects on different downstream tasks. Treating each type of augmentation equally during training makes the model learn non-optimal representations for various downstream tasks and limits the flexibility to choose augmentation types beforehand. Second, the strong data augmentations used in classic contrastive learning methods may bring too much invariance in some cases, and fine-grained information that is essential to some downstream tasks may be lost. This paper proposes a general method to alleviate these two problems by considering where and what to contrast in a general contrastive learning framework. We first propose to learn different augmentation invariances at different depths of the model according to the importance of each data augmentation instead of learning representational invariances evenly in the backbone. We then propose to expand the contrast content with augmentation embeddings to reduce the misleading effects of strong data augmentations. Experiments based on several baseline methods demonstrate that we learn better representations for various benchmarks on classification, detection, and segmentation downstream tasks. △ Less

Submitted 21 August, 2022; v1 submitted 1 June, 2022; originally announced June 2022.

Comments: Accepted to CVPR 2022

Journal ref: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

arXiv:2205.13489 [pdf, other]

Measuring Perceptual Color Differences of Smartphone Photographs

Authors: Zhihua Wang, Keshuo Xu, Yang Yang, Jianlei Dong, Shuhang Gu, Lihao Xu, Yuming Fang, Kede Ma

Abstract: Measuring perceptual color differences (CDs) is of great importance in modern smartphone photography. Despite the long history, most CD measures have been constrained by psychophysical data of homogeneous color patches or a limited number of simplistic natural photographic images. It is thus questionable whether existing CD measures generalize in the age of smartphone photography characterized by… ▽ More Measuring perceptual color differences (CDs) is of great importance in modern smartphone photography. Despite the long history, most CD measures have been constrained by psychophysical data of homogeneous color patches or a limited number of simplistic natural photographic images. It is thus questionable whether existing CD measures generalize in the age of smartphone photography characterized by greater content complexities and learning-based image signal processors. In this paper, we put together so far the largest image dataset for perceptual CD assessment, in which the photographic images are 1) captured by six flagship smartphones, 2) altered by Photoshop, 3) post-processed by built-in filters of the smartphones, and 4) reproduced with incorrect color profiles. We then conduct a large-scale psychophysical experiment to gather perceptual CDs of 30,000 image pairs in a carefully controlled laboratory environment. Based on the newly established dataset, we make one of the first attempts to construct an end-to-end learnable CD formula based on a lightweight neural network, as a generalization of several previous metrics. Extensive experiments demonstrate that the optimized formula outperforms 33 existing CD measures by a large margin, offers reasonable local CD maps without the use of dense supervision, generalizes well to homogeneous color patch data, and empirically behaves as a proper metric in the mathematical sense. Our dataset and code are publicly available at https://github.com/hellooks/CDNet. △ Less

Submitted 31 March, 2023; v1 submitted 26 May, 2022; originally announced May 2022.

Comments: 10 figures, 8 tables, 14 pages

arXiv:2205.12451 [pdf, other]

Region-aware Knowledge Distillation for Efficient Image-to-Image Translation

Authors: Linfeng Zhang, Xin Chen, Runpei Dong, Kaisheng Ma

Abstract: Recent progress in image-to-image translation has witnessed the success of generative adversarial networks (GANs). However, GANs usually contain a huge number of parameters, which lead to intolerant memory and computation consumption and limit their deployment on edge devices. To address this issue, knowledge distillation is proposed to transfer the knowledge from a cumbersome teacher model to an… ▽ More Recent progress in image-to-image translation has witnessed the success of generative adversarial networks (GANs). However, GANs usually contain a huge number of parameters, which lead to intolerant memory and computation consumption and limit their deployment on edge devices. To address this issue, knowledge distillation is proposed to transfer the knowledge from a cumbersome teacher model to an efficient student model. However, most previous knowledge distillation methods are designed for image classification and lead to limited performance in image-to-image translation. In this paper, we propose Region-aware Knowledge Distillation ReKo to compress image-to-image translation models. Firstly, ReKo adaptively finds the crucial regions in the images with an attention module. Then, patch-wise contrastive learning is adopted to maximize the mutual information between students and teachers in these crucial regions. Experiments with eight comparison methods on nine datasets demonstrate the substantial effectiveness of ReKo on both paired and unpaired image-to-image translation. For instance, our 7.08X compressed and 6.80X accelerated CycleGAN student outperforms its teacher by 1.33 and 1.04 FID scores on Horse to Zebra and Zebra to Horse, respectively. Codes will be released on GitHub. △ Less

Submitted 24 May, 2022; originally announced May 2022.

arXiv:2205.11098 [pdf, other]

PointDistiller: Structured Knowledge Distillation Towards Efficient and Compact 3D Detection

Authors: Linfeng Zhang, Runpei Dong, Hung-Shuo Tai, Kaisheng Ma

Abstract: The remarkable breakthroughs in point cloud representation learning have boosted their usage in real-world applications such as self-driving cars and virtual reality. However, these applications usually have an urgent requirement for not only accurate but also efficient 3D object detection. Recently, knowledge distillation has been proposed as an effective model compression technique, which transf… ▽ More The remarkable breakthroughs in point cloud representation learning have boosted their usage in real-world applications such as self-driving cars and virtual reality. However, these applications usually have an urgent requirement for not only accurate but also efficient 3D object detection. Recently, knowledge distillation has been proposed as an effective model compression technique, which transfers the knowledge from an over-parameterized teacher to a lightweight student and achieves consistent effectiveness in 2D vision. However, due to point clouds' sparsity and irregularity, directly applying previous image-based knowledge distillation methods to point cloud detectors usually leads to unsatisfactory performance. To fill the gap, this paper proposes PointDistiller, a structured knowledge distillation framework for point clouds-based 3D detection. Concretely, PointDistiller includes local distillation which extracts and distills the local geometric structure of point clouds with dynamic graph convolution and reweighted learning strategy, which highlights student learning on the crucial points or voxels to improve knowledge distillation efficiency. Extensive experiments on both voxels-based and raw points-based detectors have demonstrated the effectiveness of our method over seven previous knowledge distillation methods. For instance, our 4X compressed PointPillars student achieves 2.8 and 3.4 mAP improvements on BEV and 3D object detection, outperforming its teacher by 0.9 and 1.8 mAP, respectively. Codes have been released at https://github.com/RunpeiDong/PointDistiller. △ Less

Submitted 23 May, 2022; originally announced May 2022.

arXiv:2205.10661 [pdf, other]

An Empirical Investigation of Commonsense Self-Supervision with Knowledge Graphs

Authors: Jiarui Zhang, Filip Ilievski, Kaixin Ma, Jonathan Francis, Alessandro Oltramari

Abstract: Self-supervision based on the information extracted from large knowledge graphs has been shown to improve the generalization of language models, in zero-shot evaluation on various downstream language reasoning tasks. Since these improvements are reported in aggregate, however, little is known about (i) how to select the appropriate knowledge for solid performance across tasks, (ii) how to combine… ▽ More Self-supervision based on the information extracted from large knowledge graphs has been shown to improve the generalization of language models, in zero-shot evaluation on various downstream language reasoning tasks. Since these improvements are reported in aggregate, however, little is known about (i) how to select the appropriate knowledge for solid performance across tasks, (ii) how to combine this knowledge with neural language models, and (iii) how these pairings affect granular task performance. In this paper, we study the effect of knowledge sampling strategies and sizes that can be used to generate synthetic data for adapting language models. We study the effect of different synthetic datasets on language models with various architectures and sizes. The resulting models are evaluated against four task properties: domain overlap, answer similarity, vocabulary overlap, and answer length. Our experiments show that encoder-decoder models benefit from more data to learn from, whereas sampling strategies that balance across different aspects yield best performance. Most of the improvement occurs on questions with short answers and dissimilar answer candidates, which corresponds to the characteristics of the data used for pre-training. △ Less

Submitted 21 May, 2022; originally announced May 2022.

arXiv:2205.05560 [pdf, other]

doi 10.1088/1674-1137/ac827b

Mono-$γ$ Production of a Vector Dark Matter at Future $e^+e^-$ Collider

Authors: Kai Ma

Abstract: Associated production of a dark particle and a photon, represented as a mono-$γ$ event, is a promising channel to probe particle contents and dynamics in the dark sector. In this paper we study properties of the mono-$γ$ production of a vector dark matter at future $e^+e^-$ colliders. The photon-like and Pauli operators, as well as triple gauge bosons interactions involving the dark matter, are co… ▽ More Associated production of a dark particle and a photon, represented as a mono-$γ$ event, is a promising channel to probe particle contents and dynamics in the dark sector. In this paper we study properties of the mono-$γ$ production of a vector dark matter at future $e^+e^-$ colliders. The photon-like and Pauli operators, as well as triple gauge bosons interactions involving the dark matter, are considered in the framework of Effective Field Theory. We show that, comparing to the Pauli operator, the triple gauge bosons couplings are much more interesting at high energy collider. Beam polarization effects are also analyzed, and we show that the experimental sensitivities can not be enhanced significantly because of the smaller luminosity. △ Less

Submitted 11 May, 2022; originally announced May 2022.

Comments: 6 captioned figures and 1 table, 25 pages

arXiv:2204.13892 [pdf, other]

SideRT: A Real-time Pure Transformer Architecture for Single Image Depth Estimation

Authors: Chang Shu, Ziming Chen, Lei Chen, Kuan Ma, Minghui Wang, Haibing Ren

Abstract: Since context modeling is critical for estimating depth from a single image, researchers put tremendous effort into obtaining global context. Many global manipulations are designed for traditional CNN-based architectures to overcome the locality of convolutions. Attention mechanisms or transformers originally designed for capturing long-range dependencies might be a better choice, but usually comp… ▽ More Since context modeling is critical for estimating depth from a single image, researchers put tremendous effort into obtaining global context. Many global manipulations are designed for traditional CNN-based architectures to overcome the locality of convolutions. Attention mechanisms or transformers originally designed for capturing long-range dependencies might be a better choice, but usually complicates architectures and could lead to a decrease in inference speed. In this work, we propose a pure transformer architecture called SideRT that can attain excellent predictions in real-time. In order to capture better global context, Cross-Scale Attention (CSA) and Multi-Scale Refinement (MSR) modules are designed to work collaboratively to fuse features of different scales efficiently. CSA modules focus on fusing features of high semantic similarities, while MSR modules aim to fuse features at corresponding positions. These two modules contain a few learnable parameters without convolutions, based on which a lightweight yet effective model is built. This architecture achieves state-of-the-art performances in real-time (51.3 FPS) and becomes much faster with a reasonable performance drop on a smaller backbone Swin-T (83.1 FPS). Furthermore, its performance surpasses the previous state-of-the-art by a large margin, improving AbsRel metric 6.9% on KITTI and 9.7% on NYU. To the best of our knowledge, this is the first work to show that transformer-based networks can attain state-of-the-art performance in real-time in the single image depth estimation field. Code will be made available soon. △ Less

Submitted 29 April, 2022; originally announced April 2022.

Comments: 7 pages, 5 figures

arXiv:2204.11090 [pdf, other]

doi 10.1109/ISBI48211.2021.9433936

Learning Shape Priors by Pairwise Comparison for Robust Semantic Segmentation

Authors: Cong Xie, Hualuo Liu, Shilei Cao, Dong Wei, Kai Ma, Liansheng Wang, Yefeng Zheng

Abstract: Semantic segmentation is important in medical image analysis. Inspired by the strong ability of traditional image analysis techniques in capturing shape priors and inter-subject similarity, many deep learning (DL) models have been recently proposed to exploit such prior information and achieved robust performance. However, these two types of important prior information are usually studied separate… ▽ More Semantic segmentation is important in medical image analysis. Inspired by the strong ability of traditional image analysis techniques in capturing shape priors and inter-subject similarity, many deep learning (DL) models have been recently proposed to exploit such prior information and achieved robust performance. However, these two types of important prior information are usually studied separately in existing models. In this paper, we propose a novel DL model to model both type of priors within a single framework. Specifically, we introduce an extra encoder into the classic encoder-decoder structure to form a Siamese structure for the encoders, where one of them takes a target image as input (the image-encoder), and the other concatenates a template image and its foreground regions as input (the template-encoder). The template-encoder encodes the shape priors and appearance characteristics of each foreground class in the template image. A cosine similarity based attention module is proposed to fuse the information from both encoders, to utilize both types of prior information encoded by the template-encoder and model the inter-subject similarity for each foreground class. Extensive experiments on two public datasets demonstrate that our proposed method can produce superior performance to competing methods. △ Less

Submitted 23 April, 2022; originally announced April 2022.

Comments: IEEE ISBI 2021

arXiv:2204.10090 [pdf, other]

Learn from Unpaired Data for Image Restoration: A Variational Bayes Approach

Authors: Dihan Zheng, Xiaowen Zhang, Kaisheng Ma, Chenglong Bao

Abstract: Collecting paired training data is difficult in practice, but the unpaired samples broadly exist. Current approaches aim at generating synthesized training data from unpaired samples by exploring the relationship between the corrupted and clean data. This work proposes LUD-VAE, a deep generative method to learn the joint probability density function from data sampled from marginal distributions. O… ▽ More Collecting paired training data is difficult in practice, but the unpaired samples broadly exist. Current approaches aim at generating synthesized training data from unpaired samples by exploring the relationship between the corrupted and clean data. This work proposes LUD-VAE, a deep generative method to learn the joint probability density function from data sampled from marginal distributions. Our approach is based on a carefully designed probabilistic graphical model in which the clean and corrupted data domains are conditionally independent. Using variational inference, we maximize the evidence lower bound (ELBO) to estimate the joint probability density function. Furthermore, we show that the ELBO is computable without paired samples under the inference invariant assumption. This property provides the mathematical rationale of our approach in the unpaired setting. Finally, we apply our method to real-world image denoising, super-resolution, and low-light image enhancement tasks and train the models using the synthetic data generated by the LUD-VAE. Experimental results validate the advantages of our method over other approaches. △ Less

Submitted 11 September, 2022; v1 submitted 21 April, 2022; originally announced April 2022.

arXiv:2204.06951 [pdf, other]

Unsupervised Deep Learning Meets Chan-Vese Model

Authors: Dihan Zheng, Chenglong Bao, Zuoqiang Shi, Haibin Ling, Kaisheng Ma

Abstract: The Chan-Vese (CV) model is a classic region-based method in image segmentation. However, its piecewise constant assumption does not always hold for practical applications. Many improvements have been proposed but the issue is still far from well solved. In this work, we propose an unsupervised image segmentation approach that integrates the CV model with deep neural networks, which significantly… ▽ More The Chan-Vese (CV) model is a classic region-based method in image segmentation. However, its piecewise constant assumption does not always hold for practical applications. Many improvements have been proposed but the issue is still far from well solved. In this work, we propose an unsupervised image segmentation approach that integrates the CV model with deep neural networks, which significantly improves the original CV model's segmentation accuracy. Our basic idea is to apply a deep neural network that maps the image into a latent space to alleviate the violation of the piecewise constant assumption in image space. We formulate this idea under the classic Bayesian framework by approximating the likelihood with an evidence lower bound (ELBO) term while kee** the prior term in the CV model. Thus, our model only needs the input image itself and does not require pre-training from external datasets. Moreover, we extend the idea to multi-phase case and dataset based unsupervised image segmentation. Extensive experiments validate the effectiveness of our model and show that the proposed method is noticeably better than other unsupervised segmentation approaches. △ Less

Submitted 14 April, 2022; originally announced April 2022.

arXiv:2204.06187 [pdf, other]

Calibrating Class Weights with Multi-Modal Information for Partial Video Domain Adaptation

Authors: Xiyu Wang, Yuecong Xu, Kezhi Mao, Jianfei Yang

Abstract: Assuming the source label space subsumes the target one, Partial Video Domain Adaptation (PVDA) is a more general and practical scenario for cross-domain video classification problems. The key challenge of PVDA is to mitigate the negative transfer caused by the source-only outlier classes. To tackle this challenge, a crucial step is to aggregate target predictions to assign class weights by up-wei… ▽ More Assuming the source label space subsumes the target one, Partial Video Domain Adaptation (PVDA) is a more general and practical scenario for cross-domain video classification problems. The key challenge of PVDA is to mitigate the negative transfer caused by the source-only outlier classes. To tackle this challenge, a crucial step is to aggregate target predictions to assign class weights by up-weighing target classes and down-weighing outlier classes. However, the incorrect predictions of class weights can mislead the network and lead to negative transfer. Previous works improve the class weight accuracy by utilizing temporal features and attention mechanisms, but these methods may fall short when trying to generate accurate class weight when domain shifts are significant, as in most real-world scenarios. To deal with these challenges, we propose the Multi-modality Cluster-calibrated partial Adversarial Network (MCAN). MCAN enhances video feature extraction with multi-modal features from multiple temporal scales to form more robust overall features. It utilizes a novel class weight calibration method to alleviate the negative transfer caused by incorrect class weights. The calibration method tries to identify and weigh correct and incorrect predictions using distributional information implied by unsupervised clustering. Extensive experiments are conducted on prevailing PVDA benchmarks, and the proposed MCAN achieves significant improvements when compared to state-of-the-art PVDA methods. △ Less

Submitted 11 July, 2022; v1 submitted 13 April, 2022; originally announced April 2022.

Comments: Accepted by ACM Multimedia (ACMMM) 2022, update to camera-ready version. 8 pages of text, 5 figures, 2 tables

arXiv:2204.04088 [pdf, other]

Stochastic Gradient-based Fast Distributed Multi-Energy Management for an Industrial Park with Temporally-Coupled Constraints

Authors: Dafeng Zhu, Bo Yang, Chengbin Ma, Zhaojian Wang, Shanying Zhu, Kai Ma, ** Guan

Abstract: Contemporary industrial parks are challenged by the growing concerns about high cost and low efficiency of energy supply. Moreover, in the case of uncertain supply/demand, how to mobilize delay-tolerant elastic loads and compensate real-time inelastic loads to match multi-energy generation/storage and minimize energy cost is a key issue. Since energy management is hardly to be implemented offline… ▽ More Contemporary industrial parks are challenged by the growing concerns about high cost and low efficiency of energy supply. Moreover, in the case of uncertain supply/demand, how to mobilize delay-tolerant elastic loads and compensate real-time inelastic loads to match multi-energy generation/storage and minimize energy cost is a key issue. Since energy management is hardly to be implemented offline without knowing statistical information of random variables, this paper presents a systematic online energy cost minimization framework to fulfill the complementary utilization of multi-energy with time-varying generation, demand and price. Specifically to achieve charging/discharging constraints due to storage and short-term energy balancing, a fast distributed algorithm based on stochastic gradient with two-timescale implementation is proposed to ensure online implementation. To reduce the peak loads, an incentive mechanism is implemented by estimating users' willingness to shift. Analytical results on parameter setting are also given to guarantee feasibility and optimality of the proposed design. Numerical results show that when the bid-ask spread of electricity is small enough, the proposed algorithm can achieve the close-to-optimal cost asymptotically. △ Less

Submitted 8 April, 2022; originally announced April 2022.

Comments: Accepted by Applied Energy

arXiv:2203.16092 [pdf, other]

Global Tracking via Ensemble of Local Trackers

Authors: Zikun Zhou, Jianqiu Chen, Wenjie Pei, Kaige Mao, Hongpeng Wang, Zhenyu He

Abstract: The crux of long-term tracking lies in the difficulty of tracking the target with discontinuous moving caused by out-of-view or occlusion. Existing long-term tracking methods follow two typical strategies. The first strategy employs a local tracker to perform smooth tracking and uses another re-detector to detect the target when the target is lost. While it can exploit the temporal context like hi… ▽ More The crux of long-term tracking lies in the difficulty of tracking the target with discontinuous moving caused by out-of-view or occlusion. Existing long-term tracking methods follow two typical strategies. The first strategy employs a local tracker to perform smooth tracking and uses another re-detector to detect the target when the target is lost. While it can exploit the temporal context like historical appearances and locations of the target, a potential limitation of such strategy is that the local tracker tends to misidentify a nearby distractor as the target instead of activating the re-detector when the real target is out of view. The other long-term tracking strategy tracks the target in the entire image globally instead of local tracking based on the previous tracking results. Unfortunately, such global tracking strategy cannot leverage the temporal context effectively. In this work, we combine the advantages of both strategies: tracking the target in a global view while exploiting the temporal context. Specifically, we perform global tracking via ensemble of local trackers spreading the full image. The smooth moving of the target can be handled steadily by one local tracker. When the local tracker accidentally loses the target due to suddenly discontinuous moving, another local tracker close to the target is then activated and can readily take over the tracking to locate the target. While the activated local tracker performs tracking locally by leveraging the temporal context, the ensemble of local trackers renders our model the global view for tracking. Extensive experiments on six datasets demonstrate that our method performs favorably against state-of-the-art algorithms. △ Less

Submitted 30 March, 2022; originally announced March 2022.

Comments: 10 pages; 6 figures; accepted to CVPR2022

arXiv:2203.12268 [pdf, other]

Chiplet Actuary: A Quantitative Cost Model and Multi-Chiplet Architecture Exploration

Authors: Yinxiao Feng, Kaisheng Ma

Abstract: Multi-chip integration is widely recognized as the extension of Moore's Law. Cost-saving is a frequently mentioned advantage, but previous works rarely present quantitative demonstrations on the cost superiority of multi-chip integration over monolithic SoC. In this paper, we build a quantitative cost model and put forward an analytical method for multi-chip systems based on three typical multi-ch… ▽ More Multi-chip integration is widely recognized as the extension of Moore's Law. Cost-saving is a frequently mentioned advantage, but previous works rarely present quantitative demonstrations on the cost superiority of multi-chip integration over monolithic SoC. In this paper, we build a quantitative cost model and put forward an analytical method for multi-chip systems based on three typical multi-chip integration technologies to analyze the cost benefits from yield improvement, chiplet and package reuse, and heterogeneity. We re-examine the actual cost of multi-chip systems from various perspectives and show how to reduce the total cost of the VLSI system through appropriate multi-chiplet architecture. △ Less

Submitted 9 April, 2024; v1 submitted 23 March, 2022; originally announced March 2022.

Comments: Accepted by and presented at DAC 2022

arXiv:2203.10332 [pdf, other]

doi 10.1109/TMI.2021.3131245

Domain Adaptation Meets Zero-Shot Learning: An Annotation-Efficient Approach to Multi-Modality Medical Image Segmentation

Authors: Cheng Bian, Chenglang Yuan, Kai Ma, Shuang Yu, Dong Wei, Yefeng Zheng

Abstract: Due to the lack of properly annotated medical data, exploring the generalization capability of the deep model is becoming a public concern. Zero-shot learning (ZSL) has emerged in recent years to equip the deep model with the ability to recognize unseen classes. However, existing studies mainly focus on natural images, which utilize linguistic models to extract auxiliary information for ZSL. It is… ▽ More Due to the lack of properly annotated medical data, exploring the generalization capability of the deep model is becoming a public concern. Zero-shot learning (ZSL) has emerged in recent years to equip the deep model with the ability to recognize unseen classes. However, existing studies mainly focus on natural images, which utilize linguistic models to extract auxiliary information for ZSL. It is impractical to apply the natural image ZSL solutions directly to medical images, since the medical terminology is very domain-specific, and it is not easy to acquire linguistic models for the medical terminology. In this work, we propose a new paradigm of ZSL specifically for medical images utilizing cross-modality information. We make three main contributions with the proposed paradigm. First, we extract the prior knowledge about the segmentation targets, called relation prototypes, from the prior model and then propose a cross-modality adaptation module to inherit the prototypes to the zero-shot model. Second, we propose a relation prototype awareness module to make the zero-shot model aware of information contained in the prototypes. Last but not least, we develop an inheritance attention module to recalibrate the relation prototypes to enhance the inheritance process. The proposed framework is evaluated on two public cross-modality datasets including a cardiac dataset and an abdominal dataset. Extensive experiments show that the proposed framework significantly outperforms the state of the arts. △ Less

Submitted 19 March, 2022; originally announced March 2022.

Comments: IEEE TMI

arXiv:2203.07859 [pdf, other]

doi 10.1016/j.nima.2022.166622

Construction and commissioning of the collinear laser spectroscopy system at BRIF

Authors: S. J. Wang, X. F. Yang, S. W. Bai, Y. C. Liu, P. Zhang, Y. S. Liu, H. R. Hu, H. W. Li, B. Tang, B. Q. Cui, C. Y. He, X. Ma, Q. T. Li, J. H. Chen, K. Ma, L. S. Yang, Z. Y. Hu, W. L. Pu, Y. Chen, Y. F. Guo, Z. Y. Du, Z. Yan, F. L. Liu, H. R. Wang, G. Q. Yang , et al. (2 additional authors not shown)

Abstract: We have constructed a collinear laser spectroscopy (CLS) system installed at the Bei**g Radioactive Ion-beam Facility (BRIF), aiming to investigate the nuclear properties of unstable nuclei. The first on-line commissioning experiment of this system was performed using the continuous stable ($^{39}$K) and unstable ($^{38}$K) ion beams produced by im**ing a 100-MeV proton beam on a CaO target. Hy… ▽ More We have constructed a collinear laser spectroscopy (CLS) system installed at the Bei**g Radioactive Ion-beam Facility (BRIF), aiming to investigate the nuclear properties of unstable nuclei. The first on-line commissioning experiment of this system was performed using the continuous stable ($^{39}$K) and unstable ($^{38}$K) ion beams produced by im**ing a 100-MeV proton beam on a CaO target. Hyperfine structure spectra of these two isotopes are reasonably reproduced, and the extracted magnetic dipole hyperfine parameters and isotope shift agree with the literature values. The on-line experiment demonstrates the overall functioning of this CLS system, opening new opportunities for laser spectroscopy measurement of unstable isotopes at BRIF and other radioactive ion beam facilities in China. △ Less

Submitted 11 March, 2022; originally announced March 2022.

arXiv:2203.07659 [pdf]

Breast Cancer Molecular Subtypes Prediction on Pathological Images with Discriminative Patch Selecting and Multi-Instance Learning

Authors: Hong Liu, Wen-Dong Xu, Zi-Hao Shang, Xiang-Dong Wang, Hai-Yan Zhou, Ke-Wen Ma, Huan Zhou, Jia-Lin Qi, Jia-Rui Jiang, Li-Lan Tan, Hui-Min Zeng, Hui-Juan Cai, Kuan-Song Wang, Yue-Liang Qian

Abstract: Molecular subtypes of breast cancer are important references to personalized clinical treatment. For cost and labor savings, only one of the patient's paraffin blocks is usually selected for subsequent immunohistochemistry (IHC) to obtain molecular subtypes. Inevitable sampling error is risky due to tumor heterogeneity and could result in a delay in treatment. Molecular subtype prediction from con… ▽ More Molecular subtypes of breast cancer are important references to personalized clinical treatment. For cost and labor savings, only one of the patient's paraffin blocks is usually selected for subsequent immunohistochemistry (IHC) to obtain molecular subtypes. Inevitable sampling error is risky due to tumor heterogeneity and could result in a delay in treatment. Molecular subtype prediction from conventional H&E pathological whole slide images (WSI) using AI method is useful and critical to assist pathologists pre-screen proper paraffin block for IHC. It's a challenging task since only WSI level labels of molecular subtypes can be obtained from IHC. Gigapixel WSIs are divided into a huge number of patches to be computationally feasible for deep learning. While with coarse slide-level labels, patch-based methods may suffer from abundant noise patches, such as folds, overstained regions, or non-tumor tissues. A weakly supervised learning framework based on discriminative patch selecting and multi-instance learning was proposed for breast cancer molecular subtype prediction from H&E WSIs. Firstly, co-teaching strategy was adopted to learn molecular subtype representations and filter out noise patches. Then, a balanced sampling strategy was used to handle the imbalance in subtypes in the dataset. In addition, a noise patch filtering algorithm that used local outlier factor based on cluster centers was proposed to further select discriminative patches. Finally, a loss function integrating patch with slide constraint information was used to finetune MIL framework on obtained discriminative patches and further improve the performance of molecular subty**. The experimental results confirmed the effectiveness of the proposed method and our models outperformed even senior pathologists, with potential to assist pathologists to pre-screen paraffin blocks for IHC in clinic. △ Less

Submitted 15 March, 2022; originally announced March 2022.

Showing 201–250 of 610 results for author: Ma, K