Search | arXiv e-print repository

HiLo: Exploiting High Low Frequency Relations for Unbiased Panoptic Scene Graph Generation

Authors: Zijian Zhou, Miao**g Shi, Holger Caesar

Abstract: Panoptic Scene Graph generation (PSG) is a recently proposed task in image scene understanding that aims to segment the image and extract triplets of subjects, objects and their relations to build a scene graph. This task is particularly challenging for two reasons. First, it suffers from a long-tail problem in its relation categories, making naive biased methods more inclined to high-frequency re… ▽ More Panoptic Scene Graph generation (PSG) is a recently proposed task in image scene understanding that aims to segment the image and extract triplets of subjects, objects and their relations to build a scene graph. This task is particularly challenging for two reasons. First, it suffers from a long-tail problem in its relation categories, making naive biased methods more inclined to high-frequency relations. Existing unbiased methods tackle the long-tail problem by data/loss rebalancing to favor low-frequency relations. Second, a subject-object pair can have two or more semantically overlap** relations. While existing methods favor one over the other, our proposed HiLo framework lets different network branches specialize on low and high frequency relations, enforce their consistency and fuse the results. To the best of our knowledge we are the first to propose an explicitly unbiased PSG method. In extensive experiments we show that our HiLo framework achieves state-of-the-art results on the PSG task. We also apply our method to the Scene Graph Generation task that predicts boxes instead of masks and see improvements over all baseline methods. Code is available at https://github.com/franciszzj/HiLo. △ Less

Submitted 17 August, 2023; v1 submitted 28 March, 2023; originally announced March 2023.

Comments: ICCV 2023

arXiv:2303.13991 [pdf]

doi 10.1103/PhysRevLett.132.033201

Wave-Packet Surface Propagation for Light-Induced Molecular Dissociation

Authors: Shengzhe Pan, Zhaohan Zhang, Chenxi Hu, Peifen Lu, Xiaochun Gong, Ruolin Gong, Wenbin Zhang, Lianrong Zhou, Chenxu Lu, Menghang Shi, Zhejun Jiang, Hongcheng Ni, Feng He, Jian Wu

Abstract: Recent advances in laser technology have enabled tremendous progress in photochemistry, at the heart of which is the breaking and formation of chemical bonds. Such progress has been greatly facilitated by the development of accurate quantum-mechanical simulation method, which, however, does not necessarily accompany clear dynamical scenarios and is rather often a black box, other than being comput… ▽ More Recent advances in laser technology have enabled tremendous progress in photochemistry, at the heart of which is the breaking and formation of chemical bonds. Such progress has been greatly facilitated by the development of accurate quantum-mechanical simulation method, which, however, does not necessarily accompany clear dynamical scenarios and is rather often a black box, other than being computationally heavy. Here, we develop a wave-packet surface propagation (WASP) approach to describe the molecular bond-breaking dynamics from a hybrid quantum-classical perspective. Via the introduction of quantum elements including state transitions and phase accumulations to the Newtonian propagation of the nuclear wave-packet, the WASP approach naturally comes with intuitive physical scenarios and accuracies. It is carefully benchmarked with the H2+ molecule and is shown to be capable of precisely reproducing experimental observations. The WASP method is promising for the intuitive visualization of strong-field molecular dynamics and is straightforwardly extensible toward complex molecules. △ Less

Submitted 24 March, 2023; originally announced March 2023.

Comments: 24 pages, 4 figures

Journal ref: Phys. Rev. Lett. 132, 033201 (2024)

arXiv:2302.13562 [pdf, other]

Communication-efficient Federated Learning with Single-Step Synthetic Features Compressor for Faster Convergence

Authors: Yuhao Zhou, Mingjia Shi, Yuanxi Li, Qing Ye, Yanan Sun, Jiancheng Lv

Abstract: Reducing communication overhead in federated learning (FL) is challenging but crucial for large-scale distributed privacy-preserving machine learning. While methods utilizing sparsification or others can largely lower the communication overhead, the convergence rate is also greatly compromised. In this paper, we propose a novel method, named single-step synthetic features compressor (3SFC), to ach… ▽ More Reducing communication overhead in federated learning (FL) is challenging but crucial for large-scale distributed privacy-preserving machine learning. While methods utilizing sparsification or others can largely lower the communication overhead, the convergence rate is also greatly compromised. In this paper, we propose a novel method, named single-step synthetic features compressor (3SFC), to achieve communication-efficient FL by directly constructing a tiny synthetic dataset based on raw gradients. Thus, 3SFC can achieve an extremely low compression rate when the constructed dataset contains only one data sample. Moreover, 3SFC's compressing phase utilizes a similarity-based objective function so that it can be optimized with just one step, thereby considerably improving its performance and robustness. In addition, to minimize the compressing error, error feedback (EF) is also incorporated into 3SFC. Experiments on multiple datasets and models suggest that 3SFC owns significantly better convergence rates compared to competing methods with lower compression rates (up to 0.02%). Furthermore, ablation studies and visualizations show that 3SFC can carry more information than competing methods for every communication round, further validating its effectiveness. △ Less

Submitted 18 March, 2023; v1 submitted 27 February, 2023; originally announced February 2023.

arXiv:2302.13313 [pdf, other]

doi 10.1103/PhysRevB.107.L201104

ARPES signature of the competition between magnetic order and Kondo effect in CeCoGe3

Authors: Peng Li, Huiqing Ye, Yong Hu, Yuan Fang, Zhiguang Xiao, Zhongzheng Wu, Zhaoyang Shan, Ravi P. Singh, Geetha Balakrishnan, Dawei Shen, Yi-feng Yang, Chao Cao, Nicholas C. Plumb, Michael Smidman, Ming Shi, Johann Kroha, Huiqiu Yuan, Frank Steglich, Yang Liu

Abstract: The competition between magnetic order and Kondo effect is essential for the rich physics of heavy fermion systems. Nevertheless, how such competition is manifested in the quasiparticle bands in a real periodic lattice remains elusive in spectroscopic experiments. Here we report a high-resolution photoemission study of the antiferromagnetic Kondo lattice system CeCoGe3 with a high TN1 of 21K. Our… ▽ More The competition between magnetic order and Kondo effect is essential for the rich physics of heavy fermion systems. Nevertheless, how such competition is manifested in the quasiparticle bands in a real periodic lattice remains elusive in spectroscopic experiments. Here we report a high-resolution photoemission study of the antiferromagnetic Kondo lattice system CeCoGe3 with a high TN1 of 21K. Our measurements reveal a weakly dispersive 4f band at the Fermi level near the Z point, arisingfrom moderate Kondo effect. The intensity of this heavy 4f band exhibits a logarithmic increase with lowering temperature and begins to deviate from this Kondo-like behavior below 25 K, just above TN1, and eventually ceases to grow below 12 K. Our work provides direct spectroscopic evidence for the competition between magnetic order and the Kondo effect in a Kondo lattice system with local-moment antiferromagnetism, indicating a distinct scenario for the microscopic coexistence and competition of these phenomena, which might be related to the real-space modulation. △ Less

Submitted 26 February, 2023; originally announced February 2023.

Comments: 7 pages, 4 figures

Journal ref: Phys. Rev. B 107, L201104 (2023)

arXiv:2302.09201 [pdf, other]

An Efficient Method for Joint Delay-Doppler Estimation of Moving Targets in Passive Radar

Authors: Mengjiao Shi, Yunhai Xiao, Peili Li

Abstract: Passive radar systems can detect and track the moving targets of interest by exploiting non-cooperative illuminators-of-opportunity to transmit orthogonal frequency division multiplexing (OFDM) signals. These targets are searched using a bank of correlators tuned to the waveform corresponding to the given Doppler frequency shift and delay. In this paper, we study the problem of joint delay-Doppler… ▽ More Passive radar systems can detect and track the moving targets of interest by exploiting non-cooperative illuminators-of-opportunity to transmit orthogonal frequency division multiplexing (OFDM) signals. These targets are searched using a bank of correlators tuned to the waveform corresponding to the given Doppler frequency shift and delay. In this paper, we study the problem of joint delay-Doppler estimation of moving targets in OFDM passive radar. This task of estimation is described as an atomic-norm regularized convex optimization problem, or equivalently, a semi-definite programming problem. The alternating direction method of multipliers (ADMM) can be employed which computes each variable in a Gauss-Seidel manner, but its convergence is lack of certificate. In this paper, we use a symmetric Gauss-Seidel (sGS) to the framework of ADMM, which only needs to compute some of the subproblems twice but has the ability to ensure convergence. We do some simulated experiments which illustrate that the sGS-ADMM is superior to ADMM in terms of accuracy and computing time. △ Less

Submitted 17 February, 2023; originally announced February 2023.

arXiv:2302.04375 [pdf, other]

A Near-Optimal Algorithm for Safe Reinforcement Learning Under Instantaneous Hard Constraints

Authors: Ming Shi, Yingbin Liang, Ness Shroff

Abstract: In many applications of Reinforcement Learning (RL), it is critically important that the algorithm performs safely, such that instantaneous hard constraints are satisfied at each step, and unsafe states and actions are avoided. However, existing algorithms for ''safe'' RL are often designed under constraints that either require expected cumulative costs to be bounded or assume all states are safe.… ▽ More In many applications of Reinforcement Learning (RL), it is critically important that the algorithm performs safely, such that instantaneous hard constraints are satisfied at each step, and unsafe states and actions are avoided. However, existing algorithms for ''safe'' RL are often designed under constraints that either require expected cumulative costs to be bounded or assume all states are safe. Thus, such algorithms could violate instantaneous hard constraints and traverse unsafe states (and actions) in practice. Therefore, in this paper, we develop the first near-optimal safe RL algorithm for episodic Markov Decision Processes with unsafe states and actions under instantaneous hard constraints and the linear mixture model. It not only achieves a regret $\tilde{O}(\frac{d H^3 \sqrt{dK}}{Δ_c})$ that tightly matches the state-of-the-art regret in the setting with only unsafe actions and nearly matches that in the unconstrained setting, but is also safe at each step, where $d$ is the feature-map** dimension, $K$ is the number of episodes, $H$ is the number of steps in each episode, and $Δ_c$ is a safety-related parameter. We also provide a lower bound $\tildeΩ(\max\{dH \sqrt{K}, \frac{H}{Δ_c^2}\})$, which indicates that the dependency on $Δ_c$ is necessary. Further, both our algorithm design and regret analysis involve several novel ideas, which may be of independent interest. △ Less

Submitted 8 February, 2023; originally announced February 2023.

Comments: Submitted for publication

arXiv:2302.04374 [pdf, ps, other]

Near-Optimal Adversarial Reinforcement Learning with Switching Costs

Authors: Ming Shi, Yingbin Liang, Ness Shroff

Abstract: Switching costs, which capture the costs for changing policies, are regarded as a critical metric in reinforcement learning (RL), in addition to the standard metric of losses (or rewards). However, existing studies on switching costs (with a coefficient $β$ that is strictly positive and is independent of $T$) have mainly focused on static RL, where the loss distribution is assumed to be fixed duri… ▽ More Switching costs, which capture the costs for changing policies, are regarded as a critical metric in reinforcement learning (RL), in addition to the standard metric of losses (or rewards). However, existing studies on switching costs (with a coefficient $β$ that is strictly positive and is independent of $T$) have mainly focused on static RL, where the loss distribution is assumed to be fixed during the learning process, and thus practical scenarios where the loss distribution could be non-stationary or even adversarial are not considered. While adversarial RL better models this type of practical scenarios, an open problem remains: how to develop a provably efficient algorithm for adversarial RL with switching costs? This paper makes the first effort towards solving this problem. First, we provide a regret lower-bound that shows that the regret of any algorithm must be larger than $\tildeΩ( ( H S A )^{1/3} T^{2/3} )$, where $T$, $S$, $A$ and $H$ are the number of episodes, states, actions and layers in each episode, respectively. Our lower bound indicates that, due to the fundamental challenge of switching costs in adversarial RL, the best achieved regret (whose dependency on $T$ is $\tilde{O}(\sqrt{T})$) in static RL with switching costs (as well as adversarial RL without switching costs) is no longer achievable. Moreover, we propose two novel switching-reduced algorithms with regrets that match our lower bound when the transition function is known, and match our lower bound within a small factor of $\tilde{O}( H^{1/3} )$ when the transition function is unknown. Our regret analysis demonstrates the near-optimal performance of them. △ Less

Submitted 8 February, 2023; originally announced February 2023.

Comments: Accepted by ICLR2023 as Top 25%

arXiv:2301.13335 [pdf, other]

Multi-modal Large Language Model Enhanced Pseudo 3D Perception Framework for Visual Commonsense Reasoning

Authors: Jian Zhu, Hanli Wang, Miao**g Shi

Abstract: The visual commonsense reasoning (VCR) task is to choose an answer and provide a justifying rationale based on the given image and textural question. Representative works first recognize objects in images and then associate them with key words in texts. However, existing approaches do not consider exact positions of objects in a human-like three-dimensional (3D) manner, making them incompetent to… ▽ More The visual commonsense reasoning (VCR) task is to choose an answer and provide a justifying rationale based on the given image and textural question. Representative works first recognize objects in images and then associate them with key words in texts. However, existing approaches do not consider exact positions of objects in a human-like three-dimensional (3D) manner, making them incompetent to accurately distinguish objects and understand visual relation. Recently, multi-modal large language models (MLLMs) have been used as powerful tools for several multi-modal tasks but not for VCR yet, which requires elaborate reasoning on specific visual objects referred by texts. In light of the above, an MLLM enhanced pseudo 3D perception framework is designed for VCR. Specifically, we first demonstrate that the relation between objects is relevant to object depths in images, and hence introduce object depth into VCR frameworks to infer 3D positions of objects in images. Then, a depth-aware Transformer is proposed to encode depth differences between objects into the attention mechanism of Transformer to discriminatively associate objects with visual scenes guided by depth. To further associate the answer with the depth of visual scene, each word in the answer is tagged with a pseudo depth to realize depth-aware association between answer words and objects. On the other hand, BLIP-2 as an MLLM is employed to process images and texts, and the referring expressions in texts involving specific visual objects are modified with linguistic object labels to serve as comprehensible MLLM inputs. Finally, a parameter optimization technique is devised to fully consider the quality of data batches based on multi-level reasoning confidence. Experiments on the VCR dataset demonstrate the superiority of the proposed framework over state-of-the-art approaches. △ Less

Submitted 25 December, 2023; v1 submitted 30 January, 2023; originally announced January 2023.

arXiv:2301.12678 [pdf, other]

The Meta Distribution of SINR in UAV-Assisted Cellular Networks

Authors: Minwei Shi, Kai Yang, Dusit Niyato, Hang Yuan, He Zhou, Zhan Xu

Abstract: Mounting compact and lightweight base stations on unmanned aerial vehicles (UAVs) is a cost-effective and flexible solution to provide seamless coverage on the existing terrestrial networks. While the coverage probability in UAV-assisted cellular networks has been widely investigated, it provides only the first-order statistic of signal-to-interference-plus-noise ratio (SINR). In this paper, to an… ▽ More Mounting compact and lightweight base stations on unmanned aerial vehicles (UAVs) is a cost-effective and flexible solution to provide seamless coverage on the existing terrestrial networks. While the coverage probability in UAV-assisted cellular networks has been widely investigated, it provides only the first-order statistic of signal-to-interference-plus-noise ratio (SINR). In this paper, to analyze high-order statistics of SINR and characterize the disparity among individual links, we provide a meta distribution (MD)-based analytical framework for UAV-assisted cellular networks, in which the probabilistic line-of-sight channel and realistic antenna pattern are taken into account for air-to-ground transmissions. To accurately characterize the interference from UAVs, we relax the widely applied uniform off-boresight angle (OBA) assumption and derive the exact distribution of OBA. Using stochastic geometry, for both steerable and vertical antenna scenarios, we obtain mathematical expressions for the moments of condition success probability, the SINR MD, and the mean local delay. Moreover, we study the asymptotic behavior of the moments as network density approaches infinity. Numerical results validate the tightness of the theoretical results and show that the uniform OBA assumption underestimates the network performance, especially in the regime of moderate altitude of UAV. We also show that when UAVs are equipped with steerable antennas, the network coverage and user fairness can be optimized simultaneously by carefully adjusting the UAV parameters. △ Less

Submitted 30 January, 2023; originally announced January 2023.

Comments: 30 pages, 9 figures

arXiv:2301.05382 [pdf, other]

doi 10.3847/2041-8213/acb3c6

Excitation of Multi-periodic Kink Motions in Solar Flare Loops: Possible Application to Quasi-periodic Pulsations

Authors: Mijie Shi, Bo Li, Shao-Xia Chen, Mingzhe Guo, Shengju Yuan

Abstract: Magnetohydrodynamic (MHD) waves are often invoked to interpret quasi-periodic pulsations (QPPs) in solar flares. We study the response of a straight flare loop to a kink-like velocity perturbation using three-dimensional MHD simulations and forward model the microwave emissions using the fast gyrosynchrotron code. Kink motions with two periodicities are simultaneously generated,with the long-perio… ▽ More Magnetohydrodynamic (MHD) waves are often invoked to interpret quasi-periodic pulsations (QPPs) in solar flares. We study the response of a straight flare loop to a kink-like velocity perturbation using three-dimensional MHD simulations and forward model the microwave emissions using the fast gyrosynchrotron code. Kink motions with two periodicities are simultaneously generated,with the long-period component P_L = 57s being attributed to the radial fundamental kink mode and the short-period component P_S=5.8s to the first leaky kink mode. Forward modeling results show that the two-periodic oscillations are detectable in the microwave intensities for some lines of sight. Increasing the beam size to (1")^2 does not wipe out the microwave oscillations. We propose that the first leaky kink mode is a promising candidate mechanism to account for short-period QPPs. Radio telescopes with high spatial resolutions can help distinguish between this new mechanism with such customary interpretations as sausage modes. △ Less

Submitted 12 January, 2023; originally announced January 2023.

Comments: Accepted for publication in ApJL

arXiv:2212.07958 [pdf]

doi 10.1038/s41567-023-02215-z

Non-trivial band topology and orbital-selective electronic nematicity in a new titanium-based kagome superconductor

Authors: Yong Hu, Congcong Le, Zhen Zhao, Junzhang Ma, Nicholas C. Plumb, Milan Radovic, Andreas P. Schnyder, Xianxin Wu, Hui Chen, Xiaoli Dong, Jiang** Hu, Haitao Yang, Hong-Jun Gao, Ming Shi

Abstract: Electronic nematicity that spontaneously breaks rotational symmetry has been shown as a generic phenomenon in correlated quantum systems including high-temperature superconductors and the AV3Sb5 (A = K, Rb, Cs) family with a kagome network. Identifying the driving force has been a central challenge for understanding nematicity. In iron-based superconductors, the problem is complicated because the… ▽ More Electronic nematicity that spontaneously breaks rotational symmetry has been shown as a generic phenomenon in correlated quantum systems including high-temperature superconductors and the AV3Sb5 (A = K, Rb, Cs) family with a kagome network. Identifying the driving force has been a central challenge for understanding nematicity. In iron-based superconductors, the problem is complicated because the spin, orbital and lattice degrees of freedom are intimately coupled. In vanadium-based kagome superconductors AV3Sb5, the electronic nematicity exhibits an intriguing entanglement with the charge density wave order (CDW), making understanding its origin difficult. Recently, a new family of titanium-based kagome superconductors ATi3Bi5 has been synthesized. In sharp contrast to its vanadium-based counterpart, the electronic nematicity occurs in the absence of CDW. ATi3Bi5 provides a new window to explore the mechanism of electronic nematicity and its interplay with the orbital degree of freedom. Here, we combine polarization-dependent angle-resolved photoemission spectroscopy with density functional theory to directly reveal the band topology and orbital characters of the multi-orbital RbTi3Bi5. The promising coexistence of flat bands, type-II Dirac nodal line and nontrivial Z2 topological states is identified in RbTi3Bi5. Remarkably, our study clearly unveils the orbital character change along the G-M and G-K directions, implying a strong intrinsic inter-orbital coupling in the Ti-based kagome metals, reminiscent of iron-based superconductors. Furthermore, do**-dependent measurements directly uncover the orbital-selective features in the kagome bands, which can be well explained by the d-p hybridization. The suggested d-p hybridization, in collaboration with the inter-orbital coupling, could account for the electronic nematicity in ATi3Bi5. △ Less

Submitted 15 December, 2022; originally announced December 2022.

Report number: RIKEN-iTHEMS-Report-22

Journal ref: Nat. Phys. (2023)

arXiv:2212.07618 [pdf, other]

Proposal Distribution Calibration for Few-Shot Object Detection

Authors: Bohao Li, Chang Liu, Mengnan Shi, Xiaozhong Chen, Xiangyang Ji, Qixiang Ye

Abstract: Adapting object detectors learned with sufficient supervision to novel classes under low data regimes is charming yet challenging. In few-shot object detection (FSOD), the two-step training paradigm is widely adopted to mitigate the severe sample imbalance, i.e., holistic pre-training on base classes, then partial fine-tuning in a balanced setting with all classes. Since unlabeled instances are su… ▽ More Adapting object detectors learned with sufficient supervision to novel classes under low data regimes is charming yet challenging. In few-shot object detection (FSOD), the two-step training paradigm is widely adopted to mitigate the severe sample imbalance, i.e., holistic pre-training on base classes, then partial fine-tuning in a balanced setting with all classes. Since unlabeled instances are suppressed as backgrounds in the base training phase, the learned RPN is prone to produce biased proposals for novel instances, resulting in dramatic performance degradation. Unfortunately, the extreme data scarcity aggravates the proposal distribution bias, hindering the RoI head from evolving toward novel classes. In this paper, we introduce a simple yet effective proposal distribution calibration (PDC) approach to neatly enhance the localization and classification abilities of the RoI head by recycling its localization ability endowed in base training and enriching high-quality positive samples for semantic fine-tuning. Specifically, we sample proposals based on the base proposal statistics to calibrate the distribution bias and impose additional localization and classification losses upon the sampled proposals for fast expanding the base detector to novel classes. Experiments on the commonly used Pascal VOC and MS COCO datasets with explicit state-of-the-art performances justify the efficacy of our PDC for FSOD. Code is available at github.com/Bohao-Lee/PDC. △ Less

Submitted 15 December, 2022; originally announced December 2022.

Comments: This paper is under review in IEEE TNNLS

arXiv:2212.02690 [pdf, ps, other]

doi 10.3847/1538-4357/aca976

Standing Sausage Perturbations in Solar Coronal Slabs with Continuous Transverse Density Profiles: cutoff wavenumbers, evanescent eigenmodes, and oscillatory continuum

Authors: Zexing Wang, Bo Li, Shao-Xia Chen, Mijie Shi

Abstract: The lack of observed sausage perturbations in solar active region loops is customarily attributed to the relevance of cutoff axial wavenumbers and the consequent absence of trapped modes (called ``evanescent eigenmodes'' here). However, some recent eigenvalue problem studies yield that cutoff wavenumbers may disappear for those equilibria where the external density varies sufficiently slowly, ther… ▽ More The lack of observed sausage perturbations in solar active region loops is customarily attributed to the relevance of cutoff axial wavenumbers and the consequent absence of trapped modes (called ``evanescent eigenmodes'' here). However, some recent eigenvalue problem studies yield that cutoff wavenumbers may disappear for those equilibria where the external density varies sufficiently slowly, thereby casting doubt on the rarity of candidate sausage perturbations. We examine the responses of straight, transversely structured, coronal slabs to small-amplitude sausage-type perturbations that excite axial fundamentals by solving the pertinent initial value problem with eigensolutions for a closed domain. The density variation in the slab exterior is dictated by some steepness parameter $μ$, and cutoff wavenumbers are theoretically expected to be present (absent) when $μ\ge 2$ ($μ< 2$). However, our numerical results show no qualitative difference in the system evolution when $μ$ varies, despite the differences in the modal behavior. Only oscillatory eigenmodes are permitted when $μ\ge 2$. Our discrete eigenspectrum becomes increasingly closely spaced when the domain broadens, and an oscillatory continuum results for a truly open system. Oscillatory eigenmodes remain allowed and dominate the system evolution when $μ<2$. We show that the irrelevance of cutoff wavenumbers does not mean that all fast waves are evanescent. Rather, it means that an increasing number of evanescent eigenmodes emerge when the domain size increases. We conclude that sausage perturbations remain difficult to detect even for the waveguide formulated here. △ Less

Submitted 5 December, 2022; originally announced December 2022.

Comments: 32 pages, 8 figures, accepted for publication in ApJ

arXiv:2212.02573 [pdf, other]

Domain-General Crowd Counting in Unseen Scenarios

Authors: Zhipeng Du, Jiankang Deng, Miao**g Shi

Abstract: Domain shift across crowd data severely hinders crowd counting models to generalize to unseen scenarios. Although domain adaptive crowd counting approaches close this gap to a certain extent, they are still dependent on the target domain data to adapt (e.g. finetune) their models to the specific domain. In this paper, we aim to train a model based on a single source domain which can generalize wel… ▽ More Domain shift across crowd data severely hinders crowd counting models to generalize to unseen scenarios. Although domain adaptive crowd counting approaches close this gap to a certain extent, they are still dependent on the target domain data to adapt (e.g. finetune) their models to the specific domain. In this paper, we aim to train a model based on a single source domain which can generalize well on any unseen domain. This falls into the realm of domain generalization that remains unexplored in crowd counting. We first introduce a dynamic sub-domain division scheme which divides the source domain into multiple sub-domains such that we can initiate a meta-learning framework for domain generalization. The sub-domain division is dynamically refined during the meta-learning. Next, in order to disentangle domain-invariant information from domain-specific information in image features, we design the domain-invariant and -specific crowd memory modules to re-encode image features. Two types of losses, i.e. feature reconstruction and orthogonal losses, are devised to enable this disentanglement. Extensive experiments on several standard crowd counting benchmarks i.e. SHA, SHB, QNRF, and NWPU, show the strong generalizability of our method. △ Less

Submitted 28 March, 2023; v1 submitted 5 December, 2022; originally announced December 2022.

Comments: Accepted to AAAI 2023 as Oral Presentation

arXiv:2212.01818 [pdf]

Exploring and Eliciting Needs and Preferences from Editors for Wikidata Recommendations

Authors: Kholoud Alghamdi, Miao**g Shi, Elena Simperl

Abstract: Wikidata is an open knowledge graph created, managed, and maintained collaboratively by a global community of volunteers. As it continues to grow, it faces substantial editor engagement challenges, including acquiring new editors to tackle an increasing workload and retaining existing editors. Experiences from other online communities and peer-production systems, including Wikipedia, suggest that… ▽ More Wikidata is an open knowledge graph created, managed, and maintained collaboratively by a global community of volunteers. As it continues to grow, it faces substantial editor engagement challenges, including acquiring new editors to tackle an increasing workload and retaining existing editors. Experiences from other online communities and peer-production systems, including Wikipedia, suggest that recommending tasks to editors could help with both. Our aim with this paper is to elicit the user requirements for a Wikidata recommendations system. We conduct a mixed-methods study with a thematic analysis of in-depth interviews with 31 Wikidata editors and three Wikimedia managers, complemented by a quantitative analysis of edit records of 3,740 Wikidata editors. The insights gained from the study help us outline design requirements for the Wikidata recommender system. We conclude with a discussion of the implications of this work and directions for future work. △ Less

Submitted 4 December, 2022; originally announced December 2022.

arXiv:2212.00048 [pdf, ps, other]

doi 10.1016/j.jcta.2023.105790

A family of diameter perfect constant-weight codes from Steiner systems

Authors: Minjia Shi, Yuhong Xia, Denis S. Krotov

Abstract: If $S$ is a transitive metric space, then $|C|\cdot|A| \le |S|$ for any distance-$d$ code $C$ and a set $A$, ``anticode'', of diameter less than $d$. For every Steiner S$(t,k,n)$ system $S$, we show the existence of a $q$-ary constant-weight code $C$ of length~$n$, weight~$k$ (or $n-k$), and distance $d=2k-t+1$ (respectively, $d=n-t+1$) and an anticode $A$ of diameter $d-1$ such that the pair… ▽ More If $S$ is a transitive metric space, then $|C|\cdot|A| \le |S|$ for any distance-$d$ code $C$ and a set $A$, ``anticode'', of diameter less than $d$. For every Steiner S$(t,k,n)$ system $S$, we show the existence of a $q$-ary constant-weight code $C$ of length~$n$, weight~$k$ (or $n-k$), and distance $d=2k-t+1$ (respectively, $d=n-t+1$) and an anticode $A$ of diameter $d-1$ such that the pair $(C,A)$ attains the code--anticode bound and the supports of the codewords of $C$ are the blocks of $S$ (respectively, the complements of the blocks of $S$). We study the problem of estimating the minimum value of $q$ for which such a code exists, and find that minimum for small values of $t$. Keywords: diameter perfect codes, anticodes, constant-weight codes, code--anticode bound, Steiner systems. △ Less

Submitted 31 July, 2023; v1 submitted 30 November, 2022; originally announced December 2022.

Comments: v2: revised, accepted version

MSC Class: 94B25; 05B05

Journal ref: J. Comb. Theory, Ser. A 200 2023, 105790

arXiv:2211.11147 [pdf, ps, other]

Optimal quaternary linear codes with one-dimensional Hermitian hull and the related EAQECCs

Authors: Shitao Li, Minjia Shi, Huizhou Liu

Abstract: Linear codes with small hulls over finite fields have been extensively studied due to their practical applications in computational complexity and information protection. In this paper, we develop a general method to determine the exact value of $D_4^H(n,k,1)$ for $n\leq 12$ or $k\in \{1,2,3,n-1,n-2,n-3\}$, where $D_4^H(n,k,1)$ denotes the largest minimum distance among all quaternary linear… ▽ More Linear codes with small hulls over finite fields have been extensively studied due to their practical applications in computational complexity and information protection. In this paper, we develop a general method to determine the exact value of $D_4^H(n,k,1)$ for $n\leq 12$ or $k\in \{1,2,3,n-1,n-2,n-3\}$, where $D_4^H(n,k,1)$ denotes the largest minimum distance among all quaternary linear $[n,k]$ codes with one-dimensional Hermitian hull. As a consequence, we solve a conjecture proposed by Mankean and Jitman on the largest minimum distance of a quaternary linear code with one-dimensional Hermitian hull. As an application, we construct some binary entanglement-assisted quantum error-correcting codes (EAQECCs) from quaternary linear codes with one-dimensional Hermitian hull. Some of these EAQECCs are optimal codes, and some of them are better than previously known ones. △ Less

Submitted 26 November, 2022; v1 submitted 20 November, 2022; originally announced November 2022.

Comments: arXiv admin note: text overlap with arXiv:2211.02480

MSC Class: 94B05; 15B05; 12E10

arXiv:2211.10684 [pdf, other]

Personalized Federated Learning with Hidden Information on Personalized Prior

Authors: Mingjia Shi, Yuhao Zhou, Qing Ye, Jiancheng Lv

Abstract: Federated learning (FL for simplification) is a distributed machine learning technique that utilizes global servers and collaborative clients to achieve privacy-preserving global model training without direct data sharing. However, heterogeneous data problem, as one of FL's main problems, makes it difficult for the global model to perform effectively on each client's local data. Thus, personalized… ▽ More Federated learning (FL for simplification) is a distributed machine learning technique that utilizes global servers and collaborative clients to achieve privacy-preserving global model training without direct data sharing. However, heterogeneous data problem, as one of FL's main problems, makes it difficult for the global model to perform effectively on each client's local data. Thus, personalized federated learning (PFL for simplification) aims to improve the performance of the model on local data as much as possible. Bayesian learning, where the parameters of the model are seen as random variables with a prior assumption, is a feasible solution to the heterogeneous data problem due to the tendency that the more local data the model use, the more it focuses on the local data, otherwise focuses on the prior. When Bayesian learning is applied to PFL, the global model provides global knowledge as a prior to the local training process. In this paper, we employ Bayesian learning to model PFL by assuming a prior in the scaled exponential family, and therefore propose pFedBreD, a framework to solve the problem we model using Bregman divergence regularization. Empirically, our experiments show that, under the prior assumption of the spherical Gaussian and the first order strategy of mean selection, our proposal significantly outcompetes other PFL algorithms on multiple public benchmarks. △ Less

Submitted 24 November, 2022; v1 submitted 19 November, 2022; originally announced November 2022.

Comments: 19 pages, 6 figures, 3 tables

ACM Class: G.3; I.2.11

arXiv:2211.10105 [pdf, other]

$α$ DARTS Once More: Enhancing Differentiable Architecture Search by Masked Image Modeling

Authors: Bicheng Guo, Shuxuan Guo, Miao**g Shi, Peng Chen, Shibo He, Jiming Chen, Kaicheng Yu

Abstract: Differentiable architecture search (DARTS) has been a mainstream direction in automatic machine learning. Since the discovery that original DARTS will inevitably converge to poor architectures, recent works alleviate this by either designing rule-based architecture selection techniques or incorporating complex regularization techniques, abandoning the simplicity of the original DARTS that selects… ▽ More Differentiable architecture search (DARTS) has been a mainstream direction in automatic machine learning. Since the discovery that original DARTS will inevitably converge to poor architectures, recent works alleviate this by either designing rule-based architecture selection techniques or incorporating complex regularization techniques, abandoning the simplicity of the original DARTS that selects architectures based on the largest parametric value, namely $α$. Moreover, we find that all the previous attempts only rely on classification labels, hence learning only single modal information and limiting the representation power of the shared network. To this end, we propose to additionally inject semantic information by formulating a patch recovery approach. Specifically, we exploit the recent trending masked image modeling and do not abandon the guidance from the downstream tasks during the search phase. Our method surpasses all previous DARTS variants and achieves state-of-the-art results on CIFAR-10, CIFAR-100, and ImageNet without complex manual-designed strategies. △ Less

Submitted 18 November, 2022; originally announced November 2022.

arXiv:2211.05781 [pdf, other]

Demystify Transformers & Convolutions in Modern Image Deep Networks

Authors: Xiaowei Hu, Min Shi, Weiyun Wang, Sitong Wu, Linjie Xing, Wenhai Wang, Xizhou Zhu, Lewei Lu, Jie Zhou, Xiaogang Wang, Yu Qiao, Jifeng Dai

Abstract: Vision transformers have gained popularity recently, leading to the development of new vision backbones with improved features and consistent performance gains. However, these advancements are not solely attributable to novel feature transformation designs; certain benefits also arise from advanced network-level and block-level architectures. This paper aims to identify the real gains of popular c… ▽ More Vision transformers have gained popularity recently, leading to the development of new vision backbones with improved features and consistent performance gains. However, these advancements are not solely attributable to novel feature transformation designs; certain benefits also arise from advanced network-level and block-level architectures. This paper aims to identify the real gains of popular convolution and attention operators through a detailed study. We find that the key difference among these feature transformation modules, such as attention or convolution, lies in their spatial feature aggregation approach, known as the "spatial token mixer" (STM). To facilitate an impartial comparison, we introduce a unified architecture to neutralize the impact of divergent network-level and block-level designs. Subsequently, various STMs are integrated into this unified framework for comprehensive comparative analysis. Our experiments on various tasks and an analysis of inductive bias show a significant performance boost due to advanced network-level and block-level designs, but performance differences persist among different STMs. Our detailed analysis also reveals various findings about different STMs, such as effective receptive fields and invariance tests. All models and codes used in this study are publicly available at \url{https://github.com/OpenGVLab/STM-Evaluation}. △ Less

Submitted 1 December, 2023; v1 submitted 10 November, 2022; originally announced November 2022.

arXiv:2211.03392 [pdf, ps, other]

A tight upper bound on the number of non-zero weights of a quasi-cyclic code

Authors: Xiaoxiao Li, Minjia Shi, San Ling

Abstract: Let $\mathcal{C}$ be a quasi-cyclic code of index $l(l\geq2)$. Let $G$ be the subgroup of the automorphism group of $\mathcal{C}$ generated by $ρ^l$ and the scalar multiplications of $\mathcal{C}$, where $ρ$ denotes the standard cyclic shift. In this paper, we find an explicit formula of orbits of $G$ on $\mathcal{C}\setminus \{\mathbf{0}\}$. Consequently, an explicit upper bound on the number of… ▽ More Let $\mathcal{C}$ be a quasi-cyclic code of index $l(l\geq2)$. Let $G$ be the subgroup of the automorphism group of $\mathcal{C}$ generated by $ρ^l$ and the scalar multiplications of $\mathcal{C}$, where $ρ$ denotes the standard cyclic shift. In this paper, we find an explicit formula of orbits of $G$ on $\mathcal{C}\setminus \{\mathbf{0}\}$. Consequently, an explicit upper bound on the number of nonzero weights of $\mathcal{C}$ is immediately derived and a necessary and sufficient condition for codes meeting the bound is exhibited. If $\mathcal{C}$ is a one-generator quasi-cyclic code, a tighter upper bound on the number of nonzero weights of $\mathcal{C}$ is obtained by considering a larger automorphism subgroup which is generated by the multiplier, $ρ^l$ and the scalar multiplications of $\mathcal{C}$. In particular, we list some examples to show the bounds are tight. Our main result improves and generalizes some of the results in \cite{M2}. △ Less

Submitted 6 November, 2023; v1 submitted 7 November, 2022; originally announced November 2022.

arXiv:2211.02480 [pdf, ps, other]

Characterization and construction of optimal binary linear codes with one-dimensional hull

Authors: Shitao Li, Minjia Shi, Jon-Lark Kim

Abstract: The hull of a linear code over finite fields is the intersection of the code and its dual, and linear codes with small hulls have applications in computational complexity and information protection. Linear codes with the smallest hull are LCD codes, which have been widely studied. Recently, several papers were devoted to related LCD codes over finite fields with size greater than 3 to linear codes… ▽ More The hull of a linear code over finite fields is the intersection of the code and its dual, and linear codes with small hulls have applications in computational complexity and information protection. Linear codes with the smallest hull are LCD codes, which have been widely studied. Recently, several papers were devoted to related LCD codes over finite fields with size greater than 3 to linear codes with one-dimensional or higher dimensional hull. Therefore, an interesting and non-trivial problem is to study binary linear codes with one-dimensional hull with connection to binary LCD codes. The objective of this paper is to study some properties of binary linear codes with one-dimensional hull, and establish their relation with binary LCD codes. Some interesting inequalities are thus obtained. Using such a characterization, we study the largest minimum distance $d_{one}(n,k)$ among all binary linear $[n,k]$ codes with one-dimensional hull. We determine the largest minimum distances $d_{one}(n,n-k)$ for $ k\leq 5$ and $d_{one}(n,k)$ for $k\leq 4$ or $14\leq n\leq 24$. We partially determine the exact value of $d_{one}(n,k)$ for $k=5$ or $25\leq n\leq 30$. △ Less

Submitted 7 June, 2023; v1 submitted 4 November, 2022; originally announced November 2022.

MSC Class: 94B05; 15B05; 12E10

arXiv:2211.01824 [pdf, other]

Human in the loop approaches in multi-modal conversational task guidance system development

Authors: Ramesh Manuvinakurike, Sovan Biswas, Giuseppe Raffa, Richard Beckwith, Anthony Rhodes, Meng Shi, Gesem Gudino Mejia, Saurav Sahay, Lama Nachman

Abstract: Development of task guidance systems for aiding humans in a situated task remains a challenging problem. The role of search (information retrieval) and conversational systems for task guidance has immense potential to help the task performers achieve various goals. However, there are several technical challenges that need to be addressed to deliver such conversational systems, where common supervi… ▽ More Development of task guidance systems for aiding humans in a situated task remains a challenging problem. The role of search (information retrieval) and conversational systems for task guidance has immense potential to help the task performers achieve various goals. However, there are several technical challenges that need to be addressed to deliver such conversational systems, where common supervised approaches fail to deliver the expected results in terms of overall performance, user experience and adaptation to realistic conditions. In this preliminary work we first highlight some of the challenges involved during the development of such systems. We then provide an overview of existing datasets available and highlight their limitations. We finally develop a model-in-the-loop wizard-of-oz based data collection tool and perform a pilot experiment. △ Less

Submitted 3 November, 2022; originally announced November 2022.

Comments: SCAI @ SIGIR

arXiv:2211.00511 [pdf, other]

A Comparative Study on Multichannel Speaker-Attributed Automatic Speech Recognition in Multi-party Meetings

Authors: Mohan Shi, Jie Zhang, Zhihao Du, Fan Yu, Qian Chen, Shiliang Zhang, Li-Rong Dai

Abstract: Speaker-attributed automatic speech recognition (SA-ASR) in multi-party meeting scenarios is one of the most valuable and challenging ASR task. It was shown that single-channel frame-level diarization with serialized output training (SC-FD-SOT), single-channel word-level diarization with SOT (SC-WD-SOT) and joint training of single-channel target-speaker separation and ASR (SC-TS-ASR) can be explo… ▽ More Speaker-attributed automatic speech recognition (SA-ASR) in multi-party meeting scenarios is one of the most valuable and challenging ASR task. It was shown that single-channel frame-level diarization with serialized output training (SC-FD-SOT), single-channel word-level diarization with SOT (SC-WD-SOT) and joint training of single-channel target-speaker separation and ASR (SC-TS-ASR) can be exploited to partially solve this problem. In this paper, we propose three corresponding multichannel (MC) SA-ASR approaches, namely MC-FD-SOT, MC-WD-SOT and MC-TS-ASR. For different tasks/models, different multichannel data fusion strategies are considered, including channel-level cross-channel attention for MC-FD-SOT, frame-level cross-channel attention for MC-WD-SOT and neural beamforming for MC-TS-ASR. Results on the AliMeeting corpus reveal that our proposed models can consistently outperform the corresponding single-channel counterparts in terms of the speaker-dependent character error rate. △ Less

Submitted 1 March, 2023; v1 submitted 1 November, 2022; originally announced November 2022.

arXiv:2211.00298 [pdf, ps, other]

doi 10.1002/jcd.21931

Constructing MRD codes by switching

Authors: Minjia Shi, Denis S. Krotov, Ferruh Özbudak

Abstract: MRD codes are maximum codes in the rank-distance metric space on $m$-by-$n$ matrices over the finite field of order $q$. They are diameter perfect and have the cardinality $q^{m(n-d+1)}$ if $m\ge n$. We define switching in MRD codes as replacing special MRD subcodes by other subcodes with the same parameters. We consider constructions of MRD codes admitting such switching, including punctured twis… ▽ More MRD codes are maximum codes in the rank-distance metric space on $m$-by-$n$ matrices over the finite field of order $q$. They are diameter perfect and have the cardinality $q^{m(n-d+1)}$ if $m\ge n$. We define switching in MRD codes as replacing special MRD subcodes by other subcodes with the same parameters. We consider constructions of MRD codes admitting such switching, including punctured twisted Gabidulin codes and direct-product codes. Using switching, we construct a huge class of MRD codes whose cardinality grows doubly exponentially in $m$ if the other parameters ($n$, $q$, the code distance) are fixed. Moreover, we construct MRD codes with different affine ranks and aperiodic MRD codes. Keywords: MRD codes, rank distance, bilinear forms graph, switching, diameter perfect codes △ Less

Submitted 1 November, 2022; originally announced November 2022.

MSC Class: 94B25

Journal ref: J. Comb. Des. 32(5) 2024, 219-237

arXiv:2210.16104 [pdf, ps, other]

doi 10.1093/mnrasl/slac139

Three-Dimensional Propagation of Kink Wave Trains in Solar Coronal Slabs

Authors: Bo Li, Mingzhe Guo, Hui Yu, Shao-Xia Chen, Mijie Shi

Abstract: Impulsively excited wave trains are of considerable interest in solar coronal seismology. To our knowledge, however, it remains to examine the three-dimensional (3D) dispersive propagation of impulsive kink waves in straight, field-aligned, symmetric, low-beta, slab equilibria that are structured only in one transverse direction. We offer a study here, starting with an analysis of linear oblique k… ▽ More Impulsively excited wave trains are of considerable interest in solar coronal seismology. To our knowledge, however, it remains to examine the three-dimensional (3D) dispersive propagation of impulsive kink waves in straight, field-aligned, symmetric, low-beta, slab equilibria that are structured only in one transverse direction. We offer a study here, starting with an analysis of linear oblique kink modes from an eigenvalue problem perspective. Two features are numerically found for continuous and step structuring alike, one being that the group and phase velocities may lie on opposite sides of the equilibrium magnetic field ($\vec{B}_0$), and the other being that the group trajectories extend only to a limited angle from $\vec{B}_0$. We justify these features by making analytical progress for the step structuring. More importantly, we demonstrate by a 3D time-dependent simulation that these features show up in the intricate interference patterns of kink wave trains that arise from a localized initial perturbation. In a plane perpendicular to the direction of inhomogeneity, the large-time slab-guided patterns are confined to a narrow sector about $\vec{B}_0$, with some wavefronts propagating toward $\vec{B}_0$. We conclude that the phase and group diagrams lay the necessary framework for understanding the complicated time-dependent behavior of impulsive waves. △ Less

Submitted 28 October, 2022; originally announced October 2022.

Comments: 5 pages, 4 figures, accepted for publication in MNRAS Letters

arXiv:2210.16091 [pdf, ps, other]

doi 10.3847/1538-4357/ac9d35

Oblique Quasi-Kink Modes in Solar Coronal Slabs Embedded in an Asymmetric Magnetic Environment: Resonant Dam**, Phase and Group Diagrams

Authors: Shao-Xia Chen, Bo Li, Mingzhe Guo, Mijie Shi, Hui Yu

Abstract: There has been considerable interest in magnetoacoustic waves in static, straight, field-aligned, one-dimensional equilibria where the exteriors of a magnetic slab are different between the two sides. We focus on trapped, transverse fundamental, oblique quasi-kink modes in pressureless setups where the density varies continuously from a uniform interior (with density $ρ_{\rm i}$) to a uniform exte… ▽ More There has been considerable interest in magnetoacoustic waves in static, straight, field-aligned, one-dimensional equilibria where the exteriors of a magnetic slab are different between the two sides. We focus on trapped, transverse fundamental, oblique quasi-kink modes in pressureless setups where the density varies continuously from a uniform interior (with density $ρ_{\rm i}$) to a uniform exterior on either side (with density $ρ_{\rm L}$ or $ρ_{\rm R}$), assuming $ρ_{\rm L}\leρ_{\rm R}\leρ_{\rm i}$. The continuous structuring and oblique propagation make our study new relative to pertinent studies, and lead to wave dam** via the Alfv$\acute{\rm e}$n resonance. We compute resonantly damped quasi-kink modes as resistive eigenmodes, and isolate the effects of system asymmetry by varying $ρ_{\rm i}/ρ_{\rm R}$ from the ``Fully Symmetric'' ($ρ_{\rm i}/ρ_{\rm R}=ρ_{\rm i}/ρ_{\rm L}$) to the ``Fully Asymmetric'' limit ($ρ_{\rm i}/ρ_{\rm R}=1$). We find that the dam** rates possess a nonmonotonic $ρ_{\rm i}/ρ_{\rm R}$-dependence as a result of the difference between the two Alfv$\acute{\rm e}$n continua, and resonant absorption occurs only in one continuum when $ρ_{\rm i}/ρ_{\rm R}$ is below some threshold. We also find that the system asymmetry results in two qualitatively different regimes for the phase and group diagrams. The phase and group trajectories lie essentially on the same side (different sides) relative to the equilibrium magnetic field when the configuration is not far from a ``Fully Asymmetric'' (``Fully Symmetric'') one. Our numerical results are understood by making analytical progress in the thin-boundary limit, and discussed for imaging observations of axial standing modes and impulsively excited wavetrains. △ Less

Submitted 28 October, 2022; originally announced October 2022.

Comments: 31 pages (single column), 8 figures, accepted for publication in ApJ

arXiv:2210.15401 [pdf, other]

Facial Video-based Remote Physiological Measurement via Self-supervised Learning

Authors: Zijie Yue, Miao**g Shi, Shuai Ding

Abstract: Facial video-based remote physiological measurement aims to estimate remote photoplethysmography (rPPG) signals from human face videos and then measure multiple vital signs (e.g. heart rate, respiration frequency) from rPPG signals. Recent approaches achieve it by training deep neural networks, which normally require abundant facial videos and synchronously recorded photoplethysmography (PPG) sign… ▽ More Facial video-based remote physiological measurement aims to estimate remote photoplethysmography (rPPG) signals from human face videos and then measure multiple vital signs (e.g. heart rate, respiration frequency) from rPPG signals. Recent approaches achieve it by training deep neural networks, which normally require abundant facial videos and synchronously recorded photoplethysmography (PPG) signals for supervision. However, the collection of these annotated corpora is not easy in practice. In this paper, we introduce a novel frequency-inspired self-supervised framework that learns to estimate rPPG signals from facial videos without the need of ground truth PPG signals. Given a video sample, we first augment it into multiple positive/negative samples which contain similar/dissimilar signal frequencies to the original one. Specifically, positive samples are generated using spatial augmentation. Negative samples are generated via a learnable frequency augmentation module, which performs non-linear signal frequency transformation on the input without excessively changing its visual appearance. Next, we introduce a local rPPG expert aggregation module to estimate rPPG signals from augmented samples. It encodes complementary pulsation information from different face regions and aggregate them into one rPPG prediction. Finally, we propose a series of frequency-inspired losses, i.e. frequency contrastive loss, frequency ratio consistency loss, and cross-video frequency agreement loss, for the optimization of estimated rPPG signals from multiple augmented video samples and across temporally neighboring video samples. We conduct rPPG-based heart rate, heart rate variability and respiration frequency estimation on four standard benchmarks. The experimental results demonstrate that our method improves the state of the art by a large margin. △ Less

Submitted 22 July, 2023; v1 submitted 27 October, 2022; originally announced October 2022.

Comments: IEEE Transactions on Pattern Analysis and Machine Intelligence

arXiv:2210.12092 [pdf, ps, other]

Cyclic codes from low differentially uniform functions

Authors: Sihem Mesnager, Minjia Shi, Hongwei Zhu

Abstract: Cyclic codes have many applications in consumer electronics, communication and data storage systems due to their efficient encoding and decoding algorithms. An efficient approach to constructing cyclic codes is the sequence approach. In their articles [Discrete Math. 321, 2014] and [SIAM J. Discrete Math. 27(4), 2013], Ding and Zhou constructed several classes of cyclic codes from almost perfect n… ▽ More Cyclic codes have many applications in consumer electronics, communication and data storage systems due to their efficient encoding and decoding algorithms. An efficient approach to constructing cyclic codes is the sequence approach. In their articles [Discrete Math. 321, 2014] and [SIAM J. Discrete Math. 27(4), 2013], Ding and Zhou constructed several classes of cyclic codes from almost perfect nonlinear (APN) functions and planar functions over finite fields and presented some open problems on cyclic codes from highly nonlinear functions. This article focuses on these exciting works by investigating new insights in this research direction. Specifically, its objective is twofold. The first is to provide a complement with some former results and present correct proofs and statements on some known ones on the cyclic codes from the APN functions. The second is studying the cyclic codes from some known functions processing low differential uniformity. Along with this article, we shall provide answers to some open problems presented in the literature. The first one concerns Open Problem 1, proposed by Ding and Zhou in Discrete Math. 321, 2014. The two others are Open Problems 5.16 and 5.25, raised by Ding in [SIAM J. Discrete Math. 27(4), 2013]. △ Less

Submitted 21 October, 2022; originally announced October 2022.

MSC Class: 94 B15; 94 B05; 94 A55; 11B83

arXiv:2210.07153 [pdf, other]

Directed Acoustic Assembly in 3D

Authors: Kai Melde, Minghui Shi, Heiner Kremer, Senne Seneca, Christoph Frey, Ilia Platzman, Christian Degel, Daniel Schmitt, Bernhard Schölkopf, Peer Fischer

Abstract: The creation of whole 3D objects in one shot is an ultimate goal for rapid prototy**, most notably biofabrication, where conventional methods are typically slow and apply mechanical or chemical stress on biological cells. Here, we demonstrate one-step assembly of matter to form compact 3D shapes using acoustic forces, which is enabled by the superposition of multiple holographic fields. The tech… ▽ More The creation of whole 3D objects in one shot is an ultimate goal for rapid prototy**, most notably biofabrication, where conventional methods are typically slow and apply mechanical or chemical stress on biological cells. Here, we demonstrate one-step assembly of matter to form compact 3D shapes using acoustic forces, which is enabled by the superposition of multiple holographic fields. The technique is contactless and shown to work with solid microparticles, hydrogel beads and biological cells inside standard labware. The structures can be fixed via gelation of the surrounding medium. In contrast to previous work, this approach handles matter with positive acoustic contrast and does not require opposing waves, supporting surfaces or scaffolds. We envision promising applications in tissue engineering and additive manufacturing. △ Less

Submitted 13 October, 2022; originally announced October 2022.

arXiv:2210.06241 [pdf, ps, other]

Two conjectures on the largest minimum distances of binary self-orthogonal codes with dimension 5

Authors: Minjia Shi, Shitao Li, Jon-Lark Kim

Abstract: The purpose of this paper is to solve the two conjectures on the largest minimum distance $d_{so}(n,5)$ of a binary self-orthogonal $[n,5]$ code proposed by Kim and Choi (IEEE Trans. Inf. Theory, 2022). The determination of $d_{so}(n,k)$ has been a fundamental and difficult problem in coding theory because there are too many binary self-orthogonal codes as the dimension $k$ increases. Recently, Ki… ▽ More The purpose of this paper is to solve the two conjectures on the largest minimum distance $d_{so}(n,5)$ of a binary self-orthogonal $[n,5]$ code proposed by Kim and Choi (IEEE Trans. Inf. Theory, 2022). The determination of $d_{so}(n,k)$ has been a fundamental and difficult problem in coding theory because there are too many binary self-orthogonal codes as the dimension $k$ increases. Recently, Kim et al. (2021) considered the shortest self-orthogonal embedding of a binary linear code, and many binary optimal self-orthogonal $[n,k]$ codes were constructed for $k=4,5$. Kim and Choi (2022) improved some results of Kim et al. (2021) and made two conjectures on $d_{so}(n,5)$. In this paper, we develop a general method to determine the exact value of $d_{so}(n,k)$ for $k=5,6$ and show that the two conjectures made by Kim and Choi (2022) are true. △ Less

Submitted 12 October, 2022; originally announced October 2022.

MSC Class: 94B05

arXiv:2209.07176 [pdf, other]

doi 10.3847/2041-8213/ac91d4

Influence of fine structures on gyrosynchrotron emission of flare loops modulated by sausage modes

Authors: Mijie Shi, Bo Li, Mingzhe Guo

Abstract: Sausage modes are one leading mechanism for interpreting short period quasi-periodic pulsations (QPPs) of solar flares. Forward modeling their radio emission is crucial for identifying sausage modes observationally and for understanding their connections with QPPs. Using the numerical output from three-dimensional magnetohydrodynamic (MHD) simulations, we forward model the gyrosynchrotron (GS) emi… ▽ More Sausage modes are one leading mechanism for interpreting short period quasi-periodic pulsations (QPPs) of solar flares. Forward modeling their radio emission is crucial for identifying sausage modes observationally and for understanding their connections with QPPs. Using the numerical output from three-dimensional magnetohydrodynamic (MHD) simulations, we forward model the gyrosynchrotron (GS) emission of flare loops modulated by sausage modes and examine the influence of loop fine structures. The temporal evolution of the emission intensity is analyzed for an oblique line of sight crossing the loop center. We find that the low- and high-frequency intensities oscillate in-phase at the period of sausage modes for models with or without fine structures. For low-frequency emissions where the optically thick regime arises, the modulation magnitude of the intensity is dramatically reduced by the fine structures at some viewing angles. On the contrary, for high-frequency emissions where the optically thin regime holds, the effect of fine structures or viewing angle is marginal. Our results show that the periodic intensity variations of sausage modes are not wiped out by the fine structures, and sausage modes remains a promising candidate mechanism for QPPs even when flare loops are fine-structured. △ Less

Submitted 15 September, 2022; originally announced September 2022.

Comments: Accepted for publication in ApJ Letters

arXiv:2209.06003 [pdf, ps, other]

Decompositions of Local mixed Morrey-type spaces and Application

Authors: Mingwei Shi, Jiang Zhou

Abstract: In this paper, we obtain predual spaces of local mixed Morrey-type spaces, characterize mixed Hardy local Morrey-type spaces. Further also, investigate nonsmooth decomposition of local mixed Morrey-type spaces. As an application, we consider the Hardy operators on local mixed Morrey-type spaces. In this paper, we obtain predual spaces of local mixed Morrey-type spaces, characterize mixed Hardy local Morrey-type spaces. Further also, investigate nonsmooth decomposition of local mixed Morrey-type spaces. As an application, we consider the Hardy operators on local mixed Morrey-type spaces. △ Less

Submitted 13 September, 2022; originally announced September 2022.

Comments: arXiv admin note: substantial text overlap with arXiv:2209.03861

arXiv:2209.03861 [pdf, ps, other]

The local Morrey-type space Associated with Ball Quasi-Banach Function Spaces and Application

Authors: Mingwei Shi, Jiang Zhou

Abstract: In this paper, we define for the first time the local Morrey-type space associated with ball quasi-Banach function spaces and show the related series of properties. In addition, Hardy-Littlewood maximal operator's boundedness is proved. We investigate nonsmooth decomposition of the local Morrey-type space associated with ball quasi-Banach function spaces via the Hardy local Morrey-type spaces asso… ▽ More In this paper, we define for the first time the local Morrey-type space associated with ball quasi-Banach function spaces and show the related series of properties. In addition, Hardy-Littlewood maximal operator's boundedness is proved. We investigate nonsmooth decomposition of the local Morrey-type space associated with ball quasi-Banach function spaces via the Hardy local Morrey-type spaces associated with ball quasi-Banach function spaces. And we consider Hardy operator's boundedness. △ Less

Submitted 8 September, 2022; originally announced September 2022.

arXiv:2209.01627 [pdf, other]

A systematic study of race and sex bias in CNN-based cardiac MR segmentation

Authors: Tiarna Lee, Esther Puyol-Anton, Bram Ruijsink, Miao**g Shi, Andrew P. King

Abstract: In computer vision there has been significant research interest in assessing potential demographic bias in deep learning models. One of the main causes of such bias is imbalance in the training data. In medical imaging, where the potential impact of bias is arguably much greater, there has been less interest. In medical imaging pipelines, segmentation of structures of interest plays an important r… ▽ More In computer vision there has been significant research interest in assessing potential demographic bias in deep learning models. One of the main causes of such bias is imbalance in the training data. In medical imaging, where the potential impact of bias is arguably much greater, there has been less interest. In medical imaging pipelines, segmentation of structures of interest plays an important role in estimating clinical biomarkers that are subsequently used to inform patient management. Convolutional neural networks (CNNs) are starting to be used to automate this process. We present the first systematic study of the impact of training set imbalance on race and sex bias in CNN-based segmentation. We focus on segmentation of the structures of the heart from short axis cine cardiac magnetic resonance images, and train multiple CNN segmentation models with different levels of race/sex imbalance. We find no significant bias in the sex experiment but significant bias in two separate race experiments, highlighting the need to consider adequate representation of different demographic groups in health datasets. △ Less

Submitted 4 September, 2022; originally announced September 2022.

arXiv:2209.00773 [pdf, other]

Artifact-Tolerant Clustering-Guided Contrastive Embedding Learning for Ophthalmic Images

Authors: Min Shi, Anagha Lokhande, Mojtaba S. Fazli, Vishal Sharma, Yu Tian, Yan Luo, Louis R. Pasquale, Tobias Elze, Michael V. Boland, Nazlee Zebardast, David S. Friedman, Lucy Q. Shen, Mengyu Wang

Abstract: Ophthalmic images and derivatives such as the retinal nerve fiber layer (RNFL) thickness map are crucial for detecting and monitoring ophthalmic diseases (e.g., glaucoma). For computer-aided diagnosis of eye diseases, the key technique is to automatically extract meaningful features from ophthalmic images that can reveal the biomarkers (e.g., RNFL thinning patterns) linked to functional vision los… ▽ More Ophthalmic images and derivatives such as the retinal nerve fiber layer (RNFL) thickness map are crucial for detecting and monitoring ophthalmic diseases (e.g., glaucoma). For computer-aided diagnosis of eye diseases, the key technique is to automatically extract meaningful features from ophthalmic images that can reveal the biomarkers (e.g., RNFL thinning patterns) linked to functional vision loss. However, representation learning from ophthalmic images that links structural retinal damage with human vision loss is non-trivial mostly due to large anatomical variations between patients. The task becomes even more challenging in the presence of image artifacts, which are common due to issues with image acquisition and automated segmentation. In this paper, we propose an artifact-tolerant unsupervised learning framework termed EyeLearn for learning representations of ophthalmic images. EyeLearn has an artifact correction module to learn representations that can best predict artifact-free ophthalmic images. In addition, EyeLearn adopts a clustering-guided contrastive learning strategy to explicitly capture the intra- and inter-image affinities. During training, images are dynamically organized in clusters to form contrastive samples in which images in the same or different clusters are encouraged to learn similar or dissimilar representations, respectively. To evaluate EyeLearn, we use the learned representations for visual field prediction and glaucoma detection using a real-world ophthalmic image dataset of glaucoma patients. Extensive experiments and comparisons with state-of-the-art methods verified the effectiveness of EyeLearn for learning optimal feature representations from ophthalmic images. △ Less

Submitted 1 September, 2022; originally announced September 2022.

Comments: 10 pages

arXiv:2208.13389 [pdf, ps, other]

Several classes of Galois self-orthogonal MDS codes and related applications

Authors: Yang Li, Yunfei Su, Shixin Zhu, Shitao Li, Minjia Shi

Abstract: Let $q=p^h$ be a prime power and $e$ be an integer with $0\leq e\leq h-1$. $e$-Galois self-orthogonal codes are generalizations of Euclidean self-orthogonal codes ($e=0$) and Hermitian self-orthogonal codes ($e=\frac{h}{2}$ and $h$ is even). In this paper, we propose two general methods to construct $e$-Galois self-orthogonal (extended) generalized Reed-Solomon (GRS) codes. As a consequence, eight… ▽ More Let $q=p^h$ be a prime power and $e$ be an integer with $0\leq e\leq h-1$. $e$-Galois self-orthogonal codes are generalizations of Euclidean self-orthogonal codes ($e=0$) and Hermitian self-orthogonal codes ($e=\frac{h}{2}$ and $h$ is even). In this paper, we propose two general methods to construct $e$-Galois self-orthogonal (extended) generalized Reed-Solomon (GRS) codes. As a consequence, eight new classes of $e$-Galois self-orthogonal (extended) GRS codes with odd $q$ and $2e\mid h$ are obtained. Based on the Galois dual of a code, we also study its punctured and shortened codes. As applications, new $e'$-Galois self-orthogonal maximum distance separable (MDS) codes for all possible $e'$ satisfying $0\leq e'\leq h-1$, new $e$-Galois self-orthogonal MDS codes via the shortened codes, and new MDS codes with prescribed dimensional $e$-Galois hull via the punctured codes are derived. Moreover, some new $\sqrt{q}$-ary quantum MDS codes with lengths greater than $\sqrt{q}+1$ and minimum distances greater than $\frac{\sqrt{q}}{2}+1$ are obtained. △ Less

Submitted 8 February, 2023; v1 submitted 29 August, 2022; originally announced August 2022.

Comments: 26 pages, 11 tables

MSC Class: 94B05; 15B05; 12E10

arXiv:2208.10893 [pdf]

doi 10.1088/2632-2153/aced7d

Transfer Learning Application of Self-supervised Learning in ARPES

Authors: Sandy Adhitia Ekahana, Genta Indra Winata, Y. Soh, Gabriel Aeppli, Radovic Milan, Ming Shi

Abstract: Recent development in angle-resolved photoemission spectroscopy (ARPES) technique involves spatially resolving samples while maintaining the high-resolution feature of momentum space. This development easily expands the data size and its complexity for data analysis, where one of it is to label similar dispersion cuts and map them spatially. In this work, we demonstrate that the recent development… ▽ More Recent development in angle-resolved photoemission spectroscopy (ARPES) technique involves spatially resolving samples while maintaining the high-resolution feature of momentum space. This development easily expands the data size and its complexity for data analysis, where one of it is to label similar dispersion cuts and map them spatially. In this work, we demonstrate that the recent development in representational learning (self-supervised learning) model combined with k-means clustering can help automate that part of data analysis and save precious time, albeit with low performance. Finally, we introduce a few-shot learning (k-nearest neighbour or kNN) in representational space where we selectively choose one (k=1) image reference for each known label and subsequently label the rest of the data with respect to the nearest reference image. This last approach demonstrates the strength of the self-supervised learning to automate the image analysis in ARPES in particular and can be generalized into any science data analysis that heavily involves image data. △ Less

Submitted 23 August, 2022; originally announced August 2022.

arXiv:2208.05246 [pdf]

Reconstruction of low dimensional electronic states by altering the chemical arrangement at the SrTiO3 surface

Authors: Hang Li, Walber H. Brito, Eduardo B. Guedes, Alla Chikina, Rasmus T. Dahm, Dennis V. Christensen, Shinhee Yun, Francesco M. Chiabrera, Nicholas C. Plumb, Ming Shi, Nini Pryds, Milan Radovic

Abstract: Develo** reliable methods for modulating the electronic structure of the two-dimensional electron gas (2DEG) in SrTiO3 is crucial for utilizing its full potential and inducing novel properties. Here, we show that relatively simple surface preparation reconstructs the 2DEG of SrTiO3 (STO) surface, leading to a Lifshitz-like transition. Combining experimental methods, such as angle-resolved photoe… ▽ More Develo** reliable methods for modulating the electronic structure of the two-dimensional electron gas (2DEG) in SrTiO3 is crucial for utilizing its full potential and inducing novel properties. Here, we show that relatively simple surface preparation reconstructs the 2DEG of SrTiO3 (STO) surface, leading to a Lifshitz-like transition. Combining experimental methods, such as angle-resolved photoemission spectroscopy (ARPES) and X-ray photoemission spectroscopy (XPS) with ab initio calculations, we find that the modulation of the surface band structures is primarily attributed to the reorganization of the chemical composition. In addition, ARPES experiments demonstrate that vacuum ultraviolet (VUV) light can be efficiently employed to alter the band renormalization of the 2DEG system and control the electron-phonon interaction (EPI). Our study provides a robust and straightforward route to stabilize and tune the low-dimensional electronic structure via the chemical degeneracy of the STO surface. △ Less

Submitted 10 August, 2022; originally announced August 2022.

arXiv:2208.02894 [pdf, other]

doi 10.1109/TIP.2023.3289290

Redesigning Multi-Scale Neural Network for Crowd Counting

Authors: Zhipeng Du, Miao**g Shi, Jiankang Deng, Stefanos Zafeiriou

Abstract: Perspective distortions and crowd variations make crowd counting a challenging task in computer vision. To tackle it, many previous works have used multi-scale architecture in deep neural networks (DNNs). Multi-scale branches can be either directly merged (e.g. by concatenation) or merged through the guidance of proxies (e.g. attentions) in the DNNs. Despite their prevalence, these combination met… ▽ More Perspective distortions and crowd variations make crowd counting a challenging task in computer vision. To tackle it, many previous works have used multi-scale architecture in deep neural networks (DNNs). Multi-scale branches can be either directly merged (e.g. by concatenation) or merged through the guidance of proxies (e.g. attentions) in the DNNs. Despite their prevalence, these combination methods are not sophisticated enough to deal with the per-pixel performance discrepancy over multi-scale density maps. In this work, we redesign the multi-scale neural network by introducing a hierarchical mixture of density experts, which hierarchically merges multi-scale density maps for crowd counting. Within the hierarchical structure, an expert competition and collaboration scheme is presented to encourage contributions from all scales; pixel-wise soft gating nets are introduced to provide pixel-wise soft weights for scale combinations in different hierarchies. The network is optimized using both the crowd density map and the local counting map, where the latter is obtained by local integration on the former. Optimizing both can be problematic because of their potential conflicts. We introduce a new relative local counting loss based on relative count differences among hard-predicted local regions in an image, which proves to be complementary to the conventional absolute error loss on the density map. Experiments show that our method achieves the state-of-the-art performance on five public datasets, i.e. ShanghaiTech, UCF_CC_50, JHU-CROWD++, NWPU-Crowd and Trancos. △ Less

Submitted 3 July, 2023; v1 submitted 4 August, 2022; originally announced August 2022.

Comments: IEEE Transactions on Image Processing

arXiv:2208.00439 [pdf, other]

Design What You Desire: Icon Generation from Orthogonal Application and Theme Labels

Authors: Yinpeng Chen, Zhiyu Pan, Min Shi, Hao Lu, Zhiguo Cao, Weicai Zhong

Abstract: Generative adversarial networks (GANs) have been trained to be professional artists able to create stunning artworks such as face generation and image style transfer. In this paper, we focus on a realistic business scenario: automated generation of customizable icons given desired mobile applications and theme styles. We first introduce a theme-application icon dataset, termed AppIcon, where each… ▽ More Generative adversarial networks (GANs) have been trained to be professional artists able to create stunning artworks such as face generation and image style transfer. In this paper, we focus on a realistic business scenario: automated generation of customizable icons given desired mobile applications and theme styles. We first introduce a theme-application icon dataset, termed AppIcon, where each icon has two orthogonal theme and app labels. By investigating a strong baseline StyleGAN2, we observe mode collapse caused by the entanglement of the orthogonal labels. To solve this challenge, we propose IconGAN composed of a conditional generator and dual discriminators with orthogonal augmentations, and a contrastive feature disentanglement strategy is further designed to regularize the feature space of the two discriminators. Compared with other approaches, IconGAN indicates a superior advantage on the AppIcon benchmark. Further analysis also justifies the effectiveness of disentangling app and theme representations. Our project will be released at: https://github.com/architect-road/IconGAN. △ Less

Submitted 31 July, 2022; originally announced August 2022.

Comments: 10 pages, 12 figures

arXiv:2207.12613 [pdf, ps, other]

Rank and pairs of Rank and Dimension of Kernel of $\mathbb{Z}_p\mathbb{Z}_{p^2}$-linear codes

Authors: Xiaoxiao Li, Minjia Shi, Shukai Wang

Abstract: A code $C$ is called $\mathbb{Z}_p\mathbb{Z}_{p^2}$-linear if it is the Gray image of a $\mathbb{Z}_p\mathbb{Z}_{p^2}$-additive code. For any prime number $p$ larger than $3$, the bounds of the rank of $\mathbb{Z}_p\mathbb{Z}_{p^2}$-linear codes are given. For each value of the rank and the pairs of rank and the dimension of the kernel of $\mathbb{Z}_p\mathbb{Z}_{p^2}$-linear codes, we give detail… ▽ More A code $C$ is called $\mathbb{Z}_p\mathbb{Z}_{p^2}$-linear if it is the Gray image of a $\mathbb{Z}_p\mathbb{Z}_{p^2}$-additive code. For any prime number $p$ larger than $3$, the bounds of the rank of $\mathbb{Z}_p\mathbb{Z}_{p^2}$-linear codes are given. For each value of the rank and the pairs of rank and the dimension of the kernel of $\mathbb{Z}_p\mathbb{Z}_{p^2}$-linear codes, we give detailed construction of the corresponding codes. Finally, as an example, the rank and the dimension of the kernel of $\mathbb{Z}_5\mathbb{Z}_{25}$-linear codes are studied. △ Less

Submitted 25 July, 2022; originally announced July 2022.

Comments: arXiv admin note: substantial text overlap with arXiv:2205.13981

arXiv:2207.08960 [pdf, other]

Enhancing Space-time Video Super-resolution via Spatial-temporal Feature Interaction

Authors: Zijie Yue, Miao**g Shi

Abstract: The target of space-time video super-resolution (STVSR) is to increase both the frame rate (also referred to as the temporal resolution) and the spatial resolution of a given video. Recent approaches solve STVSR using end-to-end deep neural networks. A popular solution is to first increase the frame rate of the video; then perform feature refinement among different frame features; and last increas… ▽ More The target of space-time video super-resolution (STVSR) is to increase both the frame rate (also referred to as the temporal resolution) and the spatial resolution of a given video. Recent approaches solve STVSR using end-to-end deep neural networks. A popular solution is to first increase the frame rate of the video; then perform feature refinement among different frame features; and last increase the spatial resolutions of these features. The temporal correlation among features of different frames is carefully exploited in this process. The spatial correlation among features of different (spatial) resolutions, despite being also very important, is however not emphasized. In this paper, we propose a spatial-temporal feature interaction network to enhance STVSR by exploiting both spatial and temporal correlations among features of different frames and spatial resolutions. Specifically, the spatial-temporal frame interpolation module is introduced to interpolate low- and high-resolution intermediate frame features simultaneously and interactively. The spatial-temporal local and global refinement modules are respectively deployed afterwards to exploit the spatial-temporal correlation among different features for their refinement. Finally, a novel motion consistency loss is employed to enhance the motion continuity among reconstructed frames. We conduct experiments on three standard benchmarks, Vid4, Vimeo-90K and Adobe240, and the results demonstrate that our method improves the state of the art methods by a considerable margin. Our codes will be available at https://github.com/yuezijie/STINet-Space-time-Video-Super-resolution. △ Less

Submitted 20 April, 2023; v1 submitted 18 July, 2022; originally announced July 2022.

arXiv:2207.07372 [pdf, other]

3D Instances as 1D Kernels

Authors: Yizheng Wu, Min Shi, Shuaiyuan Du, Hao Lu, Zhiguo Cao, Weicai Zhong

Abstract: We introduce a 3D instance representation, termed instance kernels, where instances are represented by one-dimensional vectors that encode the semantic, positional, and shape information of 3D instances. We show that instance kernels enable easy mask inference by simply scanning kernels over the entire scenes, avoiding the heavy reliance on proposals or heuristic clustering algorithms in standard… ▽ More We introduce a 3D instance representation, termed instance kernels, where instances are represented by one-dimensional vectors that encode the semantic, positional, and shape information of 3D instances. We show that instance kernels enable easy mask inference by simply scanning kernels over the entire scenes, avoiding the heavy reliance on proposals or heuristic clustering algorithms in standard 3D instance segmentation pipelines. The idea of instance kernel is inspired by recent success of dynamic convolutions in 2D/3D instance segmentation. However, we find it non-trivial to represent 3D instances due to the disordered and unstructured nature of point cloud data, e.g., poor instance localization can significantly degrade instance representation. To remedy this, we construct a novel 3D instance encoding paradigm. First, potential instance centroids are localized as candidates. Then, a candidate merging scheme is devised to simultaneously aggregate duplicated candidates and collect context around the merged centroids to form the instance kernels. Once instance kernels are available, instance masks can be reconstructed via dynamic convolutions whose weights are conditioned on instance kernels. The whole pipeline is instantiated with a dynamic kernel network (DKNet). Results show that DKNet outperforms the state of the arts on both ScanNetV2 and S3DIS datasets with better instance localization. Code is available: https://github.com/W1zheng/DKNet. △ Less

Submitted 18 July, 2022; v1 submitted 15 July, 2022; originally announced July 2022.

Comments: Appearing in ECCV, 2022

arXiv:2207.07249 [pdf, other]

doi 10.1093/mnras/stac2006

Impulsively Generated Kink Wave Trains in Solar Coronal Slabs

Authors: Mingzhe Guo, Bo Li, Tom Van Doorsselaere, Mijie Shi

Abstract: We numerically follow the response of density-enhanced slabs to impulsive, localized, transverse velocity perturbations by working in the framework of ideal magnetohydrodynamics (MHD). Both linear and nonlinear regimes are addressed. Kink wave trains are seen to develop along the examined slabs, sharing the characteristics that more oscillatory patterns emerge with time and that the apparent wavel… ▽ More We numerically follow the response of density-enhanced slabs to impulsive, localized, transverse velocity perturbations by working in the framework of ideal magnetohydrodynamics (MHD). Both linear and nonlinear regimes are addressed. Kink wave trains are seen to develop along the examined slabs, sharing the characteristics that more oscillatory patterns emerge with time and that the apparent wavelength increases with distance at a given instant. Two features nonetheless arise due to nonlinearity, one being a density cavity close to the exciter and the other being the appearance of shocks both outside and inside the nominal slab. These features may be relevant for understanding the interaction between magnetic structures and such explosive events as coronal mass ejections. Our numerical findings on kink wave trains in solar coronal slabs are discussed in connection with typical measurements of streamer waves. △ Less

Submitted 14 July, 2022; originally announced July 2022.

Comments: 10 pages, 8 figures, accepted for publication in MNRAS

arXiv:2207.05807 [pdf, other]

Dam reservoir extraction from remote sensing imagery using tailored metric learning strategies

Authors: Arnout van Soesbergen, Zedong Chu, Miao**g Shi, Mark Mulligan

Abstract: Dam reservoirs play an important role in meeting sustainable development goals and global climate targets. However, particularly for small dam reservoirs, there is a lack of consistent data on their geographical location. To address this data gap, a promising approach is to perform automated dam reservoir extraction based on globally available remote sensing imagery. It can be considered as a fine… ▽ More Dam reservoirs play an important role in meeting sustainable development goals and global climate targets. However, particularly for small dam reservoirs, there is a lack of consistent data on their geographical location. To address this data gap, a promising approach is to perform automated dam reservoir extraction based on globally available remote sensing imagery. It can be considered as a fine-grained task of water body extraction, which involves extracting water areas in images and then separating dam reservoirs from natural water bodies. We propose a novel deep neural network (DNN) based pipeline that decomposes dam reservoir extraction into water body segmentation and dam reservoir recognition. Water bodies are firstly separated from background lands in a segmentation model and each individual water body is then predicted as either dam reservoir or natural water body in a classification model. For the former step, point-level metric learning with triplets across images is injected into the segmentation model to address contour ambiguities between water areas and land regions. For the latter step, prior-guided metric learning with triplets from clusters is injected into the classification model to optimize the image embedding space in a fine-grained level based on reservoir clusters. To facilitate future research, we establish a benchmark dataset with earth imagery data and human labelled reservoirs from river basins in West Africa and India. Extensive experiments were conducted on this benchmark in the water body segmentation task, dam reservoir recognition task, and the joint dam reservoir extraction task. Superior performance has been observed in the respective tasks when comparing our method with state of the art approaches. △ Less

Submitted 12 July, 2022; originally announced July 2022.

Comments: Accepted on IEEE Transactions on Geoscience and Remote Sensing

arXiv:2207.04229 [pdf]

doi 10.1364/BOE.471198

Spatiotemporal singular value decomposition for denoising in photoacoustic imaging with low-energy excitation light source

Authors: Mengjie Shi, Tom Vercauteren, Wenfeng Xia

Abstract: Photoacoustic (PA) imaging is an emerging hybrid imaging modality that combines rich optical spectroscopic contrast and high ultrasonic resolution and thus holds tremendous promise for a wide range of pre-clinical and clinical applications. Compact and affordable light sources such as light-emitting diodes (LEDs) and laser diodes (LDs) are promising alternatives to bulky and expensive solid-state… ▽ More Photoacoustic (PA) imaging is an emerging hybrid imaging modality that combines rich optical spectroscopic contrast and high ultrasonic resolution and thus holds tremendous promise for a wide range of pre-clinical and clinical applications. Compact and affordable light sources such as light-emitting diodes (LEDs) and laser diodes (LDs) are promising alternatives to bulky and expensive solid-state laser systems that are commonly used as PA light sources. These could accelerate the clinical translation of PA technology. However, PA signals generated with these light sources are readily degraded by noise due to the low optical fluence, leading to decreased signal-to-noise ratio (SNR) in PA images. In this work, a spatiotemporal singular value decomposition (SVD) based PA denoising method was investigated for these light sources that usually have low fluence and high repetition rates. The proposed method leverages both spatial and temporal correlations between radiofrequency (RF) data frames. Validation was performed on simulations and in vivo PA data acquired from human fingers (2D) and forearm (3D) using a LED-based system. Spatiotemporal SVD greatly enhanced the PA signals of blood vessels corrupted by noise while preserving a high temporal resolution to slow motions, improving the SNR of in vivo PA images by 1.1, 0.7, and 1.9 times compared to single frame-based wavelet denoising, averaging across 200 frames, and single frame without denoising, respectively. The proposed method demonstrated a processing time of around 50 \mus per frame with SVD acceleration and GPU. Thus, spatiotemporal SVD is well suited to PA imaging systems with low-energy excitation light sources for real-time in vivo applications. △ Less

Submitted 9 July, 2022; originally announced July 2022.

arXiv:2207.03824 [pdf, other]

Boosting Zero-shot Learning via Contrastive Optimization of Attribute Representations

Authors: Yu Du, Miao**g Shi, Fangyun Wei, Guoqi Li

Abstract: Zero-shot learning (ZSL) aims to recognize classes that do not have samples in the training set. One representative solution is to directly learn an embedding function associating visual features with corresponding class semantics for recognizing new classes. Many methods extend upon this solution, and recent ones are especially keen on extracting rich features from images, e.g. attribute features… ▽ More Zero-shot learning (ZSL) aims to recognize classes that do not have samples in the training set. One representative solution is to directly learn an embedding function associating visual features with corresponding class semantics for recognizing new classes. Many methods extend upon this solution, and recent ones are especially keen on extracting rich features from images, e.g. attribute features. These attribute features are normally extracted within each individual image; however, the common traits for features across images yet belonging to the same attribute are not emphasized. In this paper, we propose a new framework to boost ZSL by explicitly learning attribute prototypes beyond images and contrastively optimizing them with attribute-level features within images. Besides the novel architecture, two elements are highlighted for attribute representations: a new prototype generation module is designed to generate attribute prototypes from attribute semantics; a hard example-based contrastive optimization scheme is introduced to reinforce attribute-level features in the embedding space. We explore two alternative backbones, CNN-based and transformer-based, to build our framework and conduct experiments on three standard benchmarks, CUB, SUN, AwA2. Results on these benchmarks demonstrate that our method improves the state of the art by a considerable margin. Our codes will be available at https://github.com/dyabel/CoAR-ZSL.git △ Less

Submitted 18 July, 2023; v1 submitted 8 July, 2022; originally announced July 2022.

Comments: Accepted to TNNLS

arXiv:2207.01938 [pdf, ps, other]

Additive complementary dual codes over $\F_4$

Authors: Minjia Shi, Na Liu, Jon-Lark Kim, Patrick Solé

Abstract: A linear code is linear complementary dual (LCD) if it meets its dual trivially. LCD codes have been a hot topic recently due to Boolean masking application in the security of embarked electronics (Carlet and Guilley, 2014). Additive codes over $\F_4$ are $\F_4$-codes that are stable by codeword addition but not necessarily by scalar multiplication. An additive code over $\F_4$ is additive complem… ▽ More A linear code is linear complementary dual (LCD) if it meets its dual trivially. LCD codes have been a hot topic recently due to Boolean masking application in the security of embarked electronics (Carlet and Guilley, 2014). Additive codes over $\F_4$ are $\F_4$-codes that are stable by codeword addition but not necessarily by scalar multiplication. An additive code over $\F_4$ is additive complementary dual (ACD) if it meets its dual trivially. The aim of this research is to study such codes which meet their dual trivially. All the techniques and problems used to study LCD codes are potentially relevant to ACD codes. Interesting constructions of ACD codes from binary codes are given with respect to the trace Hermitian and trace Euclidean inner product. The former product is relevant to quantum codes. △ Less

Submitted 5 July, 2022; originally announced July 2022.

MSC Class: 94B05

arXiv:2207.01087 [pdf, ps, other]

Homogeneous mixed Herz-Morrey spaces and its Applications

Authors: Mingwei Shi, Jiang Zhou

Abstract: In this paper, we introduce homogeneous mixed Herz-Morrey spaces $M\dot{K}_{p,\vec{q}}^{α,λ}(\mathbb{R}^n)$ and show it's some properties. Firstly, the boundedness of sublinear operators, fractional type operators in homogeneous mixed Herz-Morrey spaces is investigated. In particular, the above results are still valid for Calder$\acute{o}$n-Zygmund operators and fractional maximal operators. Lastl… ▽ More In this paper, we introduce homogeneous mixed Herz-Morrey spaces $M\dot{K}_{p,\vec{q}}^{α,λ}(\mathbb{R}^n)$ and show it's some properties. Firstly, the boundedness of sublinear operators, fractional type operators in homogeneous mixed Herz-Morrey spaces is investigated. In particular, the above results are still valid for Calder$\acute{o}$n-Zygmund operators and fractional maximal operators. Lastly, the boundedness of their commutators in homogeneous mixed Herz-Morrey spaces is obtained. △ Less

Submitted 3 July, 2022; originally announced July 2022.

Showing 101–150 of 568 results for author: Shi, M