-
Butson Hadamard matrices, bent sequences, and spherical codes
Authors:
Minjia Shi,
Danni Lu,
Andrés Armario,
Ronan Egan,
Ferruh Ozbudak,
Patrick Solé
Abstract:
We explore a notion of bent sequence attached to the data consisting of an Hadamard matrix of order $n$ defined over the complex $q^{th}$ roots of unity, an eigenvalue of that matrix, and a Galois automorphism from the cyclotomic field of order $q.$ In particular we construct self-dual bent sequences for various $q\le 60$ and lengths $n\le 21.$ Computational construction methods comprise the resol…
▽ More
We explore a notion of bent sequence attached to the data consisting of an Hadamard matrix of order $n$ defined over the complex $q^{th}$ roots of unity, an eigenvalue of that matrix, and a Galois automorphism from the cyclotomic field of order $q.$ In particular we construct self-dual bent sequences for various $q\le 60$ and lengths $n\le 21.$ Computational construction methods comprise the resolution of polynomial systems by Groebner bases and eigenspace computations. Infinite families can be constructed from regular Hadamard matrices, Bush-type Hadamard matrices, and generalized Boolean bent functions.As an application, we estimate the covering radius of the code attached to that matrix over $\Z_q.$ We derive a lower bound on that quantity for the Chinese Euclidean metric when bent sequences exist. We give the Euclidean distance spectrum, and bound above the covering radius of an attached spherical code, depending on its strength as a spherical design.
△ Less
Submitted 1 November, 2023;
originally announced November 2023.
-
The weight enumerator polynomials of the lifted codes of the projective Solomon-Stiffler codes
Authors:
Minjia Shi,
Shitao Li,
Tor Helleseth
Abstract:
Determining the weight distribution of a code is an old and fundamental topic in coding theory that has been thoroughly studied. In 1977, Helleseth, Kløve, and Mykkeltveit presented a weight enumerator polynomial of the lifted code over $\mathbb{F}_{q^\ell}$ of a $q$-ary linear code with significant combinatorial properties, which can determine the support weight distribution of this linear code.…
▽ More
Determining the weight distribution of a code is an old and fundamental topic in coding theory that has been thoroughly studied. In 1977, Helleseth, Kløve, and Mykkeltveit presented a weight enumerator polynomial of the lifted code over $\mathbb{F}_{q^\ell}$ of a $q$-ary linear code with significant combinatorial properties, which can determine the support weight distribution of this linear code. The Solomon-Stiffler codes are a family of famous Griesmer codes, which were proposed by Solomon and Stiffler in 1965. In this paper, we determine the weight enumerator polynomials of the lifted codes of the projective Solomon-Stiffler codes using some combinatorial properties of subspaces. As a result, we determine the support weight distributions of the projective Solomon-Stiffler codes. In particular, we determine the weight hierarchies of the projective Solomon-Stiffler codes.
△ Less
Submitted 19 October, 2023;
originally announced October 2023.
-
PRIOR: Personalized Prior for Reactivating the Information Overlooked in Federated Learning
Authors:
Mingjia Shi,
Yuhao Zhou,
Kai Wang,
Huaizheng Zhang,
Shudong Huang,
Qing Ye,
Jiangcheng Lv
Abstract:
Classical federated learning (FL) enables training machine learning models without sharing data for privacy preservation, but heterogeneous data characteristic degrades the performance of the localized model. Personalized FL (PFL) addresses this by synthesizing personalized models from a global model via training on local data. Such a global model may overlook the specific information that the cli…
▽ More
Classical federated learning (FL) enables training machine learning models without sharing data for privacy preservation, but heterogeneous data characteristic degrades the performance of the localized model. Personalized FL (PFL) addresses this by synthesizing personalized models from a global model via training on local data. Such a global model may overlook the specific information that the clients have been sampled. In this paper, we propose a novel scheme to inject personalized prior knowledge into the global model in each client, which attempts to mitigate the introduced incomplete information problem in PFL. At the heart of our proposed approach is a framework, the PFL with Bregman Divergence (pFedBreD), decoupling the personalized prior from the local objective function regularized by Bregman divergence for greater adaptability in personalized scenarios. We also relax the mirror descent (RMD) to extract the prior explicitly to provide optional strategies. Additionally, our pFedBreD is backed up by a convergence analysis. Sufficient experiments demonstrate that our method reaches the state-of-the-art performances on 5 datasets and outperforms other methods by up to 3.5% across 8 benchmarks. Extensive analyses verify the robustness and necessity of proposed designs.
△ Less
Submitted 10 November, 2023; v1 submitted 13 October, 2023;
originally announced October 2023.
-
IMITATE: Clinical Prior Guided Hierarchical Vision-Language Pre-training
Authors:
Che Liu,
Sibo Cheng,
Miao**g Shi,
Anand Shah,
Wenjia Bai,
Rossella Arcucci
Abstract:
In the field of medical Vision-Language Pre-training (VLP), significant efforts have been devoted to deriving text and image features from both clinical reports and associated medical images. However, most existing methods may have overlooked the opportunity in leveraging the inherent hierarchical structure of clinical reports, which are generally split into `findings' for descriptive content and…
▽ More
In the field of medical Vision-Language Pre-training (VLP), significant efforts have been devoted to deriving text and image features from both clinical reports and associated medical images. However, most existing methods may have overlooked the opportunity in leveraging the inherent hierarchical structure of clinical reports, which are generally split into `findings' for descriptive content and `impressions' for conclusive observation. Instead of utilizing this rich, structured format, current medical VLP approaches often simplify the report into either a unified entity or fragmented tokens. In this work, we propose a novel clinical prior guided VLP framework named IMITATE to learn the structure information from medical reports with hierarchical vision-language alignment. The framework derives multi-level visual features from the chest X-ray (CXR) images and separately aligns these features with the descriptive and the conclusive text encoded in the hierarchical medical report. Furthermore, a new clinical-informed contrastive loss is introduced for cross-modal learning, which accounts for clinical prior knowledge in formulating sample correlations in contrastive learning. The proposed model, IMITATE, outperforms baseline VLP methods across six different datasets, spanning five medical imaging downstream tasks. Comprehensive experimental results highlight the advantages of integrating the hierarchical structure of medical reports for vision-language alignment.
△ Less
Submitted 1 May, 2024; v1 submitted 11 October, 2023;
originally announced October 2023.
-
SA-Paraformer: Non-autoregressive End-to-End Speaker-Attributed ASR
Authors:
Yangze Li,
Fan Yu,
Yuhao Liang,
Pengcheng Guo,
Mohan Shi,
Zhihao Du,
Shiliang Zhang,
Lei Xie
Abstract:
Joint modeling of multi-speaker ASR and speaker diarization has recently shown promising results in speaker-attributed automatic speech recognition (SA-ASR).Although being able to obtain state-of-the-art (SOTA) performance, most of the studies are based on an autoregressive (AR) decoder which generates tokens one-by-one and results in a large real-time factor (RTF). To speed up inference, we intro…
▽ More
Joint modeling of multi-speaker ASR and speaker diarization has recently shown promising results in speaker-attributed automatic speech recognition (SA-ASR).Although being able to obtain state-of-the-art (SOTA) performance, most of the studies are based on an autoregressive (AR) decoder which generates tokens one-by-one and results in a large real-time factor (RTF). To speed up inference, we introduce a recently proposed non-autoregressive model Paraformer as an acoustic model in the SA-ASR model.Paraformer uses a single-step decoder to enable parallel generation, obtaining comparable performance to the SOTA AR transformer models. Besides, we propose a speaker-filling strategy to reduce speaker identification errors and adopt an inter-CTC strategy to enhance the encoder's ability in acoustic modeling. Experiments on the AliMeeting corpus show that our model outperforms the cascaded SA-ASR model by a 6.1% relative speaker-dependent character error rate (SD-CER) reduction on the test set. Moreover, our model achieves a comparable SD-CER of 34.8% with only 1/10 RTF compared with the SOTA joint AR SA-ASR model.
△ Less
Submitted 7 October, 2023;
originally announced October 2023.
-
FairVision: Equitable Deep Learning for Eye Disease Screening via Fair Identity Scaling
Authors:
Yan Luo,
Muhammad Osama Khan,
Yu Tian,
Min Shi,
Zehao Dou,
Tobias Elze,
Yi Fang,
Mengyu Wang
Abstract:
Equity in AI for healthcare is crucial due to its direct impact on human well-being. Despite advancements in 2D medical imaging fairness, the fairness of 3D models remains underexplored, hindered by the small sizes of 3D fairness datasets. Since 3D imaging surpasses 2D imaging in SOTA clinical care, it is critical to understand the fairness of these 3D models. To address this research gap, we cond…
▽ More
Equity in AI for healthcare is crucial due to its direct impact on human well-being. Despite advancements in 2D medical imaging fairness, the fairness of 3D models remains underexplored, hindered by the small sizes of 3D fairness datasets. Since 3D imaging surpasses 2D imaging in SOTA clinical care, it is critical to understand the fairness of these 3D models. To address this research gap, we conduct the first comprehensive study on the fairness of 3D medical imaging models across multiple protected attributes. Our investigation spans both 2D and 3D models and evaluates fairness across five architectures on three common eye diseases, revealing significant biases across race, gender, and ethnicity. To alleviate these biases, we propose a novel fair identity scaling (FIS) method that improves both overall performance and fairness, outperforming various SOTA fairness methods. Moreover, we release Harvard-FairVision, the first large-scale medical fairness dataset with 30,000 subjects featuring both 2D and 3D imaging data and six demographic identity attributes. Harvard-FairVision provides labels for three major eye disorders affecting about 380 million people worldwide, serving as a valuable resource for both 2D and 3D fairness learning. Our code and dataset are publicly accessible at \url{https://ophai.hms.harvard.edu/datasets/harvard-fairvision30k}.
△ Less
Submitted 12 April, 2024; v1 submitted 3 October, 2023;
originally announced October 2023.
-
When Epipolar Constraint Meets Non-local Operators in Multi-View Stereo
Authors:
Tianqi Liu,
Xinyi Ye,
Weiyue Zhao,
Zhiyu Pan,
Min Shi,
Zhiguo Cao
Abstract:
Learning-based multi-view stereo (MVS) method heavily relies on feature matching, which requires distinctive and descriptive representations. An effective solution is to apply non-local feature aggregation, e.g., Transformer. Albeit useful, these techniques introduce heavy computation overheads for MVS. Each pixel densely attends to the whole image. In contrast, we propose to constrain non-local f…
▽ More
Learning-based multi-view stereo (MVS) method heavily relies on feature matching, which requires distinctive and descriptive representations. An effective solution is to apply non-local feature aggregation, e.g., Transformer. Albeit useful, these techniques introduce heavy computation overheads for MVS. Each pixel densely attends to the whole image. In contrast, we propose to constrain non-local feature augmentation within a pair of lines: each point only attends the corresponding pair of epipolar lines. Our idea takes inspiration from the classic epipolar geometry, which shows that one point with different depth hypotheses will be projected to the epipolar line on the other view. This constraint reduces the 2D search space into the epipolar line in stereo matching. Similarly, this suggests that the matching of MVS is to distinguish a series of points lying on the same line. Inspired by this point-to-line search, we devise a line-to-point non-local augmentation strategy. We first devise an optimized searching algorithm to split the 2D feature maps into epipolar line pairs. Then, an Epipolar Transformer (ET) performs non-local feature augmentation among epipolar line pairs. We incorporate the ET into a learning-based MVS baseline, named ET-MVSNet. ET-MVSNet achieves state-of-the-art reconstruction performance on both the DTU and Tanks-and-Temples benchmark with high efficiency. Code is available at https://github.com/TQTQliu/ET-MVSNet.
△ Less
Submitted 29 September, 2023;
originally announced September 2023.
-
The second multi-channel multi-party meeting transcription challenge (M2MeT) 2.0): A benchmark for speaker-attributed ASR
Authors:
Yuhao Liang,
Mohan Shi,
Fan Yu,
Yangze Li,
Shiliang Zhang,
Zhihao Du,
Qian Chen,
Lei Xie,
Yanmin Qian,
Jian Wu,
Zhuo Chen,
Kong Aik Lee,
Zhijie Yan,
Hui Bu
Abstract:
With the success of the first Multi-channel Multi-party Meeting Transcription challenge (M2MeT), the second M2MeT challenge (M2MeT 2.0) held in ASRU2023 particularly aims to tackle the complex task of \emph{speaker-attributed ASR (SA-ASR)}, which directly addresses the practical and challenging problem of ``who spoke what at when" at typical meeting scenario. We particularly established two sub-tr…
▽ More
With the success of the first Multi-channel Multi-party Meeting Transcription challenge (M2MeT), the second M2MeT challenge (M2MeT 2.0) held in ASRU2023 particularly aims to tackle the complex task of \emph{speaker-attributed ASR (SA-ASR)}, which directly addresses the practical and challenging problem of ``who spoke what at when" at typical meeting scenario. We particularly established two sub-tracks. The fixed training condition sub-track, where the training data is constrained to predetermined datasets, but participants can use any open-source pre-trained model. The open training condition sub-track, which allows for the use of all available data and models without limitation. In addition, we release a new 10-hour test set for challenge ranking. This paper provides an overview of the dataset, track settings, results, and analysis of submitted systems, as a benchmark to show the current state of speaker-attributed ASR.
△ Less
Submitted 5 October, 2023; v1 submitted 24 September, 2023;
originally announced September 2023.
-
A quaternary analogue of Tang-Ding codes
Authors:
Minjia Shi,
Sihui Tao,
Jon-Lark Kim,
Patrick Sole
Abstract:
In a recent paper, Tang and Ding introduced a class of binary cyclic codes of rate close to one half with a designed lower bound on their minimum distance. The definition involves the base $2$ expansion of the integers in their defining set. In this paper we propose an analogue for quaternary codes. In addition, the performances of the subfield subcode and of the trace code (two binary cyclic code…
▽ More
In a recent paper, Tang and Ding introduced a class of binary cyclic codes of rate close to one half with a designed lower bound on their minimum distance. The definition involves the base $2$ expansion of the integers in their defining set. In this paper we propose an analogue for quaternary codes. In addition, the performances of the subfield subcode and of the trace code (two binary cyclic codes) are investigated.
△ Less
Submitted 21 September, 2023;
originally announced September 2023.
-
Fermi Surface Nesting with Heavy Quasiparticles in the Locally Noncentrosymmetric Superconductor CeRh$_2$As$_2$
Authors:
Yi Wu,
Yongjun Zhang,
Sailong Ju,
Yong Hu,
Yanen Huang,
Yanan Zhang,
Huali Zhang,
Hao Zheng,
Guowei Yang,
Evrard-Ouicem Eljaouhari,
Baopeng Song,
Nicholas C. Plumb,
Frank Steglich,
Ming Shi,
Gertrud Zwicknag,
Chao Cao,
Huiqiu Yuan,
Yang Liu
Abstract:
The locally noncentrosymmetric heavy fermion superconductor CeRh$_2$As$_2$ has attracted considerable interests due to its rich superconducting phases, accompanied by a quadrupole density wave and pronounced antiferromagnetic excitations. To understand the underlying physics, we here report measurements from high-resolution angle-resolved photoemission. Our results reveal fine splittings of the co…
▽ More
The locally noncentrosymmetric heavy fermion superconductor CeRh$_2$As$_2$ has attracted considerable interests due to its rich superconducting phases, accompanied by a quadrupole density wave and pronounced antiferromagnetic excitations. To understand the underlying physics, we here report measurements from high-resolution angle-resolved photoemission. Our results reveal fine splittings of the conduction bands related to the locally noncentrosymmetric structure, as well as a quasi-two-dimensional Fermi surface (FS) with strong $4f$ contributions. The FS exhibits nesting with an in-plane vector $(π/a, π/a)$, which is facilitated by the van Hove singularity near $\bar X$ that arises from the characteristic conduction-$f$ hybridization. The FS nesting provides a natural explanation for the observed antiferromagnetic excitations at $(π/a, π/a)$, which could be intimately connected to its unconventional superconductivity. Our experimental results are well supported by density functional theory plus dynamical mean field theory calculations, which can capture the strong correlation effects. Our study not only provides spectroscopic proof of the key factors underlying the field-induced superconducting transition, but also uncovers the critical role of FS nesting and lattice Kondo effect in the intertwined spin and charge fluctuations.
△ Less
Submitted 1 June, 2024; v1 submitted 13 September, 2023;
originally announced September 2023.
-
A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale
Authors:
Hao-Jun Michael Shi,
Tsung-Hsien Lee,
Shintaro Iwasaki,
Jose Gallego-Posada,
Zhi**g Li,
Kaushik Rangadurai,
Dheevatsa Mudigere,
Michael Rabbat
Abstract:
Shampoo is an online and stochastic optimization algorithm belonging to the AdaGrad family of methods for training neural networks. It constructs a block-diagonal preconditioner where each block consists of a coarse Kronecker product approximation to full-matrix AdaGrad for each parameter of the neural network. In this work, we provide a complete description of the algorithm as well as the perform…
▽ More
Shampoo is an online and stochastic optimization algorithm belonging to the AdaGrad family of methods for training neural networks. It constructs a block-diagonal preconditioner where each block consists of a coarse Kronecker product approximation to full-matrix AdaGrad for each parameter of the neural network. In this work, we provide a complete description of the algorithm as well as the performance optimizations that our implementation leverages to train deep networks at-scale in PyTorch. Our implementation enables fast multi-GPU distributed data-parallel training by distributing the memory and computation associated with blocks of each parameter via PyTorch's DTensor data structure and performing an AllGather primitive on the computed search directions at each iteration. This major performance enhancement enables us to achieve at most a 10% performance reduction in per-step wall-clock time compared against standard diagonal-scaling-based adaptive gradient methods. We validate our implementation by performing an ablation study on training ImageNet ResNet50, demonstrating Shampoo's superiority over standard training recipes with minimal hyperparameter tuning.
△ Less
Submitted 12 September, 2023;
originally announced September 2023.
-
EANet: Expert Attention Network for Online Trajectory Prediction
Authors:
Pengfei Yao,
Tianlu Mao,
Min Shi,
**gkai Sun,
Zhaoqi Wang
Abstract:
Trajectory prediction plays a crucial role in autonomous driving. Existing mainstream research and continuoual learning-based methods all require training on complete datasets, leading to poor prediction accuracy when sudden changes in scenarios occur and failing to promptly respond and update the model. Whether these methods can make a prediction in real-time and use data instances to update the…
▽ More
Trajectory prediction plays a crucial role in autonomous driving. Existing mainstream research and continuoual learning-based methods all require training on complete datasets, leading to poor prediction accuracy when sudden changes in scenarios occur and failing to promptly respond and update the model. Whether these methods can make a prediction in real-time and use data instances to update the model immediately(i.e., online learning settings) remains a question. The problem of gradient explosion or vanishing caused by data instance streams also needs to be addressed. Inspired by Hedge Propagation algorithm, we propose Expert Attention Network, a complete online learning framework for trajectory prediction. We introduce expert attention, which adjusts the weights of different depths of network layers, avoiding the model updated slowly due to gradient problem and enabling fast learning of new scenario's knowledge to restore prediction accuracy. Furthermore, we propose a short-term motion trend kernel function which is sensitive to scenario change, allowing the model to respond quickly. To the best of our knowledge, this work is the first attempt to address the online learning problem in trajectory prediction. The experimental results indicate that traditional methods suffer from gradient problems and that our method can quickly reduce prediction errors and reach the state-of-the-art prediction accuracy.
△ Less
Submitted 11 September, 2023;
originally announced September 2023.
-
Federated cINN Clustering for Accurate Clustered Federated Learning
Authors:
Yuhao Zhou,
Minjia Shi,
Yuxin Tian,
Yuanxi Li,
Qing Ye,
Jiancheng Lv
Abstract:
Federated Learning (FL) presents an innovative approach to privacy-preserving distributed machine learning and enables efficient crowd intelligence on a large scale. However, a significant challenge arises when coordinating FL with crowd intelligence which diverse client groups possess disparate objectives due to data heterogeneity or distinct tasks. To address this challenge, we propose the Feder…
▽ More
Federated Learning (FL) presents an innovative approach to privacy-preserving distributed machine learning and enables efficient crowd intelligence on a large scale. However, a significant challenge arises when coordinating FL with crowd intelligence which diverse client groups possess disparate objectives due to data heterogeneity or distinct tasks. To address this challenge, we propose the Federated cINN Clustering Algorithm (FCCA) to robustly cluster clients into different groups, avoiding mutual interference between clients with data heterogeneity, and thereby enhancing the performance of the global model. Specifically, FCCA utilizes a global encoder to transform each client's private data into multivariate Gaussian distributions. It then employs a generative model to learn encoded latent features through maximum likelihood estimation, which eases optimization and avoids mode collapse. Finally, the central server collects converged local models to approximate similarities between clients and thus partition them into distinct clusters. Extensive experimental results demonstrate FCCA's superiority over other state-of-the-art clustered federated learning algorithms, evaluated on various models and datasets. These results suggest that our approach has substantial potential to enhance the efficiency and accuracy of real-world federated learning tasks.
△ Less
Submitted 4 September, 2023;
originally announced September 2023.
-
Electronic band reconstruction across the insulator-metal transition in colossal magnetoresistive EuCd2P2
Authors:
Huali Zhang,
Feng Du,
Xiaoying Zheng,
Shuaishuai Luo,
Yi Wu,
Hao Zheng,
Shengtao Cui,
Zhe Sun,
Zhengtai Liu,
Dawei Shen,
Michael Smidman,
Yu Song,
Ming Shi,
Zhicheng Zhong,
Chao Cao,
Huiqiu Yuan,
Yang Liu
Abstract:
While colossal magnetoresistance (CMR) in Eu-based compounds is often associated with strong spin-carrier interactions, the underlying reconstruction of the electronic bands is much less understood from spectroscopic experiments. Here using angle-resolved photoemission, we directly observe an electronic band reconstruction across the insulator-metal (and magnetic) transition in the recently discov…
▽ More
While colossal magnetoresistance (CMR) in Eu-based compounds is often associated with strong spin-carrier interactions, the underlying reconstruction of the electronic bands is much less understood from spectroscopic experiments. Here using angle-resolved photoemission, we directly observe an electronic band reconstruction across the insulator-metal (and magnetic) transition in the recently discovered CMR compound EuCd2P2. This transition is manifested by a large magnetic band splitting associated with the magnetic order, as well as unusual energy shifts of the valence bands: both the large ordered moment of Eu and carrier localization in the paramagnetic phase are crucial. Our results provide spectroscopic evidence for an electronic structure reconstruction underlying the enormous CMR observed in EuCd2P2, which could be important for understanding Eu-based CMR materials, as well as designing CMR materials based on large-moment rare-earth magnets.
△ Less
Submitted 31 August, 2023;
originally announced August 2023.
-
Hybrid Quantum Neural Network Structures for Image Multi-classification
Authors:
Mingrui Shi,
Haozhen Situ,
Cai Zhang
Abstract:
Image classification is a fundamental computer vision problem, and neural networks offer efficient solutions. With advancing quantum technology, quantum neural networks have gained attention. However, they work only for low-dimensional data and demand dimensionality reduction and quantum encoding. Two recent image classification methods have emerged: one employs PCA dimensionality reduction and an…
▽ More
Image classification is a fundamental computer vision problem, and neural networks offer efficient solutions. With advancing quantum technology, quantum neural networks have gained attention. However, they work only for low-dimensional data and demand dimensionality reduction and quantum encoding. Two recent image classification methods have emerged: one employs PCA dimensionality reduction and angle encoding, the other integrates QNNs into CNNs to boost performance. Despite numerous algorithms, comparing PCA reduction with angle encoding against the latter remains unclear. This study explores these algorithms' performance in multi-class image classification and proposes an optimized hybrid quantum neural network suitable for the current environment. Investigating PCA-based quantum algorithms unveils a barren plateau issue for QNNs as categories increase, unsuitable for multi-class in the hybrid setup. Simultaneously, the combined CNN-QNN model partly overcomes QNN's multi-class training challenges but lags in accuracy to superior traditional CNN models. Additionally, this work explores transfer learning in the hybrid quantum neural network model. In conclusion, quantum neural networks show promise but require further research and optimization, facing challenges ahead.
△ Less
Submitted 30 August, 2023;
originally announced August 2023.
-
An investigation into the impact of deep learning model choice on sex and race bias in cardiac MR segmentation
Authors:
Tiarna Lee,
Esther Puyol-Antón,
Bram Ruijsink,
Keana Aitcheson,
Miao**g Shi,
Andrew P. King
Abstract:
In medical imaging, artificial intelligence (AI) is increasingly being used to automate routine tasks. However, these algorithms can exhibit and exacerbate biases which lead to disparate performances between protected groups. We investigate the impact of model choice on how imbalances in subject sex and race in training datasets affect AI-based cine cardiac magnetic resonance image segmentation. W…
▽ More
In medical imaging, artificial intelligence (AI) is increasingly being used to automate routine tasks. However, these algorithms can exhibit and exacerbate biases which lead to disparate performances between protected groups. We investigate the impact of model choice on how imbalances in subject sex and race in training datasets affect AI-based cine cardiac magnetic resonance image segmentation. We evaluate three convolutional neural network-based models and one vision transformer model. We find significant sex bias in three of the four models and racial bias in all of the models. However, the severity and nature of the bias varies between the models, highlighting the importance of model choice when attempting to train fair AI-based segmentation models for medical imaging tasks.
△ Less
Submitted 25 August, 2023;
originally announced August 2023.
-
Harvard Glaucoma Detection and Progression: A Multimodal Multitask Dataset and Generalization-Reinforced Semi-Supervised Learning
Authors:
Yan Luo,
Min Shi,
Yu Tian,
Tobias Elze,
Mengyu Wang
Abstract:
Glaucoma is the number one cause of irreversible blindness globally. A major challenge for accurate glaucoma detection and progression forecasting is the bottleneck of limited labeled patients with the state-of-the-art (SOTA) 3D retinal imaging data of optical coherence tomography (OCT). To address the data scarcity issue, this paper proposes two solutions. First, we develop a novel generalization…
▽ More
Glaucoma is the number one cause of irreversible blindness globally. A major challenge for accurate glaucoma detection and progression forecasting is the bottleneck of limited labeled patients with the state-of-the-art (SOTA) 3D retinal imaging data of optical coherence tomography (OCT). To address the data scarcity issue, this paper proposes two solutions. First, we develop a novel generalization-reinforced semi-supervised learning (SSL) model called pseudo supervisor to optimally utilize unlabeled data. Compared with SOTA models, the proposed pseudo supervisor optimizes the policy of predicting pseudo labels with unlabeled samples to improve empirical generalization. Our pseudo supervisor model is evaluated with two clinical tasks consisting of glaucoma detection and progression forecasting. The progression forecasting task is evaluated both unimodally and multimodally. Our pseudo supervisor model demonstrates superior performance than SOTA SSL comparison models. Moreover, our model also achieves the best results on the publicly available LAG fundus dataset. Second, we introduce the Harvard Glaucoma Detection and Progression (Harvard-GDP) Dataset, a multimodal multitask dataset that includes data from 1,000 patients with OCT imaging data, as well as labels for glaucoma detection and progression. This is the largest glaucoma detection dataset with 3D OCT imaging data and the first glaucoma progression forecasting dataset that is publicly available. Detailed sex and racial analysis are provided, which can be used by interested researchers for fairness learning studies. Our released dataset is benchmarked with several SOTA supervised CNN and transformer deep learning models. The dataset and code are made publicly available via \url{https://ophai.hms.harvard.edu/datasets/harvard-gdp1000}.
△ Less
Submitted 25 August, 2023;
originally announced August 2023.
-
Green-light p-n Junction Particle Inhomogeneous Phase Enhancement of MgB2 Smart Meta-Superconductor
Authors:
Yao Qi,
Duo Chen,
Yongbo Li,
Chao Sun,
Qingyu Hai,
Miao Shi,
Honggang Chen,
Xiaopeng Zhao
Abstract:
Improving the critical temperature (TC), critical magnetic field (HC), and critical current (JC) of superconducting materials has always been one of the most significant challenges in the field of superconductivity, but progress has been slow over the years. Based on the concept of injecting energy to enhance electron pairing states, in this study, we have employed a solid-state sintering method t…
▽ More
Improving the critical temperature (TC), critical magnetic field (HC), and critical current (JC) of superconducting materials has always been one of the most significant challenges in the field of superconductivity, but progress has been slow over the years. Based on the concept of injecting energy to enhance electron pairing states, in this study, we have employed a solid-state sintering method to fabricate a series of smart meta-superconductors (SMSCs) consisting of p-n junction nanostructures with a wavelength of 550 nm, doped within an MgB2 matrix. Experimental results demonstrate that compared to pure MgB2 samples, the critical transition temperature (TC) has increased by 1.2 K, the critical current (JC) has increased by 52.8%, and the Meissner effect (HC) shows significant improvement in its diamagnetic properties. This phenomenon of enhanced superconducting performance can be explained by the coupling between superconducting electrons and evanescent waves.
△ Less
Submitted 22 August, 2023;
originally announced August 2023.
-
SegMatch: A semi-supervised learning method for surgical instrument segmentation
Authors:
Meng Wei,
Charlie Budd,
Luis C. Garcia-Peraza-Herrera,
Reuben Dorent,
Miao**g Shi,
Tom Vercauteren
Abstract:
Surgical instrument segmentation is recognised as a key enabler to provide advanced surgical assistance and improve computer assisted interventions. In this work, we propose SegMatch, a semi supervised learning method to reduce the need for expensive annotation for laparoscopic and robotic surgical images. SegMatch builds on FixMatch, a widespread semi supervised classification pipeline combining…
▽ More
Surgical instrument segmentation is recognised as a key enabler to provide advanced surgical assistance and improve computer assisted interventions. In this work, we propose SegMatch, a semi supervised learning method to reduce the need for expensive annotation for laparoscopic and robotic surgical images. SegMatch builds on FixMatch, a widespread semi supervised classification pipeline combining consistency regularization and pseudo labelling, and adapts it for the purpose of segmentation. In our proposed SegMatch, the unlabelled images are weakly augmented and fed into the segmentation model to generate a pseudo-label to enforce the unsupervised loss against the output of the model for the adversarial augmented image on the pixels with a high confidence score. Our adaptation for segmentation tasks includes carefully considering the equivariance and invariance properties of the augmentation functions we rely on. To increase the relevance of our augmentations, we depart from using only handcrafted augmentations and introduce a trainable adversarial augmentation strategy. Our algorithm was evaluated on the MICCAI Instrument Segmentation Challenge datasets Robust-MIS 2019 and EndoVis 2017. Our results demonstrate that adding unlabelled data for training purposes allows us to surpass the performance of fully supervised approaches which are limited by the availability of training data in these challenges. SegMatch also outperforms a range of state-of-the-art semi-supervised learning semantic segmentation models in different labelled to unlabelled data ratios.
△ Less
Submitted 9 August, 2023;
originally announced August 2023.
-
Adversarial Erasing with Pruned Elements: Towards Better Graph Lottery Ticket
Authors:
Yuwen Wang,
Shunyu Liu,
Kaixuan Chen,
Tongtian Zhu,
Ji Qiao,
Mengjie Shi,
Yuanyu Wan,
Mingli Song
Abstract:
Graph Lottery Ticket (GLT), a combination of core subgraph and sparse subnetwork, has been proposed to mitigate the computational cost of deep Graph Neural Networks (GNNs) on large input graphs while preserving original performance. However, the winning GLTs in exisiting studies are obtained by applying iterative magnitude-based pruning (IMP) without re-evaluating and re-considering the pruned inf…
▽ More
Graph Lottery Ticket (GLT), a combination of core subgraph and sparse subnetwork, has been proposed to mitigate the computational cost of deep Graph Neural Networks (GNNs) on large input graphs while preserving original performance. However, the winning GLTs in exisiting studies are obtained by applying iterative magnitude-based pruning (IMP) without re-evaluating and re-considering the pruned information, which disregards the dynamic changes in the significance of edges/weights during graph/model structure pruning, and thus limits the appeal of the winning tickets. In this paper, we formulate a conjecture, i.e., existing overlooked valuable information in the pruned graph connections and model parameters which can be re-grouped into GLT to enhance the final performance. Specifically, we propose an adversarial complementary erasing (ACE) framework to explore the valuable information from the pruned components, thereby develo** a more powerful GLT, referred to as the ACE-GLT. The main idea is to mine valuable information from pruned edges/weights after each round of IMP, and employ the ACE technique to refine the GLT processing. Finally, experimental results demonstrate that our ACE-GLT outperforms existing methods for searching GLT in diverse tasks. Our code will be made publicly available.
△ Less
Submitted 10 August, 2023; v1 submitted 5 August, 2023;
originally announced August 2023.
-
The fate of quasiparticles at high-temperature
Authors:
A. Hunter,
S. Beck,
E. Cappelli,
F. Margot,
M. Straub,
Y. Alexanian,
G. Gatti,
M. D. Watson,
T. K. Kim,
C. Cacho,
N. C. Plumb,
M. Shi,
M. Radović,
D. A. Sokolov,
A. P. Mackenzie,
M. Zingl,
J. Mravlje,
A. Georges,
F. Baumberger,
A. Tamai
Abstract:
We study the temperature evolution of quasiparticles in the correlated metal Sr$_2$RuO$_4$. Our angle resolved photoemission data show that quasiparticles persist up to temperatures above 200~K, far beyond the Fermi liquid regime. Extracting the quasiparticle self-energy we demonstrate that the quasiparticle residue $Z$ increases with increasing temperature. Quasiparticles eventually disappear on…
▽ More
We study the temperature evolution of quasiparticles in the correlated metal Sr$_2$RuO$_4$. Our angle resolved photoemission data show that quasiparticles persist up to temperatures above 200~K, far beyond the Fermi liquid regime. Extracting the quasiparticle self-energy we demonstrate that the quasiparticle residue $Z$ increases with increasing temperature. Quasiparticles eventually disappear on approaching the bad metal state of Sr$_2$RuO$_4$ not by losing weight but via excessive broadening from super-Planckian scattering. We further show that the Fermi surface of Sr$_2$RuO$_4$ - defined as the loci where the spectral function peaks - deflates with increasing temperature. These findings are in semi-quantitative agreement with dynamical mean field theory calculations.
△ Less
Submitted 4 August, 2023;
originally announced August 2023.
-
The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World
Authors:
Weiyun Wang,
Min Shi,
Qingyun Li,
Wenhai Wang,
Zhenhang Huang,
Linjie Xing,
Zhe Chen,
Hao Li,
Xizhou Zhu,
Zhiguo Cao,
Yushi Chen,
Tong Lu,
Jifeng Dai,
Yu Qiao
Abstract:
We present the All-Seeing (AS) project: a large-scale data and model for recognizing and understanding everything in the open world. Using a scalable data engine that incorporates human feedback and efficient models in the loop, we create a new dataset (AS-1B) with over 1 billion regions annotated with semantic tags, question-answering pairs, and detailed captions. It covers a wide range of 3.5 mi…
▽ More
We present the All-Seeing (AS) project: a large-scale data and model for recognizing and understanding everything in the open world. Using a scalable data engine that incorporates human feedback and efficient models in the loop, we create a new dataset (AS-1B) with over 1 billion regions annotated with semantic tags, question-answering pairs, and detailed captions. It covers a wide range of 3.5 million common and rare concepts in the real world, and has 132.2 billion tokens that describe the concepts and their attributes. Leveraging this new dataset, we develop the All-Seeing model (ASM), a unified framework for panoptic visual recognition and understanding. The model is trained with open-ended language prompts and locations, which allows it to generalize to various vision and language tasks with remarkable zero-shot performance, including region-text retrieval, region recognition, captioning, and question-answering. We hope that this project can serve as a foundation for vision-language artificial general intelligence research. Models and the dataset shall be released at https://github.com/OpenGVLab/All-Seeing, and demo can be seen at https://huggingface.co/spaces/OpenGVLab/all-seeing.
△ Less
Submitted 3 August, 2023;
originally announced August 2023.
-
When Multi-Task Learning Meets Partial Supervision: A Computer Vision Review
Authors:
Maxime Fontana,
Michael Spratling,
Miao**g Shi
Abstract:
Multi-Task Learning (MTL) aims to learn multiple tasks simultaneously while exploiting their mutual relationships. By using shared resources to simultaneously calculate multiple outputs, this learning paradigm has the potential to have lower memory requirements and inference times compared to the traditional approach of using separate methods for each task. Previous work in MTL has mainly focused…
▽ More
Multi-Task Learning (MTL) aims to learn multiple tasks simultaneously while exploiting their mutual relationships. By using shared resources to simultaneously calculate multiple outputs, this learning paradigm has the potential to have lower memory requirements and inference times compared to the traditional approach of using separate methods for each task. Previous work in MTL has mainly focused on fully-supervised methods, as task relationships can not only be leveraged to lower the level of data-dependency of those methods but they can also improve performance. However, MTL introduces a set of challenges due to a complex optimisation scheme and a higher labeling requirement. This review focuses on how MTL could be utilised under different partial supervision settings to address these challenges. First, this review analyses how MTL traditionally uses different parameter sharing techniques to transfer knowledge in between tasks. Second, it presents the different challenges arising from such a multi-objective optimisation scheme. Third, it introduces how task grou**s can be achieved by analysing task relationships. Fourth, it focuses on how partially supervised methods applied to MTL can tackle the aforementioned challenges. Lastly, this review presents the available datasets, tools and benchmarking results of such methods.
△ Less
Submitted 25 July, 2023;
originally announced July 2023.
-
Neural Video Depth Stabilizer
Authors:
Yiran Wang,
Min Shi,
Jiaqi Li,
Zihao Huang,
Zhiguo Cao,
Jianming Zhang,
Ke Xian,
Guosheng Lin
Abstract:
Video depth estimation aims to infer temporally consistent depth. Some methods achieve temporal consistency by finetuning a single-image depth model during test time using geometry and re-projection constraints, which is inefficient and not robust. An alternative approach is to learn how to enforce temporal consistency from data, but this requires well-designed models and sufficient video depth da…
▽ More
Video depth estimation aims to infer temporally consistent depth. Some methods achieve temporal consistency by finetuning a single-image depth model during test time using geometry and re-projection constraints, which is inefficient and not robust. An alternative approach is to learn how to enforce temporal consistency from data, but this requires well-designed models and sufficient video depth data. To address these challenges, we propose a plug-and-play framework called Neural Video Depth Stabilizer (NVDS) that stabilizes inconsistent depth estimations and can be applied to different single-image depth models without extra effort. We also introduce a large-scale dataset, Video Depth in the Wild (VDW), which consists of 14,203 videos with over two million frames, making it the largest natural-scene video depth dataset to our knowledge. We evaluate our method on the VDW dataset as well as two public benchmarks and demonstrate significant improvements in consistency, accuracy, and efficiency compared to previous approaches. Our work serves as a solid baseline and provides a data foundation for learning-based video depth models. We will release our dataset and code for future research.
△ Less
Submitted 10 August, 2023; v1 submitted 17 July, 2023;
originally announced July 2023.
-
TreeFormer: a Semi-Supervised Transformer-based Framework for Tree Counting from a Single High Resolution Image
Authors:
Hamed Amini Amirkolaee,
Miao**g Shi,
Mark Mulligan
Abstract:
Automatic tree density estimation and counting using single aerial and satellite images is a challenging task in photogrammetry and remote sensing, yet has an important role in forest management. In this paper, we propose the first semisupervised transformer-based framework for tree counting which reduces the expensive tree annotations for remote sensing images. Our method, termed as TreeFormer, f…
▽ More
Automatic tree density estimation and counting using single aerial and satellite images is a challenging task in photogrammetry and remote sensing, yet has an important role in forest management. In this paper, we propose the first semisupervised transformer-based framework for tree counting which reduces the expensive tree annotations for remote sensing images. Our method, termed as TreeFormer, first develops a pyramid tree representation module based on transformer blocks to extract multi-scale features during the encoding stage. Contextual attention-based feature fusion and tree density regressor modules are further designed to utilize the robust features from the encoder to estimate tree density maps in the decoder. Moreover, we propose a pyramid learning strategy that includes local tree density consistency and local tree count ranking losses to utilize unlabeled images into the training process. Finally, the tree counter token is introduced to regulate the network by computing the global tree counts for both labeled and unlabeled images. Our model was evaluated on two benchmark tree counting datasets, Jiangsu, and Yosemite, as well as a new dataset, KCL-London, created by ourselves. Our TreeFormer outperforms the state of the art semi-supervised methods under the same setting and exceeds the fully-supervised methods using the same number of labeled images. The codes and datasets are available at https://github.com/HAAClassic/TreeFormer.
△ Less
Submitted 12 July, 2023;
originally announced July 2023.
-
Quantum-enhanced Electrometer based on Microwave-dressed Rydberg Atoms
Authors:
Shuhe Wu,
Dong Zhang,
Zhengchun Li,
Minwei Shi,
Peiyu Yang,
**xian Guo,
Wei Du,
Guzhi Bao,
Wei** Zhang
Abstract:
Rydberg atoms have been shown remarkable performance in sensing microwave field. The sensitivity of such an electrometer based on optical readout of atomic ensemble has been demonstrated to approach the photon-shot-noise limit. However, the sensitivity can not be promoted infinitely by increasing the power of probe light due to the increased collision rates and power broadening. Compared with clas…
▽ More
Rydberg atoms have been shown remarkable performance in sensing microwave field. The sensitivity of such an electrometer based on optical readout of atomic ensemble has been demonstrated to approach the photon-shot-noise limit. However, the sensitivity can not be promoted infinitely by increasing the power of probe light due to the increased collision rates and power broadening. Compared with classical light, the use of quantum light may lead to a better sensitivity with lower number of photons. In this paper, we exploit entanglement in a microwave-dressed Rydberg electrometer to suppress the fluctuation of noise. The results show a sensitivity enhancement beating the shot noise limit in both cold and hot atom schemes. Through optimizing the transmission of optical readout, our quantum advantage can be maintained with different absorptive index of atomic vapor, which makes it possible to apply quantum light source in the absorptive electrometer.
△ Less
Submitted 11 July, 2023;
originally announced July 2023.
-
Electronic Landscape of Kagome Superconductors $\textit{A}$V$_{3}$Sb$_{5}$ ($\textit{A}$ = K, Rb, Cs) from Angle-Resolved Photoemission Spectroscopy
Authors:
Yong Hu,
Xianxin Wu,
Andreas P. Schnyder,
Ming Shi
Abstract:
The recently discovered layered kagome superconductors $\textit{A}$V$_{3}$Sb$_{5}$ ($\textit{A}$ = K, Rb, Cs) have garnered significant attention, as they exhibit an intriguing combination of superconductivity, charge density wave (CDW) order, and nontrivial band topology. As such, these kagome systems serve as an exceptional quantum platform for investigating the intricate interplay between elect…
▽ More
The recently discovered layered kagome superconductors $\textit{A}$V$_{3}$Sb$_{5}$ ($\textit{A}$ = K, Rb, Cs) have garnered significant attention, as they exhibit an intriguing combination of superconductivity, charge density wave (CDW) order, and nontrivial band topology. As such, these kagome systems serve as an exceptional quantum platform for investigating the intricate interplay between electron correlation effects, geometric frustration, and topological electronic structure. A comprehensive understanding of the underlying electronic structure is crucial for unveiling the nature and origin of the CDW order, as well as determining the electron pairing symmetry in the kagome superconductors. In this review, we present a concise survey of the electronic properties of $\textit{A}$V$_{3}$Sb$_{5}$, with a particular focus on the insights derived from angle-resolved photoemission spectroscopy (ARPES). Through the lens of ARPES, we shed light on the electronic characteristics of the kagome superconductors $\textit{A}$V$_{3}$Sb$_{5}$, which will pave the way for exciting new research frontiers in kagome-related physics.
△ Less
Submitted 13 November, 2023; v1 submitted 28 June, 2023;
originally announced June 2023.
-
Automated Grading and Feedback Tools for Programming Education: A Systematic Review
Authors:
Marcus Messer,
Neil C. C. Brown,
Michael Kölling,
Miao**g Shi
Abstract:
We conducted a systematic literature review on automated grading and feedback tools for programming education.
We analysed 121 research papers from 2017 to 2021 inclusive and categorised them based on skills assessed, approach, language paradigm, degree of automation and evaluation techniques.
Most papers assess the correctness of assignments in object-oriented languages.
Typically, these to…
▽ More
We conducted a systematic literature review on automated grading and feedback tools for programming education.
We analysed 121 research papers from 2017 to 2021 inclusive and categorised them based on skills assessed, approach, language paradigm, degree of automation and evaluation techniques.
Most papers assess the correctness of assignments in object-oriented languages.
Typically, these tools use a dynamic technique, primarily unit testing, to provide grades and feedback to the students or static analysis techniques to compare a submission with a reference solution or with a set of correct student submissions.
However, these techniques' feedback is often limited to whether the unit tests have passed or failed, the expected and actual output, or how they differ from the reference solution.
Furthermore, few tools assess the maintainability, readability or documentation of the source code, with most using static analysis techniques, such as code quality metrics, in conjunction with grading correctness.
Additionally, we found that most tools offered fully automated assessment to allow for near-instantaneous feedback and multiple resubmissions, which can increase student satisfaction and provide them with more opportunities to succeed.
In terms of techniques used to evaluate the tools' performance, most papers primarily use student surveys or compare the automatic assessment tools to grades or feedback provided by human graders.
However, because the evaluation dataset is frequently unavailable, it is more difficult to reproduce results and compare tools to a collection of common assignments.
△ Less
Submitted 5 December, 2023; v1 submitted 20 June, 2023;
originally announced June 2023.
-
Learning-based sound speed estimation and aberration correction in linear-array photoacoustic imaging
Authors:
Mengjie Shi,
Tom Vercauteren,
Wenfeng Xia
Abstract:
Photoacoustic (PA) image reconstruction involves acoustic inversion that necessitates the specification of the speed of sound (SoS) within the medium of propagation. Due to the lack of information on the spatial distribution of the SoS within heterogeneous soft tissue, a homogeneous SoS distribution (such as 1540 m/s) is typically assumed in PA image reconstruction, similar to that of ultrasound (…
▽ More
Photoacoustic (PA) image reconstruction involves acoustic inversion that necessitates the specification of the speed of sound (SoS) within the medium of propagation. Due to the lack of information on the spatial distribution of the SoS within heterogeneous soft tissue, a homogeneous SoS distribution (such as 1540 m/s) is typically assumed in PA image reconstruction, similar to that of ultrasound (US) imaging. Failure to compensate the SoS variations leads to aberration artefacts, deteriorating the image quality. Various methods have been proposed to address this issue, but they usually involve complex hardware and/or time-consuming algorithms, hindering clinical translation. In this work, we introduce a deep learning framework for SoS estimation and subsequent aberration correction in a dual-modal PA/US imaging system exploiting a clinical US probe. As the acquired PA and US images were inherently co-registered, the estimated SoS distribution from US channel data using a deep neural network was incorporated for accurate PA image reconstruction. The framework comprised an initial pre-training stage based on digital phantoms, which was further enhanced through transfer learning using physical phantom data and associated SoS maps obtained from measurements. This framework achieved a root mean square error of 10.2 m/s and 15.2 m/s for SoS estimation on digital and physical phantoms, respectively and structural similarity index measures of up to 0.86 for PA reconstructions as compared to the conventional approach of 0.69. A maximum of 1.2 times improvement in signal-to-noise ratio of PA images was further demonstrated with a human volunteer study. Our results show that the proposed framework could be valuable in various clinical and preclinical applications to enhance PA image reconstruction.
△ Less
Submitted 5 March, 2024; v1 submitted 19 June, 2023;
originally announced June 2023.
-
Harvard Glaucoma Fairness: A Retinal Nerve Disease Dataset for Fairness Learning and Fair Identity Normalization
Authors:
Yan Luo,
Yu Tian,
Min Shi,
Louis R. Pasquale,
Lucy Q. Shen,
Nazlee Zebardast,
Tobias Elze,
Mengyu Wang
Abstract:
Fairness (also known as equity interchangeably) in machine learning is important for societal well-being, but limited public datasets hinder its progress. Currently, no dedicated public medical datasets with imaging data for fairness learning are available, though minority groups suffer from more health issues. To address this gap, we introduce Harvard Glaucoma Fairness (Harvard-GF), a retinal ner…
▽ More
Fairness (also known as equity interchangeably) in machine learning is important for societal well-being, but limited public datasets hinder its progress. Currently, no dedicated public medical datasets with imaging data for fairness learning are available, though minority groups suffer from more health issues. To address this gap, we introduce Harvard Glaucoma Fairness (Harvard-GF), a retinal nerve disease dataset with both 2D and 3D imaging data and balanced racial groups for glaucoma detection. Glaucoma is the leading cause of irreversible blindness globally with Blacks having doubled glaucoma prevalence than other races. We also propose a fair identity normalization (FIN) approach to equalize the feature importance between different identity groups. Our FIN approach is compared with various the-state-of-the-art fairness learning methods with superior performance in the racial, gender, and ethnicity fairness tasks with 2D and 3D imaging data, which demonstrate the utilities of our dataset Harvard-GF for fairness learning. To facilitate fairness comparisons between different models, we propose an equity-scaled performance measure, which can be flexibly used to compare all kinds of performance metrics in the context of fairness. The dataset and code are publicly accessible via \url{https://ophai.hms.harvard.edu/datasets/harvard-glaucoma-fairness-3300-samples/}.
△ Less
Submitted 10 March, 2024; v1 submitted 15 June, 2023;
originally announced June 2023.
-
Text Promptable Surgical Instrument Segmentation with Vision-Language Models
Authors:
Zijian Zhou,
Oluwatosin Alabi,
Meng Wei,
Tom Vercauteren,
Miao**g Shi
Abstract:
In this paper, we propose a novel text promptable surgical instrument segmentation approach to overcome challenges associated with diversity and differentiation of surgical instruments in minimally invasive surgeries. We redefine the task as text promptable, thereby enabling a more nuanced comprehension of surgical instruments and adaptability to new instrument types. Inspired by recent advancemen…
▽ More
In this paper, we propose a novel text promptable surgical instrument segmentation approach to overcome challenges associated with diversity and differentiation of surgical instruments in minimally invasive surgeries. We redefine the task as text promptable, thereby enabling a more nuanced comprehension of surgical instruments and adaptability to new instrument types. Inspired by recent advancements in vision-language models, we leverage pretrained image and text encoders as our model backbone and design a text promptable mask decoder consisting of attention- and convolution-based prompting schemes for surgical instrument segmentation prediction. Our model leverages multiple text prompts for each surgical instrument through a new mixture of prompts mechanism, resulting in enhanced segmentation performance. Additionally, we introduce a hard instrument area reinforcement module to improve image feature comprehension and segmentation precision. Extensive experiments on several surgical instrument segmentation datasets demonstrate our model's superior performance and promising generalization capability. To our knowledge, this is the first implementation of a promptable approach to surgical instrument segmentation, offering significant potential for practical application in the field of robotic-assisted surgery. Code is available at https://github.com/franciszzj/TP-SIS.
△ Less
Submitted 8 November, 2023; v1 submitted 15 June, 2023;
originally announced June 2023.
-
Theoretical Hardness and Tractability of POMDPs in RL with Partial Online State Information
Authors:
Ming Shi,
Yingbin Liang,
Ness Shroff
Abstract:
Partially observable Markov decision processes (POMDPs) have been widely applied in various real-world applications. However, existing theoretical results have shown that learning in POMDPs is intractable in the worst case, where the main challenge lies in the lack of latent state information. A key fundamental question here is: how much online state information (OSI) is sufficient to achieve trac…
▽ More
Partially observable Markov decision processes (POMDPs) have been widely applied in various real-world applications. However, existing theoretical results have shown that learning in POMDPs is intractable in the worst case, where the main challenge lies in the lack of latent state information. A key fundamental question here is: how much online state information (OSI) is sufficient to achieve tractability? In this paper, we establish a lower bound that reveals a surprising hardness result: unless we have full OSI, we need an exponentially scaling sample complexity to obtain an $ε$-optimal policy solution for POMDPs. Nonetheless, inspired by the insights in our lower-bound design, we identify important tractable subclasses of POMDPs, even with only partial OSI. In particular, for two subclasses of POMDPs with partial OSI, we provide new algorithms that are proved to be near-optimal by establishing new regret upper and lower bounds. Both our algorithm design and regret analysis involve non-trivial developments for joint OSI query and action control.
△ Less
Submitted 11 March, 2024; v1 submitted 14 June, 2023;
originally announced June 2023.
-
LoSh: Long-Short Text Joint Prediction Network for Referring Video Object Segmentation
Authors:
Linfeng Yuan,
Miao**g Shi,
Zijie Yue,
Qijun Chen
Abstract:
Referring video object segmentation (RVOS) aims to segment the target instance referred by a given text expression in a video clip. The text expression normally contains sophisticated description of the instance's appearance, action, and relation with others. It is therefore rather difficult for a RVOS model to capture all these attributes correspondingly in the video; in fact, the model often fav…
▽ More
Referring video object segmentation (RVOS) aims to segment the target instance referred by a given text expression in a video clip. The text expression normally contains sophisticated description of the instance's appearance, action, and relation with others. It is therefore rather difficult for a RVOS model to capture all these attributes correspondingly in the video; in fact, the model often favours more on the action- and relation-related visual attributes of the instance. This can end up with partial or even incorrect mask prediction of the target instance. We tackle this problem by taking a subject-centric short text expression from the original long text expression. The short one retains only the appearance-related information of the target instance so that we can use it to focus the model's attention on the instance's appearance. We let the model make joint predictions using both long and short text expressions; and insert a long-short cross-attention module to interact the joint features and a long-short predictions intersection loss to regulate the joint predictions. Besides the improvement on the linguistic part, we also introduce a forward-backward visual consistency loss, which utilizes optical flows to warp visual features between the annotated frames and their temporal neighbors for consistency. We build our method on top of two state of the art pipelines. Extensive experiments on A2D-Sentences, Refer-YouTube-VOS, JHMDB-Sentences and Refer-DAVIS17 show impressive improvements of our method.Code is available at https://github.com/LinfengYuan1997/Losh.
△ Less
Submitted 1 April, 2024; v1 submitted 14 June, 2023;
originally announced June 2023.
-
Zero-Shot Automatic Pronunciation Assessment
Authors:
Hongfu Liu,
Mingqian Shi,
Ye Wang
Abstract:
Automatic Pronunciation Assessment (APA) is vital for computer-assisted language learning. Prior methods rely on annotated speech-text data to train Automatic Speech Recognition (ASR) models or speech-score data to train regression models. In this work, we propose a novel zero-shot APA method based on the pre-trained acoustic model, HuBERT. Our method involves encoding speech input and corrupting…
▽ More
Automatic Pronunciation Assessment (APA) is vital for computer-assisted language learning. Prior methods rely on annotated speech-text data to train Automatic Speech Recognition (ASR) models or speech-score data to train regression models. In this work, we propose a novel zero-shot APA method based on the pre-trained acoustic model, HuBERT. Our method involves encoding speech input and corrupting them via a masking module. We then employ the Transformer encoder and apply k-means clustering to obtain token sequences. Finally, a scoring module is designed to measure the number of wrongly recovered tokens. Experimental results on speechocean762 demonstrate that the proposed method achieves comparable performance to supervised regression baselines and outperforms non-regression baselines in terms of Pearson Correlation Coefficient (PCC). Additionally, we analyze how masking strategies affect the performance of APA.
△ Less
Submitted 31 May, 2023;
originally announced May 2023.
-
Small-amplitude Compressible Magnetohydrodynamic Turbulence Modulated by Collisionless Dam** in Earth's Magnetosheath: Observation Matches Theory
Authors:
Siqi Zhao,
Huirong Yan,
Terry Z. Liu,
Ka Ho Yuen,
Mijie Shi
Abstract:
Plasma turbulence is a ubiquitous dynamical process that transfers energy across many spatial and temporal scales and affects energetic particle transport. Recent advances in the understanding of compressible magnetohydrodynamic (MHD) turbulence demonstrate the important role of dam** in sha** energy distributions on small scales, yet its observational evidence is still lacking. This study pro…
▽ More
Plasma turbulence is a ubiquitous dynamical process that transfers energy across many spatial and temporal scales and affects energetic particle transport. Recent advances in the understanding of compressible magnetohydrodynamic (MHD) turbulence demonstrate the important role of dam** in sha** energy distributions on small scales, yet its observational evidence is still lacking. This study provides the first observational evidence of substantial collisionless dam** (CD) modulation on small-amplitude compressible MHD turbulence cascade in Earth's magnetosheath using four Cluster spacecraft. Based on an improved compressible MHD decomposition algorithm, turbulence is decomposed into three eigenmodes: incompressible Alfvén modes, and compressible slow and fast (magnetosonic) modes. Our observations demonstrate that CD enhances the anisotropy of compressible MHD modes because CD has a strong dependence on wave propagation angle. The wavenumber distributions of slow modes are mainly stretched perpendicular to the background magnetic field ($\mathbf{B_0}$) and weakly modulated by CD. In contrast, fast modes are subjected to a more significant CD modulation. Fast modes exhibit a weak, scale-independent anisotropy above the CD truncation scale. Below the CD truncation scale, the anisotropy of fast modes enhances as wavenumbers increase. As a result, fast mode fractions in the total energy of compressible modes decrease with the increase of perpendicular wavenumber (to $\mathbf{B_0}$) or wave propagation angle. Our findings reveal how the turbulence cascade is shaped by CD and its consequences to anisotropies in the space environment.
△ Less
Submitted 8 February, 2024; v1 submitted 21 May, 2023;
originally announced May 2023.
-
CASA-ASR: Context-Aware Speaker-Attributed ASR
Authors:
Mohan Shi,
Zhihao Du,
Qian Chen,
Fan Yu,
Yangze Li,
Shiliang Zhang,
Jie Zhang,
Li-Rong Dai
Abstract:
Recently, speaker-attributed automatic speech recognition (SA-ASR) has attracted a wide attention, which aims at answering the question ``who spoke what''. Different from modular systems, end-to-end (E2E) SA-ASR minimizes the speaker-dependent recognition errors directly and shows a promising applicability. In this paper, we propose a context-aware SA-ASR (CASA-ASR) model by enhancing the contextu…
▽ More
Recently, speaker-attributed automatic speech recognition (SA-ASR) has attracted a wide attention, which aims at answering the question ``who spoke what''. Different from modular systems, end-to-end (E2E) SA-ASR minimizes the speaker-dependent recognition errors directly and shows a promising applicability. In this paper, we propose a context-aware SA-ASR (CASA-ASR) model by enhancing the contextual modeling ability of E2E SA-ASR. Specifically, in CASA-ASR, a contextual text encoder is involved to aggregate the semantic information of the whole utterance, and a context-dependent scorer is employed to model the speaker discriminability by contrasting with speakers in the context. In addition, a two-pass decoding strategy is further proposed to fully leverage the contextual modeling ability resulting in a better recognition performance. Experimental results on AliMeeting corpus show that the proposed CASA-ASR model outperforms the original E2E SA-ASR system with a relative improvement of 11.76% in terms of speaker-dependent character error rate.
△ Less
Submitted 21 May, 2023;
originally announced May 2023.
-
Semantic VAD: Low-Latency Voice Activity Detection for Speech Interaction
Authors:
Mohan Shi,
Yuchun Shu,
Lingyun Zuo,
Qian Chen,
Shiliang Zhang,
Jie Zhang,
Li-Rong Dai
Abstract:
For speech interaction, voice activity detection (VAD) is often used as a front-end. However, traditional VAD algorithms usually need to wait for a continuous tail silence to reach a preset maximum duration before segmentation, resulting in a large latency that affects user experience. In this paper, we propose a novel semantic VAD for low-latency segmentation. Different from existing methods, a f…
▽ More
For speech interaction, voice activity detection (VAD) is often used as a front-end. However, traditional VAD algorithms usually need to wait for a continuous tail silence to reach a preset maximum duration before segmentation, resulting in a large latency that affects user experience. In this paper, we propose a novel semantic VAD for low-latency segmentation. Different from existing methods, a frame-level punctuation prediction task is added to the semantic VAD, and the artificial endpoint is included in the classification category in addition to the often-used speech presence and absence. To enhance the semantic information of the model, we also incorporate an automatic speech recognition (ASR) related semantic loss. Evaluations on an internal dataset show that the proposed method can reduce the average latency by 53.3% without significant deterioration of character error rate in the back-end ASR compared to the traditional VAD approach.
△ Less
Submitted 21 May, 2023;
originally announced May 2023.
-
A new method for solving the equation $x^d+(x+1)^d=b$ in $\mathbb{F}_{q^4}$ where $d=q^3+q^2+q-1$
Authors:
Liqin Qian,
Minjia Shi,
Wei Lu
Abstract:
In this paper, we give a new method answer to a recent conjecture proposed by Budaghyan, Calderini, Carlet, Davidova and Kaleyski about the equation $x^d+(x+1)^d=b$ in $\mathbb{F}_{q^4}$, where $n$ is a positive integer, $q=2^n$ and $d=q^3+q^2+q-1$. In particular, we directly determine the differential spectrum of this power function $x^d$ using methods different from those in the literature. Comp…
▽ More
In this paper, we give a new method answer to a recent conjecture proposed by Budaghyan, Calderini, Carlet, Davidova and Kaleyski about the equation $x^d+(x+1)^d=b$ in $\mathbb{F}_{q^4}$, where $n$ is a positive integer, $q=2^n$ and $d=q^3+q^2+q-1$. In particular, we directly determine the differential spectrum of this power function $x^d$ using methods different from those in the literature. Compared with the methods in the literature, our method is more direct and simple.
△ Less
Submitted 17 May, 2023;
originally announced May 2023.
-
Characterization of Plotkin-optimal two-weight codes over finite chain rings and related applications
Authors:
Shitao Li,
Minjia Shi
Abstract:
Few-weight codes over finite chain rings are associated with combinatorial objects such as strongly regular graphs (SRGs), strongly walk-regular graphs (SWRGs) and finite geometries, and are also widely used in data storage systems and secret sharing schemes. The first objective of this paper is to characterize all possible parameters of Plotkin-optimal two-homogeneous weight regular projective co…
▽ More
Few-weight codes over finite chain rings are associated with combinatorial objects such as strongly regular graphs (SRGs), strongly walk-regular graphs (SWRGs) and finite geometries, and are also widely used in data storage systems and secret sharing schemes. The first objective of this paper is to characterize all possible parameters of Plotkin-optimal two-homogeneous weight regular projective codes over finite chain rings, as well as their weight distributions. We show the existence of codes with these parameters by constructing an infinite family of two-homogeneous weight codes. The parameters of their Gray images have the same weight distribution as that of the two-weight codes of type SU1 in the sense of Calderbank and Kantor (Bull Lond Math Soc 18: 97-122, 1986). Further, we also construct three-homogeneous weight regular projective codes over finite chain rings combined with some known results. Finally, we study applications of our constructed codes in secret sharing schemes and graph theory. In particular, infinite families of SRGs and SWRGs with non-trivial parameters are obtained.
△ Less
Submitted 15 May, 2023;
originally announced May 2023.
-
Filtering higher-order datasets
Authors:
Nicholas W. Landry,
Ilya Amburg,
Mirah Shi,
Sinan G. Aksoy
Abstract:
Many complex systems often contain interactions between more than two nodes, known as higher-order interactions, which can change the structure of these systems in significant ways. Researchers often assume that all interactions paint a consistent picture of a higher-order dataset's structure. In contrast, the connection patterns of individuals or entities in empirical systems are often stratified…
▽ More
Many complex systems often contain interactions between more than two nodes, known as higher-order interactions, which can change the structure of these systems in significant ways. Researchers often assume that all interactions paint a consistent picture of a higher-order dataset's structure. In contrast, the connection patterns of individuals or entities in empirical systems are often stratified by interaction size. Ignoring this fact can aggregate connection patterns that exist only at certain scales of interaction. To isolate these scale-dependent patterns, we present an approach for analyzing higher-order datasets by filtering interactions by their size. We apply this framework to several empirical datasets from three domains to demonstrate that data practitioners can gain valuable information from this approach.
△ Less
Submitted 1 November, 2023; v1 submitted 11 May, 2023;
originally announced May 2023.
-
Quasi-cyclic perfect codes in Doob graphs and special partitions of Galois rings
Authors:
Minjia Shi,
Xiaoxiao Li,
Denis S. Krotov,
Ferruh Özbudak
Abstract:
The Galois ring GR$(4^Δ)$ is the residue ring $Z_4[x]/(h(x))$, where $h(x)$ is a basic primitive polynomial of degree $Δ$ over $Z_4$. For any odd $Δ$ larger than $1$, we construct a partition of GR$(4^Δ) \backslash \{0\}$ into $6$-subsets of type $\{a,b,-a-b,-a,-b,a+b\}$ and $3$-subsets of type $\{c,-c,2c\}$ such that the partition is invariant under the multiplication by a nonzero element of the…
▽ More
The Galois ring GR$(4^Δ)$ is the residue ring $Z_4[x]/(h(x))$, where $h(x)$ is a basic primitive polynomial of degree $Δ$ over $Z_4$. For any odd $Δ$ larger than $1$, we construct a partition of GR$(4^Δ) \backslash \{0\}$ into $6$-subsets of type $\{a,b,-a-b,-a,-b,a+b\}$ and $3$-subsets of type $\{c,-c,2c\}$ such that the partition is invariant under the multiplication by a nonzero element of the Teichmuller set in GR$(4^Δ)$ and, if $Δ$ is not a multiple of $3$, under the action of the automorphism group of GR$(4^Δ)$.
As a corollary, this implies the existence of quasi-cyclic additive $1$-perfect codes of index $(2^Δ-1)$ in $D((2^Δ-1)(2^Δ-2)/{6}, 2^Δ-1 )$ where $D(m,n)$ is the Doob metric scheme on $Z^{2m+n}$.
△ Less
Submitted 4 May, 2023;
originally announced May 2023.
-
Fashionpedia-Ads: Do Your Favorite Advertisements Reveal Your Fashion Taste?
Authors:
Mengyun Shi,
Claire Cardie,
Serge Belongie
Abstract:
Consumers are exposed to advertisements across many different domains on the internet, such as fashion, beauty, car, food, and others. On the other hand, fashion represents second highest e-commerce shop** category. Does consumer digital record behavior on various fashion ad images reveal their fashion taste? Does ads from other domains infer their fashion taste as well? In this paper, we study…
▽ More
Consumers are exposed to advertisements across many different domains on the internet, such as fashion, beauty, car, food, and others. On the other hand, fashion represents second highest e-commerce shop** category. Does consumer digital record behavior on various fashion ad images reveal their fashion taste? Does ads from other domains infer their fashion taste as well? In this paper, we study the correlation between advertisements and fashion taste. Towards this goal, we introduce a new dataset, Fashionpedia-Ads, which asks subjects to provide their preferences on both ad (fashion, beauty, car, and dessert) and fashion product (social network and e-commerce style) images. Furthermore, we exhaustively collect and annotate the emotional, visual and textual information on the ad images from multi-perspectives (abstractive level, physical level, captions, and brands). We open-source Fashionpedia-Ads to enable future studies and encourage more approaches to interpretability research between advertisements and fashion taste.
△ Less
Submitted 3 May, 2023;
originally announced May 2023.
-
Fashionpedia-Taste: A Dataset towards Explaining Human Fashion Taste
Authors:
Mengyun Shi,
Serge Belongie,
Claire Cardie
Abstract:
Existing fashion datasets do not consider the multi-facts that cause a consumer to like or dislike a fashion image. Even two consumers like a same fashion image, they could like this image for total different reasons. In this paper, we study the reason why a consumer like a certain fashion image. Towards this goal, we introduce an interpretability dataset, Fashionpedia-taste, consist of rich annot…
▽ More
Existing fashion datasets do not consider the multi-facts that cause a consumer to like or dislike a fashion image. Even two consumers like a same fashion image, they could like this image for total different reasons. In this paper, we study the reason why a consumer like a certain fashion image. Towards this goal, we introduce an interpretability dataset, Fashionpedia-taste, consist of rich annotation to explain why a subject like or dislike a fashion image from the following 3 perspectives: 1) localized attributes; 2) human attention; 3) caption. Furthermore, subjects are asked to provide their personal attributes and preference on fashion, such as personality and preferred fashion brands. Our dataset makes it possible for researchers to build computational models to fully understand and interpret human fashion taste from different humanistic perspectives and modalities.
△ Less
Submitted 3 May, 2023;
originally announced May 2023.
-
Competing charge-density wave instabilities in the kagome metal ScV$_6$Sn$_6$
Authors:
Saizheng Cao,
Chenchao Xu,
Hiroshi Fukui,
Taishun Manjo,
Ming Shi,
Yang Liu,
Chao Cao,
Yu Song
Abstract:
Owing to its unique geometry, the kagome lattice hosts various many-body quantum states including frustrated magnetism, superconductivity, and charge-density waves (CDWs), with intense efforts focused on kagome metals exhibiting $2\times2$ CDWs associated with the nesting of van Hove saddle points. Recently, a $\sqrt{3}\times\sqrt{3}$ CDW was discovered in the kagome metal ScV$_6$Sn$_6$ below…
▽ More
Owing to its unique geometry, the kagome lattice hosts various many-body quantum states including frustrated magnetism, superconductivity, and charge-density waves (CDWs), with intense efforts focused on kagome metals exhibiting $2\times2$ CDWs associated with the nesting of van Hove saddle points. Recently, a $\sqrt{3}\times\sqrt{3}$ CDW was discovered in the kagome metal ScV$_6$Sn$_6$ below $T_{\rm CDW}\approx91$~K, whose underlying mechanism and formation process remain unclear. Using inelastic X-ray scattering, we discover a short-range $\sqrt{3}\times\sqrt{3}\times2$ CDW that is dominant in ScV$_6$Sn$_6$ well above $T_{\rm CDW}$, distinct from the $\sqrt{3}\times\sqrt{3}\times3$ CDW below $T_{\rm CDW}$. The short-range CDW grows upon cooling, and is accompanied by the softening of phonons, indicative of its dynamic nature. As the $\sqrt{3}\times\sqrt{3}\times3$ CDW appears, the short-range CDW becomes suppressed, revealing a competition between these CDW instabilities. Our first-principles calculations indicate that the $\sqrt{3}\times\sqrt{3}\times2$ CDW is energetically favored, consistent with experimental observations at high temperatures. However, the $\sqrt{3}\times\sqrt{3}\times3$ CDW is selected as the ground state likely due to a large wavevector-dependent electron-phonon coupling, which also accounts for the enhanced electron scattering above $T_{\rm CDW}$. The competing CDW instabilities in ScV$_6$Sn$_6$ lead to an unusual CDW formation process, with the most pronounced phonon softening and the static CDW occurring at different wavevectors.
△ Less
Submitted 17 April, 2023;
originally announced April 2023.
-
Hidden magnetism uncovered in charge ordered bilayer kagome material ScV_6Sn_6
Authors:
Z. Guguchia,
D. J. Gawryluk,
Soohyeon Shin,
Z. Hao,
C. Mielke III,
D. Das,
I. Plokhikh,
L. Liborio,
K. Shenton,
Y. Hu,
V. Sazgari,
M. Medarde,
H. Deng,
Y. Cai,
C. Chen,
Y. Jiang,
A. Amato,
M. Shi,
M. Z. Hasan,
J. -X. Yin,
R. Khasanov,
E. Pomjakushina,
H. Luetkens
Abstract:
Charge ordered kagome lattices have been demonstrated to be intriguing platforms for studying the intertwining of topology, correlation, and magnetism. The recently discovered charge ordered kagome material ScV_6Sn_6 does not feature a magnetic groundstate or excitations, thus it is often regarded as a conventional paramagnet. Here, using advanced muon-spin rotation spectroscopy, we uncover an une…
▽ More
Charge ordered kagome lattices have been demonstrated to be intriguing platforms for studying the intertwining of topology, correlation, and magnetism. The recently discovered charge ordered kagome material ScV_6Sn_6 does not feature a magnetic groundstate or excitations, thus it is often regarded as a conventional paramagnet. Here, using advanced muon-spin rotation spectroscopy, we uncover an unexpected hidden magnetism of the charge order. We observe a striking enhancement of the internal field width sensed by the muon ensemble, which takes place within the charge ordered state. More remarkably, the muon spin relaxation rate below the charge ordering temperature is substantially enhanced by applying an external magnetic field. Taken together with the hidden magnetism found in AV_3Sb_5 (A = K, Rb, Cs) and FeGe kagome systems, our results suggest ubiqitous time-reversal symmetry-breaking in charge ordered kagome lattices.
△ Less
Submitted 13 April, 2023;
originally announced April 2023.
-
Phonon promoted charge density wave in topological kagome metal ScV$_{6}$Sn$_{6}$
Authors:
Yong Hu,
Junzhang Ma,
Yinxiang Li,
Dariusz Jakub Gawryluk,
Tianchen Hu,
Jérémie Teyssier,
Volodymyr Multian,
Zhouyi Yin,
Yuxiao Jiang,
Shuxiang Xu,
Soohyeon Shin,
Igor Plokhikh,
Xinloong Han,
Nicholas Clark Plumb,
Yang Liu,
Jiaxin Yin,
Zurab Guguchia,
Yue Zhao,
Andreas P. Schnyder,
Xianxin Wu,
Ekaterina Pomjakushina,
M. Zahid Hasan,
Nanlin Wang,
Ming Shi
Abstract:
Charge density wave (CDW) orders in vanadium-based kagome metals have recently received tremendous attention due to their unique properties and intricate interplay with exotic correlated phenomena, topological and symmetry-breaking states. However, the origin of the CDW order remains a topic of debate. The discovery of ScV$_{6}$Sn$_{6}$, a vanadium-based bilayer kagome metal exhibiting an in-plane…
▽ More
Charge density wave (CDW) orders in vanadium-based kagome metals have recently received tremendous attention due to their unique properties and intricate interplay with exotic correlated phenomena, topological and symmetry-breaking states. However, the origin of the CDW order remains a topic of debate. The discovery of ScV$_{6}$Sn$_{6}$, a vanadium-based bilayer kagome metal exhibiting an in-plane $\sqrt{3}$ x $\sqrt{3} $ $\textit{R}$30$°$ CDW order with time-reversal symmetry breaking, provides a novel platform to explore the underlying mechanism behind the unconventional CDW. Here, we combine high-resolution angle-resolved photoemission spectroscopy, Raman scattering measurements and density functional theory to investigate the electronic structures and phonon modes of ScV$_{6}$Sn$_{6}$ and their evolution with temperature. We identify topologically nontrivial Dirac surface states and multiple van Hove singularities (VHSs) in the vicinity of the Fermi level, with one VHS near the K point exhibiting nesting wave vectors in proximity to the $\sqrt{3}$ x $\sqrt{3}$ $\textit{R}$30$°$ CDW wave vector. Additionally, Raman measurements indicate a strong intrinsic electron-phonon coupling in ScV$_{6}$Sn$_{6}$, as evidenced by the presence of a two-phonon mode and a large frequency amplitude mode. Our findings highlight the fundamental role of lattice degrees of freedom in promoting the CDW in ScV$_{6}$Sn$_{6}$ and provide important insights into the fascinating correlation phenomena observed in kagome metals.
△ Less
Submitted 13 April, 2023;
originally announced April 2023.
-
Self-training with dual uncertainty for semi-supervised medical image segmentation
Authors:
Zhanhong Qiu,
Haitao Gan,
Ming Shi,
Zhongwei Huang,
Zhi Yang
Abstract:
In the field of semi-supervised medical image segmentation, the shortage of labeled data is the fundamental problem. How to effectively learn image features from unlabeled images to improve segmentation accuracy is the main research direction in this field. Traditional self-training methods can partially solve the problem of insufficient labeled data by generating pseudo labels for iterative train…
▽ More
In the field of semi-supervised medical image segmentation, the shortage of labeled data is the fundamental problem. How to effectively learn image features from unlabeled images to improve segmentation accuracy is the main research direction in this field. Traditional self-training methods can partially solve the problem of insufficient labeled data by generating pseudo labels for iterative training. However, noise generated due to the model's uncertainty during training directly affects the segmentation results. Therefore, we added sample-level and pixel-level uncertainty to stabilize the training process based on the self-training framework. Specifically, we saved several moments of the model during pre-training, and used the difference between their predictions on unlabeled samples as the sample-level uncertainty estimate for that sample. Then, we gradually add unlabeled samples from easy to hard during training. At the same time, we added a decoder with different upsampling methods to the segmentation network and used the difference between the outputs of the two decoders as pixel-level uncertainty. In short, we selectively retrained unlabeled samples and assigned pixel-level uncertainty to pseudo labels to optimize the self-training process. We compared the segmentation results of our model with five semi-supervised approaches on the public 2017 ACDC dataset and 2018 Prostate dataset. Our proposed method achieves better segmentation performance on both datasets under the same settings, demonstrating its effectiveness, robustness, and potential transferability to other medical image segmentation tasks. Keywords: Medical image segmentation, semi-supervised learning, self-training, uncertainty estimation
△ Less
Submitted 10 October, 2023; v1 submitted 10 April, 2023;
originally announced April 2023.
-
HD-GCN:A Hybrid Diffusion Graph Convolutional Network
Authors:
Zhi Yang,
Kang Li,
Haitao Gan,
Zhongwei Huang,
Ming Shi
Abstract:
The information diffusion performance of GCN and its variant models is limited by the adjacency matrix, which can lower their performance. Therefore, we introduce a new framework for graph convolutional networks called Hybrid Diffusion-based Graph Convolutional Network (HD-GCN) to address the limitations of information diffusion caused by the adjacency matrix. In the HD-GCN framework, we initially…
▽ More
The information diffusion performance of GCN and its variant models is limited by the adjacency matrix, which can lower their performance. Therefore, we introduce a new framework for graph convolutional networks called Hybrid Diffusion-based Graph Convolutional Network (HD-GCN) to address the limitations of information diffusion caused by the adjacency matrix. In the HD-GCN framework, we initially utilize diffusion maps to facilitate the diffusion of information among nodes that are adjacent to each other in the feature space. This allows for the diffusion of information between similar points that may not have an adjacent relationship. Next, we utilize graph convolution to further propagate information among adjacent nodes after the diffusion maps, thereby enabling the spread of information among similar nodes that are adjacent in the graph. Finally, we employ the diffusion distances obtained through the use of diffusion maps to regularize and constrain the predicted labels of training nodes. This regularization method is then applied to the HD-GCN training, resulting in a smoother classification surface. The model proposed in this paper effectively overcomes the limitations of information diffusion imposed only by the adjacency matrix. HD-GCN utilizes hybrid diffusion by combining information diffusion between neighborhood nodes in the feature space and adjacent nodes in the adjacency matrix. This method allows for more comprehensive information propagation among nodes, resulting in improved model performance. We evaluated the performance of DM-GCN on three well-known citation network datasets and the results showed that the proposed framework is more effective than several graph-based semi-supervised learning methods.
△ Less
Submitted 31 March, 2023;
originally announced March 2023.
-
Binary self-orthogonal codes which meet the Griesmer bound or have optimal minimum distances
Authors:
Minjia Shi,
Shitao Li,
Tor Helleseth,
Jon-Lark Kim
Abstract:
The purpose of this paper is two-fold. First, we characterize the existence of binary self-orthogonal codes meeting the Griesmer bound by employing Solomon-Stiffler codes and some related residual codes. Second, using such a characterization, we determine the exact value of $d_{so}(n,7)$ except for five special cases and the exact value of $d_{so}(n,8)$ except for 41 special cases, where…
▽ More
The purpose of this paper is two-fold. First, we characterize the existence of binary self-orthogonal codes meeting the Griesmer bound by employing Solomon-Stiffler codes and some related residual codes. Second, using such a characterization, we determine the exact value of $d_{so}(n,7)$ except for five special cases and the exact value of $d_{so}(n,8)$ except for 41 special cases, where $d_{so}(n,k)$ denotes the largest minimum distance among all binary self-orthogonal $[n, k]$ codes. Currently, the exact value of $d_{so}(n,k)$ $(k \le 6)$ was determined by Shi et al. (2022). In addition, we develop a general method to prove the nonexistence of some binary self-orthogonal codes by considering the residual code of a binary self-orthogonal code.
△ Less
Submitted 29 March, 2023;
originally announced March 2023.
-
Remarks on on the derivatives of rational Bézier curves
Authors:
Mao Shi
Abstract:
By studying the existing higher order derivation formulas of rational Bézier curves, we find that they fail when the order of the derivative exceeds the degree of the curves. In this paper, we present a new derivation formula for rational Bézier curves that overcomes this drawback and show that the $k$th degree derivative of a $n$th degree rational Bézier curve can be written in terms of a…
▽ More
By studying the existing higher order derivation formulas of rational Bézier curves, we find that they fail when the order of the derivative exceeds the degree of the curves. In this paper, we present a new derivation formula for rational Bézier curves that overcomes this drawback and show that the $k$th degree derivative of a $n$th degree rational Bézier curve can be written in terms of a $(2^kn)$th degree rational Bézier curve.we also consider the properties of the endpoints and the bounds of the derivatives.
△ Less
Submitted 21 February, 2024; v1 submitted 27 March, 2023;
originally announced March 2023.