-
A class of bootstrap based residuals for compositional data
Authors:
Gustavo H. A. Pereira,
Jianwen Cai
Abstract:
Regression models for compositional data are common in several areas of knowledge. As in other classes of regression models, it is desirable to perform diagnostic analysis in these models using residuals that are approximately standard normally distributed. However, for regression models for compositional data, there has not been any multivariate residual that meets this requirement. In this work,…
▽ More
Regression models for compositional data are common in several areas of knowledge. As in other classes of regression models, it is desirable to perform diagnostic analysis in these models using residuals that are approximately standard normally distributed. However, for regression models for compositional data, there has not been any multivariate residual that meets this requirement. In this work, we introduce a class of asymptotically standard normally distributed residuals for compositional data based on bootstrap. Monte Carlo simulation studies indicate that the distributions of the residuals of this class are well approximated by the standard normal distribution in small samples. An application to simulated data also suggests that one of the residuals of the proposed class is better to identify model misspecification than its competitors. Finally, the usefulness of the best residual of the proposed class is illustrated through an application on sleep stages. The class of residuals proposed here can also be used in other classes of multivariate regression models.
△ Less
Submitted 20 March, 2024;
originally announced March 2024.
-
Diversified and Personalized Multi-rater Medical Image Segmentation
Authors:
Yicheng Wu,
Xiangde Luo,
Zhe Xu,
Xiaoqing Guo,
Lie Ju,
Zongyuan Ge,
Wenjun Liao,
Jianfei Cai
Abstract:
Annotation ambiguity due to inherent data uncertainties such as blurred boundaries in medical scans and different observer expertise and preferences has become a major obstacle for training deep-learning based medical image segmentation models. To address it, the common practice is to gather multiple annotations from different experts, leading to the setting of multi-rater medical image segmentati…
▽ More
Annotation ambiguity due to inherent data uncertainties such as blurred boundaries in medical scans and different observer expertise and preferences has become a major obstacle for training deep-learning based medical image segmentation models. To address it, the common practice is to gather multiple annotations from different experts, leading to the setting of multi-rater medical image segmentation. Existing works aim to either merge different annotations into the "groundtruth" that is often unattainable in numerous medical contexts, or generate diverse results, or produce personalized results corresponding to individual expert raters. Here, we bring up a more ambitious goal for multi-rater medical image segmentation, i.e., obtaining both diversified and personalized results. Specifically, we propose a two-stage framework named D-Persona (first Diversification and then Personalization). In Stage I, we exploit multiple given annotations to train a Probabilistic U-Net model, with a bound-constrained loss to improve the prediction diversity. In this way, a common latent space is constructed in Stage I, where different latent codes denote diversified expert opinions. Then, in Stage II, we design multiple attention-based projection heads to adaptively query the corresponding expert prompts from the shared latent space, and then perform the personalized medical image segmentation. We evaluated the proposed model on our in-house Nasopharyngeal Carcinoma dataset and the public lung nodule dataset (i.e., LIDC-IDRI). Extensive experiments demonstrated our D-Persona can provide diversified and personalized results at the same time, achieving new SOTA performance for multi-rater medical image segmentation. Our code will be released at https://github.com/ycwu1997/D-Persona.
△ Less
Submitted 20 March, 2024;
originally announced March 2024.
-
OV9D: Open-Vocabulary Category-Level 9D Object Pose and Size Estimation
Authors:
Junhao Cai,
Yisheng He,
Weihao Yuan,
Siyu Zhu,
Zilong Dong,
Liefeng Bo,
Qifeng Chen
Abstract:
This paper studies a new open-set problem, the open-vocabulary category-level object pose and size estimation. Given human text descriptions of arbitrary novel object categories, the robot agent seeks to predict the position, orientation, and size of the target object in the observed scene image. To enable such generalizability, we first introduce OO3D-9D, a large-scale photorealistic dataset for…
▽ More
This paper studies a new open-set problem, the open-vocabulary category-level object pose and size estimation. Given human text descriptions of arbitrary novel object categories, the robot agent seeks to predict the position, orientation, and size of the target object in the observed scene image. To enable such generalizability, we first introduce OO3D-9D, a large-scale photorealistic dataset for this task. Derived from OmniObject3D, OO3D-9D is the largest and most diverse dataset in the field of category-level object pose and size estimation. It includes additional annotations for the symmetry axis of each category, which help resolve symmetric ambiguity. Apart from the large-scale dataset, we find another key to enabling such generalizability is leveraging the strong prior knowledge in pre-trained visual-language foundation models. We then propose a framework built on pre-trained DinoV2 and text-to-image stable diffusion models to infer the normalized object coordinate space (NOCS) maps of the target instances. This framework fully leverages the visual semantic prior from DinoV2 and the aligned visual and language knowledge within the text-to-image diffusion model, which enables generalization to various text descriptions of novel categories. Comprehensive quantitative and qualitative experiments demonstrate that the proposed open-vocabulary method, trained on our large-scale synthesized data, significantly outperforms the baseline and can effectively generalize to real-world images of unseen categories. The project page is at https://ov9d.github.io.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
Revisiting Tensor Basis Neural Networks for Reynolds stress modeling: application to plane channel and square duct flows
Authors:
Jiayi Cai,
Pierre-Emmanuel Angeli,
Jean-Marc Martinez,
Guillaume Damblin,
Didier Lucor
Abstract:
Several Tensor Basis Neural Network (TBNN) frameworks aimed at enhancing turbulence RANS modeling have recently been proposed in the literature as data-driven constitutive models for systems with known invariance properties. However, persistent ambiguities remain regarding the physical adequacy of applying the General Eddy Viscosity Model (GEVM). This work aims at investigating this aspect in an a…
▽ More
Several Tensor Basis Neural Network (TBNN) frameworks aimed at enhancing turbulence RANS modeling have recently been proposed in the literature as data-driven constitutive models for systems with known invariance properties. However, persistent ambiguities remain regarding the physical adequacy of applying the General Eddy Viscosity Model (GEVM). This work aims at investigating this aspect in an a priori stage for better predictions of the Reynolds stress anisotropy tensor, while preserving the Galilean and rotational invariances. In particular, we propose a general framework providing optimal tensor basis models for two types of canonical flows: Plane Channel Flow (PCF) and Square Duct Flow (SDF). Subsequently, deep neural networks based on these optimal models are trained using state-of-the-art strategies to achieve a balanced and physically sound prediction of the full anisotropy tensor. A priori results obtained by the proposed framework are in very good agreement with the reference DNS data. Notably, our shallow network with three layers provides accurate predictions of the anisotropy tensor for PCF at unobserved friction Reynolds numbers, both in interpolation and extrapolation scenarios. Learning the SDF case is more challenging because of its physical nature and a lack of training data at various regimes. We propose to alleviate this problem based on Transfer Learning (TL). To more efficiently generalize to an unseen intermediate $\mathrm{Re}_τ$ regime, we take advantage of our prior knowledge acquired from a training with a larger and wider dataset. Our results indicate the potential of the developed network model, and demonstrate the feasibility and efficiency of the TL process in terms of training data size and training time. Based on these results, we believe there is a promising future by integrating these neural networks into an adapted in-house RANS solver.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
RL in Markov Games with Independent Function Approximation: Improved Sample Complexity Bound under the Local Access Model
Authors:
Junyi Fan,
Yuxuan Han,
Jialin Zeng,
Jian-Feng Cai,
Yang Wang,
Yang Xiang,
Jiheng Zhang
Abstract:
Efficiently learning equilibria with large state and action spaces in general-sum Markov games while overcoming the curse of multi-agency is a challenging problem. Recent works have attempted to solve this problem by employing independent linear function classes to approximate the marginal $Q$-value for each agent. However, existing sample complexity bounds under such a framework have a suboptimal…
▽ More
Efficiently learning equilibria with large state and action spaces in general-sum Markov games while overcoming the curse of multi-agency is a challenging problem. Recent works have attempted to solve this problem by employing independent linear function classes to approximate the marginal $Q$-value for each agent. However, existing sample complexity bounds under such a framework have a suboptimal dependency on the desired accuracy $\varepsilon$ or the action space. In this work, we introduce a new algorithm, Lin-Confident-FTRL, for learning coarse correlated equilibria (CCE) with local access to the simulator, i.e., one can interact with the underlying environment on the visited states. Up to a logarithmic dependence on the size of the state space, Lin-Confident-FTRL learns $ε$-CCE with a provable optimal accuracy bound $O(ε^{-2})$ and gets rids of the linear dependency on the action space, while scaling polynomially with relevant problem parameters (such as the number of agents and time horizon). Moreover, our analysis of Linear-Confident-FTRL generalizes the virtual policy iteration technique in the single-agent local planning literature, which yields a new computationally efficient algorithm with a tighter sample complexity bound when assuming random access to the simulator.
△ Less
Submitted 19 March, 2024; v1 submitted 18 March, 2024;
originally announced March 2024.
-
Generative Region-Language Pretraining for Open-Ended Object Detection
Authors:
Chuang Lin,
Yi Jiang,
Lizhen Qu,
Zehuan Yuan,
Jianfei Cai
Abstract:
In recent research, significant attention has been devoted to the open-vocabulary object detection task, aiming to generalize beyond the limited number of classes labeled during training and detect objects described by arbitrary category names at inference. Compared with conventional object detection, open vocabulary object detection largely extends the object detection categories. However, it rel…
▽ More
In recent research, significant attention has been devoted to the open-vocabulary object detection task, aiming to generalize beyond the limited number of classes labeled during training and detect objects described by arbitrary category names at inference. Compared with conventional object detection, open vocabulary object detection largely extends the object detection categories. However, it relies on calculating the similarity between image regions and a set of arbitrary category names with a pretrained vision-and-language model. This implies that, despite its open-set nature, the task still needs the predefined object categories during the inference stage. This raises the question: What if we do not have exact knowledge of object categories during inference? In this paper, we call such a new setting as generative open-ended object detection, which is a more general and practical problem. To address it, we formulate object detection as a generative problem and propose a simple framework named GenerateU, which can detect dense objects and generate their names in a free-form way. Particularly, we employ Deformable DETR as a region proposal generator with a language model translating visual regions to object names. To assess the free-form object detection task, we introduce an evaluation method designed to quantitatively measure the performance of generative outcomes. Extensive experiments demonstrate strong zero-shot detection performance of our GenerateU. For example, on the LVIS dataset, our GenerateU achieves comparable results to the open-vocabulary object detection method GLIP, even though the category names are not seen by GenerateU during inference. Code is available at: https:// github.com/FoundationVision/GenerateU .
△ Less
Submitted 15 March, 2024;
originally announced March 2024.
-
Measurements of All-Particle Energy Spectrum and Mean Logarithmic Mass of Cosmic Rays from 0.3 to 30 PeV with LHAASO-KM2A
Authors:
The LHAASO Collaboration,
Zhen Cao,
F. Aharonian,
Q. An,
A. Axikegu,
Y. X. Bai,
Y. W. Bao,
D. Bastieri,
X. J. Bi,
Y. J. Bi,
J. T. Cai,
Q. Cao,
W. Y. Cao,
Zhe Cao,
J. Chang,
J. F. Chang,
A. M. Chen,
E. S. Chen,
Liang Chen,
Lin Chen,
Long Chen,
M. J. Chen,
M. L. Chen,
Q. H. Chen,
S. H. Chen
, et al. (256 additional authors not shown)
Abstract:
We present the measurements of all-particle energy spectrum and mean logarithmic mass of cosmic rays in the energy range of 0.3-30 PeV using data collected from LHAASO-KM2A between September 2021 and December 2022, which is based on a nearly composition-independent energy reconstruction method, achieving unprecedented accuracy. Our analysis reveals the position of the knee at…
▽ More
We present the measurements of all-particle energy spectrum and mean logarithmic mass of cosmic rays in the energy range of 0.3-30 PeV using data collected from LHAASO-KM2A between September 2021 and December 2022, which is based on a nearly composition-independent energy reconstruction method, achieving unprecedented accuracy. Our analysis reveals the position of the knee at $3.67 \pm 0.05 \pm 0.15$ PeV. Below the knee, the spectral index is found to be -$2.7413 \pm 0.0004 \pm 0.0050$, while above the knee, it is -$3.128 \pm 0.005 \pm 0.027$, with the sharpness of the transition measured with a statistical error of 2%. The mean logarithmic mass of cosmic rays is almost heavier than helium in the whole measured energy range. It decreases from 1.7 at 0.3 PeV to 1.3 at 3 PeV, representing a 24% decline following a power law with an index of -$0.1200 \pm 0.0003 \pm 0.0341$. This is equivalent to an increase in abundance of light components. Above the knee, the mean logarithmic mass exhibits a power law trend towards heavier components, which is reversal to the behavior observed in the all-particle energy spectrum. Additionally, the knee position and the change in power-law index are approximately the same. These findings suggest that the knee observed in the all-particle spectrum corresponds to the knee of the light component, rather than the medium-heavy components.
△ Less
Submitted 26 March, 2024; v1 submitted 15 March, 2024;
originally announced March 2024.
-
Imaginary-time relaxation quantum critical dynamics in two-dimensional dimerized Heisenberg model
Authors:
Jia-Qi Cai,
Yu-Rong Shu,
Xue-Qing Rao,
Shuai Yin
Abstract:
We study the imaginary-time relaxation critical dynamics of the Neel-paramagnetic quantum phase transition in the two-dimensional (2D) dimerized S = 1/2 Heisenberg model. We focus on the scaling correction in the short-time region. A unified scaling form including both short-time and finite-size corrections is proposed. According to this full scaling form, improved short-imaginary-time scaling rel…
▽ More
We study the imaginary-time relaxation critical dynamics of the Neel-paramagnetic quantum phase transition in the two-dimensional (2D) dimerized S = 1/2 Heisenberg model. We focus on the scaling correction in the short-time region. A unified scaling form including both short-time and finite-size corrections is proposed. According to this full scaling form, improved short-imaginary-time scaling relations are obtained. We numerically verify the scaling form and the improved short-time scaling relations for different initial states using projector quantum Monte Carlo algorithm.
△ Less
Submitted 14 March, 2024;
originally announced March 2024.
-
Non-Hermitian sensing in the absence of exceptional points
Authors:
Lei Xiao,
Yaoming Chu,
Quan Lin,
Haiqing Lin,
Wei Yi,
Jianming Cai,
Peng Xue
Abstract:
Open systems possess unique potentials in high-precision sensing, yet the majority of previous studies rely on the spectral singularities known as exceptional points. Here we theoretically propose and experimentally demonstrate universal non-Hermitian sensing in the absence of exceptional points. The scheme makes use of the intrinsic sensitivity of a non-Hermitian probe to weak external fields, wh…
▽ More
Open systems possess unique potentials in high-precision sensing, yet the majority of previous studies rely on the spectral singularities known as exceptional points. Here we theoretically propose and experimentally demonstrate universal non-Hermitian sensing in the absence of exceptional points. The scheme makes use of the intrinsic sensitivity of a non-Hermitian probe to weak external fields, which can be understood as the direct consequence of non-Hermiticity. We confirm the basic mechanism by simulating the sensor-field dynamics using photon interferometry, and, as a concrete example, demonstrate the enhanced sensing of signals encoded in the setting angle of a wave plate. While the sensitivity of the probe is ultimately limited by the measurement noise, we find the non-Hermitian sensor showing superior performance under background noises that cannot be suppressed through repetitive measurements. Our experiment opens the avenue of enhanced sensing without exceptional points, complementing existing efforts aimed at harnessing the unique features of open systems.
△ Less
Submitted 12 March, 2024;
originally announced March 2024.
-
Attacking Transformers with Feature Diversity Adversarial Perturbation
Authors:
Chenxing Gao,
Hang Zhou,
Junqing Yu,
YuTeng Ye,
Jiale Cai,
Junle Wang,
Wei Yang
Abstract:
Understanding the mechanisms behind Vision Transformer (ViT), particularly its vulnerability to adversarial perturba tions, is crucial for addressing challenges in its real-world applications. Existing ViT adversarial attackers rely on la bels to calculate the gradient for perturbation, and exhibit low transferability to other structures and tasks. In this paper, we present a label-free white-box…
▽ More
Understanding the mechanisms behind Vision Transformer (ViT), particularly its vulnerability to adversarial perturba tions, is crucial for addressing challenges in its real-world applications. Existing ViT adversarial attackers rely on la bels to calculate the gradient for perturbation, and exhibit low transferability to other structures and tasks. In this paper, we present a label-free white-box attack approach for ViT-based models that exhibits strong transferability to various black box models, including most ViT variants, CNNs, and MLPs, even for models developed for other modalities. Our inspira tion comes from the feature collapse phenomenon in ViTs, where the critical attention mechanism overly depends on the low-frequency component of features, causing the features in middle-to-end layers to become increasingly similar and eventually collapse. We propose the feature diversity attacker to naturally accelerate this process and achieve remarkable performance and transferability.
△ Less
Submitted 9 March, 2024;
originally announced March 2024.
-
A Preliminary Exploration of YouTubers' Use of Generative-AI in Content Creation
Authors:
Yao Lyu,
He Zhang,
Shuo Niu,
Jie Cai
Abstract:
Content creators increasingly utilize generative artificial intelligence (Gen-AI) on platforms such as YouTube, TikTok, Instagram, and various blogging sites to produce imaginative images, AI-generated videos, and articles using Large Language Models (LLMs). Despite its growing popularity, there remains an underexplored area concerning the specific domains where AI-generated content is being appli…
▽ More
Content creators increasingly utilize generative artificial intelligence (Gen-AI) on platforms such as YouTube, TikTok, Instagram, and various blogging sites to produce imaginative images, AI-generated videos, and articles using Large Language Models (LLMs). Despite its growing popularity, there remains an underexplored area concerning the specific domains where AI-generated content is being applied, and the methodologies content creators employ with Gen-AI tools during the creation process. This study initially explores this emerging area through a qualitative analysis of 68 YouTube videos demonstrating Gen-AI usage. Our research focuses on identifying the content domains, the variety of tools used, the activities performed, and the nature of the final products generated by Gen-AI in the context of user-generated content.
△ Less
Submitted 9 March, 2024;
originally announced March 2024.
-
Content Moderation Justice and Fairness on Social Media: Comparisons Across Different Contexts and Platforms
Authors:
Jie Cai,
Aashka Patel,
Azadeh Naderi,
Donghee Yvette Wohn
Abstract:
Social media users may perceive moderation decisions by the platform differently, which can lead to frustration and dropout. This study investigates users' perceived justice and fairness of online moderation decisions when they are exposed to various illegal versus legal scenarios, retributive versus restorative moderation strategies, and user-moderated versus commercially moderated platforms. We…
▽ More
Social media users may perceive moderation decisions by the platform differently, which can lead to frustration and dropout. This study investigates users' perceived justice and fairness of online moderation decisions when they are exposed to various illegal versus legal scenarios, retributive versus restorative moderation strategies, and user-moderated versus commercially moderated platforms. We conduct an online experiment on 200 American social media users of Reddit and Twitter. Results show that retributive moderation delivers higher justice and fairness for commercially moderated than for user-moderated platforms in illegal violations; restorative moderation delivers higher fairness for legal violations than illegal ones. We discuss the opportunities for platform policymaking to improve moderation system design.
△ Less
Submitted 9 March, 2024;
originally announced March 2024.
-
Semi-Supervised Dialogue Abstractive Summarization via High-Quality Pseudolabel Selection
Authors:
Jianfeng He,
Hang Su,
Jason Cai,
Igor Shalyminov,
Hwanjun Song,
Saab Mansour
Abstract:
Semi-supervised dialogue summarization (SSDS) leverages model-generated summaries to reduce reliance on human-labeled data and improve the performance of summarization models. While addressing label noise, previous works on semi-supervised learning primarily focus on natural language understanding tasks, assuming each sample has a unique label. However, these methods are not directly applicable to…
▽ More
Semi-supervised dialogue summarization (SSDS) leverages model-generated summaries to reduce reliance on human-labeled data and improve the performance of summarization models. While addressing label noise, previous works on semi-supervised learning primarily focus on natural language understanding tasks, assuming each sample has a unique label. However, these methods are not directly applicable to SSDS, as it is a generative task, and each dialogue can be summarized in different ways. In this work, we propose a novel scoring approach, SiCF, which encapsulates three primary dimensions of summarization model quality: Semantic invariance (indicative of model confidence), Coverage (factual recall), and Faithfulness (factual precision). Using the SiCF score, we select unlabeled dialogues with high-quality generated summaries to train summarization models. Comprehensive experiments on three public datasets demonstrate the effectiveness of SiCF scores in uncertainty estimation and semi-supervised learning for dialogue summarization tasks. Our code is available at \url{https://github.com/amazon-science/summarization-sicf-score}.
△ Less
Submitted 6 March, 2024;
originally announced March 2024.
-
A component-splitting implicit time integration for multicomponent reacting flows simulations
Authors:
**gchao Zhang,
**sheng Cai,
Shucheng Pan
Abstract:
A component-splitting method is proposed to improve convergence characteristics for implicit time integration of compressible multicomponent reactive flows. The characteristic decomposition of flux jacobian of multicomponent Navier-Stokes equations yields a large sparse eigensystem, presenting challenges of slow convergence and high computational costs for implicit methods. To addresses this issue…
▽ More
A component-splitting method is proposed to improve convergence characteristics for implicit time integration of compressible multicomponent reactive flows. The characteristic decomposition of flux jacobian of multicomponent Navier-Stokes equations yields a large sparse eigensystem, presenting challenges of slow convergence and high computational costs for implicit methods. To addresses this issue, the component-splitting method segregates the implicit operator into two parts: one for the flow equations (density/momentum/energy) and the other for the component equations. Each part's implicit operator employs flux-vector splitting based on their respective spectral radii to achieve accelerated convergence. This approach improves the computational efficiency of implicit iteration, mitigating the quadratic increase in time cost with the number of species. Two consistence corrections are developed to reduce the introduced component-splitting error and ensure the numerical consistency of mass fraction. Importantly, the impact of component-splitting method on accuracy is minimal as the residual approaches convergence. The accuracy, efficiency, and robustness of component-splitting method are thoroughly investigated and compared with the coupled implicit scheme through several numerical cases involving thermo-chemical nonequilibrium hypersonic flows. The results demonstrate that the component-splitting method decreases the required number of iteration steps for convergence of residual and wall heat flux, decreases the computation time per iteration step, and diminishes the residual to lower magnitude. The acceleration efficiency is enhanced with increases in CFL number and number of species.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
Energy-Efficient UAV Swarm Assisted MEC with Dynamic Clustering and Scheduling
Authors:
Jialiuyuan Li,
Jiayuan Chen,
Changyan Yi,
Tong Zhang,
Kun Zhu,
Jun Cai
Abstract:
In this paper, the energy-efficient unmanned aerial vehicle (UAV) swarm assisted mobile edge computing (MEC) with dynamic clustering and scheduling is studied. In the considered system model, UAVs are divided into multiple swarms, with each swarm consisting of a leader UAV and several follower UAVs to provide computing services to end-users. Unlike existing work, we allow UAVs to dynamically clust…
▽ More
In this paper, the energy-efficient unmanned aerial vehicle (UAV) swarm assisted mobile edge computing (MEC) with dynamic clustering and scheduling is studied. In the considered system model, UAVs are divided into multiple swarms, with each swarm consisting of a leader UAV and several follower UAVs to provide computing services to end-users. Unlike existing work, we allow UAVs to dynamically cluster into different swarms, i.e., each follower UAV can change its leader based on the time-varying spatial positions, updated application placement, etc. in a dynamic manner. Meanwhile, UAVs are required to dynamically schedule their energy replenishment, application placement, trajectory planning and task delegation. With the aim of maximizing the long-term energy efficiency of the UAV swarm assisted MEC system, a joint optimization problem of dynamic clustering and scheduling is formulated. Taking into account the underlying cooperation and competition among intelligent UAVs, we further reformulate this optimization problem as a combination of a series of strongly coupled multi-agent stochastic games, and then propose a novel reinforcement learning-based UAV swarm dynamic coordination (RLDC) algorithm for obtaining the equilibrium. Simulations are conducted to evaluate the performance of the RLDC algorithm and demonstrate its superiority over counterparts.
△ Less
Submitted 29 February, 2024;
originally announced February 2024.
-
Edge Computing Enabled Real-Time Video Analysis via Adaptive Spatial-Temporal Semantic Filtering
Authors:
Xiang Chen,
Wenjie Zhu,
Jiayuan Chen,
Tong Zhang,
Changyan Yi,
Jun Cai
Abstract:
This paper proposes a novel edge computing enabled real-time video analysis system for intelligent visual devices. The proposed system consists of a tracking-assisted object detection module (TAODM) and a region of interesting module (ROIM). TAODM adaptively determines the offloading decision to process each video frame locally with a tracking algorithm or to offload it to the edge server inferred…
▽ More
This paper proposes a novel edge computing enabled real-time video analysis system for intelligent visual devices. The proposed system consists of a tracking-assisted object detection module (TAODM) and a region of interesting module (ROIM). TAODM adaptively determines the offloading decision to process each video frame locally with a tracking algorithm or to offload it to the edge server inferred by an object detection model. ROIM determines each offloading frame's resolution and detection model configuration to ensure that the analysis results can return in time. TAODM and ROIM interact jointly to filter the repetitive spatial-temporal semantic information to maximize the processing rate while ensuring high video analysis accuracy. Unlike most existing works, this paper investigates the real-time video analysis systems where the intelligent visual device connects to the edge server through a wireless network with fluctuating network conditions. We decompose the real-time video analysis problem into the offloading decision and configurations selection sub-problems. To solve these two sub-problems, we introduce a double deep Q network (DDQN) based offloading approach and a contextual multi-armed bandit (CMAB) based adaptive configurations selection approach, respectively. A DDQN-CMAB reinforcement learning (DCRL) training framework is further developed to integrate these two approaches to improve the overall video analyzing performance. Extensive simulations are conducted to evaluate the performance of the proposed solution, and demonstrate its superiority over counterparts.
△ Less
Submitted 29 February, 2024;
originally announced February 2024.
-
Intrinsic supercurrent diode effect in NbSe2 nanobridge
Authors:
Yiwen Zhang,
Jiliang Cai,
Peng Dong,
Jiadian He,
Yifan Ding,
**ghui Wang,
Xiang Zhou,
Kecheng Cao,
Yueshen Wu,
Jun Li
Abstract:
The significance of the superconducting diode effect lies in its potential application as a fundamental component in the development of next-generation superconducting circuit technology. The stringent operating conditions at low temperatures have posed challenges for the conventional semiconductor diode, primarily due to its exceptionally high resistivity. In response to this limitation, various…
▽ More
The significance of the superconducting diode effect lies in its potential application as a fundamental component in the development of next-generation superconducting circuit technology. The stringent operating conditions at low temperatures have posed challenges for the conventional semiconductor diode, primarily due to its exceptionally high resistivity. In response to this limitation, various approaches have emerged to achieve the superconducting diode effect, primarily involving the disruption of inversion symmetry in a two-dimensional superconductor through heterostructure fabrication. In this study, we present a direct observation of the supercurrent diode effect in a NbSe2 nanobridge with a length of approximately 15 nm, created using focused helium ion beam fabrication. Nonreciprocal supercurrents were identified, reaching a peak value of approximately 380 $μ$A for each bias polarity at $B_{z}^{max} =\pm 0.2$ mT. Notably, the nonreciprocal supercurrent can be toggled by altering the bias polarity. This discovery of the superconducting diode effect introduces a novel avenue and mechanism through nanofabrication on a superconducting flake, offering fresh perspectives for the development of superconducting devices and potential circuits.
△ Less
Submitted 25 February, 2024;
originally announced February 2024.
-
Ultra-short lifetime isomer studies from photonuclear reactions using laser-driven ultra-intense γ-ray
Authors:
Di Wu,
Haoyang Lan,
Jiaxing Liu,
Huangang Lu,
Jianyao Zhang,
Jianfeng Lv,
Xuezhi Wu,
Hui Zhang,
Yadong Xia,
Qiangyou He,
Jie Cai,
Qianyi Ma,
Yuhui Xia,
Zhenan Wang,
Meizhi Wang,
Zhiyan Yang,
Xinlu Xu,
Yixing Geng,
Chen Lin,
Wenjun Ma,
Yanying Zhao,
Haoran Wang,
Fulong Liu,
Chuangye He,
**qing Yu
, et al. (7 additional authors not shown)
Abstract:
Isomers, ubiquitous populations of relatively long-lived nuclear excited states, play a crucial role in nuclear physics. However, isomers with half-life times of several seconds or less barely had experimental cross section data due to the lack of a suitable measuring method. We report a method of online γ spectroscopy for ultra-short-lived isomers from photonuclear reactions using laser-driven ul…
▽ More
Isomers, ubiquitous populations of relatively long-lived nuclear excited states, play a crucial role in nuclear physics. However, isomers with half-life times of several seconds or less barely had experimental cross section data due to the lack of a suitable measuring method. We report a method of online γ spectroscopy for ultra-short-lived isomers from photonuclear reactions using laser-driven ultra-intense γ-rays. The fastest time resolution can reach sub-ps level with γ-ray intensities >10^{19}/s ({\geqslant} 8 MeV). The ^{115}In(γ, n)^{114m2}In reaction (T_{1/2} = 43.1 ms) was first measured in the high-energy region which shed light on the nuclear structure studies of In element. Simulations showed it would be an efficient way to study ^{229m}Th (T_{1/2} = 7 μs), which is believed to be the next generation of nuclear clock. This work offered a unique way of gaining insight into ultra-short lifetimes and promised an effective way to fill the gap in relevant experimental data.
△ Less
Submitted 23 February, 2024;
originally announced February 2024.
-
Two-stage Cytopathological Image Synthesis for Augmenting Cervical Abnormality Screening
Authors:
Zhenrong Shen,
Manman Fei,
Xin Wang,
Jiangdong Cai,
Sheng Wang,
Lichi Zhang,
Qian Wang
Abstract:
Automatic thin-prep cytologic test (TCT) screening can assist pathologists in finding cervical abnormality towards accurate and efficient cervical cancer diagnosis. Current automatic TCT screening systems mostly involve abnormal cervical cell detection, which generally requires large-scale and diverse training data with high-quality annotations to achieve promising performance. Pathological image…
▽ More
Automatic thin-prep cytologic test (TCT) screening can assist pathologists in finding cervical abnormality towards accurate and efficient cervical cancer diagnosis. Current automatic TCT screening systems mostly involve abnormal cervical cell detection, which generally requires large-scale and diverse training data with high-quality annotations to achieve promising performance. Pathological image synthesis is naturally raised to minimize the efforts in data collection and annotation. However, it is challenging to generate realistic large-size cytopathological images while simultaneously synthesizing visually plausible appearances for small-size abnormal cervical cells. In this paper, we propose a two-stage image synthesis framework to create synthetic data for augmenting cervical abnormality screening. In the first Global Image Generation stage, a Normal Image Generator is designed to generate cytopathological images full of normal cervical cells. In the second Local Cell Editing stage, normal cells are randomly selected from the generated images and then are converted to different types of abnormal cells using the proposed Abnormal Cell Synthesizer. Both Normal Image Generator and Abnormal Cell Synthesizer are built upon Stable Diffusion, a pre-trained foundation model for image synthesis, via parameter-efficient fine-tuning methods for customizing cytopathological image contents and extending spatial layout controllability, respectively. Our experiments demonstrate the synthetic image quality, diversity, and controllability of the proposed synthesis framework, and validate its data augmentation effectiveness in enhancing the performance of abnormal cervical cell detection.
△ Less
Submitted 25 February, 2024; v1 submitted 22 February, 2024;
originally announced February 2024.
-
T-Stitch: Accelerating Sampling in Pre-Trained Diffusion Models with Trajectory Stitching
Authors:
Zizheng Pan,
Bohan Zhuang,
De-An Huang,
Weili Nie,
Zhiding Yu,
Chaowei Xiao,
Jianfei Cai,
Anima Anandkumar
Abstract:
Sampling from diffusion probabilistic models (DPMs) is often expensive for high-quality image generation and typically requires many steps with a large model. In this paper, we introduce sampling Trajectory Stitching T-Stitch, a simple yet efficient technique to improve the sampling efficiency with little or no generation degradation. Instead of solely using a large DPM for the entire sampling tra…
▽ More
Sampling from diffusion probabilistic models (DPMs) is often expensive for high-quality image generation and typically requires many steps with a large model. In this paper, we introduce sampling Trajectory Stitching T-Stitch, a simple yet efficient technique to improve the sampling efficiency with little or no generation degradation. Instead of solely using a large DPM for the entire sampling trajectory, T-Stitch first leverages a smaller DPM in the initial steps as a cheap drop-in replacement of the larger DPM and switches to the larger DPM at a later stage. Our key insight is that different diffusion models learn similar encodings under the same training data distribution and smaller models are capable of generating good global structures in the early steps. Extensive experiments demonstrate that T-Stitch is training-free, generally applicable for different architectures, and complements most existing fast sampling techniques with flexible speed and quality trade-offs. On DiT-XL, for example, 40% of the early timesteps can be safely replaced with a 10x faster DiT-S without performance drop on class-conditional ImageNet generation. We further show that our method can also be used as a drop-in technique to not only accelerate the popular pretrained stable diffusion (SD) models but also improve the prompt alignment of stylized SD models from the public model zoo. Code is released at https://github.com/NVlabs/T-Stitch
△ Less
Submitted 21 February, 2024;
originally announced February 2024.
-
A Uniformly Random Solution to Algorithmic Redistricting
Authors:
**-Yi Cai,
Jacob Kruse,
Kenneth Mayer,
Daniel P. Szabo
Abstract:
The process of drawing electoral district boundaries is known as political redistricting. Within this context, gerrymandering is the practice of drawing these boundaries such that they unfairly favor a particular political party, often leading to unequal representation and skewed electoral outcomes. One of the few ways to detect gerrymandering is by algorithmically sampling redistricting plans. Pr…
▽ More
The process of drawing electoral district boundaries is known as political redistricting. Within this context, gerrymandering is the practice of drawing these boundaries such that they unfairly favor a particular political party, often leading to unequal representation and skewed electoral outcomes. One of the few ways to detect gerrymandering is by algorithmically sampling redistricting plans. Previous methods mainly focus on sampling from some neighborhood of ``realistic' districting plans, rather than a uniform sample of the entire space. We present a deterministic subexponential time algorithm to uniformly sample from the space of all possible $ k $-partitions of a bounded degree planar graph, and with this construct a sample of the entire space of redistricting plans. We also give a way to restrict this sample space to plans that match certain compactness and population constraints at the cost of added complexity. The algorithm runs in $ 2^{O(\sqrt{n}\log n)} $ time, although we only give a heuristic implementation. Our method generalizes an algorithm to count self-avoiding walks on a square to count paths that split general planar graphs into $ k $ regions, and uses this to sample from the space of all $ k $-partitions of a planar graph.
△ Less
Submitted 21 February, 2024;
originally announced February 2024.
-
FGAD: Self-boosted Knowledge Distillation for An Effective Federated Graph Anomaly Detection Framework
Authors:
**yu Cai,
Yunhe Zhang,
Zhoumin Lu,
Wenzhong Guo,
See-kiong Ng
Abstract:
Graph anomaly detection (GAD) aims to identify anomalous graphs that significantly deviate from other ones, which has raised growing attention due to the broad existence and complexity of graph-structured data in many real-world scenarios. However, existing GAD methods usually execute with centralized training, which may lead to privacy leakage risk in some sensitive cases, thereby impeding collab…
▽ More
Graph anomaly detection (GAD) aims to identify anomalous graphs that significantly deviate from other ones, which has raised growing attention due to the broad existence and complexity of graph-structured data in many real-world scenarios. However, existing GAD methods usually execute with centralized training, which may lead to privacy leakage risk in some sensitive cases, thereby impeding collaboration among organizations seeking to collectively develop robust GAD models. Although federated learning offers a promising solution, the prevalent non-IID problems and high communication costs present significant challenges, particularly pronounced in collaborations with graph data distributed among different participants. To tackle these challenges, we propose an effective federated graph anomaly detection framework (FGAD). We first introduce an anomaly generator to perturb the normal graphs to be anomalous, and train a powerful anomaly detector by distinguishing generated anomalous graphs from normal ones. Then, we leverage a student model to distill knowledge from the trained anomaly detector (teacher model), which aims to maintain the personality of local models and alleviate the adverse impact of non-IID problems. Moreover, we design an effective collaborative learning mechanism that facilitates the personalization preservation of local models and significantly reduces communication costs among clients. Empirical results of the GAD tasks on non-IID graphs compared with state-of-the-art baselines demonstrate the superiority and efficiency of the proposed FGAD method.
△ Less
Submitted 20 February, 2024;
originally announced February 2024.
-
A Three-Party Repeated Coalition Formation Game for PLS in Wireless Communications with IRSs
Authors:
Haipeng Zhou,
Ruoyang Chen,
Changyan Yi,
Juan Li,
Jun Cai
Abstract:
In this paper, a repeated coalition formation game (RCFG) with dynamic decision-making for physical layer security (PLS) in wireless communications with intelligent reflecting surfaces (IRSs) has been investigated. In the considered system, one central legitimate transmitter (LT) aims to transmit secret signals to a group of legitimate receivers (LRs) under the threat of a proactive eavesdropper (…
▽ More
In this paper, a repeated coalition formation game (RCFG) with dynamic decision-making for physical layer security (PLS) in wireless communications with intelligent reflecting surfaces (IRSs) has been investigated. In the considered system, one central legitimate transmitter (LT) aims to transmit secret signals to a group of legitimate receivers (LRs) under the threat of a proactive eavesdropper (EV), while there exist a number of third-party IRSs (TIRSs) which can choose to form a coalition with either legitimate pairs (LPs) or the EV to improve their respective performances in exchange for potential benefits (e.g., payments). Unlike existing works that commonly restricted to friendly IRSs or malicious IRSs only, we study the complicated dynamic ally-adversary relationships among LPs, EV and TIRSs, under unpredictable wireless channel conditions, and introduce a RCFG to model their long-term strategic interactions. Particularly, we first analyze the existence of Nash equilibrium (NE) in the formulated RCFG, and then propose a switch operations-based coalition selection along with a deep reinforcement learning (DRL)-based algorithm for obtaining such equilibrium. Simulations examine the feasibility of the proposed algorithm and show its superiority over counterparts.
△ Less
Submitted 18 February, 2024;
originally announced February 2024.
-
Convergence rate and exponential stability of backward Euler method for neutral stochastic delay differential equations under generalized monotonicity conditions
Authors:
**g**g Cai,
Ziheng Chen,
Yuanling Niu
Abstract:
This work focuses on the numerical approximations of neutral stochastic delay differential equations with their drift and diffusion coefficients growing super-linearly with respect to both delay variables and state variables. Under generalized monotonicity conditions, we prove that the backward Euler method not only converges strongly in the mean square sense with order $1/2$, but also inherit the…
▽ More
This work focuses on the numerical approximations of neutral stochastic delay differential equations with their drift and diffusion coefficients growing super-linearly with respect to both delay variables and state variables. Under generalized monotonicity conditions, we prove that the backward Euler method not only converges strongly in the mean square sense with order $1/2$, but also inherit the mean square exponential stability of the original equations. As a byproduct, we obtain the same results on convergence rate and exponential stability of the backward Euler method for stochastic delay differential equations with generalized monotonicity conditions. These theoretical results are finally supported by several numerical experiments.
△ Less
Submitted 14 February, 2024;
originally announced February 2024.
-
Effects of Strong Capacitive Coupling Between Meta-Atoms in rf SQUID Metamaterials
Authors:
**gnan Cai,
Robin Cantor,
Johanne Hizanidis,
Nikos Lazarides,
Steven M. Anlage
Abstract:
We consider, for the first time, the effects of strong capacitive and inductive coupling between radio frequency Superconducting Quantum Interference Devices (rf SQUIDs) in an overlap** metamaterial geometry when driven by rf flux at and near their self-resonant frequencies. The equations of motion for the gauge-invariant phases on the Josephson junctions in each SQUID are set up and solved. Our…
▽ More
We consider, for the first time, the effects of strong capacitive and inductive coupling between radio frequency Superconducting Quantum Interference Devices (rf SQUIDs) in an overlap** metamaterial geometry when driven by rf flux at and near their self-resonant frequencies. The equations of motion for the gauge-invariant phases on the Josephson junctions in each SQUID are set up and solved. Our model accounts for the high-frequency displacement currents through capacitive overlap between the wiring of SQUID loops. We begin by modeling two overlap** SQUIDs and studying the response in both the linear and nonlinear high-frequency driving limits. By exploring a sequence of more and more complicated arrays, the formalism is eventually extended to the $N\times N \times 2$ overlap** metamaterial array, where we develop an understanding of the many ($8N^2-8N+3$) resulting resonant modes in terms of three classes of resonances. The capacitive coupling gives rise to qualitatively new self-resonant responses of rf SQUID metamaterials, and is demonstrated through analytical theory, numerical modeling, and experiment in the 10-30 GHz range on capacitively and inductively coupled rf SQUID metamaterials.
△ Less
Submitted 31 May, 2024; v1 submitted 10 February, 2024;
originally announced February 2024.
-
MULTI: Multimodal Understanding Leaderboard with Text and Images
Authors:
Zichen Zhu,
Yang Xu,
Lu Chen,
**gkai Yang,
Yichuan Ma,
Yiming Sun,
Hailin Wen,
Jiaqi Liu,
**yu Cai,
Yingzi Ma,
Situo Zhang,
Zihan Zhao,
Liangtai Sun,
Kai Yu
Abstract:
Rapid progress in multimodal large language models (MLLMs) highlights the need to introduce challenging yet realistic benchmarks to the academic community, while existing benchmarks primarily focus on understanding simple natural images and short context. In this paper, we present MULTI as a cutting-edge benchmark for evaluating MLLMs on understanding complex tables and images, and reasoning with…
▽ More
Rapid progress in multimodal large language models (MLLMs) highlights the need to introduce challenging yet realistic benchmarks to the academic community, while existing benchmarks primarily focus on understanding simple natural images and short context. In this paper, we present MULTI as a cutting-edge benchmark for evaluating MLLMs on understanding complex tables and images, and reasoning with long context. MULTI provides multimodal inputs and requires responses that are either precise or open-ended, reflecting real-life examination styles. MULTI includes over 18,000 questions and challenges MLLMs with a variety of tasks, ranging from formula derivation to image detail analysis and cross-modality reasoning. We also introduce MULTI-Elite, a 500-question selected hard subset, and MULTI-Extend, with more than 4,500 external knowledge context pieces. Our evaluation indicates significant potential for MLLM advancement, with GPT-4V achieving a 63.7% accuracy rate on MULTI, in contrast to other MLLMs scoring between 28.5% and 55.3%. MULTI serves not only as a robust evaluation platform but also paves the way for the development of expert-level AI.
△ Less
Submitted 20 February, 2024; v1 submitted 5 February, 2024;
originally announced February 2024.
-
GeReA: Question-Aware Prompt Captions for Knowledge-based Visual Question Answering
Authors:
Ziyu Ma,
Shutao Li,
Bin Sun,
Jianfei Cai,
Zuxiang Long,
Fuyan Ma
Abstract:
Knowledge-based visual question answering (VQA) requires world knowledge beyond the image for accurate answer. Recently, instead of extra knowledge bases, a large language model (LLM) like GPT-3 is activated as an implicit knowledge engine to jointly acquire and reason the necessary knowledge for answering by converting images into textual information (e.g., captions and answer candidates). Howeve…
▽ More
Knowledge-based visual question answering (VQA) requires world knowledge beyond the image for accurate answer. Recently, instead of extra knowledge bases, a large language model (LLM) like GPT-3 is activated as an implicit knowledge engine to jointly acquire and reason the necessary knowledge for answering by converting images into textual information (e.g., captions and answer candidates). However, such conversion may introduce irrelevant information, which causes the LLM to misinterpret images and ignore visual details crucial for accurate knowledge. We argue that multimodal large language model (MLLM) is a better implicit knowledge engine than the LLM for its superior capability of visual understanding. Despite this, how to activate the capacity of MLLM as the implicit knowledge engine has not been explored yet. Therefore, we propose GeReA, a generate-reason framework that prompts a MLLM like InstructBLIP with question relevant vision and language information to generate knowledge-relevant descriptions and reasons those descriptions for knowledge-based VQA. Specifically, the question-relevant image regions and question-specific manual prompts are encoded in the MLLM to generate the knowledge relevant descriptions, referred to as question-aware prompt captions. After that, the question-aware prompt captions, image-question pair, and similar samples are sent into the multi-modal reasoning model to learn a joint knowledge-image-question representation for answer prediction. GeReA unlocks the use of MLLM as the implicit knowledge engine, surpassing all previous state-of-the-art methods on OK-VQA and A-OKVQA datasets, with test accuracies of 66.5% and 63.3% respectively. Our code will be released at https://github.com/Upper9527/GeReA.
△ Less
Submitted 4 February, 2024;
originally announced February 2024.
-
Identifying possible mechanism for quantum needle in chemical magnetoreception
Authors:
Xiaoyu Chen,
Haibin Liu,
Jianming Cai
Abstract:
The radical pair mechanism is an important model that may provide a basis for biological magnetoreception. To account for the high orientation precision of the real avian compass, P. J. Hore et al. proposed an intriguing phenomenon called quantum needle [Proc. Natl. Acad. Sci. 113, 4634 (2016)], where a spike-like feature emerges in the fractional yield signal. However, it is believed that quantum…
▽ More
The radical pair mechanism is an important model that may provide a basis for biological magnetoreception. To account for the high orientation precision of the real avian compass, P. J. Hore et al. proposed an intriguing phenomenon called quantum needle [Proc. Natl. Acad. Sci. 113, 4634 (2016)], where a spike-like feature emerges in the fractional yield signal. However, it is believed that quantum needle requires the radical pair lifetime to be longer than a few microseconds and thus poses stern challenges in realistic biological systems. Here, we exploit the optimization techniques and find a novel class of model system, which sustains much more prominent features of quantum needle and significantly relaxes the requirement for radical pair lifetime. Even more surprisingly, we find that the characteristics of quantum needle retain a narrow functional window around the geomagnetic field, which is absent in the previous model systems. Therefore, our work provides essential evidence for identifying the possible physical mechanism for quantum needle in chemical magnetoreception.
△ Less
Submitted 29 January, 2024;
originally announced January 2024.
-
Dynamic Human Digital Twin Deployment at the Edge for Task Execution: A Two-Timescale Accuracy-Aware Online Optimization
Authors:
Yuye Yang,
You Shi,
Changyan Yi,
Jun Cai,
Jiawen Kang,
Dusit Niyato,
Xuemin,
Shen
Abstract:
Human digital twin (HDT) is an emerging paradigm that bridges physical twins (PTs) with powerful virtual twins (VTs) for assisting complex task executions in human-centric services. In this paper, we study a two-timescale online optimization for building HDT under an end-edge-cloud collaborative framework. As a unique feature of HDT, we consider that PTs' corresponding VTs are deployed on edge ser…
▽ More
Human digital twin (HDT) is an emerging paradigm that bridges physical twins (PTs) with powerful virtual twins (VTs) for assisting complex task executions in human-centric services. In this paper, we study a two-timescale online optimization for building HDT under an end-edge-cloud collaborative framework. As a unique feature of HDT, we consider that PTs' corresponding VTs are deployed on edge servers, consisting of not only generic models placed by downloading experiential knowledge from the cloud but also customized models updated by collecting personalized data from end devices. To maximize task execution accuracy with stringent energy and delay constraints, and by taking into account HDT's inherent mobility and status variation uncertainties, we jointly and dynamically optimize VTs' construction and PTs' task offloading, along with communication and computation resource allocations. Observing that decision variables are asynchronous with different triggers, we propose a novel two-timescale accuracy-aware online optimization approach (TACO). Specifically, TACO utilizes an improved Lyapunov method to decompose the problem into multiple instant ones, and then leverages piecewise McCormick envelopes and block coordinate descent based algorithms, addressing two timescales alternately. Theoretical analyses and simulations show that the proposed approach can reach asymptotic optimum within a polynomial-time complexity, and demonstrate its superiority over counterparts.
△ Less
Submitted 29 January, 2024;
originally announced January 2024.
-
Spectral conditions for graphs in which every edge belongs to a factor
Authors:
** Cai,
Bo Zhou
Abstract:
A factor of a graph is a spanning subgraph. Spectral sufficient conditions are provided via spectral radius and signless Laplacian spectral radius for graphs with (i) a matching of given size (particularly, $1$-factor) containing any given edge, and (ii) a star factor with a component isomorphic to stars of order two or three containing any given edge, respectively.
A factor of a graph is a spanning subgraph. Spectral sufficient conditions are provided via spectral radius and signless Laplacian spectral radius for graphs with (i) a matching of given size (particularly, $1$-factor) containing any given edge, and (ii) a star factor with a component isomorphic to stars of order two or three containing any given edge, respectively.
△ Less
Submitted 15 January, 2024;
originally announced January 2024.
-
"I Got Flagged for Supposed Bullying, Even Though It Was in Response to Someone Harassing Me About My Disability.": A Study of Blind TikTokers' Content Moderation Experiences
Authors:
Yao Lyu,
Jie Cai,
Anisa Callis,
Kelley Cotter,
John M. Carroll
Abstract:
The Human-Computer Interaction (HCI) community has consistently focused on the experiences of users moderated by social media platforms. Recently, scholars have noticed that moderation practices could perpetuate biases, resulting in the marginalization of user groups undergoing moderation. However, most studies have primarily addressed marginalization related to issues such as racism or sexism, wi…
▽ More
The Human-Computer Interaction (HCI) community has consistently focused on the experiences of users moderated by social media platforms. Recently, scholars have noticed that moderation practices could perpetuate biases, resulting in the marginalization of user groups undergoing moderation. However, most studies have primarily addressed marginalization related to issues such as racism or sexism, with little attention given to the experiences of people with disabilities. In this paper, we present a study on the moderation experiences of blind users on TikTok, also known as "BlindToker," to address this gap. We conducted semi-structured interviews with 20 BlindTokers and used thematic analysis to analyze the data. Two main themes emerged: BlindTokers' situated content moderation experiences and their reactions to content moderation. We reported on the lack of accessibility on TikTok's platform, contributing to the moderation and marginalization of BlindTokers. Additionally, we discovered instances of harassment from trolls that prompted BlindTokers to respond with harsh language, triggering further moderation. We discussed these findings in the context of the literature on moderation, marginalization, and transformative justice, seeking solutions to address such issues.
△ Less
Submitted 21 January, 2024;
originally announced January 2024.
-
Third-Party Developers and Tool Development For Community Management on Live Streaming Platform Twitch
Authors:
Jie Cai,
Ya-Fang Lin,
He Zhang,
John M. Carroll
Abstract:
Community management is critical for stakeholders to collaboratively build and sustain communities with socio-technical support. However, most of the existing research has mainly focused on the community members and the platform, with little attention given to the developers who act as intermediaries between the platform and community members and develop tools to support community management. This…
▽ More
Community management is critical for stakeholders to collaboratively build and sustain communities with socio-technical support. However, most of the existing research has mainly focused on the community members and the platform, with little attention given to the developers who act as intermediaries between the platform and community members and develop tools to support community management. This study focuses on third-party developers (TPDs) for the live streaming platform Twitch and explores their tool development practices. Using a mixed method with in-depth qualitative analysis, we found that TPDs maintain complex relationships with different stakeholders (streamers, viewers, platform, professional developers), and the multi-layered policy restricts their agency regarding idea innovation and tool development. We argue that HCI research should shift its focus from tool users to tool developers with regard to community management. We propose designs to support closer collaboration between TPDS and the platform and professional developers and streamline TPDs' development process with unified toolkits and policy documentation.
△ Less
Submitted 17 March, 2024; v1 submitted 20 January, 2024;
originally announced January 2024.
-
Low-Complexity Integer Divider Architecture for Homomorphic Encryption
Authors:
Sajjad Akherati,
Jiaxuan Cai,
Xinmiao Zhang
Abstract:
Homomorphic encryption (HE) allows computations to be directly carried out on ciphertexts and enables privacy-preserving cloud computing. The computations on the coefficients of the polynomials involved in HE are always followed by modular reduction, and the overall complexity of ciphertext multiplication can be reduced by utilizing the quotient. Our previous design considers the cases that the di…
▽ More
Homomorphic encryption (HE) allows computations to be directly carried out on ciphertexts and enables privacy-preserving cloud computing. The computations on the coefficients of the polynomials involved in HE are always followed by modular reduction, and the overall complexity of ciphertext multiplication can be reduced by utilizing the quotient. Our previous design considers the cases that the dividend is an integer multiple of the modulus and the modulus is in the format of $2^w-2^u\pm1$, where $u<w/2$. In this paper, the division is generalized for larger $u$ and dividend not an integer multiple of the modulus. An algorithm is proposed to compute the quotient and vigorous mathematical proofs are provided. Moreover, efficient hardware architecture is developed for implementing the proposed algorithm. Compared to alternative division approaches that utilize the inverse of the divisor, for $w=32$, the proposed design achieves at least 9% shorter latency and 79\% area reduction for 75% possible values of $u$.
△ Less
Submitted 19 January, 2024;
originally announced January 2024.
-
Learning Backdoors for Mixed Integer Programs with Contrastive Learning
Authors:
Junyang Cai,
Taoan Huang,
Bistra Dilkina
Abstract:
Many real-world problems can be efficiently modeled as Mixed Integer Programs (MIPs) and solved with the Branch-and-Bound method. Prior work has shown the existence of MIP backdoors, small sets of variables such that prioritizing branching on them when possible leads to faster running times. However, finding high-quality backdoors that improve running times remains an open question. Previous work…
▽ More
Many real-world problems can be efficiently modeled as Mixed Integer Programs (MIPs) and solved with the Branch-and-Bound method. Prior work has shown the existence of MIP backdoors, small sets of variables such that prioritizing branching on them when possible leads to faster running times. However, finding high-quality backdoors that improve running times remains an open question. Previous work learns to estimate the relative solver speed of randomly sampled backdoors through ranking and then decide whether to use it. In this paper, we utilize the Monte-Carlo tree search method to collect backdoors for training, rather than relying on random sampling, and adapt a contrastive learning framework to train a Graph Attention Network model to predict backdoors. Our method, evaluated on four common MIP problem domains, demonstrates performance improvements over both Gurobi and previous models.
△ Less
Submitted 18 January, 2024;
originally announced January 2024.
-
Cross-modality Guidance-aided Multi-modal Learning with Dual Attention for MRI Brain Tumor Grading
Authors:
Dunyuan Xu,
Xi Wang,
**yue Cai,
Pheng-Ann Heng
Abstract:
Brain tumor represents one of the most fatal cancers around the world, and is very common in children and the elderly. Accurate identification of the type and grade of tumor in the early stages plays an important role in choosing a precise treatment plan. The Magnetic Resonance Imaging (MRI) protocols of different sequences provide clinicians with important contradictory information to identify tu…
▽ More
Brain tumor represents one of the most fatal cancers around the world, and is very common in children and the elderly. Accurate identification of the type and grade of tumor in the early stages plays an important role in choosing a precise treatment plan. The Magnetic Resonance Imaging (MRI) protocols of different sequences provide clinicians with important contradictory information to identify tumor regions. However, manual assessment is time-consuming and error-prone due to big amount of data and the diversity of brain tumor types. Hence, there is an unmet need for MRI automated brain tumor diagnosis. We observe that the predictive capability of uni-modality models is limited and their performance varies widely across modalities, and the commonly used modality fusion methods would introduce potential noise, which results in significant performance degradation. To overcome these challenges, we propose a novel cross-modality guidance-aided multi-modal learning with dual attention for addressing the task of MRI brain tumor grading. To balance the tradeoff between model efficiency and efficacy, we employ ResNet Mix Convolution as the backbone network for feature extraction. Besides, dual attention is applied to capture the semantic interdependencies in spatial and slice dimensions respectively. To facilitate information interaction among modalities, we design a cross-modality guidance-aided module where the primary modality guides the other secondary modalities during the process of training, which can effectively leverage the complementary information of different MRI modalities and meanwhile alleviate the impact of the possible noise.
△ Less
Submitted 17 January, 2024;
originally announced January 2024.
-
Enhancing Campus Mobility: Achievements and Challenges of Autonomous Shuttle "Snow Lion''
Authors:
Yingbing Chen,
Jie Cheng,
Sheng Wang,
Hongji Liu,
Xiaodong Mei,
Xiaoyang Yan,
Mingkai Tang,
Ge Sun,
Ya Wen,
Junwei Cai,
Xupeng Xie,
Lu Gan,
Mandan Chao,
Ren Xin,
Ming Liu,
Jianhao Jiao,
Kangcheng Liu,
Lujia Wang
Abstract:
The rapid evolution of autonomous vehicles (AVs) has significantly influenced global transportation systems. In this context, we present ``Snow Lion'', an autonomous shuttle meticulously designed to revolutionize on-campus transportation, offering a safer and more efficient mobility solution for students, faculty, and visitors. The primary objective of this research is to enhance campus mobility b…
▽ More
The rapid evolution of autonomous vehicles (AVs) has significantly influenced global transportation systems. In this context, we present ``Snow Lion'', an autonomous shuttle meticulously designed to revolutionize on-campus transportation, offering a safer and more efficient mobility solution for students, faculty, and visitors. The primary objective of this research is to enhance campus mobility by providing a reliable, efficient, and eco-friendly transportation solution that seamlessly integrates with existing infrastructure and meets the diverse needs of a university setting. To achieve this goal, we delve into the intricacies of the system design, encompassing sensing, perception, localization, planning, and control aspects. We evaluate the autonomous shuttle's performance in real-world scenarios, involving a 1146-kilometer road haul and the transportation of 442 passengers over a two-month period. These experiments demonstrate the effectiveness of our system and offer valuable insights into the intricate process of integrating an autonomous vehicle within campus shuttle operations. Furthermore, a thorough analysis of the lessons derived from this experience furnishes a valuable real-world case study, accompanied by recommendations for future research and development in the field of autonomous driving.
△ Less
Submitted 16 January, 2024;
originally announced January 2024.
-
Flowmind2Digital: The First Comprehensive Flowmind Recognition and Conversion Approach
Authors:
Huanyu Liu,
Jianfeng Cai,
Tingjia Zhang,
Hongsheng Li,
Siyuan Wang,
Guangming Zhu,
Syed Afaq Ali Shah,
Mohammed Bennamoun,
Liang Zhang
Abstract:
Flowcharts and mind maps, collectively known as flowmind, are vital in daily activities, with hand-drawn versions facilitating real-time collaboration. However, there's a growing need to digitize them for efficient processing. Automated conversion methods are essential to overcome manual conversion challenges. Existing sketch recognition methods face limitations in practical situations, being fiel…
▽ More
Flowcharts and mind maps, collectively known as flowmind, are vital in daily activities, with hand-drawn versions facilitating real-time collaboration. However, there's a growing need to digitize them for efficient processing. Automated conversion methods are essential to overcome manual conversion challenges. Existing sketch recognition methods face limitations in practical situations, being field-specific and lacking digital conversion steps. Our paper introduces the Flowmind2digital method and hdFlowmind dataset to address these challenges. Flowmind2digital, utilizing neural networks and keypoint detection, achieves a record 87.3% accuracy on our dataset, surpassing previous methods by 11.9%. The hdFlowmind dataset, comprising 1,776 annotated flowminds across 22 scenarios, outperforms existing datasets. Additionally, our experiments emphasize the importance of simple graphics, enhancing accuracy by 9.3%.
△ Less
Submitted 8 January, 2024;
originally announced January 2024.
-
Optimization Over Trained Neural Networks: Taking a Relaxing Walk
Authors:
Jiatai Tong,
Junyang Cai,
Thiago Serra
Abstract:
Besides training, mathematical optimization is also used in deep learning to model and solve formulations over trained neural networks for purposes such as verification, compression, and optimization with learned constraints. However, solving these formulations soon becomes difficult as the network size grows due to the weak linear relaxation and dense constraint matrix. We have seen improvements…
▽ More
Besides training, mathematical optimization is also used in deep learning to model and solve formulations over trained neural networks for purposes such as verification, compression, and optimization with learned constraints. However, solving these formulations soon becomes difficult as the network size grows due to the weak linear relaxation and dense constraint matrix. We have seen improvements in recent years with cutting plane algorithms, reformulations, and an heuristic based on Mixed-Integer Linear Programming (MILP). In this work, we propose a more scalable heuristic based on exploring global and local linear relaxations of the neural network model. Our heuristic is competitive with a state-of-the-art MILP solver and the prior heuristic while producing better solutions with increases in input, depth, and number of neurons.
△ Less
Submitted 28 January, 2024; v1 submitted 7 January, 2024;
originally announced January 2024.
-
Spectral conditions for factor-criticality of graphs
Authors:
** Cai,
Bo Zhou
Abstract:
A graph $G$ is $k$-factor-critical if $G-S$ has a perfect matching for any $k$-subset $S$ of the vertex set of $G$. In this paper, we investigate the factor-criticality of graphs with fixed minimum degree and provide sufficient conditions for such graphs to be $k$-factor-critical in terms of spectral radius and signless Laplacian spectral radius.
A graph $G$ is $k$-factor-critical if $G-S$ has a perfect matching for any $k$-subset $S$ of the vertex set of $G$. In this paper, we investigate the factor-criticality of graphs with fixed minimum degree and provide sufficient conditions for such graphs to be $k$-factor-critical in terms of spectral radius and signless Laplacian spectral radius.
△ Less
Submitted 1 January, 2024;
originally announced January 2024.
-
Absence of Weyl nodes in EuCd$_2$As$_2$ revealed by the carrier density dependence of the anomalous Hall effect
Authors:
Yue Shi,
Zhaoyu Liu,
Logan A. Burnett,
Seokhyeong Lee,
Chaowei Hu,
Qianni Jiang,
Jiaqi Cai,
Xiaodong Xu,
Mo Li,
Cheng-Chien Chen,
Jiun-Haw Chu
Abstract:
The antiferromagnetic layered compound EuCd$_2$As$_2$ is widely considered as a leading candidate of ideal Weyl semimetal, featuring a single pair of Weyl nodes in its field-induced ferromagnetic (FM) state. Nevertheless, this view has recently been challenged by an optical spectroscopy study, which suggests that it is a magnetic semiconductor. In this study, we have successfully synthesized highl…
▽ More
The antiferromagnetic layered compound EuCd$_2$As$_2$ is widely considered as a leading candidate of ideal Weyl semimetal, featuring a single pair of Weyl nodes in its field-induced ferromagnetic (FM) state. Nevertheless, this view has recently been challenged by an optical spectroscopy study, which suggests that it is a magnetic semiconductor. In this study, we have successfully synthesized highly insulating EuCd$_2$As$_2$ crystals with carrier density reaching as low as $2\times 10^{15}$ $\text{cm}^{-3}$. The magneto-transport measurements revealed a progressive decrease of the anomalous Hall conductivity (AHC) by several orders of magnitude as the carrier density decreases. This behavior contradicts with what is expected from the intrinsic AHC generated by the Weyl points, which is independent of carrier density as the Fermi level approaches the charge neutrality point. In contrast, the scaling relationship between AHC and longitudinal conductivity aligns with the characteristics of variable range hop** insulators. Our results suggest that EuCd$_2$As$_2$ is a magnetic semiconductor rather than a topological Weyl semimetal.
△ Less
Submitted 27 February, 2024; v1 submitted 29 December, 2023;
originally announced January 2024.
-
Gemini: Map** and Architecture Co-exploration for Large-scale DNN Chiplet Accelerators
Authors:
**gwei Cai,
Zuotong Wu,
Sen Peng,
Yuchen Wei,
Zhanhong Tan,
Guiming Shi,
Mingyu Gao,
Kaisheng Ma
Abstract:
Chiplet technology enables the integration of an increasing number of transistors on a single accelerator with higher yield in the post-Moore era, addressing the immense computational demands arising from rapid AI advancements. However, it also introduces more expensive packaging costs and costly Die-to-Die (D2D) interfaces, which require more area, consume higher power, and offer lower bandwidth…
▽ More
Chiplet technology enables the integration of an increasing number of transistors on a single accelerator with higher yield in the post-Moore era, addressing the immense computational demands arising from rapid AI advancements. However, it also introduces more expensive packaging costs and costly Die-to-Die (D2D) interfaces, which require more area, consume higher power, and offer lower bandwidth than on-chip interconnects. Maximizing the benefits and minimizing the drawbacks of chiplet technology is crucial for develo** large-scale DNN chiplet accelerators, which poses challenges to both architecture and map**. Despite its importance in the post-Moore era, methods to address these challenges remain scarce.
△ Less
Submitted 27 December, 2023;
originally announced December 2023.
-
Satellite Impact on Astronomical Observations Based on Elliptical Orbit Model
Authors:
Tianzhu Hu,
Yong Zhang,
Xiangqun Cui,
Zihuang Cao,
Kang Huang,
**gyi Cai,
Jun Li,
Tong Zhou
Abstract:
Space-based and ground-based telescopes have extensively documented the impact of satellites on astronomical observations. With the proliferation of satellite mega-constellation programs, their influence on astronomical observations has become undeniable. It is crucial to quantify the impact of satellites on telescopes. To address this need, we have enhanced the circular orbit model for satellites…
▽ More
Space-based and ground-based telescopes have extensively documented the impact of satellites on astronomical observations. With the proliferation of satellite mega-constellation programs, their influence on astronomical observations has become undeniable. It is crucial to quantify the impact of satellites on telescopes. To address this need, we have enhanced the circular orbit model for satellites and introduced a methodology based on two-line element (TLE) orbit data. This involves constructing a satellite probability distribution model to evaluate the impact of satellites on telescopes. Using our method, we assessed the satellite impact on global observatories. The results indicate that the regions most severely affected by satellite interference currently are those near the equator, with latitudes around 50 and 80 degrees experiencing the most significant impact from low Earth orbit satellites. Furthermore, we validated the reliability of our method using imaging data obtained from the focal surface acquisition camera of the LAMOST telescope.
△ Less
Submitted 21 December, 2023;
originally announced December 2023.
-
Compressible Navier-Stokes equations without heat conduction in Lp-framework
Authors:
Juanzi Cai,
Zhigang Wu,
Mengqian Liu
Abstract:
In this paper, we mainly consider global well-posedness and long time behavior of compressible Navier-Stokes equations without heat conduction in $L^p$-framework. This is a generalization of Peng and Zhai \cite{peng}(SIMA, 55(2023), no.2, 1439-1463), where they obtained the corresponding result in $L^2$-framework. Based on the key observation that we can release the regularity of non-dissipative e…
▽ More
In this paper, we mainly consider global well-posedness and long time behavior of compressible Navier-Stokes equations without heat conduction in $L^p$-framework. This is a generalization of Peng and Zhai \cite{peng}(SIMA, 55(2023), no.2, 1439-1463), where they obtained the corresponding result in $L^2$-framework. Based on the key observation that we can release the regularity of non-dissipative entropy $S$ in high frequency in \cite{peng}, we ultimately achieve the desired $L^p$ estimate in the high frequency via complicated calculations on the nonlinear terms. In addition, we get the $L^p$-decay rate of the solution.
△ Less
Submitted 14 December, 2023;
originally announced December 2023.
-
Interface-Induced Superconductivity in Magnetic Topological Insulator-Iron Chalcogenide Heterostructures
Authors:
Hemian Yi,
Yi-Fan Zhao,
Ying-Ting Chan,
Jiaqi Cai,
Ruobing Mei,
Xianxin Wu,
Zi-Jie Yan,
Ling-Jie Zhou,
Ruoxi Zhang,
Zihao Wang,
Stephen Paolini,
Run Xiao,
Ke Wang,
Anthony R. Richardella,
John Singleton,
Laurel E. Winter,
Thomas Prokscha,
Zaher Salman,
Andreas Suter,
Purnima P. Balakrishnan,
Alexander J. Grutter,
Moses H. W. Chan,
Nitin Samarth,
Xiaodong Xu,
Weida Wu
, et al. (2 additional authors not shown)
Abstract:
When two different electronic materials are brought together, the resultant interface often shows unexpected quantum phenomena, including interfacial superconductivity and Fu-Kane topological superconductivity (TSC). Here, we use molecular beam epitaxy (MBE) to synthesize heterostructures formed by stacking together two magnetic materials, a ferromagnetic topological insulator (TI) and an antiferr…
▽ More
When two different electronic materials are brought together, the resultant interface often shows unexpected quantum phenomena, including interfacial superconductivity and Fu-Kane topological superconductivity (TSC). Here, we use molecular beam epitaxy (MBE) to synthesize heterostructures formed by stacking together two magnetic materials, a ferromagnetic topological insulator (TI) and an antiferromagnetic iron chalcogenide (FeTe). We discover emergent interface-induced superconductivity in these heterostructures and demonstrate the trifecta occurrence of superconductivity, ferromagnetism, and topological band structure in the magnetic TI layer, the three essential ingredients of chiral TSC. The unusual coexistence of ferromagnetism and superconductivity can be attributed to the high upper critical magnetic field that exceeds the Pauli paramagnetic limit for conventional superconductors at low temperatures. The magnetic TI/FeTe heterostructures with robust superconductivity and atomically sharp interfaces provide an ideal wafer-scale platform for the exploration of chiral TSC and Majorana physics, constituting an important step toward scalable topological quantum computation.
△ Less
Submitted 7 December, 2023;
originally announced December 2023.
-
Protecting Quantum Information via Destructive Interference of Correlated Noise
Authors:
Alon Salhov,
Qingyun Cao,
Jianming Cai,
Alex Retzker,
Fedor Jelezko,
Genko Genov
Abstract:
Decoherence and imperfect control are crucial challenges for quantum technologies. Common protection strategies rely on noise temporal autocorrelation, which is not optimal if other correlations are present. We develop and demonstrate experimentally a strategy that utilizes the cross-correlation of two noise sources. We achieve a tenfold coherence time extension by destructive interference of cros…
▽ More
Decoherence and imperfect control are crucial challenges for quantum technologies. Common protection strategies rely on noise temporal autocorrelation, which is not optimal if other correlations are present. We develop and demonstrate experimentally a strategy that utilizes the cross-correlation of two noise sources. We achieve a tenfold coherence time extension by destructive interference of cross-correlated noise, improve control fidelity, and surpass the state-of-the-art sensitivity for high frequency quantum sensing, significantly expanding the applicability of noise protection strategies.
△ Less
Submitted 4 December, 2023;
originally announced December 2023.
-
Interlacing Polynomial Method for Matrix Approximation via Generalized Column and Row Selection
Authors:
Jian-Feng Cai,
Zhiqiang Xu,
Zili Xu
Abstract:
This paper delves into the spectral norm aspect of the Generalized Column and Row Subset Selection (GCRSS) problem. Given a target matrix $\mathbf{A}$, the objective of GCRSS is to select a column submatrix $\mathbf{B}_{:,S}$ from the source matrix $\mathbf{B}$ and a row submatrix $\mathbf{C}_{R,:}$ from the source maitrx $\mathbf{C}$, with the aim of minimizing the spectral norm of the residual m…
▽ More
This paper delves into the spectral norm aspect of the Generalized Column and Row Subset Selection (GCRSS) problem. Given a target matrix $\mathbf{A}$, the objective of GCRSS is to select a column submatrix $\mathbf{B}_{:,S}$ from the source matrix $\mathbf{B}$ and a row submatrix $\mathbf{C}_{R,:}$ from the source maitrx $\mathbf{C}$, with the aim of minimizing the spectral norm of the residual matrix $(\mathbf{I}_n-\mathbf{B}_{:,S}\mathbf{B}_{:,S}^{\dagger})\mathbf{A}(\mathbf{I}_d-\mathbf{C}_{R,:}^{\dagger} \mathbf{C}_{R,:})$. By employing the interlacing polynomials method, we show that the largest root of the expected characteristic polynomial of the residual matrix serves as an upper bound on the smallest spectral norm of the residual matrix. We estimate this root for two specific GCRSS scenarios, one where $r=0$, simplifying the problem to the Generalized Column Subset Selection (GCSS) problem, and the other where $\mathbf{B}=\mathbf{C}=\mathbf{I}_d$, reducing the problem to the submatrix selection problem. In the GCSS scenario, we connect the expected characteristic polynomials to the convolution of multi-affine polynomials, leading to the derivation of the first provable reconstruction bound on the spectral norm of the residual matrix for the GCSS problem. In the submatrix selection scenario, we show that for any sufficiently small $\varepsilon>0$ and any square matrix $\mathbf{A}\in\mathbb{R}^{d\times d}$, there exist two subsets $S\subset [d]$ and $R\subset [d]$ of sizes $O(d\cdot \varepsilon^2)$ such that $\Vert\mathbf{A}_{S,R}\Vert_2\leq \varepsilon\cdot \Vert\mathbf{A}\Vert_2$. Unlike previous studies that have produced comparable results for very special cases where the matrix is either a zero-diagonal or a positive semidefinite matrix, our results apply universally to any matrix $\mathbf{A}$.
△ Less
Submitted 4 December, 2023;
originally announced December 2023.
-
A large family of strongly regular graphs with small Weisfeiler-Leman dimension
Authors:
**zhuan Cai,
** Guo,
Alexander L. Gavrilyuk,
Ilia Ponomarenko
Abstract:
In 2002, D. Fon-Der-Flaass constructed a prolific family of strongly regular graphs. In this paper, we prove that for infinitely many natural numbers $n$, this family contains $n^{Ω(n^{2/3})}$ strongly regular $n$-vertex graphs $X$ with the same parameters, which satisfy the following condition: an isomorphism between $X$ and any other graph can be verified by the $4$-dimensional Weisfeiler-Leman…
▽ More
In 2002, D. Fon-Der-Flaass constructed a prolific family of strongly regular graphs. In this paper, we prove that for infinitely many natural numbers $n$, this family contains $n^{Ω(n^{2/3})}$ strongly regular $n$-vertex graphs $X$ with the same parameters, which satisfy the following condition: an isomorphism between $X$ and any other graph can be verified by the $4$-dimensional Weisfeiler-Leman algorithm.
△ Less
Submitted 1 December, 2023;
originally announced December 2023.
-
Efficient Stitchable Task Adaptation
Authors:
Haoyu He,
Zizheng Pan,
**g Liu,
Jianfei Cai,
Bohan Zhuang
Abstract:
The paradigm of pre-training and fine-tuning has laid the foundation for deploying deep learning models. However, most fine-tuning methods are designed to meet a specific resource budget. Recently, considering diverse deployment scenarios with various resource budgets, stitchable neural network (SN-Net) is introduced to quickly obtain numerous new networks (stitches) from the pre-trained models (a…
▽ More
The paradigm of pre-training and fine-tuning has laid the foundation for deploying deep learning models. However, most fine-tuning methods are designed to meet a specific resource budget. Recently, considering diverse deployment scenarios with various resource budgets, stitchable neural network (SN-Net) is introduced to quickly obtain numerous new networks (stitches) from the pre-trained models (anchors) in a model family via model stitching. Although promising, SN-Net confronts new challenges when adapting it to new target domains, including huge memory and storage requirements and a long and sub-optimal multistage adaptation process. In this work, we present a novel framework, Efficient Stitchable Task Adaptation (ESTA), to efficiently produce a palette of fine-tuned models that adhere to diverse resource constraints. Specifically, we first tailor parameter-efficient fine-tuning to share low-rank updates among the stitches while maintaining independent bias terms. In this way, we largely reduce fine-tuning memory burdens and mitigate the interference among stitches that arises in task adaptation. Furthermore, we streamline a simple yet effective one-stage deployment pipeline, which estimates the important stitches to deploy with training-time gradient statistics. By assigning higher sampling probabilities to important stitches, we also get a boosted Pareto frontier. Extensive experiments on 25 downstream visual recognition tasks demonstrate that our ESTA is capable of generating stitches with smooth accuracy-efficiency trade-offs and surpasses the direct SN-Net adaptation by remarkable margins with significantly lower training time and fewer trainable parameters. Furthermore, we demonstrate the flexibility and scalability of our ESTA framework by stitching LLMs from LLaMA family, obtaining chatbot stitches of assorted sizes.
△ Less
Submitted 28 November, 2023;
originally announced November 2023.
-
Stochastic Three-Operator Splitting Algorithms for Nonconvex and Nonsmooth Optimization Arising from FLASH Radiotherapy
Authors:
Fengmiao Bian,
Jiulong Liu,
Xiaoqun Zhang,
Hao Gao,
Jian-Feng Cai
Abstract:
Radiation therapy (RT) aims to deliver tumoricidal doses with minimal radiation-induced normal-tissue toxicity. Compared to conventional RT (of conventional dose rate), FLASH-RT (of ultra-high dose rate) can provide additional normal tissue sparing, which however has created a new nonconvex and nonsmooth optimization problem that is highly challenging to solve. In this paper, we propose a stochast…
▽ More
Radiation therapy (RT) aims to deliver tumoricidal doses with minimal radiation-induced normal-tissue toxicity. Compared to conventional RT (of conventional dose rate), FLASH-RT (of ultra-high dose rate) can provide additional normal tissue sparing, which however has created a new nonconvex and nonsmooth optimization problem that is highly challenging to solve. In this paper, we propose a stochastic three-operator splitting (STOS) algorithm to address the FLASH optimization problem. We establish the convergence and convergence rates of the STOS algorithm under the nonconvex framework for both unbiased gradient estimators and variance-reduced gradient estimators. These stochastic gradient estimators include the most popular ones, such as SGD, SAGA, SARAH, and SVRG, among others. The effectiveness of the STOS algorithm is validated using FLASH radiotherapy planning for patients.
△ Less
Submitted 24 November, 2023;
originally announced November 2023.
-
Long-MIL: Scaling Long Contextual Multiple Instance Learning for Histopathology Whole Slide Image Analysis
Authors:
Honglin Li,
Yunlong Zhang,
Chenglu Zhu,
Jiatong Cai,
Sunyi Zheng,
Lin Yang
Abstract:
Histopathology image analysis is the golden standard of clinical diagnosis for Cancers. In doctors daily routine and computer-aided diagnosis, the Whole Slide Image (WSI) of histopathology tissue is used for analysis. Because of the extremely large scale of resolution, previous methods generally divide the WSI into a large number of patches, then aggregate all patches within a WSI by Multi-Instanc…
▽ More
Histopathology image analysis is the golden standard of clinical diagnosis for Cancers. In doctors daily routine and computer-aided diagnosis, the Whole Slide Image (WSI) of histopathology tissue is used for analysis. Because of the extremely large scale of resolution, previous methods generally divide the WSI into a large number of patches, then aggregate all patches within a WSI by Multi-Instance Learning (MIL) to make the slide-level prediction when develo** computer-aided diagnosis tools. However, most previous WSI-MIL models using global-attention without pairwise interaction and any positional information, or self-attention with absolute position embedding can not well handle shape varying large WSIs, e.g. testing WSIs after model deployment may be larger than training WSIs, since the model development set is always limited due to the difficulty of histopathology WSIs collection. To deal with the problem, in this paper, we propose to amend position embedding for shape varying long-contextual WSI by introducing Linear Bias into Attention, and adapt it from 1-d long sequence into 2-d long-contextual WSI which helps model extrapolate position embedding to unseen or under-fitted positions. We further utilize Flash-Attention module to tackle the computational complexity of Transformer, which also keep full self-attention performance compared to previous attention approximation work. Our method, Long-contextual MIL (Long-MIL) are evaluated on extensive experiments including 4 dataset including WSI classification and survival prediction tasks to validate the superiority on shape varying WSIs. The source code will be open-accessed soon.
△ Less
Submitted 20 November, 2023;
originally announced November 2023.