-
Spatially Randomized Designs Can Enhance Policy Evaluation
Authors:
Ying Yang,
Chengchun Shi,
Fang Yao,
Shouyang Wang,
Hongtu Zhu
Abstract:
This article studies the benefits of using spatially randomized experimental designs which partition the experimental area into distinct, non-overlap** units with treatments assigned randomly. Such designs offer improved policy evaluation in online experiments by providing more precise policy value estimators and more effective A/B testing algorithms than traditional global designs, which apply…
▽ More
This article studies the benefits of using spatially randomized experimental designs which partition the experimental area into distinct, non-overlap** units with treatments assigned randomly. Such designs offer improved policy evaluation in online experiments by providing more precise policy value estimators and more effective A/B testing algorithms than traditional global designs, which apply the same treatment across all units simultaneously. We examine both parametric and nonparametric methods for estimating and inferring policy values based on this randomized approach. Our analysis includes evaluating the mean squared error of the treatment effect estimator and the statistical power of the associated tests. Additionally, we extend our findings to experiments with spatio-temporal dependencies, where treatments are allocated sequentially over time, and account for potential temporal carryover effects. Our theoretical insights are supported by comprehensive numerical experiments.
△ Less
Submitted 17 March, 2024;
originally announced March 2024.
-
Test of lepton universality and measurement of the form factors of $D^0\to K^{*}(892)^-μ^+ν_μ$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (637 additional authors not shown)
Abstract:
We report a first study of the semileptonic decay $D^0\rightarrow K^-π^0μ^{+}ν_μ$ by analyzing an $e^+e^-$ annihilation data sample of $7.9~\mathrm{fb}^{-1}$ collected at the center-of-mass energy of 3.773 GeV with the BESIII detector. The absolute branching fraction of $D^0\to K^-π^0μ^{+}ν_μ$ is measured for the first time to be $(0.729 \pm 0.014_{\rm stat} \pm 0.011_{\rm syst})\%$. Based on an a…
▽ More
We report a first study of the semileptonic decay $D^0\rightarrow K^-π^0μ^{+}ν_μ$ by analyzing an $e^+e^-$ annihilation data sample of $7.9~\mathrm{fb}^{-1}$ collected at the center-of-mass energy of 3.773 GeV with the BESIII detector. The absolute branching fraction of $D^0\to K^-π^0μ^{+}ν_μ$ is measured for the first time to be $(0.729 \pm 0.014_{\rm stat} \pm 0.011_{\rm syst})\%$. Based on an amplitude analysis, the $S\text{-}{\rm wave}$ contribution is determined to be $(5.76 \pm 0.35_{\rm stat} \pm 0.29_{\rm syst})\%$ of the total decay rate in addition to the dominated $K^{*}(892)^-$ component. The branching fraction of $D^0\to K^{*}(892)^-μ^+ν_μ$ is given to be $(2.062 \pm 0.039_{\rm stat} \pm 0.032_{\rm syst})\%$, which improves the precision of the world average by a factor of 5. Combining with the world average of ${\mathcal B}(D^0\to K^{*}(892)^-e^+ν_e)$, the ratio of the branching fractions obtained is $\frac{{\mathcal B}(D^0\to K^{*}(892)^-μ^+ν_μ)}{{\mathcal B}(D^0\to K^{*}(892)^-e^+ν_e)} = 0.96\pm0.08$, in agreement with lepton flavor universality. Furthermore, assuming single-pole dominance parameterization, the most precise hadronic form factor ratios for $D^0\to K^{*}(892)^{-} μ^+ν_μ$ are extracted to be $r_{V}=V(0)/A_1(0)=1.37 \pm 0.09_{\rm stat} \pm 0.03_{\rm syst}$ and $r_{2}=A_2(0)/A_1(0)=0.76 \pm 0.06_{\rm stat} \pm 0.02_{\rm syst}$.
△ Less
Submitted 16 March, 2024;
originally announced March 2024.
-
MicroDiffusion: Implicit Representation-Guided Diffusion for 3D Reconstruction from Limited 2D Microscopy Projections
Authors:
Mude Hui,
Zihao Wei,
Hongru Zhu,
Fei Xia,
Yuyin Zhou
Abstract:
Volumetric optical microscopy using non-diffracting beams enables rapid imaging of 3D volumes by projecting them axially to 2D images but lacks crucial depth information. Addressing this, we introduce MicroDiffusion, a pioneering tool facilitating high-quality, depth-resolved 3D volume reconstruction from limited 2D projections. While existing Implicit Neural Representation (INR) models often yiel…
▽ More
Volumetric optical microscopy using non-diffracting beams enables rapid imaging of 3D volumes by projecting them axially to 2D images but lacks crucial depth information. Addressing this, we introduce MicroDiffusion, a pioneering tool facilitating high-quality, depth-resolved 3D volume reconstruction from limited 2D projections. While existing Implicit Neural Representation (INR) models often yield incomplete outputs and Denoising Diffusion Probabilistic Models (DDPM) excel at capturing details, our method integrates INR's structural coherence with DDPM's fine-detail enhancement capabilities. We pretrain an INR model to transform 2D axially-projected images into a preliminary 3D volume. This pretrained INR acts as a global prior guiding DDPM's generative process through a linear interpolation between INR outputs and noise inputs. This strategy enriches the diffusion process with structured 3D information, enhancing detail and reducing noise in localized 2D images. By conditioning the diffusion model on the closest 2D projection, MicroDiffusion substantially enhances fidelity in resulting 3D reconstructions, surpassing INR and standard DDPM outputs with unparalleled image quality and structural fidelity. Our code and dataset are available at https://github.com/UCSC-VLAA/MicroDiffusion.
△ Less
Submitted 16 March, 2024;
originally announced March 2024.
-
Surveyor: Facilitating Discovery Within Video Games for Blind and Low Vision Players
Authors:
Vishnu Nair,
Hanxiu 'Hazel' Zhu,
Peize Song,
Jizhong Wang,
Brian A. Smith
Abstract:
Video games are increasingly accessible to blind and low vision (BLV) players, yet many aspects remain inaccessible. One aspect is the joy players feel when they explore environments and make new discoveries, which is integral to many games. Sighted players experience discovery by surveying environments and identifying unexplored areas. Current accessibility tools, however, guide BLV players direc…
▽ More
Video games are increasingly accessible to blind and low vision (BLV) players, yet many aspects remain inaccessible. One aspect is the joy players feel when they explore environments and make new discoveries, which is integral to many games. Sighted players experience discovery by surveying environments and identifying unexplored areas. Current accessibility tools, however, guide BLV players directly to items and places, robbing them of that experience. Thus, a crucial challenge is to develop navigation assistance tools that also foster exploration and discovery. To address this challenge, we propose the concept of exploration assistance in games and design Surveyor, an in-game exploration assistance tool that enhances discovery by tracking where BLV players look and highlighting unexplored areas. We designed Surveyor using insights from a formative study and compared Surveyor's effectiveness to approaches found in existing accessible games. Our findings reveal implications for facilitating richer play experiences for BLV users within games.
△ Less
Submitted 15 March, 2024;
originally announced March 2024.
-
Measurements of All-Particle Energy Spectrum and Mean Logarithmic Mass of Cosmic Rays from 0.3 to 30 PeV with LHAASO-KM2A
Authors:
The LHAASO Collaboration,
Zhen Cao,
F. Aharonian,
Q. An,
A. Axikegu,
Y. X. Bai,
Y. W. Bao,
D. Bastieri,
X. J. Bi,
Y. J. Bi,
J. T. Cai,
Q. Cao,
W. Y. Cao,
Zhe Cao,
J. Chang,
J. F. Chang,
A. M. Chen,
E. S. Chen,
Liang Chen,
Lin Chen,
Long Chen,
M. J. Chen,
M. L. Chen,
Q. H. Chen,
S. H. Chen
, et al. (256 additional authors not shown)
Abstract:
We present the measurements of all-particle energy spectrum and mean logarithmic mass of cosmic rays in the energy range of 0.3-30 PeV using data collected from LHAASO-KM2A between September 2021 and December 2022, which is based on a nearly composition-independent energy reconstruction method, achieving unprecedented accuracy. Our analysis reveals the position of the knee at…
▽ More
We present the measurements of all-particle energy spectrum and mean logarithmic mass of cosmic rays in the energy range of 0.3-30 PeV using data collected from LHAASO-KM2A between September 2021 and December 2022, which is based on a nearly composition-independent energy reconstruction method, achieving unprecedented accuracy. Our analysis reveals the position of the knee at $3.67 \pm 0.05 \pm 0.15$ PeV. Below the knee, the spectral index is found to be -$2.7413 \pm 0.0004 \pm 0.0050$, while above the knee, it is -$3.128 \pm 0.005 \pm 0.027$, with the sharpness of the transition measured with a statistical error of 2%. The mean logarithmic mass of cosmic rays is almost heavier than helium in the whole measured energy range. It decreases from 1.7 at 0.3 PeV to 1.3 at 3 PeV, representing a 24% decline following a power law with an index of -$0.1200 \pm 0.0003 \pm 0.0341$. This is equivalent to an increase in abundance of light components. Above the knee, the mean logarithmic mass exhibits a power law trend towards heavier components, which is reversal to the behavior observed in the all-particle energy spectrum. Additionally, the knee position and the change in power-law index are approximately the same. These findings suggest that the knee observed in the all-particle spectrum corresponds to the knee of the light component, rather than the medium-heavy components.
△ Less
Submitted 26 March, 2024; v1 submitted 15 March, 2024;
originally announced March 2024.
-
Field Line Curvature Scattering in the Dayside Off-Equatorial Minima Regions
Authors:
Bin Cai,
Hui Zhu
Abstract:
Magnetic field line curvature (FLC) scattering is an effective mechanism for collisionless particle scattering. In the terrestrial magnetosphere, the FLC scattering plays an essential role in sha** the outer boundary of protons radiation belt, the rapid decay of ring current, and the formation of proton isotropic boundary (IB). However, previous studies have yet to adequately investigate the inf…
▽ More
Magnetic field line curvature (FLC) scattering is an effective mechanism for collisionless particle scattering. In the terrestrial magnetosphere, the FLC scattering plays an essential role in sha** the outer boundary of protons radiation belt, the rapid decay of ring current, and the formation of proton isotropic boundary (IB). However, previous studies have yet to adequately investigate the influence of FLC scattering on charged particles in the Earth's dayside magnetosphere, particularly in the off-equatorial magnetic minima regions. This study employs T89 magnetic field model to investigate the impacts of FLC scattering on ring current protons in the dayside magnetosphere, with a specific focus on the off-equatorial minimum regions. We analyze the spatial distributions of single and dual magnetic minima regions, adiabatic parameter, and pitch angle diffusion coefficients due to FLC scattering as functions of $Kp$. The results show that the effects of FLC scattering are significant not only on the dusk and dawn sides but also in the off-equatorial minima regions on the noon. Additionally, we investigate the role of dipole tilt angle in the hemispheric asymmetry of FLC scattering effects. The dipole tilt angle controls the overall displacement of the dayside magnetosphere, resulting in different FLC scattering effects in the two hemispheres. Our study holds significance for understanding the FLC scattering effects in the off-equatorial region of Earth's dayside magnetosphere and for constructing a more accurate dynamic model of particles.
△ Less
Submitted 14 March, 2024;
originally announced March 2024.
-
A Semantic Mention Graph Augmented Model for Document-Level Event Argument Extraction
Authors:
Jian Zhang,
Changlin Yang,
Hai** Zhu,
Qika Lin,
Fangzhi Xu,
Jun Liu
Abstract:
Document-level Event Argument Extraction (DEAE) aims to identify arguments and their specific roles from an unstructured document. The advanced approaches on DEAE utilize prompt-based methods to guide pre-trained language models (PLMs) in extracting arguments from input documents. They mainly concentrate on establishing relations between triggers and entity mentions within documents, leaving two u…
▽ More
Document-level Event Argument Extraction (DEAE) aims to identify arguments and their specific roles from an unstructured document. The advanced approaches on DEAE utilize prompt-based methods to guide pre-trained language models (PLMs) in extracting arguments from input documents. They mainly concentrate on establishing relations between triggers and entity mentions within documents, leaving two unresolved problems: a) independent modeling of entity mentions; b) document-prompt isolation. To this end, we propose a semantic mention Graph Augmented Model (GAM) to address these two problems in this paper. Firstly, GAM constructs a semantic mention graph that captures relations within and between documents and prompts, encompassing co-existence, co-reference and co-type relations. Furthermore, we introduce an ensembled graph transformer module to address mentions and their three semantic relations effectively. Later, the graph-augmented encoder-decoder module incorporates the relation-specific graph into the input embedding of PLMs and optimizes the encoder section with topology information, enhancing the relations comprehensively. Extensive experiments on the RAMS and WikiEvents datasets demonstrate the effectiveness of our approach, surpassing baseline methods and achieving a new state-of-the-art performance.
△ Less
Submitted 12 March, 2024;
originally announced March 2024.
-
Field test of mode-pairing quantum key distribution
Authors:
Hao-Tao Zhu,
Yizhi Huang,
Wen-Xin Pan,
Chao-Wu Zhou,
Jianjun Tang,
Hong He,
Ming Cheng,
Xiandu **,
Mi Zou,
Shibiao Tang,
Xiongfeng Ma,
Teng-Yun Chen,
Jian-Wei Pan
Abstract:
Quantum key distribution is a cornerstone of quantum technology, offering information-theoretical secure keys for remote parties. With many quantum communication networks established globally, the mode-pairing protocol stands out for its efficacy over inter-city distances using simple setups, emerging as a promising solution. In this study, we employ the mode-pairing scheme into existing inter-cit…
▽ More
Quantum key distribution is a cornerstone of quantum technology, offering information-theoretical secure keys for remote parties. With many quantum communication networks established globally, the mode-pairing protocol stands out for its efficacy over inter-city distances using simple setups, emerging as a promising solution. In this study, we employ the mode-pairing scheme into existing inter-city fiber links, conducting field tests across distances ranging from tens to about a hundred kilometers. Our system achieves a key rate of $1.217$ kbit/s in a $195.85$ km symmetric link and $3.089$ kbit/s in a $127.92$ km asymmetric link without global phase locking. The results demonstrate that the mode-pairing protocol can achieve key rates comparable to those of a single quantum link between two trusted nodes on the Bei**g-Shanghai backbone line, effectively reducing the need for half of the trusted nodes. These field tests confirm the mode-pairing scheme's adaptability, efficiency, and practicality, positioning it as a highly suitable protocol for quantum networks.
△ Less
Submitted 14 March, 2024;
originally announced March 2024.
-
TMDs from Semi-inclusive Energy Correlators
Authors:
Xiaohui Liu,
Hua Xing Zhu
Abstract:
We introduce a novel category of observables known as the Semi-Inclusive Energy Correlators (SIECs), an extension of the recently proposed nucleon energy correlator to integrate a new element, the fragmenting energy correlation function. These SIECs gauge the correlation between the examined hadron and the surrounding radiations, providing a comprehensive tomography of the radiative patterns origi…
▽ More
We introduce a novel category of observables known as the Semi-Inclusive Energy Correlators (SIECs), an extension of the recently proposed nucleon energy correlator to integrate a new element, the fragmenting energy correlation function. These SIECs gauge the correlation between the examined hadron and the surrounding radiations, providing a comprehensive tomography of the radiative patterns originating from initial state radiation or parton fragmentation. As such, they could function as the generating functions for numerous kinematic distributions. To illustrate, we find a direct relation between the transverse momentum moments (TMMs) of the transverse momentum-dependent (TMD) distributions and the SIECs. We demonstrate how the TMMs of the TMD parton distributions and the fragmentation functions can be distinctively derived from the nucleon energy correlator and the fragmenting energy correlator, respectively, without enforcing the back-to-back kinematics.
△ Less
Submitted 13 March, 2024;
originally announced March 2024.
-
SOTOPIA-$π$: Interactive Learning of Socially Intelligent Language Agents
Authors:
Ruiyi Wang,
Haofei Yu,
Wenxin Zhang,
Zhengyang Qi,
Maarten Sap,
Graham Neubig,
Yonatan Bisk,
Hao Zhu
Abstract:
Humans learn social skills through both imitation and social interaction. This social learning process is largely understudied by existing research on building language agents. Motivated by this gap, we propose an interactive learning method, SOTOPIA-$π$, improving the social intelligence of language agents. This method leverages behavior cloning and self-reinforcement training on filtered social…
▽ More
Humans learn social skills through both imitation and social interaction. This social learning process is largely understudied by existing research on building language agents. Motivated by this gap, we propose an interactive learning method, SOTOPIA-$π$, improving the social intelligence of language agents. This method leverages behavior cloning and self-reinforcement training on filtered social interaction data according to large language model (LLM) ratings. We show that our training method allows a 7B LLM to reach the social goal completion ability of an expert model (GPT-4-based agent), while improving the safety of language agents and maintaining general QA ability on the MMLU benchmark. We also find that this training paradigm uncovers some difficulties in LLM-based evaluation of social intelligence: LLM-based evaluators overestimate the abilities of the language agents trained specifically for social interaction.
△ Less
Submitted 25 April, 2024; v1 submitted 13 March, 2024;
originally announced March 2024.
-
A Dual-domain Regularization Method for Ring Artifact Removal of X-ray CT
Authors:
Hongyang Zhu,
Xin Lu,
Yanwei Qin,
Xinran Yu,
Tianjiao Sun,
Yunsong Zhao
Abstract:
Ring artifacts in computed tomography images, arising from the undesirable responses of detector units, significantly degrade image quality and diagnostic reliability. To address this challenge, we propose a dual-domain regularization model to effectively remove ring artifacts, while maintaining the integrity of the original CT image. The proposed model corrects the vertical stripe artifacts on th…
▽ More
Ring artifacts in computed tomography images, arising from the undesirable responses of detector units, significantly degrade image quality and diagnostic reliability. To address this challenge, we propose a dual-domain regularization model to effectively remove ring artifacts, while maintaining the integrity of the original CT image. The proposed model corrects the vertical stripe artifacts on the sinogram by innovatively updating the response inconsistency compensation coefficients of detector units, which is achieved by employing the group sparse constraint and the projection-view direction sparse constraint on the stripe artifacts. Simultaneously, we apply the sparse constraint on the reconstructed image to further rectified ring artifacts in the image domain. The key advantage of the proposed method lies in considering the relationship between the response inconsistency compensation coefficients of the detector units and the projection views, which enables a more accurate correction of the response of the detector units. An alternating minimization method is designed to solve the model. Comparative experiments on real photon counting detector data demonstrate that the proposed method not only surpasses existing methods in removing ring artifacts but also excels in preserving structural details and image fidelity.
△ Less
Submitted 14 March, 2024; v1 submitted 13 March, 2024;
originally announced March 2024.
-
MolBind: Multimodal Alignment of Language, Molecules, and Proteins
Authors:
Teng Xiao,
Chao Cui,
Huaisheng Zhu,
Vasant G. Honavar
Abstract:
Recent advancements in biology and chemistry have leveraged multi-modal learning, integrating molecules and their natural language descriptions to enhance drug discovery. However, current pre-training frameworks are limited to two modalities, and designing a unified network to process different modalities (e.g., natural language, 2D molecular graphs, 3D molecular conformations, and 3D proteins) re…
▽ More
Recent advancements in biology and chemistry have leveraged multi-modal learning, integrating molecules and their natural language descriptions to enhance drug discovery. However, current pre-training frameworks are limited to two modalities, and designing a unified network to process different modalities (e.g., natural language, 2D molecular graphs, 3D molecular conformations, and 3D proteins) remains challenging due to inherent gaps among them. In this work, we propose MolBind, a framework that trains encoders for multiple modalities through contrastive learning, map** all modalities to a shared feature space for multi-modal semantic alignment. To facilitate effective pre-training of MolBind on multiple modalities, we also build and collect a high-quality dataset with four modalities, MolBind-M4, including graph-language, conformation-language, graph-conformation, and conformation-protein paired data. MolBind shows superior zero-shot learning performance across a wide range of tasks, demonstrating its strong capability of capturing the underlying semantics of multiple modalities.
△ Less
Submitted 2 April, 2024; v1 submitted 12 March, 2024;
originally announced March 2024.
-
3M-Diffusion: Latent Multi-Modal Diffusion for Text-Guided Generation of Molecular Graphs
Authors:
Huaisheng Zhu,
Teng Xiao,
Vasant G Honavar
Abstract:
Generating molecules with desired properties is a critical task with broad applications in drug discovery and materials design. Inspired by recent advances in large language models, there is a growing interest in using natural language descriptions of molecules to generate molecules with the desired properties. Most existing methods focus on generating molecules that precisely match the text descr…
▽ More
Generating molecules with desired properties is a critical task with broad applications in drug discovery and materials design. Inspired by recent advances in large language models, there is a growing interest in using natural language descriptions of molecules to generate molecules with the desired properties. Most existing methods focus on generating molecules that precisely match the text description. However, practical applications call for methods that generate diverse, and ideally novel, molecules with the desired properties. We propose 3M-Diffusion, a novel multi-modal molecular graph generation method, to address this challenge. 3M-Diffusion first encodes molecular graphs into a graph latent space aligned with text descriptions. It then reconstructs the molecular structure and atomic attributes based on the given text descriptions using the molecule decoder. It then learns a probabilistic map** from the text space to the latent molecular graph space using a diffusion model. The results of our extensive experiments on several datasets demonstrate that 3M-Diffusion can generate high-quality, novel and diverse molecular graphs that semantically match the textual description provided.
△ Less
Submitted 11 March, 2024;
originally announced March 2024.
-
Determination of the number of $ψ(3686)$ events taken at BESIII
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (639 additional authors not shown)
Abstract:
The number of $ψ(3686)$ events collected by the BESIII detector during the 2021 run period is determined to be $(2259.3\pm 11.1)\times 10^6$ by counting inclusive $ψ(3686)$ hadronic events. The uncertainty is systematic and the statistical uncertainty is negligible. Meanwhile, the numbers of $ψ(3686)$ events collected during the 2009 and 2012 run periods are updated to be…
▽ More
The number of $ψ(3686)$ events collected by the BESIII detector during the 2021 run period is determined to be $(2259.3\pm 11.1)\times 10^6$ by counting inclusive $ψ(3686)$ hadronic events. The uncertainty is systematic and the statistical uncertainty is negligible. Meanwhile, the numbers of $ψ(3686)$ events collected during the 2009 and 2012 run periods are updated to be $(107.7\pm0.6)\times 10^6$ and $(345.4\pm 2.6)\times 10^6$, respectively. Both numbers are consistent with the previous measurements within one standard deviation. The total number of $ψ(3686)$ events in the three data samples is $(2712.4\pm14.3)\times10^6$.
△ Less
Submitted 28 May, 2024; v1 submitted 11 March, 2024;
originally announced March 2024.
-
Distribution-Aware Data Expansion with Diffusion Models
Authors:
Haowei Zhu,
Ling Yang,
Jun-Hai Yong,
Hongzhi Yin,
Jiawei Jiang,
Meng Xiao,
Wentao Zhang,
Bin Wang
Abstract:
The scale and quality of a dataset significantly impact the performance of deep models. However, acquiring large-scale annotated datasets is both a costly and time-consuming endeavor. To address this challenge, dataset expansion technologies aim to automatically augment datasets, unlocking the full potential of deep models. Current data expansion techniques include image transformation and image s…
▽ More
The scale and quality of a dataset significantly impact the performance of deep models. However, acquiring large-scale annotated datasets is both a costly and time-consuming endeavor. To address this challenge, dataset expansion technologies aim to automatically augment datasets, unlocking the full potential of deep models. Current data expansion techniques include image transformation and image synthesis methods. Transformation-based methods introduce only local variations, leading to limited diversity. In contrast, synthesis-based methods generate entirely new content, greatly enhancing informativeness. However, existing synthesis methods carry the risk of distribution deviations, potentially degrading model performance with out-of-distribution samples. In this paper, we propose DistDiff, a training-free data expansion framework based on the distribution-aware diffusion model. DistDiff constructs hierarchical prototypes to approximate the real data distribution, optimizing latent data points within diffusion models with hierarchical energy guidance. We demonstrate its capability to generate distribution-consistent samples, significantly improving data expansion tasks. DistDiff consistently enhances accuracy across a diverse range of datasets compared to models trained solely on original data. Furthermore, our approach consistently outperforms existing synthesis-based techniques and demonstrates compatibility with widely adopted transformation-based augmentation methods. Additionally, the expanded dataset exhibits robustness across various architectural frameworks. Our code is available at https://github.com/haoweiz23/DistDiff
△ Less
Submitted 5 June, 2024; v1 submitted 11 March, 2024;
originally announced March 2024.
-
Is Vanilla MLP in Neural Radiance Field Enough for Few-shot View Synthesis?
Authors:
Hanxin Zhu,
Tianyu He,
Xin Li,
Bingchen Li,
Zhibo Chen
Abstract:
Neural Radiance Field (NeRF) has achieved superior performance for novel view synthesis by modeling the scene with a Multi-Layer Perception (MLP) and a volume rendering procedure, however, when fewer known views are given (i.e., few-shot view synthesis), the model is prone to overfit the given views. To handle this issue, previous efforts have been made towards leveraging learned priors or introdu…
▽ More
Neural Radiance Field (NeRF) has achieved superior performance for novel view synthesis by modeling the scene with a Multi-Layer Perception (MLP) and a volume rendering procedure, however, when fewer known views are given (i.e., few-shot view synthesis), the model is prone to overfit the given views. To handle this issue, previous efforts have been made towards leveraging learned priors or introducing additional regularizations. In contrast, in this paper, we for the first time provide an orthogonal method from the perspective of network structure. Given the observation that trivially reducing the number of model parameters alleviates the overfitting issue, but at the cost of missing details, we propose the multi-input MLP (mi-MLP) that incorporates the inputs (i.e., location and viewing direction) of the vanilla MLP into each layer to prevent the overfitting issue without harming detailed synthesis. To further reduce the artifacts, we propose to model colors and volume density separately and present two regularization terms. Extensive experiments on multiple datasets demonstrate that: 1) although the proposed mi-MLP is easy to implement, it is surprisingly effective as it boosts the PSNR of the baseline from $14.73$ to $24.23$. 2) the overall framework achieves state-of-the-art results on a wide range of benchmarks. We will release the code upon publication.
△ Less
Submitted 9 March, 2024;
originally announced March 2024.
-
Mask-Enhanced Segment Anything Model for Tumor Lesion Semantic Segmentation
Authors:
Hairong Shi,
Songhao Han,
Shaofei Huang,
Yue Liao,
Guanbin Li,
Xiangxing Kong,
Hua Zhu,
Xiaomu Wang,
Si Liu
Abstract:
Tumor lesion segmentation on CT or MRI images plays a critical role in cancer diagnosis and treatment planning. Considering the inherent differences in tumor lesion segmentation data across various medical imaging modalities and equipment, integrating medical knowledge into the Segment Anything Model (SAM) presents promising capability due to its versatility and generalization potential. Recent st…
▽ More
Tumor lesion segmentation on CT or MRI images plays a critical role in cancer diagnosis and treatment planning. Considering the inherent differences in tumor lesion segmentation data across various medical imaging modalities and equipment, integrating medical knowledge into the Segment Anything Model (SAM) presents promising capability due to its versatility and generalization potential. Recent studies have attempted to enhance SAM with medical expertise by pre-training on large-scale medical segmentation datasets. However, challenges still exist in 3D tumor lesion segmentation owing to tumor complexity and the imbalance in foreground and background regions. Therefore, we introduce Mask-Enhanced SAM (M-SAM), an innovative architecture tailored for 3D tumor lesion segmentation. We propose a novel Mask-Enhanced Adapter (MEA) within M-SAM that enriches the semantic information of medical images with positional data from coarse segmentation masks, facilitating the generation of more precise segmentation masks. Furthermore, an iterative refinement scheme is implemented in M-SAM to refine the segmentation masks progressively, leading to improved performance. Extensive experiments on seven tumor lesion segmentation datasets indicate that our M-SAM not only achieves high segmentation accuracy but also exhibits robust generalization.
△ Less
Submitted 9 March, 2024;
originally announced March 2024.
-
Asymptotic Variation of Elementary Abelian p-Extensions over $P^1$
Authors:
Hui June Zhu
Abstract:
In this paper, we prove that there exists a Zariski dense open subset $U$ in the parameter space of all elementary $p$-covers of the projective line that ramified at exactly one point, defined over the rationals, such that for every curve $X$ in $U(\overline{Q})$ and for any prime $p$ large enough, the reduction of $X$ at all primes lying over $p$ achieves its generic Newton slopes.
In this paper, we prove that there exists a Zariski dense open subset $U$ in the parameter space of all elementary $p$-covers of the projective line that ramified at exactly one point, defined over the rationals, such that for every curve $X$ in $U(\overline{Q})$ and for any prime $p$ large enough, the reduction of $X$ at all primes lying over $p$ achieves its generic Newton slopes.
△ Less
Submitted 8 March, 2024;
originally announced March 2024.
-
Embodied Understanding of Driving Scenarios
Authors:
Yunsong Zhou,
Linyan Huang,
Qingwen Bu,
Jia Zeng,
Tianyu Li,
Hang Qiu,
Hongzi Zhu,
Minyi Guo,
Yu Qiao,
Hongyang Li
Abstract:
Embodied scene understanding serves as the cornerstone for autonomous agents to perceive, interpret, and respond to open driving scenarios. Such understanding is typically founded upon Vision-Language Models (VLMs). Nevertheless, existing VLMs are restricted to the 2D domain, devoid of spatial awareness and long-horizon extrapolation proficiencies. We revisit the key aspects of autonomous driving…
▽ More
Embodied scene understanding serves as the cornerstone for autonomous agents to perceive, interpret, and respond to open driving scenarios. Such understanding is typically founded upon Vision-Language Models (VLMs). Nevertheless, existing VLMs are restricted to the 2D domain, devoid of spatial awareness and long-horizon extrapolation proficiencies. We revisit the key aspects of autonomous driving and formulate appropriate rubrics. Hereby, we introduce the Embodied Language Model (ELM), a comprehensive framework tailored for agents' understanding of driving scenes with large spatial and temporal spans. ELM incorporates space-aware pre-training to endow the agent with robust spatial localization capabilities. Besides, the model employs time-aware token selection to accurately inquire about temporal cues. We instantiate ELM on the reformulated multi-faced benchmark, and it surpasses previous state-of-the-art approaches in all aspects. All code, data, and models will be publicly shared.
△ Less
Submitted 7 March, 2024;
originally announced March 2024.
-
An Adaptable, Safe, and Portable Robot-Assisted Feeding System
Authors:
Ethan Kroll Gordon,
Rajat Kumar Jenamani,
Amal Nanavati,
Ziang Liu,
Haya Bolotski,
Raida Karim,
Daniel Stabile,
Atharva Kashyap,
Bernie Hao Zhu,
Xilai Dai,
Tyler Schrenk,
Jonathan Ko,
Taylor Kessler Faulkner,
Tapomayukh Bhattacharjee,
Siddhartha Srinivasa
Abstract:
We demonstrate a robot-assisted feeding system that enables people with mobility impairments to feed themselves. Our system design embodies Safety, Portability, and User Control, with comprehensive full-stack safety checks, the ability to be mounted on and powered by any powered wheelchair, and a custom web-app allowing care-recipients to leverage their own assistive devices for robot control. For…
▽ More
We demonstrate a robot-assisted feeding system that enables people with mobility impairments to feed themselves. Our system design embodies Safety, Portability, and User Control, with comprehensive full-stack safety checks, the ability to be mounted on and powered by any powered wheelchair, and a custom web-app allowing care-recipients to leverage their own assistive devices for robot control. For bite acquisition, we leverage multi-modal online learning to tractably adapt to unseen food types. For bite transfer, we leverage real-time mouth perception and interaction-aware control. Co-designed with community researchers, our system has been validated through multiple end-user studies.
△ Less
Submitted 6 March, 2024;
originally announced March 2024.
-
Efficient Search and Learning for Agile Locomotion on Step** Stones
Authors:
Adithya Kumar Chinnakkonda Ravi,
Victor Dhédin,
Armand Jordana,
Huaijiang Zhu,
Avadesh Meduri,
Ludovic Righetti,
Bernhard Schölkopf,
Majid Khadiv
Abstract:
Legged robots have become capable of performing highly dynamic maneuvers in the past few years. However, agile locomotion in highly constrained environments such as step** stones is still a challenge. In this paper, we propose a combination of model-based control, search, and learning to design efficient control policies for agile locomotion on step** stones. In our framework, we use nonlinear…
▽ More
Legged robots have become capable of performing highly dynamic maneuvers in the past few years. However, agile locomotion in highly constrained environments such as step** stones is still a challenge. In this paper, we propose a combination of model-based control, search, and learning to design efficient control policies for agile locomotion on step** stones. In our framework, we use nonlinear model predictive control (NMPC) to generate whole-body motions for a given contact plan. To efficiently search for an optimal contact plan, we propose to use Monte Carlo tree search (MCTS). While the combination of MCTS and NMPC can quickly find a feasible plan for a given environment (a few seconds), it is not yet suitable to be used as a reactive policy. Hence, we generate a dataset for optimal goal-conditioned policy for a given scene and learn it through supervised learning. In particular, we leverage the power of diffusion models in handling multi-modality in the dataset. We test our proposed framework on a scenario where our quadruped robot Solo12 successfully jumps to different goals in a highly constrained environment.
△ Less
Submitted 6 March, 2024;
originally announced March 2024.
-
Extend Your Own Correspondences: Unsupervised Distant Point Cloud Registration by Progressive Distance Extension
Authors:
Quan Liu,
Hongzi Zhu,
Zhenxi Wang,
Yunsong Zhou,
Shan Chang,
Minyi Guo
Abstract:
Registration of point clouds collected from a pair of distant vehicles provides a comprehensive and accurate 3D view of the driving scenario, which is vital for driving safety related applications, yet existing literature suffers from the expensive pose label acquisition and the deficiency to generalize to new data distributions. In this paper, we propose EYOC, an unsupervised distant point cloud…
▽ More
Registration of point clouds collected from a pair of distant vehicles provides a comprehensive and accurate 3D view of the driving scenario, which is vital for driving safety related applications, yet existing literature suffers from the expensive pose label acquisition and the deficiency to generalize to new data distributions. In this paper, we propose EYOC, an unsupervised distant point cloud registration method that adapts to new point cloud distributions on the fly, requiring no global pose labels. The core idea of EYOC is to train a feature extractor in a progressive fashion, where in each round, the feature extractor, trained with near point cloud pairs, can label slightly farther point cloud pairs, enabling self-supervision on such far point cloud pairs. This process continues until the derived extractor can be used to register distant point clouds. Particularly, to enable high-fidelity correspondence label generation, we devise an effective spatial filtering scheme to select the most representative correspondences to register a point cloud pair, and then utilize the aligned point clouds to discover more correct correspondences. Experiments show that EYOC can achieve comparable performance with state-of-the-art supervised methods at a lower training cost. Moreover, it outwits supervised methods regarding generalization performance on new data distributions.
△ Less
Submitted 27 March, 2024; v1 submitted 6 March, 2024;
originally announced March 2024.
-
Observation of the decay $h_{c}\to3(π^{+}π^{-})π^{0}$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (639 additional authors not shown)
Abstract:
Based on $(2712.4\pm14.1)\times10^{6}$ $ψ(3686)$ events collected with the BESIII detector, we study the decays $h_{c}\to3(π^{+}π^{-})π^{0}$, $h_{c}\to2(π^{+}π^{-})ω$, $h_{c}\to2(π^{+}π^{-})π^{0}η$, $h_{c}\to2(π^{+}π^{-})η$, and $h_{c}\to p\bar{p}$ via $ψ(3686)\toπ^{0}h_{c}$. The decay channel $h_{c}\to3(π^{+}π^{-})π^{0}$ is observed for the first time, and its branching fraction is determined to…
▽ More
Based on $(2712.4\pm14.1)\times10^{6}$ $ψ(3686)$ events collected with the BESIII detector, we study the decays $h_{c}\to3(π^{+}π^{-})π^{0}$, $h_{c}\to2(π^{+}π^{-})ω$, $h_{c}\to2(π^{+}π^{-})π^{0}η$, $h_{c}\to2(π^{+}π^{-})η$, and $h_{c}\to p\bar{p}$ via $ψ(3686)\toπ^{0}h_{c}$. The decay channel $h_{c}\to3(π^{+}π^{-})π^{0}$ is observed for the first time, and its branching fraction is determined to be $\left( {9.28\pm 1.14 \pm 0.77} \right) \times {10^{ - 3}}$, where the first uncertainty is statistical and the second is systematic. In addition, first evidence is found for the modes $h_{c} \to 2(π^{+}π^{-})π^{0}η$ and $h_{c}\to2(π^{+}π^{-})ω$ with significances of 4.8$σ$ and 4.7$σ$, and their branching fractions are determined to be $(7.55\pm1.51\pm0.77)\times10^{-3}$ and $\left( {4.00 \pm 0.86 \pm 0.35}\right) \times {10^{ - 3}}$, respectively. No significant signals of $h_c\to 2(π^+π^-)η$ and $h_{c}\to p\bar{p}$ are observed, and the upper limits of the branching fractions of these decays are determined to be $<6.19\times10^{-4}$ and $<4.40\times10^{-5}$ at the 90% confidence level, respectively.
△ Less
Submitted 6 March, 2024;
originally announced March 2024.
-
Comprehensive evaluation of Mal-API-2019 dataset by machine learning in malware detection
Authors:
Zhenglin Li,
Haibei Zhu,
Houze Liu,
**tong Song,
Qishuo Cheng
Abstract:
This study conducts a thorough examination of malware detection using machine learning techniques, focusing on the evaluation of various classification models using the Mal-API-2019 dataset. The aim is to advance cybersecurity capabilities by identifying and mitigating threats more effectively. Both ensemble and non-ensemble machine learning methods, such as Random Forest, XGBoost, K Nearest Neigh…
▽ More
This study conducts a thorough examination of malware detection using machine learning techniques, focusing on the evaluation of various classification models using the Mal-API-2019 dataset. The aim is to advance cybersecurity capabilities by identifying and mitigating threats more effectively. Both ensemble and non-ensemble machine learning methods, such as Random Forest, XGBoost, K Nearest Neighbor (KNN), and Neural Networks, are explored. Special emphasis is placed on the importance of data pre-processing techniques, particularly TF-IDF representation and Principal Component Analysis, in improving model performance. Results indicate that ensemble methods, particularly Random Forest and XGBoost, exhibit superior accuracy, precision, and recall compared to others, highlighting their effectiveness in malware detection. The paper also discusses limitations and potential future directions, emphasizing the need for continuous adaptation to address the evolving nature of malware. This research contributes to ongoing discussions in cybersecurity and provides practical insights for develo** more robust malware detection systems in the digital era.
△ Less
Submitted 25 March, 2024; v1 submitted 4 March, 2024;
originally announced March 2024.
-
Observation of $ψ(3686)\to 3φ$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (645 additional authors not shown)
Abstract:
Using $(2.712\pm0.014)\times 10^9$ $ψ(3686)$ events collected by the BESIII detector operating at the BEPCII collider, we report the first observation of $ψ(3686)\to 3φ$ decay with a significance larger than 10$σ$. The branching fraction of this decay is determined to be $(1.46\pm0.05\pm0.17)\times10^{-5}$, where the first uncertainty is statistical and the second is systematic. No significant str…
▽ More
Using $(2.712\pm0.014)\times 10^9$ $ψ(3686)$ events collected by the BESIII detector operating at the BEPCII collider, we report the first observation of $ψ(3686)\to 3φ$ decay with a significance larger than 10$σ$. The branching fraction of this decay is determined to be $(1.46\pm0.05\pm0.17)\times10^{-5}$, where the first uncertainty is statistical and the second is systematic. No significant structure is observed in the $φφ$ invariant mass spectra.
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
Uncovering the Sino-US dynamic risk spillovers effects: Evidence from agricultural futures markets
Authors:
Han-Yu Zhu,
Peng-Fei Dai,
Wei-Xing Zhou
Abstract:
Agricultural products play a critical role in human development. With economic globalization and the financialization of agricultural products continuing to advance, the interconnections between different agricultural futures have become closer. We utilize a TVP-VAR-DY model combined with the quantile method to measure the risk spillover between 11 agricultural futures on the futures exchanges of…
▽ More
Agricultural products play a critical role in human development. With economic globalization and the financialization of agricultural products continuing to advance, the interconnections between different agricultural futures have become closer. We utilize a TVP-VAR-DY model combined with the quantile method to measure the risk spillover between 11 agricultural futures on the futures exchanges of US and China from July 9,2014, to December 31,2022. This study yielded several significant findings. Firstly, CBOT corn, soybean, and wheat were identified as the primary risk transmitters, with DCE corn and soybean as the main risk receivers. Secondly, sudden events or increased economic uncertainty can increase the overall risk spillovers. Thirdly, there is an aggregation of risk spillovers amongst agricultural futures based on the dynamic directional spillover results. Lastly, the central agricultural futures under the conditional mean are CBOT corn and soybean, while CZCE hard wheat and long-grained rice are the two risk spillover centers in extreme cases, as per the results of the spillover network and minimum spanning tree. Based on these results, decision-makers are advised to safeguard against the price risk of agricultural futures under sudden economic events, and investors can utilize the results to construct a superior investment portfolio by taking different agricultural product futures as risk-leading indicators according to various situations.
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
Minimax-Regret Sample Selection in Randomized Experiments
Authors:
Yuchen Hu,
Henry Zhu,
Emma Brunskill,
Stefan Wager
Abstract:
Randomized controlled trials are often run in settings with many subpopulations that may have differential benefits from the treatment being evaluated. We consider the problem of sample selection, i.e., whom to enroll in a randomized trial, such as to optimize welfare in a heterogeneous population. We formalize this problem within the minimax-regret framework, and derive optimal sample-selection s…
▽ More
Randomized controlled trials are often run in settings with many subpopulations that may have differential benefits from the treatment being evaluated. We consider the problem of sample selection, i.e., whom to enroll in a randomized trial, such as to optimize welfare in a heterogeneous population. We formalize this problem within the minimax-regret framework, and derive optimal sample-selection schemes under a variety of conditions. Using data from a COVID-19 vaccine trial, we also highlight how different objectives and decision rules can lead to meaningfully different guidance regarding optimal sample allocation.
△ Less
Submitted 25 June, 2024; v1 submitted 2 March, 2024;
originally announced March 2024.
-
Synthesizing study-specific controls using generative models on open access datasets for harmonized multi-study analyses
Authors:
Shruti P. Gadewar,
Alyssa H. Zhu,
Iyad Ba Gari,
Sunanda Somu,
Sophia I. Thomopoulos,
Paul M. Thompson,
Talia M. Nir,
Neda Jahanshad
Abstract:
Neuroimaging consortia can enhance reliability and generalizability of findings by pooling data across studies to achieve larger sample sizes. To adjust for site and MRI protocol effects, imaging datasets are often harmonized based on healthy controls. When data from a control group were not collected, statistical harmonization options are limited as patient characteristics and acquisition-related…
▽ More
Neuroimaging consortia can enhance reliability and generalizability of findings by pooling data across studies to achieve larger sample sizes. To adjust for site and MRI protocol effects, imaging datasets are often harmonized based on healthy controls. When data from a control group were not collected, statistical harmonization options are limited as patient characteristics and acquisition-related variables may be confounded. Here, in a multi-study neuroimaging analysis of Alzheimer's patients and controls, we tested whether it is possible to generate synthetic control MRIs. For one case-control study, we used a generative adversarial model for style-based harmonization to generate site-specific controls. Downstream feature extraction, statistical harmonization and group-level multi-study case-control and case-only analyses were performed twice, using either true or synthetic controls. All effect sizes using synthetic controls overlapped with those based on true study controls. This line of work may facilitate wider inclusion of case-only studies in multi-study consortia.
△ Less
Submitted 29 February, 2024;
originally announced March 2024.
-
SeD: Semantic-Aware Discriminator for Image Super-Resolution
Authors:
Bingchen Li,
Xin Li,
Hanxin Zhu,
Yeying **,
Ruoyu Feng,
Zhizheng Zhang,
Zhibo Chen
Abstract:
Generative Adversarial Networks (GANs) have been widely used to recover vivid textures in image super-resolution (SR) tasks. In particular, one discriminator is utilized to enable the SR network to learn the distribution of real-world high-quality images in an adversarial training manner. However, the distribution learning is overly coarse-grained, which is susceptible to virtual textures and caus…
▽ More
Generative Adversarial Networks (GANs) have been widely used to recover vivid textures in image super-resolution (SR) tasks. In particular, one discriminator is utilized to enable the SR network to learn the distribution of real-world high-quality images in an adversarial training manner. However, the distribution learning is overly coarse-grained, which is susceptible to virtual textures and causes counter-intuitive generation results. To mitigate this, we propose the simple and effective Semantic-aware Discriminator (denoted as SeD), which encourages the SR network to learn the fine-grained distributions by introducing the semantics of images as a condition. Concretely, we aim to excavate the semantics of images from a well-trained semantic extractor. Under different semantics, the discriminator is able to distinguish the real-fake images individually and adaptively, which guides the SR network to learn the more fine-grained semantic-aware textures. To obtain accurate and abundant semantics, we take full advantage of recently popular pretrained vision models (PVMs) with extensive datasets, and then incorporate its semantic features into the discriminator through a well-designed spatial cross-attention module. In this way, our proposed semantic-aware discriminator empowered the SR network to produce more photo-realistic and pleasing images. Extensive experiments on two typical tasks, i.e., SR and Real SR have demonstrated the effectiveness of our proposed methods.
△ Less
Submitted 29 February, 2024;
originally announced February 2024.
-
PlanGPT: Enhancing Urban Planning with Tailored Language Model and Efficient Retrieval
Authors:
He Zhu,
Wenjia Zhang,
Nuoxian Huang,
Boyang Li,
Luyao Niu,
Zipei Fan,
Tianle Lun,
Yicheng Tao,
Junyou Su,
Zhaoya Gong,
Chenyu Fang,
Xing Liu
Abstract:
In the field of urban planning, general-purpose large language models often struggle to meet the specific needs of planners. Tasks like generating urban planning texts, retrieving related information, and evaluating planning documents pose unique challenges. To enhance the efficiency of urban professionals and overcome these obstacles, we introduce PlanGPT, the first specialized Large Language Mod…
▽ More
In the field of urban planning, general-purpose large language models often struggle to meet the specific needs of planners. Tasks like generating urban planning texts, retrieving related information, and evaluating planning documents pose unique challenges. To enhance the efficiency of urban professionals and overcome these obstacles, we introduce PlanGPT, the first specialized Large Language Model tailored for urban and spatial planning. Developed through collaborative efforts with institutions like the Chinese Academy of Urban Planning, PlanGPT leverages a customized local database retrieval framework, domain-specific fine-tuning of base models, and advanced tooling capabilities. Empirical tests demonstrate that PlanGPT has achieved advanced performance, delivering responses of superior quality precisely tailored to the intricacies of urban planning.
△ Less
Submitted 29 February, 2024;
originally announced February 2024.
-
The Situate AI Guidebook: Co-Designing a Toolkit to Support Multi-Stakeholder Early-stage Deliberations Around Public Sector AI Proposals
Authors:
Anna Kawakami,
Amanda Coston,
Haiyi Zhu,
Hoda Heidari,
Kenneth Holstein
Abstract:
Public sector agencies are rapidly deploying AI systems to augment or automate critical decisions in real-world contexts like child welfare, criminal justice, and public health. A growing body of work documents how these AI systems often fail to improve services in practice. These failures can often be traced to decisions made during the early stages of AI ideation and design, such as problem form…
▽ More
Public sector agencies are rapidly deploying AI systems to augment or automate critical decisions in real-world contexts like child welfare, criminal justice, and public health. A growing body of work documents how these AI systems often fail to improve services in practice. These failures can often be traced to decisions made during the early stages of AI ideation and design, such as problem formulation. However, today, we lack systematic processes to support effective, early-stage decision-making about whether and under what conditions to move forward with a proposed AI project. To understand how to scaffold such processes in real-world settings, we worked with public sector agency leaders, AI developers, frontline workers, and community advocates across four public sector agencies and three community advocacy groups in the United States. Through an iterative co-design process, we created the Situate AI Guidebook: a structured process centered around a set of deliberation questions to scaffold conversations around (1) goals and intended use or a proposed AI system, (2) societal and legal considerations, (3) data and modeling constraints, and (4) organizational governance factors. We discuss how the guidebook's design is informed by participants' challenges, needs, and desires for improved deliberation processes. We further elaborate on implications for designing responsible AI toolkits in collaboration with public sector agency stakeholders and opportunities for future work to expand upon the guidebook. This design approach can be more broadly adopted to support the co-creation of responsible AI toolkits that scaffold key decision-making processes surrounding the use of AI in the public sector and beyond.
△ Less
Submitted 5 March, 2024; v1 submitted 28 February, 2024;
originally announced February 2024.
-
Neural Radiance Fields in Medical Imaging: Challenges and Next Steps
Authors:
Xin Wang,
Shu Hu,
Heng Fan,
Hongtu Zhu,
Xin Li
Abstract:
Neural Radiance Fields (NeRF), as a pioneering technique in computer vision, offer great potential to revolutionize medical imaging by synthesizing three-dimensional representations from the projected two-dimensional image data. However, they face unique challenges when applied to medical applications. This paper presents a comprehensive examination of applications of NeRFs in medical imaging, hig…
▽ More
Neural Radiance Fields (NeRF), as a pioneering technique in computer vision, offer great potential to revolutionize medical imaging by synthesizing three-dimensional representations from the projected two-dimensional image data. However, they face unique challenges when applied to medical applications. This paper presents a comprehensive examination of applications of NeRFs in medical imaging, highlighting four imminent challenges, including fundamental imaging principles, inner structure requirement, object boundary definition, and color density significance. We discuss current methods on different organs and discuss related limitations. We also review several datasets and evaluation metrics and propose several promising directions for future research.
△ Less
Submitted 21 March, 2024; v1 submitted 26 February, 2024;
originally announced February 2024.
-
Confidence-Aware Multi-Field Model Calibration
Authors:
Yuang Zhao,
Chuhan Wu,
Qinglin Jia,
Hong Zhu,
Jia Yan,
Libin Zong,
Linxuan Zhang,
Zhenhua Dong,
Muyu Zhang
Abstract:
Accurately predicting the probabilities of user feedback, such as clicks and conversions, is critical for advertisement ranking and bidding. However, there often exist unwanted mismatches between predicted probabilities and true likelihoods due to the rapid shift of data distributions and intrinsic model biases. Calibration aims to address this issue by post-processing model predictions, and field…
▽ More
Accurately predicting the probabilities of user feedback, such as clicks and conversions, is critical for advertisement ranking and bidding. However, there often exist unwanted mismatches between predicted probabilities and true likelihoods due to the rapid shift of data distributions and intrinsic model biases. Calibration aims to address this issue by post-processing model predictions, and field-aware calibration can adjust model output on different feature field values to satisfy fine-grained advertising demands. Unfortunately, the observed samples corresponding to certain field values can be seriously limited to make confident calibrations, which may yield bias amplification and online disturbance. In this paper, we propose a confidence-aware multi-field calibration method, which adaptively adjusts the calibration intensity based on confidence levels derived from sample statistics. It also utilizes multiple fields for joint model calibration according to their importance to mitigate the impact of data sparsity on a single field. Extensive offline and online experiments show the superiority of our method in boosting advertising performance and reducing prediction deviations.
△ Less
Submitted 21 May, 2024; v1 submitted 27 February, 2024;
originally announced February 2024.
-
Towards Open-ended Visual Quality Comparison
Authors:
Haoning Wu,
Hanwei Zhu,
Zicheng Zhang,
Erli Zhang,
Chaofeng Chen,
Liang Liao,
Chunyi Li,
Annan Wang,
Wenxiu Sun,
Qiong Yan,
Xiaohong Liu,
Guangtao Zhai,
Shiqi Wang,
Weisi Lin
Abstract:
Comparative settings (e.g. pairwise choice, listwise ranking) have been adopted by a wide range of subjective studies for image quality assessment (IQA), as it inherently standardizes the evaluation criteria across different observers and offer more clear-cut responses. In this work, we extend the edge of emerging large multi-modality models (LMMs) to further advance visual quality comparison into…
▽ More
Comparative settings (e.g. pairwise choice, listwise ranking) have been adopted by a wide range of subjective studies for image quality assessment (IQA), as it inherently standardizes the evaluation criteria across different observers and offer more clear-cut responses. In this work, we extend the edge of emerging large multi-modality models (LMMs) to further advance visual quality comparison into open-ended settings, that 1) can respond to open-range questions on quality comparison; 2) can provide detailed reasonings beyond direct answers. To this end, we propose the Co-Instruct. To train this first-of-its-kind open-source open-ended visual quality comparer, we collect the Co-Instruct-562K dataset, from two sources: (a) LLM-merged single image quality description, (b) GPT-4V "teacher" responses on unlabeled data. Furthermore, to better evaluate this setting, we propose the MICBench, the first benchmark on multi-image comparison for LMMs. We demonstrate that Co-Instruct not only achieves in average 30% higher accuracy than state-of-the-art open-source LMMs, but also outperforms GPT-4V (its teacher), on both existing related benchmarks and the proposed MICBench. Our model is published at https://huggingface.co/q-future/co-instruct.
△ Less
Submitted 4 March, 2024; v1 submitted 26 February, 2024;
originally announced February 2024.
-
CMC: Few-shot Novel View Synthesis via Cross-view Multiplane Consistency
Authors:
Hanxin Zhu,
Tianyu He,
Zhibo Chen
Abstract:
Neural Radiance Field (NeRF) has shown impressive results in novel view synthesis, particularly in Virtual Reality (VR) and Augmented Reality (AR), thanks to its ability to represent scenes continuously. However, when just a few input view images are available, NeRF tends to overfit the given views and thus make the estimated depths of pixels share almost the same value. Unlike previous methods th…
▽ More
Neural Radiance Field (NeRF) has shown impressive results in novel view synthesis, particularly in Virtual Reality (VR) and Augmented Reality (AR), thanks to its ability to represent scenes continuously. However, when just a few input view images are available, NeRF tends to overfit the given views and thus make the estimated depths of pixels share almost the same value. Unlike previous methods that conduct regularization by introducing complex priors or additional supervisions, we propose a simple yet effective method that explicitly builds depth-aware consistency across input views to tackle this challenge. Our key insight is that by forcing the same spatial points to be sampled repeatedly in different input views, we are able to strengthen the interactions between views and therefore alleviate the overfitting problem. To achieve this, we build the neural networks on layered representations (\textit{i.e.}, multiplane images), and the sampling point can thus be resampled on multiple discrete planes. Furthermore, to regularize the unseen target views, we constrain the rendered colors and depths from different input views to be the same. Although simple, extensive experiments demonstrate that our proposed method can achieve better synthesis quality over state-of-the-art methods.
△ Less
Submitted 26 February, 2024;
originally announced February 2024.
-
Quadratic Message Passing for Generalized Quadratic Equations Model
Authors:
Huimin Zhu
Abstract:
For approximate inference in the generalized quadratic equations model, many state-of-the-art algorithms lack any prior knowledge of the target signal structure, exhibits slow convergence, and can not handle any analytic prior knowledge of the target signal structure. So, this paper proposes a new algorithm, Quadratic Message passing (QMP). QMP has a complexity as low as $O(N^{3})$. The SE derived…
▽ More
For approximate inference in the generalized quadratic equations model, many state-of-the-art algorithms lack any prior knowledge of the target signal structure, exhibits slow convergence, and can not handle any analytic prior knowledge of the target signal structure. So, this paper proposes a new algorithm, Quadratic Message passing (QMP). QMP has a complexity as low as $O(N^{3})$. The SE derived for QMP can capture precisely the per-iteration behavior of the simulated algorithm. Simulation results confirm QMP outperforms many state-of-the-art algorithms.
△ Less
Submitted 25 February, 2024;
originally announced February 2024.
-
On universality for the kinetic wave equation
Authors:
Pierre Germain,
Hui Zhu
Abstract:
On compact Riemannian manifolds with chaotic geometries, specifically those exhibiting the random wave model conjectured by Berry, we derive heuristically a homogeneous kinetic wave equation that is universal for all such manifolds.
On compact Riemannian manifolds with chaotic geometries, specifically those exhibiting the random wave model conjectured by Berry, we derive heuristically a homogeneous kinetic wave equation that is universal for all such manifolds.
△ Less
Submitted 22 February, 2024;
originally announced February 2024.
-
GAM-Depth: Self-Supervised Indoor Depth Estimation Leveraging a Gradient-Aware Mask and Semantic Constraints
Authors:
Anqi Cheng,
Zhiyuan Yang,
Haiyue Zhu,
Kezhi Mao
Abstract:
Self-supervised depth estimation has evolved into an image reconstruction task that minimizes a photometric loss. While recent methods have made strides in indoor depth estimation, they often produce inconsistent depth estimation in textureless areas and unsatisfactory depth discrepancies at object boundaries. To address these issues, in this work, we propose GAM-Depth, developed upon two novel co…
▽ More
Self-supervised depth estimation has evolved into an image reconstruction task that minimizes a photometric loss. While recent methods have made strides in indoor depth estimation, they often produce inconsistent depth estimation in textureless areas and unsatisfactory depth discrepancies at object boundaries. To address these issues, in this work, we propose GAM-Depth, developed upon two novel components: gradient-aware mask and semantic constraints. The gradient-aware mask enables adaptive and robust supervision for both key areas and textureless regions by allocating weights based on gradient magnitudes.The incorporation of semantic constraints for indoor self-supervised depth estimation improves depth discrepancies at object boundaries, leveraging a co-optimization network and proxy semantic labels derived from a pretrained segmentation model. Experimental studies on three indoor datasets, including NYUv2, ScanNet, and InteriorNet, show that GAM-Depth outperforms existing methods and achieves state-of-the-art performance, signifying a meaningful step forward in indoor depth estimation. Our code will be available at https://github.com/AnqiCheng1234/GAM-Depth.
△ Less
Submitted 22 February, 2024;
originally announced February 2024.
-
Wikibench: Community-Driven Data Curation for AI Evaluation on Wikipedia
Authors:
Tzu-Sheng Kuo,
Aaron Halfaker,
Zirui Cheng,
Jiwoo Kim,
Meng-Hsin Wu,
Tongshuang Wu,
Kenneth Holstein,
Haiyi Zhu
Abstract:
AI tools are increasingly deployed in community contexts. However, datasets used to evaluate AI are typically created by developers and annotators outside a given community, which can yield misleading conclusions about AI performance. How might we empower communities to drive the intentional design and curation of evaluation datasets for AI that impacts them? We investigate this question on Wikipe…
▽ More
AI tools are increasingly deployed in community contexts. However, datasets used to evaluate AI are typically created by developers and annotators outside a given community, which can yield misleading conclusions about AI performance. How might we empower communities to drive the intentional design and curation of evaluation datasets for AI that impacts them? We investigate this question on Wikipedia, an online community with multiple AI-based content moderation tools deployed. We introduce Wikibench, a system that enables communities to collaboratively curate AI evaluation datasets, while navigating ambiguities and differences in perspective through discussion. A field study on Wikipedia shows that datasets curated using Wikibench can effectively capture community consensus, disagreement, and uncertainty. Furthermore, study participants used Wikibench to shape the overall data curation process, including refining label definitions, determining data inclusion criteria, and authoring data statements. Based on our findings, we propose future directions for systems that support community-driven data curation.
△ Less
Submitted 21 February, 2024;
originally announced February 2024.
-
Spectral Efficiency Maximization for Active RIS-aided Cell-Free Massive MIMO Systems with Imperfect CSI
Authors:
Mahdi Eskandari,
Huiling Zhu,
Jiangzhou Wang
Abstract:
A cell-free network merged with active reconfigurable reflecting surfaces (RIS) is investigated in this paper. Based on the imperfect channel state information (CSI), the aggregated channel from the user to the access point (AP) is initially estimated using the linear minimum mean square error (LMMSE) technique. The central processing unit (CPU) then detects uplink data from individual users throu…
▽ More
A cell-free network merged with active reconfigurable reflecting surfaces (RIS) is investigated in this paper. Based on the imperfect channel state information (CSI), the aggregated channel from the user to the access point (AP) is initially estimated using the linear minimum mean square error (LMMSE) technique. The central processing unit (CPU) then detects uplink data from individual users through the utilization of the maximum ratio combining (MRC) approach, relying on the estimated channel. Then, a closed-form expression for uplink spectral efficiency (SE) is derived which demonstrates its reliance on statistical CSI (S-CSI) alone. The amplitude gain of each active RIS element is derived in a closed-form expression as a function of the number of active RIS elements, the number of users, and the size of each reflecting element. A soft actor-critic (SAC) algorithm is utilized to design the phase shift of the active RIS to maximize the uplink SE. Simulation results emphasize the robustness of the proposed SAC algorithm, showcasing its effectiveness in cell-free networks under the influence of imperfect CSI.
△ Less
Submitted 11 February, 2024;
originally announced February 2024.
-
Integrating Active Learning in Causal Inference with Interference: A Novel Approach in Online Experiments
Authors:
Hongtao Zhu,
Sizhe Zhang,
Yang Su,
Zhenyu Zhao,
Nan Chen
Abstract:
In the domain of causal inference research, the prevalent potential outcomes framework, notably the Rubin Causal Model (RCM), often overlooks individual interference and assumes independent treatment effects. This assumption, however, is frequently misaligned with the intricate realities of real-world scenarios, where interference is not merely a possibility but a common occurrence. Our research e…
▽ More
In the domain of causal inference research, the prevalent potential outcomes framework, notably the Rubin Causal Model (RCM), often overlooks individual interference and assumes independent treatment effects. This assumption, however, is frequently misaligned with the intricate realities of real-world scenarios, where interference is not merely a possibility but a common occurrence. Our research endeavors to address this discrepancy by focusing on the estimation of direct and spillover treatment effects under two assumptions: (1) network-based interference, where treatments on neighbors within connected networks affect one's outcomes, and (2) non-random treatment assignments influenced by confounders. To improve the efficiency of estimating potentially complex effects functions, we introduce an novel active learning approach: Active Learning in Causal Inference with Interference (ACI). This approach uses Gaussian process to flexibly model the direct and spillover treatment effects as a function of a continuous measure of neighbors' treatment assignment. The ACI framework sequentially identifies the experimental settings that demand further data. It further optimizes the treatment assignments under the network interference structure using genetic algorithms to achieve efficient learning outcome. By applying our method to simulation data and a Tencent game dataset, we demonstrate its feasibility in achieving accurate effects estimations with reduced data requirements. This ACI approach marks a significant advancement in the realm of data efficiency for causal inference, offering a robust and efficient alternative to traditional methodologies, particularly in scenarios characterized by complex interference patterns.
△ Less
Submitted 19 February, 2024;
originally announced February 2024.
-
Phonon-lithium ion interactions: A case study of LiM(SeO3)2
Authors:
Runxin Ouyang,
Yu Yang,
Chaohong Guan,
Hong Zhu
Abstract:
Li ion diffusion is fundamentally a thermally activated ion hop** process. Recently, soft lattice, anharmonic phonon and paddlewheel mechanism have been proposed to potentially benefit the ion transport, while the understanding of vibrational couplings of mobile ion and anions is still limited but essential. Herein, we access the ionic conductivity, the stability and the lattice dynamics in LiM(…
▽ More
Li ion diffusion is fundamentally a thermally activated ion hop** process. Recently, soft lattice, anharmonic phonon and paddlewheel mechanism have been proposed to potentially benefit the ion transport, while the understanding of vibrational couplings of mobile ion and anions is still limited but essential. Herein, we access the ionic conductivity, the stability and the lattice dynamics in LiM(SeO3)2 (M =Al, Ga, In, Sc, Y, and La) with two types of oxygen anions within LiO4 polyhedron, namely edge-shared and corner-shared, the prototype of which, LiGa(SeO3)2, has been experimentally synthesized. We studied in detail the anharmonic and harmonic phonon interactions, as well as couplings between vibrations of edge-bonded or corner-bonded anions in Li polyanions and Li ion diffusion. As M changing from Sc to La, anharmonic phonons increase alongside reduced activation energy for Li diffusion. Phonon modes involving edge-bonded oxygen anions contribute more to Li migration than corner-bonded oxygen anions, owing to greater atomic interactions between Li ions and edge-bonded anions. Thus, rather than the overall lattice softness, attentions shall be paid to reduce the frequency of the critical phonons contributing to Li ion diffusions as well as to increase the anharmonicity, for the design of Li ion superionic conductors for all-solid-state-batteries.
△ Less
Submitted 19 February, 2024;
originally announced February 2024.
-
Human Video Translation via Query War**
Authors:
Haiming Zhu,
Yangyang Xu,
Shengfeng He
Abstract:
In this paper, we present QueryWarp, a novel framework for temporally coherent human motion video translation. Existing diffusion-based video editing approaches that rely solely on key and value tokens to ensure temporal consistency, which scarifies the preservation of local and structural regions. In contrast, we aim to consider complementary query priors by constructing the temporal correlations…
▽ More
In this paper, we present QueryWarp, a novel framework for temporally coherent human motion video translation. Existing diffusion-based video editing approaches that rely solely on key and value tokens to ensure temporal consistency, which scarifies the preservation of local and structural regions. In contrast, we aim to consider complementary query priors by constructing the temporal correlations among query tokens from different frames. Initially, we extract appearance flows from source poses to capture continuous human foreground motion. Subsequently, during the denoising process of the diffusion model, we employ appearance flows to warp the previous frame's query token, aligning it with the current frame's query. This query war** imposes explicit constraints on the outputs of self-attention layers, effectively guaranteeing temporally coherent translation. We perform experiments on various human motion video translation tasks, and the results demonstrate that our QueryWarp framework surpasses state-of-the-art methods both qualitatively and quantitatively.
△ Less
Submitted 21 May, 2024; v1 submitted 19 February, 2024;
originally announced February 2024.
-
Power Optimization for Integrated Active and Passive Sensing in DFRC Systems
Authors:
Xingliang Lou,
Wenchao Xia,
Kai-Kit Wong,
Haitao Zhao,
Tony Q. S. Quek,
Hongbo Zhu
Abstract:
Most existing works on dual-function radar-communication (DFRC) systems mainly focus on active sensing, but ignore passive sensing. To leverage multi-static sensing capability, we explore integrated active and passive sensing (IAPS) in DFRC systems to remedy sensing performance. The multi-antenna base station (BS) is responsible for communication and active sensing by transmitting signals to user…
▽ More
Most existing works on dual-function radar-communication (DFRC) systems mainly focus on active sensing, but ignore passive sensing. To leverage multi-static sensing capability, we explore integrated active and passive sensing (IAPS) in DFRC systems to remedy sensing performance. The multi-antenna base station (BS) is responsible for communication and active sensing by transmitting signals to user equipments while detecting a target according to echo signals. In contrast, passive sensing is performed at the receive access points (RAPs). We consider both the cases where the capacity of the backhaul links between the RAPs and BS is unlimited or limited and adopt different fusion strategies. Specifically, when the backhaul capacity is unlimited, the BS and RAPs transfer sensing signals they have received to the central controller (CC) for signal fusion. The CC processes the signals and leverages the generalized likelihood ratio test detector to determine the present of a target. However, when the backhaul capacity is limited, each RAP, as well as the BS, makes decisions independently and sends its binary inference results to the CC for result fusion via voting aggregation. Then, aiming at maximize the target detection probability under communication quality of service constraints, two power optimization algorithms are proposed. Finally, numerical simulations demonstrate that the sensing performance in case of unlimited backhaul capacity is much better than that in case of limited backhaul capacity. Moreover, it implied that the proposed IAPS scheme outperforms only-passive and only-active sensing schemes, especially in unlimited capacity case.
△ Less
Submitted 17 February, 2024;
originally announced February 2024.
-
Search for the production of deuterons and antideuterons in e^+e^- annihilation at center-of-mass energies between 4.13 and 4.70 GeV
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
R. Aliberti,
A. Amoroso,
M. R. An,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann
, et al. (593 additional authors not shown)
Abstract:
Using a data sample of $e^+e^-$ collision data corresponding to an integrated luminosity of 19 fb$^{-1}$ collected with the BESIII detector at the BEPCII collider, we search for the production of deuterons and antideuterons via $e^+e^-\to ppπ^-\bar{d}+c.c.$ for the first time at center-of-mass energies between 4.13 and 4.70 GeV. No significant signal is observed and the upper limit of the…
▽ More
Using a data sample of $e^+e^-$ collision data corresponding to an integrated luminosity of 19 fb$^{-1}$ collected with the BESIII detector at the BEPCII collider, we search for the production of deuterons and antideuterons via $e^+e^-\to ppπ^-\bar{d}+c.c.$ for the first time at center-of-mass energies between 4.13 and 4.70 GeV. No significant signal is observed and the upper limit of the $e^+e^-\to ppπ^-\bar{d}+c.c.$ cross section is determined to be from 9.0 to 145 fb depending on the center-of-mass energy at the $90\%$ confidence level.
△ Less
Submitted 17 February, 2024;
originally announced February 2024.
-
KG-Agent: An Efficient Autonomous Agent Framework for Complex Reasoning over Knowledge Graph
Authors:
**hao Jiang,
Kun Zhou,
Wayne Xin Zhao,
Yang Song,
Chen Zhu,
Hengshu Zhu,
Ji-Rong Wen
Abstract:
In this paper, we aim to improve the reasoning ability of large language models (LLMs) over knowledge graphs (KGs) to answer complex questions. Inspired by existing methods that design the interaction strategy between LLMs and KG, we propose an autonomous LLM-based agent framework, called KG-Agent, which enables a small LLM to actively make decisions until finishing the reasoning process over KGs.…
▽ More
In this paper, we aim to improve the reasoning ability of large language models (LLMs) over knowledge graphs (KGs) to answer complex questions. Inspired by existing methods that design the interaction strategy between LLMs and KG, we propose an autonomous LLM-based agent framework, called KG-Agent, which enables a small LLM to actively make decisions until finishing the reasoning process over KGs. In KG-Agent, we integrate the LLM, multifunctional toolbox, KG-based executor, and knowledge memory, and develop an iteration mechanism that autonomously selects the tool then updates the memory for reasoning over KG. To guarantee the effectiveness, we leverage program language to formulate the multi-hop reasoning process over the KG, and synthesize a code-based instruction dataset to fine-tune the base LLM. Extensive experiments demonstrate that only using 10K samples for tuning LLaMA-7B can outperform state-of-the-art methods using larger LLMs or more data, on both in-domain and out-domain datasets. Our code and data will be publicly released.
△ Less
Submitted 16 February, 2024;
originally announced February 2024.
-
The Borel complexity of the class of models of first-order theories
Authors:
Uri Andrews,
David Gonzalez,
Steffen Lempp,
Dino Rossegger,
Hongyu Zhu
Abstract:
We investigate the descriptive complexity of the set of models of first-order theories. Using classical results of Knight and Solovay, we give a sharp condition for complete theories to have a $\boldsymbolΠ_ω^0$-complete set of models. We also give sharp conditions for theories to have a $\boldsymbolΠ^0_n$-complete set of models. Finally, we determine the Turing degrees needed to witness the compl…
▽ More
We investigate the descriptive complexity of the set of models of first-order theories. Using classical results of Knight and Solovay, we give a sharp condition for complete theories to have a $\boldsymbolΠ_ω^0$-complete set of models. We also give sharp conditions for theories to have a $\boldsymbolΠ^0_n$-complete set of models. Finally, we determine the Turing degrees needed to witness the completeness.
△ Less
Submitted 15 February, 2024;
originally announced February 2024.
-
Prompt-based Personalized Federated Learning for Medical Visual Question Answering
Authors:
He Zhu,
Ren Togo,
Takahiro Ogawa,
Miki Haseyama
Abstract:
We present a novel prompt-based personalized federated learning (pFL) method to address data heterogeneity and privacy concerns in traditional medical visual question answering (VQA) methods. Specifically, we regard medical datasets from different organs as clients and use pFL to train personalized transformer-based VQA models for each client. To address the high computational complexity of client…
▽ More
We present a novel prompt-based personalized federated learning (pFL) method to address data heterogeneity and privacy concerns in traditional medical visual question answering (VQA) methods. Specifically, we regard medical datasets from different organs as clients and use pFL to train personalized transformer-based VQA models for each client. To address the high computational complexity of client-to-client communication in previous pFL methods, we propose a succinct information sharing system by introducing prompts that are small learnable parameters. In addition, the proposed method introduces a reliability parameter to prevent the negative effects of low performance and irrelevant clients. Finally, extensive evaluations on various heterogeneous medical datasets attest to the effectiveness of our proposed method.
△ Less
Submitted 14 February, 2024;
originally announced February 2024.
-
Bidirectional Generative Pre-training for Improving Time Series Representation Learning
Authors:
Ziyang Song,
Qincheng Lu,
He Zhu,
Yue Li
Abstract:
Learning time-series representations for discriminative tasks has been a long-standing challenge. Current pre-training methods are limited in either unidirectional next-token prediction or randomly masked token prediction. We propose a novel architecture called Bidirectional Timely Generative Pre-trained Transformer (BiTimelyGPT), which pre-trains on time-series data by both next-token and previou…
▽ More
Learning time-series representations for discriminative tasks has been a long-standing challenge. Current pre-training methods are limited in either unidirectional next-token prediction or randomly masked token prediction. We propose a novel architecture called Bidirectional Timely Generative Pre-trained Transformer (BiTimelyGPT), which pre-trains on time-series data by both next-token and previous-token predictions in alternating transformer layers. This pre-training task preserves original distribution and data shapes of the time-series. Additionally, the full-rank forward and backward attention matrices exhibit more expressive representation capabilities. Using biosignal data, BiTimelyGPT demonstrates superior performance in predicting neurological functionality, disease diagnosis, and physiological signs. By visualizing the attention heatmap, we observe that the pre-trained BiTimelyGPT can identify discriminative segments from time-series sequences, even more so after fine-tuning on the task.
△ Less
Submitted 14 February, 2024;
originally announced February 2024.
-
DisGNet: A Distance Graph Neural Network for Forward Kinematics Learning of Gough-Stewart Platform
Authors:
Huizhi Zhu,
Wenxia Xu,
Jian Huang,
Jiaxin Li
Abstract:
In this paper, we propose a graph neural network, DisGNet, for learning the graph distance matrix to address the forward kinematics problem of the Gough-Stewart platform. DisGNet employs the k-FWL algorithm for message-passing, providing high expressiveness with a small parameter count, making it suitable for practical deployment. Additionally, we introduce the GPU-friendly Newton-Raphson method,…
▽ More
In this paper, we propose a graph neural network, DisGNet, for learning the graph distance matrix to address the forward kinematics problem of the Gough-Stewart platform. DisGNet employs the k-FWL algorithm for message-passing, providing high expressiveness with a small parameter count, making it suitable for practical deployment. Additionally, we introduce the GPU-friendly Newton-Raphson method, an efficient parallelized optimization method executed on the GPU to refine DisGNet's output poses, achieving ultra-high-precision pose. This novel two-stage approach delivers ultra-high precision output while meeting real-time requirements. Our results indicate that on our dataset, DisGNet can achieves error accuracys below 1mm and 1deg at 79.8\% and 98.2\%, respectively. As executed on a GPU, our two-stage method can ensure the requirement for real-time computation. Codes are released at https://github.com/FLAMEZZ5201/DisGNet.
△ Less
Submitted 14 February, 2024;
originally announced February 2024.