-
Vox-Fusion++: Voxel-based Neural Implicit Dense Tracking and Map** with Multi-maps
Authors:
Hongjia Zhai,
Hai Li,
Xingrui Yang,
Gan Huang,
Yuhang Ming,
Hujun Bao,
Guofeng Zhang
Abstract:
In this paper, we introduce Vox-Fusion++, a multi-maps-based robust dense tracking and map** system that seamlessly fuses neural implicit representations with traditional volumetric fusion techniques. Building upon the concept of implicit map** and positioning systems, our approach extends its applicability to real-world scenarios. Our system employs a voxel-based neural implicit surface repre…
▽ More
In this paper, we introduce Vox-Fusion++, a multi-maps-based robust dense tracking and map** system that seamlessly fuses neural implicit representations with traditional volumetric fusion techniques. Building upon the concept of implicit map** and positioning systems, our approach extends its applicability to real-world scenarios. Our system employs a voxel-based neural implicit surface representation, enabling efficient encoding and optimization of the scene within each voxel. To handle diverse environments without prior knowledge, we incorporate an octree-based structure for scene division and dynamic expansion. To achieve real-time performance, we propose a high-performance multi-process framework. This ensures the system's suitability for applications with stringent time constraints. Additionally, we adopt the idea of multi-maps to handle large-scale scenes, and leverage loop detection and hierarchical pose optimization strategies to reduce long-term pose drift and remove duplicate geometry. Through comprehensive evaluations, we demonstrate that our method outperforms previous methods in terms of reconstruction quality and accuracy across various scenarios. We also show that our Vox-Fusion++ can be used in augmented reality and collaborative map** applications. Our source code will be publicly available at \url{https://github.com/zju3dv/Vox-Fusion_Plus_Plus}
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
End-to-End Neuro-Symbolic Reinforcement Learning with Textual Explanations
Authors:
Lirui Luo,
Guoxi Zhang,
Hongming Xu,
Yaodong Yang,
Cong Fang,
Qing Li
Abstract:
Neuro-symbolic reinforcement learning (NS-RL) has emerged as a promising paradigm for explainable decision-making, characterized by the interpretability of symbolic policies. NS-RL entails structured state representations for tasks with visual observations, but previous methods cannot refine the structured states with rewards due to a lack of efficiency. Accessibility also remains an issue, as ext…
▽ More
Neuro-symbolic reinforcement learning (NS-RL) has emerged as a promising paradigm for explainable decision-making, characterized by the interpretability of symbolic policies. NS-RL entails structured state representations for tasks with visual observations, but previous methods cannot refine the structured states with rewards due to a lack of efficiency. Accessibility also remains an issue, as extensive domain knowledge is required to interpret symbolic policies. In this paper, we present a neuro-symbolic framework for jointly learning structured states and symbolic policies, whose key idea is to distill the vision foundation model into an efficient perception module and refine it during policy learning. Moreover, we design a pipeline to prompt GPT-4 to generate textual explanations for the learned policies and decisions, significantly reducing users' cognitive load to understand the symbolic policies. We verify the efficacy of our approach on nine Atari tasks and present GPT-generated explanations for policies and decisions.
△ Less
Submitted 13 June, 2024; v1 submitted 19 March, 2024;
originally announced March 2024.
-
$Herschel$ investigation of cores and filamentary structures in L1251 located in the Cepheus flare
Authors:
Divyansh Dewan,
Archana Soam,
Guo-Yin Zhang,
Akhil Lasrado,
Saikhom Pravash Singh,
Chang Won Lee
Abstract:
Context: Molecular clouds are the prime locations of star formation. These clouds contain filamentary structures and cores which are crucial in the formation of young stars. Aims: In this work, we aim to quantify the physical properties of structural characteristics within the molecular cloud L1251 to better understand the initial conditions for star formation. Methods: We applied the getsf algori…
▽ More
Context: Molecular clouds are the prime locations of star formation. These clouds contain filamentary structures and cores which are crucial in the formation of young stars. Aims: In this work, we aim to quantify the physical properties of structural characteristics within the molecular cloud L1251 to better understand the initial conditions for star formation. Methods: We applied the getsf algorithm to identify cores and filaments within the molecular cloud L1251 using the Herschel multiband dust continuum image, enabling us to measure their respective physical properties. Additionally, we utilized an enhanced differential term algorithm to produce high-resolution temperature maps and column density maps with a resolution of ${13.5}''$. Results: We identified 122 cores in the region. Out of them, 23 are protostellar cores, 13 are robust prestellar cores, 32 are candidate prestellar cores (including 13 robust prestellar cores and 19 strictly candidate prestellar cores), and 67 are unbound starless cores. getsf also found 147 filament structures in the region. Statistical analysis of the physical properties (mass (M), temperature (T), size, and core brightness (hereafter, we are using the word luminosity (L)) for the core brightness) of obtained cores shows a negative correlation between core mass and temperature and a positive correlation between (M/L) and (M/T). Analysis of the filaments gives a median width of 0.14 pc and no correlation between width and length. Out of those 122 cores, 92 are present in filaments (75.4%) and the remaining were outside them. Out of the cores present in filaments, 57 (62%) cores are present in supercritical filaments ($M_{\rm line}>16M_{\odot }/{\rm pc}$).
△ Less
Submitted 16 March, 2024;
originally announced March 2024.
-
Test of lepton universality and measurement of the form factors of $D^0\to K^{*}(892)^-μ^+ν_μ$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (637 additional authors not shown)
Abstract:
We report a first study of the semileptonic decay $D^0\rightarrow K^-π^0μ^{+}ν_μ$ by analyzing an $e^+e^-$ annihilation data sample of $7.9~\mathrm{fb}^{-1}$ collected at the center-of-mass energy of 3.773 GeV with the BESIII detector. The absolute branching fraction of $D^0\to K^-π^0μ^{+}ν_μ$ is measured for the first time to be $(0.729 \pm 0.014_{\rm stat} \pm 0.011_{\rm syst})\%$. Based on an a…
▽ More
We report a first study of the semileptonic decay $D^0\rightarrow K^-π^0μ^{+}ν_μ$ by analyzing an $e^+e^-$ annihilation data sample of $7.9~\mathrm{fb}^{-1}$ collected at the center-of-mass energy of 3.773 GeV with the BESIII detector. The absolute branching fraction of $D^0\to K^-π^0μ^{+}ν_μ$ is measured for the first time to be $(0.729 \pm 0.014_{\rm stat} \pm 0.011_{\rm syst})\%$. Based on an amplitude analysis, the $S\text{-}{\rm wave}$ contribution is determined to be $(5.76 \pm 0.35_{\rm stat} \pm 0.29_{\rm syst})\%$ of the total decay rate in addition to the dominated $K^{*}(892)^-$ component. The branching fraction of $D^0\to K^{*}(892)^-μ^+ν_μ$ is given to be $(2.062 \pm 0.039_{\rm stat} \pm 0.032_{\rm syst})\%$, which improves the precision of the world average by a factor of 5. Combining with the world average of ${\mathcal B}(D^0\to K^{*}(892)^-e^+ν_e)$, the ratio of the branching fractions obtained is $\frac{{\mathcal B}(D^0\to K^{*}(892)^-μ^+ν_μ)}{{\mathcal B}(D^0\to K^{*}(892)^-e^+ν_e)} = 0.96\pm0.08$, in agreement with lepton flavor universality. Furthermore, assuming single-pole dominance parameterization, the most precise hadronic form factor ratios for $D^0\to K^{*}(892)^{-} μ^+ν_μ$ are extracted to be $r_{V}=V(0)/A_1(0)=1.37 \pm 0.09_{\rm stat} \pm 0.03_{\rm syst}$ and $r_{2}=A_2(0)/A_1(0)=0.76 \pm 0.06_{\rm stat} \pm 0.02_{\rm syst}$.
△ Less
Submitted 16 March, 2024;
originally announced March 2024.
-
Online Policy Learning from Offline Preferences
Authors:
Guoxi Zhang,
Han Bao,
Hisashi Kashima
Abstract:
In preference-based reinforcement learning (PbRL), a reward function is learned from a type of human feedback called preference. To expedite preference collection, recent works have leveraged \emph{offline preferences}, which are preferences collected for some offline data. In this scenario, the learned reward function is fitted on the offline data. If a learning agent exhibits behaviors that do n…
▽ More
In preference-based reinforcement learning (PbRL), a reward function is learned from a type of human feedback called preference. To expedite preference collection, recent works have leveraged \emph{offline preferences}, which are preferences collected for some offline data. In this scenario, the learned reward function is fitted on the offline data. If a learning agent exhibits behaviors that do not overlap with the offline data, the learned reward function may encounter generalizability issues. To address this problem, the present study introduces a framework that consolidates offline preferences and \emph{virtual preferences} for PbRL, which are comparisons between the agent's behaviors and the offline data. Critically, the reward function can track the agent's behaviors using the virtual preferences, thereby offering well-aligned guidance to the agent. Through experiments on continuous control tasks, this study demonstrates the effectiveness of incorporating the virtual preferences in PbRL.
△ Less
Submitted 15 March, 2024;
originally announced March 2024.
-
Rethinking Low-quality Optical Flow in Unsupervised Surgical Instrument Segmentation
Authors:
Peiran Wu,
Yang Liu,
Jiayu Huo,
Gongyu Zhang,
Christos Bergeles,
Rachel Sparks,
Prokar Dasgupta,
Alejandro Granados,
Sebastien Ourselin
Abstract:
Video-based surgical instrument segmentation plays an important role in robot-assisted surgeries. Unlike supervised settings, unsupervised segmentation relies heavily on motion cues, which are challenging to discern due to the typically lower quality of optical flow in surgical footage compared to natural scenes. This presents a considerable burden for the advancement of unsupervised segmentation…
▽ More
Video-based surgical instrument segmentation plays an important role in robot-assisted surgeries. Unlike supervised settings, unsupervised segmentation relies heavily on motion cues, which are challenging to discern due to the typically lower quality of optical flow in surgical footage compared to natural scenes. This presents a considerable burden for the advancement of unsupervised segmentation techniques. In our work, we address the challenge of enhancing model performance despite the inherent limitations of low-quality optical flow. Our methodology employs a three-pronged approach: extracting boundaries directly from the optical flow, selectively discarding frames with inferior flow quality, and employing a fine-tuning process with variable frame rates. We thoroughly evaluate our strategy on the EndoVis2017 VOS dataset and Endovis2017 Challenge dataset, where our model demonstrates promising results, achieving a mean Intersection-over-Union (mIoU) of 0.75 and 0.72, respectively. Our findings suggest that our approach can greatly decrease the need for manual annotations in clinical environments and may facilitate the annotation process for new datasets. The code is available at https://github.com/wpr1018001/Rethinking-Low-quality-Optical-Flow.git
△ Less
Submitted 15 March, 2024;
originally announced March 2024.
-
DIFFTACTILE: A Physics-based Differentiable Tactile Simulator for Contact-rich Robotic Manipulation
Authors:
Zilin Si,
Gu Zhang,
Qingwei Ben,
Branden Romero,
Zhou Xian,
Chao Liu,
Chuang Gan
Abstract:
We introduce DIFFTACTILE, a physics-based differentiable tactile simulation system designed to enhance robotic manipulation with dense and physically accurate tactile feedback. In contrast to prior tactile simulators which primarily focus on manipulating rigid bodies and often rely on simplified approximations to model stress and deformations of materials in contact, DIFFTACTILE emphasizes physics…
▽ More
We introduce DIFFTACTILE, a physics-based differentiable tactile simulation system designed to enhance robotic manipulation with dense and physically accurate tactile feedback. In contrast to prior tactile simulators which primarily focus on manipulating rigid bodies and often rely on simplified approximations to model stress and deformations of materials in contact, DIFFTACTILE emphasizes physics-based contact modeling with high fidelity, supporting simulations of diverse contact modes and interactions with objects possessing a wide range of material properties. Our system incorporates several key components, including a Finite Element Method (FEM)-based soft body model for simulating the sensing elastomer, a multi-material simulator for modeling diverse object types (such as elastic, elastoplastic, cables) under manipulation, a penalty-based contact model for handling contact dynamics. The differentiable nature of our system facilitates gradient-based optimization for both 1) refining physical properties in simulation using real-world data, hence narrowing the sim-to-real gap and 2) efficient learning of tactile-assisted gras** and contact-rich manipulation skills. Additionally, we introduce a method to infer the optical response of our tactile sensor to contact using an efficient pixel-based neural module. We anticipate that DIFFTACTILE will serve as a useful platform for studying contact-rich manipulations, leveraging the benefits of dense tactile feedback and differentiable physics. Code and supplementary materials are available at the project website https://difftactile.github.io/.
△ Less
Submitted 13 March, 2024;
originally announced March 2024.
-
Online Digital Twin-Empowered Content Resale Mechanism in Age of Information-Aware Edge Caching Networks
Authors:
Yuhan Yi,
Guanglin Zhang,
Hai Jiang
Abstract:
For users requesting popular contents from content providers, edge caching can alleviate backhaul pressure and enhance the quality of experience of users. Recently there is also a growing concern about content freshness that is quantified by age of information (AoI). Therefore, AoI-aware online caching algorithms are required, which is challenging because the content popularity is usually unknown…
▽ More
For users requesting popular contents from content providers, edge caching can alleviate backhaul pressure and enhance the quality of experience of users. Recently there is also a growing concern about content freshness that is quantified by age of information (AoI). Therefore, AoI-aware online caching algorithms are required, which is challenging because the content popularity is usually unknown in advance and may vary over time. In this paper, we propose an online digital twin (DT) empowered content resale mechanism in AoI-aware edge caching networks. We aim to design an optimal two-timescale caching strategy to maximize the utility of an edge network service provider (ENSP). The formulated optimization problem is non-convex and NP-hard. To tackle this intractable problem, we propose a DT-assisted Online Caching Algorithm (DT-OCA). In specific, we first decompose our formulated problem into a series of subproblems, each handling a cache period. For each cache period, we use a DT-based prediction method to effectively capture future content popularity, and develop online caching strategy. Competitive ratio analysis and extensive experimental results demonstrate that our algorithm has promising performance, and outperforms other benchmark algorithms. Insightful observations are also found and discussed.
△ Less
Submitted 12 March, 2024;
originally announced March 2024.
-
Propensity-score matching analysis in COVID-19-related studies: a method and quality systematic review
Authors:
Chunhui Gu,
Ruosha Li,
Guoqiang Zhang
Abstract:
Objectives: To provide an overall quality assessment of the methods used for COVID-19-related studies using propensity score matching (PSM).
Study Design and Setting: A systematic search was conducted in June 2021 on PubMed to identify COVID-19-related studies that use the PSM analysis between 2020 and 2021. Key information about study design and PSM analysis were extracted, such as covariates,…
▽ More
Objectives: To provide an overall quality assessment of the methods used for COVID-19-related studies using propensity score matching (PSM).
Study Design and Setting: A systematic search was conducted in June 2021 on PubMed to identify COVID-19-related studies that use the PSM analysis between 2020 and 2021. Key information about study design and PSM analysis were extracted, such as covariates, matching algorithm, and reporting of estimated treatment effect type.
Results: One-hundred-and-fifty (87.72%) cohort studies and thirteen (7.60%) case-control studies were found among 171 identified articles. Forty-five studies (26.32%) provided a reasonable justification for covariates selection. One-hundred-and-three (60.23%) and Sixty-nine (40.35%) studies did not provide the model that was used for calculating the propensity score or did not report the matching algorithm, respectively. Seventy-three (42.69%) studies reported the method(s) for checking covariates balance. Forty studies (23.39%) had a statistician co-author. All the case-control studies (n=13) did not have a statistician co-author (p=0.006) and all studies that clarified the treatment effect estimation (n=6) had a statistician co-author (p<0.001).
Conclusions: The reporting quality of the PSM analysis is suboptimal in some COVID-19 epidemiological studies. Some pitfalls may undermine study findings that involve PSM analysis, such as a mismatch between PSM analysis and study design.
△ Less
Submitted 10 March, 2024;
originally announced March 2024.
-
Determination of the number of $ψ(3686)$ events taken at BESIII
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (639 additional authors not shown)
Abstract:
The number of $ψ(3686)$ events collected by the BESIII detector during the 2021 run period is determined to be $(2259.3\pm 11.1)\times 10^6$ by counting inclusive $ψ(3686)$ hadronic events. The uncertainty is systematic and the statistical uncertainty is negligible. Meanwhile, the numbers of $ψ(3686)$ events collected during the 2009 and 2012 run periods are updated to be…
▽ More
The number of $ψ(3686)$ events collected by the BESIII detector during the 2021 run period is determined to be $(2259.3\pm 11.1)\times 10^6$ by counting inclusive $ψ(3686)$ hadronic events. The uncertainty is systematic and the statistical uncertainty is negligible. Meanwhile, the numbers of $ψ(3686)$ events collected during the 2009 and 2012 run periods are updated to be $(107.7\pm0.6)\times 10^6$ and $(345.4\pm 2.6)\times 10^6$, respectively. Both numbers are consistent with the previous measurements within one standard deviation. The total number of $ψ(3686)$ events in the three data samples is $(2712.4\pm14.3)\times10^6$.
△ Less
Submitted 28 May, 2024; v1 submitted 11 March, 2024;
originally announced March 2024.
-
Higher-order exceptional surface in a pseudo-Hermitian superconducting circuit
Authors:
Guo-Qiang Zhang,
Wei Feng,
Yu Wang,
Chui-** Yang
Abstract:
In the last few years, much attention has been paid to exceptional surfaces (ESs) owing to various important physical phenomena and potential applications. However, high-order ESs in pseudo-Hermitian systems have not been reported until now. Here, we study the high-order ES in a pseudo-Hermitian superconducting (SC) circuit system. In our proposal, the SC circuit system is composed of three circul…
▽ More
In the last few years, much attention has been paid to exceptional surfaces (ESs) owing to various important physical phenomena and potential applications. However, high-order ESs in pseudo-Hermitian systems have not been reported until now. Here, we study the high-order ES in a pseudo-Hermitian superconducting (SC) circuit system. In our proposal, the SC circuit system is composed of three circularly coupled SC cavities, where the gain and loss are balanced. According to the eigenvalue properties of the pseudo-Hermitian Hamiltonian, we derive the general pseudo-Hermitian conditions for the ternary SC system. In the special pseudo-Hermitian case with parity-time symmetry, all third-order exceptional points (EP3s) of the SC system form a third-order exceptional line in the parameter space. Under the general pseudo-Hermitian conditions, more EP3s are found, and all EP3s are located on a surface, i.e., a third-order exceptional surface is constructed. Moreover, we also investigate the eigenvalues of the pseudo-Hermitian SC circuit around EP3s. Our work opens up a door for exploring high-order ESs and related applications in pseudo-Hermitian systems.
△ Less
Submitted 9 March, 2024;
originally announced March 2024.
-
Towards Efficient Replay in Federated Incremental Learning
Authors:
Yichen Li,
Qunwei Li,
Haozhao Wang,
Ruixuan Li,
Wenliang Zhong,
Guannan Zhang
Abstract:
In Federated Learning (FL), the data in each client is typically assumed fixed or static. However, data often comes in an incremental manner in real-world applications, where the data domain may increase dynamically. In this work, we study catastrophic forgetting with data heterogeneity in Federated Incremental Learning (FIL) scenarios where edge clients may lack enough storage space to retain ful…
▽ More
In Federated Learning (FL), the data in each client is typically assumed fixed or static. However, data often comes in an incremental manner in real-world applications, where the data domain may increase dynamically. In this work, we study catastrophic forgetting with data heterogeneity in Federated Incremental Learning (FIL) scenarios where edge clients may lack enough storage space to retain full data. We propose to employ a simple, generic framework for FIL named Re-Fed, which can coordinate each client to cache important samples for replay. More specifically, when a new task arrives, each client first caches selected previous samples based on their global and local importance. Then, the client trains the local model with both the cached samples and the samples from the new task. Theoretically, we analyze the ability of Re-Fed to discover important samples for replay thus alleviating the catastrophic forgetting problem. Moreover, we empirically show that Re-Fed achieves competitive performance compared to state-of-the-art methods.
△ Less
Submitted 3 June, 2024; v1 submitted 9 March, 2024;
originally announced March 2024.
-
SAFDNet: A Simple and Effective Network for Fully Sparse 3D Object Detection
Authors:
Gang Zhang,
Junnan Chen,
Guohuan Gao,
Jianmin Li,
Si Liu,
Xiaolin Hu
Abstract:
LiDAR-based 3D object detection plays an essential role in autonomous driving. Existing high-performing 3D object detectors usually build dense feature maps in the backbone network and prediction head. However, the computational costs introduced by the dense feature maps grow quadratically as the perception range increases, making these models hard to scale up to long-range detection. Some recent…
▽ More
LiDAR-based 3D object detection plays an essential role in autonomous driving. Existing high-performing 3D object detectors usually build dense feature maps in the backbone network and prediction head. However, the computational costs introduced by the dense feature maps grow quadratically as the perception range increases, making these models hard to scale up to long-range detection. Some recent works have attempted to construct fully sparse detectors to solve this issue; nevertheless, the resulting models either rely on a complex multi-stage pipeline or exhibit inferior performance. In this work, we propose SAFDNet, a straightforward yet highly effective architecture, tailored for fully sparse 3D object detection. In SAFDNet, an adaptive feature diffusion strategy is designed to address the center feature missing problem. We conducted extensive experiments on Waymo Open, nuScenes, and Argoverse2 datasets. SAFDNet performed slightly better than the previous SOTA on the first two datasets but much better on the last dataset, which features long-range detection, verifying the efficacy of SAFDNet in scenarios where long-range detection is required. Notably, on Argoverse2, SAFDNet surpassed the previous best hybrid detector HEDNet by 2.6% mAP while being 2.1x faster, and yielded 2.1% mAP gains over the previous best sparse detector FSDv2 while being 1.3x faster. The code will be available at https://github.com/zhanggang001/HEDNet.
△ Less
Submitted 22 April, 2024; v1 submitted 9 March, 2024;
originally announced March 2024.
-
Advances of Deep Learning in Protein Science: A Comprehensive Survey
Authors:
Bozhen Hu,
Cheng Tan,
Lirong Wu,
Jiangbin Zheng,
Jun Xia,
Zhangyang Gao,
Zicheng Liu,
Fandi Wu,
Guijun Zhang,
Stan Z. Li
Abstract:
Protein representation learning plays a crucial role in understanding the structure and function of proteins, which are essential biomolecules involved in various biological processes. In recent years, deep learning has emerged as a powerful tool for protein modeling due to its ability to learn complex patterns and representations from large-scale protein data. This comprehensive survey aims to pr…
▽ More
Protein representation learning plays a crucial role in understanding the structure and function of proteins, which are essential biomolecules involved in various biological processes. In recent years, deep learning has emerged as a powerful tool for protein modeling due to its ability to learn complex patterns and representations from large-scale protein data. This comprehensive survey aims to provide an overview of the recent advances in deep learning techniques applied to protein science. The survey begins by introducing the developments of deep learning based protein models and emphasizes the importance of protein representation learning in drug discovery, protein engineering, and function annotation. It then delves into the fundamentals of deep learning, including convolutional neural networks, recurrent neural networks, attention models, and graph neural networks in modeling protein sequences, structures, and functions, and explores how these techniques can be used to extract meaningful features and capture intricate relationships within protein data. Next, the survey presents various applications of deep learning in the field of proteins, including protein structure prediction, protein-protein interaction prediction, protein function prediction, etc. Furthermore, it highlights the challenges and limitations of these deep learning techniques and also discusses potential solutions and future directions for overcoming these challenges. This comprehensive survey provides a valuable resource for researchers and practitioners in the field of proteins who are interested in harnessing the power of deep learning techniques. By consolidating the latest advancements and discussing potential avenues for improvement, this review contributes to the ongoing progress in protein research and paves the way for future breakthroughs in the field.
△ Less
Submitted 8 March, 2024;
originally announced March 2024.
-
REPS: Reconstruction-based Point Cloud Sampling
Authors:
Guoqing Zhang,
Wenbo Zhao,
Jian Liu,
Xianming Liu
Abstract:
Sampling is widely used in various point cloud tasks as it can effectively reduce resource consumption. Recently, some methods have proposed utilizing neural networks to optimize the sampling process for various task requirements. Currently, deep downsampling methods can be categorized into two main types: generative-based and score-based. Generative-based methods directly generate sampled point c…
▽ More
Sampling is widely used in various point cloud tasks as it can effectively reduce resource consumption. Recently, some methods have proposed utilizing neural networks to optimize the sampling process for various task requirements. Currently, deep downsampling methods can be categorized into two main types: generative-based and score-based. Generative-based methods directly generate sampled point clouds using networks, whereas score-based methods assess the importance of points according to specific rules and then select sampled point clouds based on their scores. However, these methods often result in noticeable clustering effects in high-intensity feature areas, compromising their ability to preserve small-scale features and leading to the loss of some structures, thereby affecting the performance of subsequent tasks. In this paper, we propose REPS, a reconstruction-based scoring strategy that evaluates the importance of each vertex by removing and reconstructing them using surrounding vertices. Our reconstruction process comprises point reconstruction and shape reconstruction. The two aforementioned reconstruction methods effectively evaluate the importance of vertices by removing them at different scales for reconstruction. These reconstructions ensure that our method maintains the overall geometric features of the point cloud and avoids disturbing small-scale structures during sampling. Additionally, we propose the Global-Local Fusion Attention (GLFA) module, which aggregates local and global attention features of point clouds, ensuring high-quality reconstruction and sampling effects. Our method outperforms previous approaches in preserving the structural features of the sampled point clouds. Furthermore, abundant experimental results demonstrate the superior performance of our method across various common tasks.
△ Less
Submitted 7 March, 2024;
originally announced March 2024.
-
Yi: Open Foundation Models by 01.AI
Authors:
01. AI,
:,
Alex Young,
Bei Chen,
Chao Li,
Chengen Huang,
Ge Zhang,
Guanwei Zhang,
Heng Li,
Jiangcheng Zhu,
Jianqun Chen,
**g Chang,
Kaidong Yu,
Peng Liu,
Qiang Liu,
Shawn Yue,
Senbin Yang,
Shiming Yang,
Tao Yu,
Wen Xie,
Wenhao Huang,
Xiaohui Hu,
Xiaoyi Ren,
Xinyao Niu,
Pengcheng Nie
, et al. (7 additional authors not shown)
Abstract:
We introduce the Yi model family, a series of language and multimodal models that demonstrate strong multi-dimensional capabilities. The Yi model family is based on 6B and 34B pretrained language models, then we extend them to chat models, 200K long context models, depth-upscaled models, and vision-language models. Our base models achieve strong performance on a wide range of benchmarks like MMLU,…
▽ More
We introduce the Yi model family, a series of language and multimodal models that demonstrate strong multi-dimensional capabilities. The Yi model family is based on 6B and 34B pretrained language models, then we extend them to chat models, 200K long context models, depth-upscaled models, and vision-language models. Our base models achieve strong performance on a wide range of benchmarks like MMLU, and our finetuned chat models deliver strong human preference rate on major evaluation platforms like AlpacaEval and Chatbot Arena. Building upon our scalable super-computing infrastructure and the classical transformer architecture, we attribute the performance of Yi models primarily to its data quality resulting from our data-engineering efforts. For pretraining, we construct 3.1 trillion tokens of English and Chinese corpora using a cascaded data deduplication and quality filtering pipeline. For finetuning, we polish a small scale (less than 10K) instruction dataset over multiple iterations such that every single instance has been verified directly by our machine learning engineers. For vision-language, we combine the chat language model with a vision transformer encoder and train the model to align visual representations to the semantic space of the language model. We further extend the context length to 200K through lightweight continual pretraining and demonstrate strong needle-in-a-haystack retrieval performance. We show that extending the depth of the pretrained checkpoint through continual pretraining further improves performance. We believe that given our current results, continuing to scale up model parameters using thoroughly optimized data will lead to even stronger frontier models.
△ Less
Submitted 7 March, 2024;
originally announced March 2024.
-
Cluster radioactivity preformation probability of trans-lead nuclei in the scheme of NpNn
Authors:
Lin-**g Qi,
Dong-Meng Zhang,
Song Luo,
Gui-Qing Zhang,
Peng-Cheng Chu,
Xi-Jun Wu,
Xiao-Hua Li
Abstract:
In the present work, the cluster radioactivity preformation probability Pc in the scheme of NpNn for the effective number of the valence particles (holes) in trans-lead nuclei has been systematically investigated. This quantity has been explored in the simplified parametrization of NpNn as well as the multiplication NpNnI of this product with the isospin asymmetry I. The calculations for Pc are bo…
▽ More
In the present work, the cluster radioactivity preformation probability Pc in the scheme of NpNn for the effective number of the valence particles (holes) in trans-lead nuclei has been systematically investigated. This quantity has been explored in the simplified parametrization of NpNn as well as the multiplication NpNnI of this product with the isospin asymmetry I. The calculations for Pc are both performed in microscopic and model-dependent way. Within the microscopic approach, based on our previous work [Chin. Phys. C 47,014101 (2023)], Pc is calculated in cluster formation model (CFM) combined with the exponential relationship of Pc to the alpha decay preformation probability P alpha when the mass number of the emitted cluster Ac less than 28. While Ac greater than 28, Pc is obtained through the charge-number dependence of Pc on the decay products proposed by Ren et al. [Phys. Rev. C 70,034304 (2004)]. In the model-dependent approach, Pc is extracted through the ratios from calculated cluster radioactivity half-lives in the framework of unified fission model (UFM) proposed by Dong et al. [Eur. Phys. J. A 41,197 (2009)] to experimental ones. Both of the results show Pc in logarithmic form are linear to NpNn as well as NpNnI. For comparison, the parent-mass-number dependence analytical formula as well as the model proposed by K. Wei and H. F. Zhang [Phys. Rev. C 96,021601(R)(2017)] are also used. Furthermore, the preformation mechanic for cluster radioactivity has also been discussed.
△ Less
Submitted 7 March, 2024;
originally announced March 2024.
-
StableDrag: Stable Dragging for Point-based Image Editing
Authors:
Yutao Cui,
Xiaotong Zhao,
Guozhen Zhang,
Shengming Cao,
Kai Ma,
Limin Wang
Abstract:
Point-based image editing has attracted remarkable attention since the emergence of DragGAN. Recently, DragDiffusion further pushes forward the generative quality via adapting this dragging technique to diffusion models. Despite these great success, this dragging scheme exhibits two major drawbacks, namely inaccurate point tracking and incomplete motion supervision, which may result in unsatisfact…
▽ More
Point-based image editing has attracted remarkable attention since the emergence of DragGAN. Recently, DragDiffusion further pushes forward the generative quality via adapting this dragging technique to diffusion models. Despite these great success, this dragging scheme exhibits two major drawbacks, namely inaccurate point tracking and incomplete motion supervision, which may result in unsatisfactory dragging outcomes. To tackle these issues, we build a stable and precise drag-based editing framework, coined as StableDrag, by designing a discirminative point tracking method and a confidence-based latent enhancement strategy for motion supervision. The former allows us to precisely locate the updated handle points, thereby boosting the stability of long-range manipulation, while the latter is responsible for guaranteeing the optimized latent as high-quality as possible across all the manipulation steps. Thanks to these unique designs, we instantiate two types of image editing models including StableDrag-GAN and StableDrag-Diff, which attains more stable dragging performance, through extensive qualitative experiments and quantitative assessment on DragBench.
△ Less
Submitted 7 March, 2024;
originally announced March 2024.
-
DEEP-ICL: Definition-Enriched Experts for Language Model In-Context Learning
Authors:
Xingwei Qu,
Yiming Liang,
Yucheng Wang,
Tianyu Zheng,
Tommy Yue,
Lei Ma,
Stephen W. Huang,
Jiajun Zhang,
Yinan Shi,
Chenghua Lin,
Jie Fu,
Ge Zhang
Abstract:
It has long been assumed that the sheer number of parameters in large language models (LLMs) drives in-context learning (ICL) capabilities, enabling remarkable performance improvements by leveraging task-specific demonstrations. Challenging this hypothesis, we introduce DEEP-ICL, a novel task Definition Enriched ExPert Ensembling methodology for ICL. DEEP-ICL explicitly extracts task definitions f…
▽ More
It has long been assumed that the sheer number of parameters in large language models (LLMs) drives in-context learning (ICL) capabilities, enabling remarkable performance improvements by leveraging task-specific demonstrations. Challenging this hypothesis, we introduce DEEP-ICL, a novel task Definition Enriched ExPert Ensembling methodology for ICL. DEEP-ICL explicitly extracts task definitions from given demonstrations and generates responses through learning task-specific examples. We argue that improvement from ICL does not directly rely on model size, but essentially stems from understanding task definitions and task-guided learning. Inspired by this, DEEP-ICL combines two 3B models with distinct roles (one for concluding task definitions and the other for learning task demonstrations) and achieves comparable performance to LLaMA2-13B. Furthermore, our framework outperforms conventional ICL by overcoming pretraining sequence length limitations, by supporting unlimited demonstrations. We contend that DEEP-ICL presents a novel alternative for achieving efficient few-shot learning, extending beyond the conventional ICL.
△ Less
Submitted 16 June, 2024; v1 submitted 7 March, 2024;
originally announced March 2024.
-
3D Diffusion Policy: Generalizable Visuomotor Policy Learning via Simple 3D Representations
Authors:
Yanjie Ze,
Gu Zhang,
Kangning Zhang,
Chenyuan Hu,
Muhan Wang,
Huazhe Xu
Abstract:
Imitation learning provides an efficient way to teach robots dexterous skills; however, learning complex skills robustly and generalizablely usually consumes large amounts of human demonstrations. To tackle this challenging problem, we present 3D Diffusion Policy (DP3), a novel visual imitation learning approach that incorporates the power of 3D visual representations into diffusion policies, a cl…
▽ More
Imitation learning provides an efficient way to teach robots dexterous skills; however, learning complex skills robustly and generalizablely usually consumes large amounts of human demonstrations. To tackle this challenging problem, we present 3D Diffusion Policy (DP3), a novel visual imitation learning approach that incorporates the power of 3D visual representations into diffusion policies, a class of conditional action generative models. The core design of DP3 is the utilization of a compact 3D visual representation, extracted from sparse point clouds with an efficient point encoder. In our experiments involving 72 simulation tasks, DP3 successfully handles most tasks with just 10 demonstrations and surpasses baselines with a 24.2% relative improvement. In 4 real robot tasks, DP3 demonstrates precise control with a high success rate of 85%, given only 40 demonstrations of each task, and shows excellent generalization abilities in diverse aspects, including space, viewpoint, appearance, and instance. Interestingly, in real robot experiments, DP3 rarely violates safety requirements, in contrast to baseline methods which frequently do, necessitating human intervention. Our extensive evaluation highlights the critical importance of 3D representations in real-world robot learning. Videos, code, and data are available on https://3d-diffusion-policy.github.io .
△ Less
Submitted 8 June, 2024; v1 submitted 6 March, 2024;
originally announced March 2024.
-
Observation of the decay $h_{c}\to3(π^{+}π^{-})π^{0}$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (639 additional authors not shown)
Abstract:
Based on $(2712.4\pm14.1)\times10^{6}$ $ψ(3686)$ events collected with the BESIII detector, we study the decays $h_{c}\to3(π^{+}π^{-})π^{0}$, $h_{c}\to2(π^{+}π^{-})ω$, $h_{c}\to2(π^{+}π^{-})π^{0}η$, $h_{c}\to2(π^{+}π^{-})η$, and $h_{c}\to p\bar{p}$ via $ψ(3686)\toπ^{0}h_{c}$. The decay channel $h_{c}\to3(π^{+}π^{-})π^{0}$ is observed for the first time, and its branching fraction is determined to…
▽ More
Based on $(2712.4\pm14.1)\times10^{6}$ $ψ(3686)$ events collected with the BESIII detector, we study the decays $h_{c}\to3(π^{+}π^{-})π^{0}$, $h_{c}\to2(π^{+}π^{-})ω$, $h_{c}\to2(π^{+}π^{-})π^{0}η$, $h_{c}\to2(π^{+}π^{-})η$, and $h_{c}\to p\bar{p}$ via $ψ(3686)\toπ^{0}h_{c}$. The decay channel $h_{c}\to3(π^{+}π^{-})π^{0}$ is observed for the first time, and its branching fraction is determined to be $\left( {9.28\pm 1.14 \pm 0.77} \right) \times {10^{ - 3}}$, where the first uncertainty is statistical and the second is systematic. In addition, first evidence is found for the modes $h_{c} \to 2(π^{+}π^{-})π^{0}η$ and $h_{c}\to2(π^{+}π^{-})ω$ with significances of 4.8$σ$ and 4.7$σ$, and their branching fractions are determined to be $(7.55\pm1.51\pm0.77)\times10^{-3}$ and $\left( {4.00 \pm 0.86 \pm 0.35}\right) \times {10^{ - 3}}$, respectively. No significant signals of $h_c\to 2(π^+π^-)η$ and $h_{c}\to p\bar{p}$ are observed, and the upper limits of the branching fractions of these decays are determined to be $<6.19\times10^{-4}$ and $<4.40\times10^{-5}$ at the 90% confidence level, respectively.
△ Less
Submitted 6 March, 2024;
originally announced March 2024.
-
On discrete-time polynomial dynamical systems on hypergraphs
Authors:
Shaoxuan Cui,
Guofeng Zhang,
Hildeberto Jardón-Kojakhmetov,
Ming Cao
Abstract:
This paper studies the stability of discrete-time polynomial dynamical systems on hypergraphs by utilizing the Perron-Frobenius theorem for nonnegative tensors with respect to the tensors Z-eigenvalues and Z-eigenvectors. Firstly, for a multilinear polynomial system on a uniform hypergraph, we study the stability of the origin of the corresponding systems. Next, we extend our results to non-homoge…
▽ More
This paper studies the stability of discrete-time polynomial dynamical systems on hypergraphs by utilizing the Perron-Frobenius theorem for nonnegative tensors with respect to the tensors Z-eigenvalues and Z-eigenvectors. Firstly, for a multilinear polynomial system on a uniform hypergraph, we study the stability of the origin of the corresponding systems. Next, we extend our results to non-homogeneous polynomial systems on non-uniform hypergraphs. We confirm that the local stability of any discrete-time polynomial system is in general dominated by pairwise terms. Assuming that the origin is locally stable, we construct a conservative (but explicit) region of attraction from the system parameters. Finally, we validate our results via some numerical examples.
△ Less
Submitted 5 June, 2024; v1 submitted 5 March, 2024;
originally announced March 2024.
-
DynST: Dynamic Sparse Training for Resource-Constrained Spatio-Temporal Forecasting
Authors:
Hao Wu,
Haomin Wen,
Guibin Zhang,
Yutong Xia,
Kai Wang,
Yuxuan Liang,
Yu Zheng,
Kun Wang
Abstract:
The ever-increasing sensor service, though opening a precious path and providing a deluge of earth system data for deep-learning-oriented earth science, sadly introduce a daunting obstacle to their industrial level deployment. Concretely, earth science systems rely heavily on the extensive deployment of sensors, however, the data collection from sensors is constrained by complex geographical and s…
▽ More
The ever-increasing sensor service, though opening a precious path and providing a deluge of earth system data for deep-learning-oriented earth science, sadly introduce a daunting obstacle to their industrial level deployment. Concretely, earth science systems rely heavily on the extensive deployment of sensors, however, the data collection from sensors is constrained by complex geographical and social factors, making it challenging to achieve comprehensive coverage and uniform deployment. To alleviate the obstacle, traditional approaches to sensor deployment utilize specific algorithms to design and deploy sensors. These methods dynamically adjust the activation times of sensors to optimize the detection process across each sub-region. Regrettably, formulating an activation strategy generally based on historical observations and geographic characteristics, which make the methods and resultant models were neither simple nor practical. Worse still, the complex technical design may ultimately lead to a model with weak generalizability. In this paper, we introduce for the first time the concept of spatio-temporal data dynamic sparse training and are committed to adaptively, dynamically filtering important sensor distributions. To our knowledge, this is the first proposal (termed DynST) of an industry-level deployment optimization concept at the data level. However, due to the existence of the temporal dimension, pruning of spatio-temporal data may lead to conflicts at different timestamps. To achieve this goal, we employ dynamic merge technology, along with ingenious dimensional map** to mitigate potential impacts caused by the temporal aspect. During the training process, DynST utilize iterative pruning and sparse training, repeatedly identifying and dynamically removing sensor perception areas that contribute the least to future predictions.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
The bright black hole X-ray binary 4U 1543-47 during 2021 outburst. A clear state transition from super-Eddington to sub-Eddington accretion revealed by Insight-HXMT
Authors:
Pei **,
Guobao Zhang,
Yuexin Zhang,
Mariano Méndez,
**lu Qu,
David M. Russell,
Jiancheng Wang,
Shuangnan Zhang,
Yi-Jung Yang,
Shumei Jia,
Zixu Yang,
Hexin Liu
Abstract:
We present a detailed analysis of the observations with the Hard X-ray Modulation Telescope of the black hole X-ray transient 4U~1543-47 during its outburst in 2021. We find a clear state transition during the outburst decay of the source. Using previous measurements of the black-hole mass and distance to the source, the source luminosity during this transition is close to the Eddington limit. The…
▽ More
We present a detailed analysis of the observations with the Hard X-ray Modulation Telescope of the black hole X-ray transient 4U~1543-47 during its outburst in 2021. We find a clear state transition during the outburst decay of the source. Using previous measurements of the black-hole mass and distance to the source, the source luminosity during this transition is close to the Eddington limit. The light curves before and after the transition can be fitted by two exponential functions with short ($\sim 16$ days) and long ($\sim 130$ days) decay time scales, respectively. We detect strong reflection features in all observations that can be described with either the RelxillNS or Reflionx_bb reflection models, both of which have a black-body incident spectrum. In the super-Eddington state, we observe a Comptonized component characterized by a low electron temperature of approximately 2.0 keV. We suggest that this component appears exclusively within the inner radiation-pressure dominated region of the supercritical disk as a part of the intrinsic spectrum of the accretion disk itself. This feature vanishes as the source transitions into the sub-Eddington state. The emissivity index of the accretion disk in the reflection component is significantly different before and after the transition, $\sim3.0$-$5.0$ and $\sim7.0$-$9.0$ in the super- and sub-Eddington states, respectively. Based on the reflection geometry of returning disk radiation, the geometrically thicker the accretion disk, the smaller the emissivity index. Therefore, we propose that the transition is primarily driven by the change of the accretion flow from a supercritical to a thin disk configuration.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
FastOcc: Accelerating 3D Occupancy Prediction by Fusing the 2D Bird's-Eye View and Perspective View
Authors:
Jiawei Hou,
Xiaoyan Li,
Wenhao Guan,
Gang Zhang,
Di Feng,
Yuheng Du,
Xiangyang Xue,
Jian Pu
Abstract:
In autonomous driving, 3D occupancy prediction outputs voxel-wise status and semantic labels for more comprehensive understandings of 3D scenes compared with traditional perception tasks, such as 3D object detection and bird's-eye view (BEV) semantic segmentation. Recent researchers have extensively explored various aspects of this task, including view transformation techniques, ground-truth label…
▽ More
In autonomous driving, 3D occupancy prediction outputs voxel-wise status and semantic labels for more comprehensive understandings of 3D scenes compared with traditional perception tasks, such as 3D object detection and bird's-eye view (BEV) semantic segmentation. Recent researchers have extensively explored various aspects of this task, including view transformation techniques, ground-truth label generation, and elaborate network design, aiming to achieve superior performance. However, the inference speed, crucial for running on an autonomous vehicle, is neglected. To this end, a new method, dubbed FastOcc, is proposed. By carefully analyzing the network effect and latency from four parts, including the input image resolution, image backbone, view transformation, and occupancy prediction head, it is found that the occupancy prediction head holds considerable potential for accelerating the model while kee** its accuracy. Targeted at improving this component, the time-consuming 3D convolution network is replaced with a novel residual-like architecture, where features are mainly digested by a lightweight 2D BEV convolution network and compensated by integrating the 3D voxel features interpolated from the original image features. Experiments on the Occ3D-nuScenes benchmark demonstrate that our FastOcc achieves state-of-the-art results with a fast inference speed.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
World Models for Autonomous Driving: An Initial Survey
Authors:
Yanchen Guan,
Haicheng Liao,
Zhenning Li,
Jia Hu,
Runze Yuan,
Yunjian Li,
Guohui Zhang,
Chengzhong Xu
Abstract:
In the rapidly evolving landscape of autonomous driving, the capability to accurately predict future events and assess their implications is paramount for both safety and efficiency, critically aiding the decision-making process. World models have emerged as a transformative approach, enabling autonomous driving systems to synthesize and interpret vast amounts of sensor data, thereby predicting po…
▽ More
In the rapidly evolving landscape of autonomous driving, the capability to accurately predict future events and assess their implications is paramount for both safety and efficiency, critically aiding the decision-making process. World models have emerged as a transformative approach, enabling autonomous driving systems to synthesize and interpret vast amounts of sensor data, thereby predicting potential future scenarios and compensating for information gaps. This paper provides an initial review of the current state and prospective advancements of world models in autonomous driving, spanning their theoretical underpinnings, practical applications, and the ongoing research efforts aimed at overcoming existing limitations. Highlighting the significant role of world models in advancing autonomous driving technologies, this survey aspires to serve as a foundational reference for the research community, facilitating swift access to and comprehension of this burgeoning field, and inspiring continued innovation and exploration.
△ Less
Submitted 7 May, 2024; v1 submitted 4 March, 2024;
originally announced March 2024.
-
Observation of $ψ(3686)\to 3φ$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (645 additional authors not shown)
Abstract:
Using $(2.712\pm0.014)\times 10^9$ $ψ(3686)$ events collected by the BESIII detector operating at the BEPCII collider, we report the first observation of $ψ(3686)\to 3φ$ decay with a significance larger than 10$σ$. The branching fraction of this decay is determined to be $(1.46\pm0.05\pm0.17)\times10^{-5}$, where the first uncertainty is statistical and the second is systematic. No significant str…
▽ More
Using $(2.712\pm0.014)\times 10^9$ $ψ(3686)$ events collected by the BESIII detector operating at the BEPCII collider, we report the first observation of $ψ(3686)\to 3φ$ decay with a significance larger than 10$σ$. The branching fraction of this decay is determined to be $(1.46\pm0.05\pm0.17)\times10^{-5}$, where the first uncertainty is statistical and the second is systematic. No significant structure is observed in the $φφ$ invariant mass spectra.
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
Local regularity for inhomogeneous parabolic systems with a skew-symmetric part in BMO
Authors:
Guoming Zhang
Abstract:
In this note we aim to establish the local regularity for weak solutions to inhomogeneous parabolic systems having a real and anti-symmetric part in BMO, which can be seen as a generalization of the corresponding results for parabolic systems with bounded coefficients in the work of Auscher, Bortz, Egert and Saari [J. Math. Pures Appl.(9) 2019].
In this note we aim to establish the local regularity for weak solutions to inhomogeneous parabolic systems having a real and anti-symmetric part in BMO, which can be seen as a generalization of the corresponding results for parabolic systems with bounded coefficients in the work of Auscher, Bortz, Egert and Saari [J. Math. Pures Appl.(9) 2019].
△ Less
Submitted 17 April, 2024; v1 submitted 4 March, 2024;
originally announced March 2024.
-
Cell sorting by active forces in a phase-field model of cell monolayers
Authors:
James N. Graham,
Guanming Zhang,
Julia M. Yeomans
Abstract:
Cell sorting, the segregation of cells with different properties into distinct domains, is a key phenomenon in biological processes such as embryogenesis. We use a phase-field model of a confluent cell layer to study the role of activity in cell sorting. We find that a mixture of cells with extensile or contractile dipolar activity, and which are identical apart from their activity, quickly sort i…
▽ More
Cell sorting, the segregation of cells with different properties into distinct domains, is a key phenomenon in biological processes such as embryogenesis. We use a phase-field model of a confluent cell layer to study the role of activity in cell sorting. We find that a mixture of cells with extensile or contractile dipolar activity, and which are identical apart from their activity, quickly sort into small, elongated patches which then grow slowly in time. We interpret the sorting as driven by the different diffusivity of the extensile and contractile cells, mirroring the ordering of Brownian particles connected to different hot and cold thermostats. We check that the free energy is not changed by either partial or complete sorting, thus confirming that activity can be responsible for the ordering even in the absence of thermodynamic mechanisms.
△ Less
Submitted 3 March, 2024;
originally announced March 2024.
-
Optomechanical cooling with simultaneous intracavity and extracavity squeezed light
Authors:
S. S. Zheng,
F. X. Sun,
M. Asjad,
G. W. Zhang,
J. Huo,
J. Li,
J. Zhou,
Z. Ma,
Q. Y. He
Abstract:
We propose a novel and experimentally feasible approach to achieve high-efficiency ground-state cooling of a mechanical oscillator in an optomechanical system under the deeply unresolved sideband condition with the assistance of both intracavity and extracavity squeezing. In the scheme, a degenerate optical parametric amplifier is placed inside the optical cavity, generating the intracavity squeez…
▽ More
We propose a novel and experimentally feasible approach to achieve high-efficiency ground-state cooling of a mechanical oscillator in an optomechanical system under the deeply unresolved sideband condition with the assistance of both intracavity and extracavity squeezing. In the scheme, a degenerate optical parametric amplifier is placed inside the optical cavity, generating the intracavity squeezing; besides, the optical cavity is driven by externally generated squeezing light, namely the extracavity squeezing. The quantum interference effect generated by intracavity squeezing and extracavity squeezing can completely suppress the non-resonant Stokes heating process while greatly enhancing the anti-Stokes cooling process. Therefore, the joint-squeezing scheme is capable of cooling the mechanical oscillators to their quantum ground state in a regime far away from the resolved sideband condition. Compared with other traditional optomechanical cooling schemes, the single-photon cooling rate in this joint-squeezing scheme can be tremendously enlarged by nearly three orders of magnitude. At the same time, the coupling strength required to achieve ground-state cooling can be significantly reduced. This scheme is promising for cooling large-mass and low-frequency mechanical oscillators, which provides a prerequisite for preparing and manipulating non-classical states in macroscopic quantum systems and lays a significant foundation for quantum manipulation.
△ Less
Submitted 2 March, 2024;
originally announced March 2024.
-
Mixed-halide perovskite alloys $\text{CsPb}(\text{I}_{1-x}^{}\text{Br}_x^{})_3^{}$ and $\text{CsPb}(\text{Br}_{1-x}^{}\text{Cl}_x^{})_3^{}$: New insight of configuration entropy effect from first principles and phase diagrams
Authors:
Fang Pan,
Junni Zhai,
**yu Chen,
Lin Yang,
Hua Dong,
Fang Yuan,
Zhuangde Jiang,
Wei Ren,
Zuo-Guang Ye,
Guo-Xu Zhang,
**grui Li
Abstract:
Stability is one of the key issues in mixed-halide perovskite alloys which are promising in emergent optoelectronics. Previous density-functional-theory (DFT) and machine learning studies indicate that the formation-energy convex hulls of these materials are very shallow, and stable alloy compositions are rare. In this work, we revisit this problem using DFT with special focus on the effects of co…
▽ More
Stability is one of the key issues in mixed-halide perovskite alloys which are promising in emergent optoelectronics. Previous density-functional-theory (DFT) and machine learning studies indicate that the formation-energy convex hulls of these materials are very shallow, and stable alloy compositions are rare. In this work, we revisit this problem using DFT with special focus on the effects of configuration and vibration entropies. Allowed by the $20$-atomic models for the $\text{CsPb}(\text{I}_{1-x}^{}\text{Br}_x^{})_3^{}$ and $\text{CsPb}(\text{Br}_{1-x}^{}\text{Cl}_x^{})_3^{}$ series, the partition functions and therewith thermodynamic state functions are calculated by traversing all possible mixed-halide configurations. We can thus evaluate the temperature- and system-dependent configuration entropy, which largely corrects the conventional approach based on the ideal solution model. Finally, temperature-composition phase diagrams that include $α$, $β$, $γ$ and $δ$ phases of both alloys are constructed based on the free energy data, for which the contribution of phonon vibrations is included.
△ Less
Submitted 29 February, 2024;
originally announced February 2024.
-
EncodingNet: A Novel Encoding-based MAC Design for Efficient Neural Network Acceleration
Authors:
Bo Liu,
Grace Li Zhang,
Xunzhao Yin,
Ulf Schlichtmann,
Bing Li
Abstract:
Deep neural networks (DNNs) have achieved great breakthroughs in many fields such as image classification and natural language processing. However, the execution of DNNs needs to conduct massive numbers of multiply-accumulate (MAC) operations on hardware and thus incurs a large power consumption. To address this challenge, we propose a novel digital MAC design based on encoding. In this new design…
▽ More
Deep neural networks (DNNs) have achieved great breakthroughs in many fields such as image classification and natural language processing. However, the execution of DNNs needs to conduct massive numbers of multiply-accumulate (MAC) operations on hardware and thus incurs a large power consumption. To address this challenge, we propose a novel digital MAC design based on encoding. In this new design, the multipliers are replaced by simple logic gates to project the results onto a wide bit representation. These bits carry individual position weights, which can be trained for specific neural networks to enhance inference accuracy. The outputs of the new multipliers are added by bit-wise weighted accumulation and the accumulation results are compatible with existing computing platforms accelerating neural networks with either uniform or non-uniform quantization. Since the multiplication function is replaced by simple logic projection, the critical paths in the resulting circuits become much shorter. Correspondingly, pipelining stages in the MAC array can be reduced, leading to a significantly smaller area as well as a better power efficiency. The proposed design has been synthesized and verified by ResNet18-Cifar10, ResNet20-Cifar100 and ResNet50-ImageNet. The experimental results confirmed the reduction of circuit area by up to 79.63% and the reduction of power consumption of executing DNNs by up to 70.18%, while the accuracy of the neural networks can still be well maintained.
△ Less
Submitted 25 February, 2024;
originally announced February 2024.
-
VRP-SAM: SAM with Visual Reference Prompt
Authors:
Yanpeng Sun,
Jiahui Chen,
Shan Zhang,
Xinyu Zhang,
Qiang Chen,
Gang Zhang,
Errui Ding,
**gdong Wang,
Zechao Li
Abstract:
In this paper, we propose a novel Visual Reference Prompt (VRP) encoder that empowers the Segment Anything Model (SAM) to utilize annotated reference images as prompts for segmentation, creating the VRP-SAM model. In essence, VRP-SAM can utilize annotated reference images to comprehend specific objects and perform segmentation of specific objects in target image. It is note that the VRP encoder ca…
▽ More
In this paper, we propose a novel Visual Reference Prompt (VRP) encoder that empowers the Segment Anything Model (SAM) to utilize annotated reference images as prompts for segmentation, creating the VRP-SAM model. In essence, VRP-SAM can utilize annotated reference images to comprehend specific objects and perform segmentation of specific objects in target image. It is note that the VRP encoder can support a variety of annotation formats for reference images, including \textbf{point}, \textbf{box}, \textbf{scribble}, and \textbf{mask}. VRP-SAM achieves a breakthrough within the SAM framework by extending its versatility and applicability while preserving SAM's inherent strengths, thus enhancing user-friendliness. To enhance the generalization ability of VRP-SAM, the VRP encoder adopts a meta-learning strategy. To validate the effectiveness of VRP-SAM, we conducted extensive empirical studies on the Pascal and COCO datasets. Remarkably, VRP-SAM achieved state-of-the-art performance in visual reference segmentation with minimal learnable parameters. Furthermore, VRP-SAM demonstrates strong generalization capabilities, allowing it to perform segmentation of unseen objects and enabling cross-domain segmentation. The source code and models will be available at \url{https://github.com/syp2ysy/VRP-SAM}
△ Less
Submitted 30 March, 2024; v1 submitted 27 February, 2024;
originally announced February 2024.
-
ArcSin: Adaptive ranged cosine Similarity injected noise for Language-Driven Visual Tasks
Authors:
Yang Liu,
Xiaomin Yu,
Gongyu Zhang,
Christos Bergeles,
Prokar Dasgupta,
Alejandro Granados,
Sebastien Ourselin
Abstract:
In this study, we address the challenging task of bridging the modality gap between learning from language and inference for visual tasks, including Visual Question Answering (VQA), Image Captioning (IC) and Visual Entailment (VE). We train models for these tasks in a zero-shot cross-modal transfer setting, a domain where the previous state-of-the-art method relied on the fixed scale noise injecti…
▽ More
In this study, we address the challenging task of bridging the modality gap between learning from language and inference for visual tasks, including Visual Question Answering (VQA), Image Captioning (IC) and Visual Entailment (VE). We train models for these tasks in a zero-shot cross-modal transfer setting, a domain where the previous state-of-the-art method relied on the fixed scale noise injection, often compromising the semantic content of the original modality embedding. To combat it, we propose a novel method called Adaptive ranged cosine Similarity injected noise (ArcSin). First, we introduce an innovative adaptive noise scale that effectively generates the textual elements with more variability while preserving the original text feature's integrity. Second, a similarity pool strategy is employed, expanding the domain generalization potential by broadening the overall noise scale. This dual strategy effectively widens the scope of the original domain while safeguarding content integrity. Our empirical results demonstrate that these models closely rival those trained on images in terms of performance. Specifically, our method exhibits substantial improvements over the previous state-of-the-art, achieving gains of 1.9 and 1.1 CIDEr points in S-Cap and M-Cap, respectively. Additionally, we observe increases of 1.5 percentage points (pp), 1.4 pp, and 1.4 pp in accuracy for VQA, VQA-E, and VE, respectively, pushing the boundaries of what is achievable within the constraints of image-trained model benchmarks. The code will be released.
△ Less
Submitted 27 February, 2024;
originally announced February 2024.
-
StructLM: Towards Building Generalist Models for Structured Knowledge Grounding
Authors:
Alex Zhuang,
Ge Zhang,
Tianyu Zheng,
Xinrun Du,
Junjie Wang,
Weiming Ren,
Stephen W. Huang,
Jie Fu,
Xiang Yue,
Wenhu Chen
Abstract:
Structured data sources, such as tables, graphs, and databases, are ubiquitous knowledge sources. Despite the demonstrated capabilities of large language models (LLMs) on plain text, their proficiency in interpreting and utilizing structured data remains limited. Our investigation reveals a notable deficiency in LLMs' ability to process structured data, e.g., ChatGPT lags behind state-of-the-art (…
▽ More
Structured data sources, such as tables, graphs, and databases, are ubiquitous knowledge sources. Despite the demonstrated capabilities of large language models (LLMs) on plain text, their proficiency in interpreting and utilizing structured data remains limited. Our investigation reveals a notable deficiency in LLMs' ability to process structured data, e.g., ChatGPT lags behind state-of-the-art (SoTA) model by an average of 35%. To augment the Structured Knowledge Grounding (SKG) capabilities in LLMs, we have developed a comprehensive instruction tuning dataset comprising 1.1 million examples. Utilizing this dataset, we train a series of models, referred to as StructLM, based on the Mistral and the CodeLlama model family, ranging from 7B to 34B parameters. Our StructLM series surpasses task-specific models on 16 out of 18 evaluated datasets and establishes new SoTA performance on 8 SKG tasks. Furthermore, StructLM demonstrates strong generalization across 6 novel held-out SKG tasks, outperforming TableLlama by an average of 35\% and Flan-UL2 20B by an average of 10\%. Contrary to expectations, we observe that scaling model size offers marginal benefits, with StructLM-34B showing only slight improvements over StructLM-7B. This suggests that structured knowledge grounding is still a challenging task and requires more innovative design to push to a new level.
△ Less
Submitted 24 April, 2024; v1 submitted 26 February, 2024;
originally announced February 2024.
-
A Poisson-Gamma Dynamic Factor Model with Time-Varying Transition Dynamics
Authors:
Jiahao Wang,
Sikun Yang,
Heinz Koeppl,
Xiuzhen Cheng,
Pengfei Hu,
Guoming Zhang
Abstract:
Probabilistic approaches for handling count-valued time sequences have attracted amounts of research attentions because their ability to infer explainable latent structures and to estimate uncertainties, and thus are especially suitable for dealing with \emph{noisy} and \emph{incomplete} count data. Among these models, Poisson-Gamma Dynamical Systems (PGDSs) are proven to be effective in capturing…
▽ More
Probabilistic approaches for handling count-valued time sequences have attracted amounts of research attentions because their ability to infer explainable latent structures and to estimate uncertainties, and thus are especially suitable for dealing with \emph{noisy} and \emph{incomplete} count data. Among these models, Poisson-Gamma Dynamical Systems (PGDSs) are proven to be effective in capturing the evolving dynamics underlying observed count sequences. However, the state-of-the-art PGDS still fails to capture the \emph{time-varying} transition dynamics that are commonly observed in real-world count time sequences. To mitigate this gap, a non-stationary PGDS is proposed to allow the underlying transition matrices to evolve over time, and the evolving transition matrices are modeled by sophisticatedly-designed Dirichlet Markov chains. Leveraging Dirichlet-Multinomial-Beta data augmentation techniques, a fully-conjugate and efficient Gibbs sampler is developed to perform posterior simulation. Experiments show that, in comparison with related models, the proposed non-stationary PGDS achieves improved predictive performance due to its capacity to learn non-stationary dependency structure captured by the time-evolving transition matrices.
△ Less
Submitted 23 May, 2024; v1 submitted 25 February, 2024;
originally announced February 2024.
-
ChatMusician: Understanding and Generating Music Intrinsically with LLM
Authors:
Ruibin Yuan,
Hanfeng Lin,
Yi Wang,
Zeyue Tian,
Shangda Wu,
Tianhao Shen,
Ge Zhang,
Yuhang Wu,
Cong Liu,
Ziya Zhou,
Ziyang Ma,
Liumeng Xue,
Ziyu Wang,
Qin Liu,
Tianyu Zheng,
Yizhi Li,
Yinghao Ma,
Yiming Liang,
Xiaowei Chi,
Ruibo Liu,
Zili Wang,
Pengfei Li,
**gcheng Wu,
Chenghua Lin,
Qifeng Liu
, et al. (10 additional authors not shown)
Abstract:
While Large Language Models (LLMs) demonstrate impressive capabilities in text generation, we find that their ability has yet to be generalized to music, humanity's creative language. We introduce ChatMusician, an open-source LLM that integrates intrinsic musical abilities. It is based on continual pre-training and finetuning LLaMA2 on a text-compatible music representation, ABC notation, and the…
▽ More
While Large Language Models (LLMs) demonstrate impressive capabilities in text generation, we find that their ability has yet to be generalized to music, humanity's creative language. We introduce ChatMusician, an open-source LLM that integrates intrinsic musical abilities. It is based on continual pre-training and finetuning LLaMA2 on a text-compatible music representation, ABC notation, and the music is treated as a second language. ChatMusician can understand and generate music with a pure text tokenizer without any external multi-modal neural structures or tokenizers. Interestingly, endowing musical abilities does not harm language abilities, even achieving a slightly higher MMLU score. Our model is capable of composing well-structured, full-length music, conditioned on texts, chords, melodies, motifs, musical forms, etc, surpassing GPT-4 baseline. On our meticulously curated college-level music understanding benchmark, MusicTheoryBench, ChatMusician surpasses LLaMA2 and GPT-3.5 on zero-shot setting by a noticeable margin. Our work reveals that LLMs can be an excellent compressor for music, but there remains significant territory to be conquered. We release our 4B token music-language corpora MusicPile, the collected MusicTheoryBench, code, model and demo in GitHub.
△ Less
Submitted 25 February, 2024;
originally announced February 2024.
-
Does Combining Parameter-efficient Modules Improve Few-shot Transfer Accuracy?
Authors:
Nader Asadi,
Mahdi Beitollahi,
Yasser Khalil,
Yinchuan Li,
Guojun Zhang,
Xi Chen
Abstract:
Parameter-efficient fine-tuning stands as the standard for efficiently fine-tuning large language and vision models on downstream tasks. Specifically, the efficiency of low-rank adaptation has facilitated the creation and sharing of hundreds of custom LoRA modules, each trained on distinct data from various downstream tasks. In this paper, we explore the composability of LoRA modules, examining if…
▽ More
Parameter-efficient fine-tuning stands as the standard for efficiently fine-tuning large language and vision models on downstream tasks. Specifically, the efficiency of low-rank adaptation has facilitated the creation and sharing of hundreds of custom LoRA modules, each trained on distinct data from various downstream tasks. In this paper, we explore the composability of LoRA modules, examining if combining these pre-trained modules enhances generalization to unseen downstream tasks. Our investigation involves evaluating two approaches: (a) uniform composition, involving averaging upstream LoRA modules with equal weights, and (b) learned composition, where we learn the weights for each upstream module and perform weighted averaging. Our experimental results on both vision and language models reveal that in few-shot settings, where only a limited number of samples are available for the downstream task, both uniform and learned composition methods result in better transfer accuracy; outperforming full fine-tuning and training a LoRA from scratch. Moreover, in full-shot settings, learned composition performs comparably to regular LoRA training with significantly fewer number of trainable parameters. Our research unveils the potential of uniform composition for enhancing transferability in low-shot settings, without introducing additional learnable parameters.
△ Less
Submitted 23 February, 2024;
originally announced February 2024.
-
Ultra-short lifetime isomer studies from photonuclear reactions using laser-driven ultra-intense γ-ray
Authors:
Di Wu,
Haoyang Lan,
Jiaxing Liu,
Huangang Lu,
Jianyao Zhang,
Jianfeng Lv,
Xuezhi Wu,
Hui Zhang,
Yadong Xia,
Qiangyou He,
Jie Cai,
Qianyi Ma,
Yuhui Xia,
Zhenan Wang,
Meizhi Wang,
Zhiyan Yang,
Xinlu Xu,
Yixing Geng,
Chen Lin,
Wenjun Ma,
Yanying Zhao,
Haoran Wang,
Fulong Liu,
Chuangye He,
**qing Yu
, et al. (7 additional authors not shown)
Abstract:
Isomers, ubiquitous populations of relatively long-lived nuclear excited states, play a crucial role in nuclear physics. However, isomers with half-life times of several seconds or less barely had experimental cross section data due to the lack of a suitable measuring method. We report a method of online γ spectroscopy for ultra-short-lived isomers from photonuclear reactions using laser-driven ul…
▽ More
Isomers, ubiquitous populations of relatively long-lived nuclear excited states, play a crucial role in nuclear physics. However, isomers with half-life times of several seconds or less barely had experimental cross section data due to the lack of a suitable measuring method. We report a method of online γ spectroscopy for ultra-short-lived isomers from photonuclear reactions using laser-driven ultra-intense γ-rays. The fastest time resolution can reach sub-ps level with γ-ray intensities >10^{19}/s ({\geqslant} 8 MeV). The ^{115}In(γ, n)^{114m2}In reaction (T_{1/2} = 43.1 ms) was first measured in the high-energy region which shed light on the nuclear structure studies of In element. Simulations showed it would be an efficient way to study ^{229m}Th (T_{1/2} = 7 μs), which is believed to be the next generation of nuclear clock. This work offered a unique way of gaining insight into ultra-short lifetimes and promised an effective way to fill the gap in relevant experimental data.
△ Less
Submitted 23 February, 2024;
originally announced February 2024.
-
CaT-GNN: Enhancing Credit Card Fraud Detection via Causal Temporal Graph Neural Networks
Authors:
Yifan Duan,
Guibin Zhang,
Shilong Wang,
Xiaojiang Peng,
Wang Ziqi,
Junyuan Mao,
Hao Wu,
Xinke Jiang,
Kun Wang
Abstract:
Credit card fraud poses a significant threat to the economy. While Graph Neural Network (GNN)-based fraud detection methods perform well, they often overlook the causal effect of a node's local structure on predictions. This paper introduces a novel method for credit card fraud detection, the \textbf{\underline{Ca}}usal \textbf{\underline{T}}emporal \textbf{\underline{G}}raph \textbf{\underline{N}…
▽ More
Credit card fraud poses a significant threat to the economy. While Graph Neural Network (GNN)-based fraud detection methods perform well, they often overlook the causal effect of a node's local structure on predictions. This paper introduces a novel method for credit card fraud detection, the \textbf{\underline{Ca}}usal \textbf{\underline{T}}emporal \textbf{\underline{G}}raph \textbf{\underline{N}}eural \textbf{N}etwork (CaT-GNN), which leverages causal invariant learning to reveal inherent correlations within transaction data. By decomposing the problem into discovery and intervention phases, CaT-GNN identifies causal nodes within the transaction graph and applies a causal mixup strategy to enhance the model's robustness and interpretability. CaT-GNN consists of two key components: Causal-Inspector and Causal-Intervener. The Causal-Inspector utilizes attention weights in the temporal attention mechanism to identify causal and environment nodes without introducing additional parameters. Subsequently, the Causal-Intervener performs a causal mixup enhancement on environment nodes based on the set of nodes. Evaluated on three datasets, including a private financial dataset and two public datasets, CaT-GNN demonstrates superior performance over existing state-of-the-art methods. Our findings highlight the potential of integrating causal reasoning with graph neural networks to improve fraud detection capabilities in financial transactions.
△ Less
Submitted 22 February, 2024;
originally announced February 2024.
-
OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement
Authors:
Tianyu Zheng,
Ge Zhang,
Tianhao Shen,
Xueling Liu,
Bill Yuchen Lin,
Jie Fu,
Wenhu Chen,
Xiang Yue
Abstract:
The introduction of large language models has significantly advanced code generation. However, open-source models often lack the execution capabilities and iterative refinement of advanced systems like the GPT-4 Code Interpreter. To address this, we introduce OpenCodeInterpreter, a family of open-source code systems designed for generating, executing, and iteratively refining code. Supported by Co…
▽ More
The introduction of large language models has significantly advanced code generation. However, open-source models often lack the execution capabilities and iterative refinement of advanced systems like the GPT-4 Code Interpreter. To address this, we introduce OpenCodeInterpreter, a family of open-source code systems designed for generating, executing, and iteratively refining code. Supported by Code-Feedback, a dataset featuring 68K multi-turn interactions, OpenCodeInterpreter integrates execution and human feedback for dynamic code refinement. Our comprehensive evaluation of OpenCodeInterpreter across key benchmarks such as HumanEval, MBPP, and their enhanced versions from EvalPlus reveals its exceptional performance. Notably, OpenCodeInterpreter-33B achieves an accuracy of 83.2 (76.4) on the average (and plus versions) of HumanEval and MBPP, closely rivaling GPT-4's 84.2 (76.2) and further elevates to 91.6 (84.6) with synthesized human feedback from GPT-4. OpenCodeInterpreter brings the gap between open-source code generation models and proprietary systems like GPT-4 Code Interpreter.
△ Less
Submitted 27 February, 2024; v1 submitted 22 February, 2024;
originally announced February 2024.
-
An Improved Pseudopolynomial Time Algorithm for Subset Sum
Authors:
Lin Chen,
Jiayi Lian,
Yuchen Mao,
Guochuan Zhang
Abstract:
We investigate pseudo-polynomial time algorithms for Subset Sum. Given a multi-set $X$ of $n$ positive integers and a target $t$, Subset Sum asks whether some subset of $X$ sums to $t$. Bringmann proposes an $\tilde{O}(n + t)$-time algorithm [Bringmann SODA'17], and an open question has naturally arisen: can Subset Sum be solved in $O(n + w)$ time? Here $w$ is the maximum integer in $X$. We make a…
▽ More
We investigate pseudo-polynomial time algorithms for Subset Sum. Given a multi-set $X$ of $n$ positive integers and a target $t$, Subset Sum asks whether some subset of $X$ sums to $t$. Bringmann proposes an $\tilde{O}(n + t)$-time algorithm [Bringmann SODA'17], and an open question has naturally arisen: can Subset Sum be solved in $O(n + w)$ time? Here $w$ is the maximum integer in $X$. We make a progress towards resolving the open question by proposing an $\tilde{O}(n + \sqrt{wt})$-time algorithm.
△ Less
Submitted 4 April, 2024; v1 submitted 22 February, 2024;
originally announced February 2024.
-
REPOFUSE: Repository-Level Code Completion with Fused Dual Context
Authors:
Ming Liang,
Xiaoheng Xie,
Gehao Zhang,
Xun** Zheng,
Peng Di,
wei jiang,
Hongwei Chen,
Chengpeng Wang,
Gang Fan
Abstract:
The success of language models in code assistance has spurred the proposal of repository-level code completion as a means to enhance prediction accuracy, utilizing the context from the entire codebase. However, this amplified context can inadvertently increase inference latency, potentially undermining the developer experience and deterring tool adoption - a challenge we termed the Context-Latency…
▽ More
The success of language models in code assistance has spurred the proposal of repository-level code completion as a means to enhance prediction accuracy, utilizing the context from the entire codebase. However, this amplified context can inadvertently increase inference latency, potentially undermining the developer experience and deterring tool adoption - a challenge we termed the Context-Latency Conundrum. This paper introduces REPOFUSE, a pioneering solution designed to enhance repository-level code completion without the latency trade-off. REPOFUSE uniquely fuses two types of context: the analogy context, rooted in code analogies, and the rationale context, which encompasses in-depth semantic relationships. We propose a novel rank truncated generation (RTG) technique that efficiently condenses these contexts into prompts with restricted size. This enables REPOFUSE to deliver precise code completions while maintaining inference efficiency. Through testing with the CrossCodeEval suite, REPOFUSE has demonstrated a significant leap over existing models, achieving a 40.90% to 59.75% increase in exact match (EM) accuracy for code completions and a 26.8% enhancement in inference speed. Beyond experimental validation, REPOFUSE has been integrated into the workflow of a large enterprise, where it actively supports various coding tasks.
△ Less
Submitted 22 February, 2024; v1 submitted 22 February, 2024;
originally announced February 2024.
-
Breaking the Barrier: Utilizing Large Language Models for Industrial Recommendation Systems through an Inferential Knowledge Graph
Authors:
Qian Zhao,
Hao Qian,
Ziqi Liu,
Gong-Duo Zhang,
Lihong Gu
Abstract:
Recommendation systems are widely used in e-commerce websites and online platforms to address information overload. However, existing systems primarily rely on historical data and user feedback, making it difficult to capture user intent transitions. Recently, Knowledge Base (KB)-based models are proposed to incorporate expert knowledge, but it struggle to adapt to new items and the evolving e-com…
▽ More
Recommendation systems are widely used in e-commerce websites and online platforms to address information overload. However, existing systems primarily rely on historical data and user feedback, making it difficult to capture user intent transitions. Recently, Knowledge Base (KB)-based models are proposed to incorporate expert knowledge, but it struggle to adapt to new items and the evolving e-commerce environment. To address these challenges, we propose a novel Large Language Model based Complementary Knowledge Enhanced Recommendation System (LLM-KERec). It introduces an entity extractor that extracts unified concept terms from item and user information. To provide cost-effective and reliable prior knowledge, entity pairs are generated based on entity popularity and specific strategies. The large language model determines complementary relationships in each entity pair, constructing a complementary knowledge graph. Furthermore, a new complementary recall module and an Entity-Entity-Item (E-E-I) weight decision model refine the scoring of the ranking model using real complementary exposure-click samples. Extensive experiments conducted on three industry datasets demonstrate the significant performance improvement of our model compared to existing approaches. Additionally, detailed analysis shows that LLM-KERec enhances users' enthusiasm for consumption by recommending complementary items. In summary, LLM-KERec addresses the limitations of traditional recommendation systems by incorporating complementary knowledge and utilizing a large language model to capture user intent transitions, adapt to new items, and enhance recommendation efficiency in the evolving e-commerce landscape.
△ Less
Submitted 21 February, 2024;
originally announced February 2024.
-
Algebraic Riccati Tensor Equations with Applications in Multilinear Control Systems
Authors:
Yuchao Wang,
Yimin Wei,
Guofeng Zhang,
Shih Yu Chang
Abstract:
In a recent interesting paper [8], Chen et al. initialized the control-theoretic study of a class of discrete-time multilinear time-invariant (MLTI) control systems, where system states, inputs and outputs are all tensors endowed with the Einstein product. Criteria for fundamental system-theoretic notions such as stability, reachability and observability are established by means of tensor decompos…
▽ More
In a recent interesting paper [8], Chen et al. initialized the control-theoretic study of a class of discrete-time multilinear time-invariant (MLTI) control systems, where system states, inputs and outputs are all tensors endowed with the Einstein product. Criteria for fundamental system-theoretic notions such as stability, reachability and observability are established by means of tensor decomposition. The purpose of this paper is to continue this novel research direction. Specifically, we focus on continuous-time MLTI control systems. We define Hamiltonian tensors and symplectic tensors and establish the Schur-Hamiltonian tensor decomposition and symplectic tensor singular value decomposition (SVD). Based on these we propose the algebraic Riccati tensor equation (ARTE) and show that it has a unique positive semidefinite solution if the system is stablizable and detectable. A tensor-based Newton method is proposed to find numerical solutions of the ARTE. The tensor version of the bounded real lemma is also established. A first-order robustness analysis of the ARTE is conducted. Finally, a numerical example is used to demonstrate the proposed theory and algorithms.
△ Less
Submitted 20 February, 2024;
originally announced February 2024.
-
CMDAG: A Chinese Metaphor Dataset with Annotated Grounds as CoT for Boosting Metaphor Generation
Authors:
Yujie Shao,
Xinrong Yao,
Xingwei Qu,
Chenghua Lin,
Shi Wang,
Stephen W. Huang,
Ge Zhang,
Jie Fu
Abstract:
Metaphor is a prominent linguistic device in human language and literature, as they add color, imagery, and emphasis to enhance effective communication. This paper introduces a large-scale high quality annotated Chinese Metaphor Corpus, which comprises around 28K sentences drawn from a diverse range of Chinese literary sources, such as poems, prose, song lyrics, etc. To ensure the accuracy and con…
▽ More
Metaphor is a prominent linguistic device in human language and literature, as they add color, imagery, and emphasis to enhance effective communication. This paper introduces a large-scale high quality annotated Chinese Metaphor Corpus, which comprises around 28K sentences drawn from a diverse range of Chinese literary sources, such as poems, prose, song lyrics, etc. To ensure the accuracy and consistency of our annotations, we introduce a comprehensive set of guidelines. These guidelines address the facets of metaphor annotation, including identifying tenors, vehicles, and grounds to handling the complexities of similes, personifications, juxtapositions, and hyperboles. Breaking tradition, our approach to metaphor generation emphasizes grounds and their distinct features rather than the conventional combination of tenors and vehicles. By integrating "ground" as a CoT (Chain of Thoughts) input, we are able to generate metaphors that resonate more with real-world intuition. We test generative models such as Belle, Baichuan, and Chinese-alpaca-33B using our annotated corpus. These models are able to generate creative and fluent metaphor sentences more frequently induced by selected samples from our dataset, demonstrating the value of our corpus for Chinese metaphor research. The code is available in https://github.com/JasonShao55/Chinese_Metaphor_Explanation.
△ Less
Submitted 20 February, 2024; v1 submitted 20 February, 2024;
originally announced February 2024.
-
CIF-Bench: A Chinese Instruction-Following Benchmark for Evaluating the Generalizability of Large Language Models
Authors:
Yizhi LI,
Ge Zhang,
Xingwei Qu,
Jiali Li,
Zhaoqun Li,
Zekun Wang,
Hao Li,
Ruibin Yuan,
Yinghao Ma,
Kai Zhang,
Wangchunshu Zhou,
Yiming Liang,
Lei Zhang,
Lei Ma,
Jiajun Zhang,
Zuowen Li,
Stephen W. Huang,
Chenghua Lin,
Jie Fu
Abstract:
The advancement of large language models (LLMs) has enhanced the ability to generalize across a wide range of unseen natural language processing (NLP) tasks through instruction-following. Yet, their effectiveness often diminishes in low-resource languages like Chinese, exacerbated by biased evaluations from data leakage, casting doubt on their true generalizability to new linguistic territories. I…
▽ More
The advancement of large language models (LLMs) has enhanced the ability to generalize across a wide range of unseen natural language processing (NLP) tasks through instruction-following. Yet, their effectiveness often diminishes in low-resource languages like Chinese, exacerbated by biased evaluations from data leakage, casting doubt on their true generalizability to new linguistic territories. In response, we introduce the Chinese Instruction-Following Benchmark (CIF-Bench), designed to evaluate the zero-shot generalizability of LLMs to the Chinese language. CIF-Bench comprises 150 tasks and 15,000 input-output pairs, developed by native speakers to test complex reasoning and Chinese cultural nuances across 20 categories. To mitigate data contamination, we release only half of the dataset publicly, with the remainder kept private, and introduce diversified instructions to minimize score variance, totaling 45,000 data instances. Our evaluation of 28 selected LLMs reveals a noticeable performance gap, with the best model scoring only 52.9%, highlighting the limitations of LLMs in less familiar language and task contexts. This work not only uncovers the current limitations of LLMs in handling Chinese language tasks but also sets a new standard for future LLM generalizability research, pushing towards the development of more adaptable, culturally informed, and linguistically diverse models.
△ Less
Submitted 4 June, 2024; v1 submitted 20 February, 2024;
originally announced February 2024.
-
MORE-3S:Multimodal-based Offline Reinforcement Learning with Shared Semantic Spaces
Authors:
Tianyu Zheng,
Ge Zhang,
Xingwei Qu,
Ming Kuang,
Stephen W. Huang,
Zhaofeng He
Abstract:
Drawing upon the intuition that aligning different modalities to the same semantic embedding space would allow models to understand states and actions more easily, we propose a new perspective to the offline reinforcement learning (RL) challenge. More concretely, we transform it into a supervised learning task by integrating multimodal and pre-trained language models. Our approach incorporates sta…
▽ More
Drawing upon the intuition that aligning different modalities to the same semantic embedding space would allow models to understand states and actions more easily, we propose a new perspective to the offline reinforcement learning (RL) challenge. More concretely, we transform it into a supervised learning task by integrating multimodal and pre-trained language models. Our approach incorporates state information derived from images and action-related data obtained from text, thereby bolstering RL training performance and promoting long-term strategic thinking. We emphasize the contextual understanding of language and demonstrate how decision-making in RL can benefit from aligning states' and actions' representation with languages' representation. Our method significantly outperforms current baselines as evidenced by evaluations conducted on Atari and OpenAI Gym environments. This contributes to advancing offline RL performance and efficiency while providing a novel perspective on offline RL.Our code and data are available at https://github.com/Zheng0428/MORE_.
△ Less
Submitted 20 February, 2024;
originally announced February 2024.
-
A Geometric Algorithm for Tubular Shape Reconstruction from Skeletal Representation
Authors:
Guoqing Zhang,
Yang Li
Abstract:
We introduce a novel approach for the reconstruction of tubular shapes from skeletal representations. Our method processes all skeletal points as a whole, eliminating the need for splitting input structure into multiple segments. We represent the tubular shape as a truncated signed distance function (TSDF) in a voxel hashing manner, in which the signed distance between a voxel center and the objec…
▽ More
We introduce a novel approach for the reconstruction of tubular shapes from skeletal representations. Our method processes all skeletal points as a whole, eliminating the need for splitting input structure into multiple segments. We represent the tubular shape as a truncated signed distance function (TSDF) in a voxel hashing manner, in which the signed distance between a voxel center and the object is computed through a simple geometric algorithm. Our method does not involve any surface sampling scheme or solving large matrix equations, and therefore is a faster and more elegant solution for tubular shape reconstruction compared to other approaches. Experiments demonstrate the efficiency and effectiveness of the proposed method. Code is avaliable at https://github.com/wlsdzyzl/Dragon.
△ Less
Submitted 1 July, 2024; v1 submitted 20 February, 2024;
originally announced February 2024.
-
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling
Authors:
Jun Zhan,
Junqi Dai,
Jiasheng Ye,
Yunhua Zhou,
Dong Zhang,
Zhigeng Liu,
Xin Zhang,
Ruibin Yuan,
Ge Zhang,
Linyang Li,
Hang Yan,
Jie Fu,
Tao Gui,
Tianxiang Sun,
Yugang Jiang,
Xipeng Qiu
Abstract:
We introduce AnyGPT, an any-to-any multimodal language model that utilizes discrete representations for the unified processing of various modalities, including speech, text, images, and music. AnyGPT can be trained stably without any alterations to the current large language model (LLM) architecture or training paradigms. Instead, it relies exclusively on data-level preprocessing, facilitating the…
▽ More
We introduce AnyGPT, an any-to-any multimodal language model that utilizes discrete representations for the unified processing of various modalities, including speech, text, images, and music. AnyGPT can be trained stably without any alterations to the current large language model (LLM) architecture or training paradigms. Instead, it relies exclusively on data-level preprocessing, facilitating the seamless integration of new modalities into LLMs, akin to the incorporation of new languages. We build a multimodal text-centric dataset for multimodal alignment pre-training. Utilizing generative models, we synthesize the first large-scale any-to-any multimodal instruction dataset. It consists of 108k samples of multi-turn conversations that intricately interweave various modalities, thus equip** the model to handle arbitrary combinations of multimodal inputs and outputs. Experimental results demonstrate that AnyGPT is capable of facilitating any-to-any multimodal conversation while achieving performance comparable to specialized models across all modalities, proving that discrete representations can effectively and conveniently unify multiple modalities within a language model. Demos are shown in https://junzhan2000.github.io/AnyGPT.github.io/
△ Less
Submitted 7 March, 2024; v1 submitted 19 February, 2024;
originally announced February 2024.