-
SceneTracker: Long-term Scene Flow Estimation Network
Authors:
Bo Wang,
Jian Li,
Yang Yu,
Li Liu,
Zhen** Sun,
Dewen Hu
Abstract:
Considering the complementarity of scene flow estimation in the spatial domain's focusing capability and 3D object tracking in the temporal domain's coherence, this study aims to address a comprehensive new task that can simultaneously capture fine-grained and long-term 3D motion in an online manner: long-term scene flow estimation (LSFE). We introduce SceneTracker, a novel learning-based LSFE net…
▽ More
Considering the complementarity of scene flow estimation in the spatial domain's focusing capability and 3D object tracking in the temporal domain's coherence, this study aims to address a comprehensive new task that can simultaneously capture fine-grained and long-term 3D motion in an online manner: long-term scene flow estimation (LSFE). We introduce SceneTracker, a novel learning-based LSFE network that adopts an iterative approach to approximate the optimal trajectory. Besides, it dynamically indexes and constructs appearance and depth correlation features simultaneously and employs the Transformer to explore and utilize long-range connections within and between trajectories. With detailed experiments, SceneTracker shows superior capabilities in handling 3D spatial occlusion and depth noise interference, highly tailored to the LSFE task's needs. Finally, we build the first real-world evaluation dataset, LSFDriving, further substantiating SceneTracker's commendable generalization capacity. The code and data for SceneTracker is available at https://github.com/wwsource/SceneTracker.
△ Less
Submitted 6 May, 2024; v1 submitted 28 March, 2024;
originally announced March 2024.
-
Emerging Jordan blocks in the two-dimensional Potts and loop models at generic $Q$
Authors:
Lawrence Liu,
Jesper Lykke Jacobsen,
Hubert Saleur
Abstract:
It was recently suggested -- based on general self-consistency arguments as well as results from the bootstrap (arXiv:2005.07708, arXiv:2007.11539, arXiv:2007.04190) -- that the CFT describing the $Q$-state Potts model is logarithmic for generic values of $Q$, with rank-two Jordan blocks for $L_0$ and ${\mkern 1.5mu\overline{\mkern-1.5mu L\mkern-1.5mu}\mkern 1.5mu}_0$ in many sectors of the theory…
▽ More
It was recently suggested -- based on general self-consistency arguments as well as results from the bootstrap (arXiv:2005.07708, arXiv:2007.11539, arXiv:2007.04190) -- that the CFT describing the $Q$-state Potts model is logarithmic for generic values of $Q$, with rank-two Jordan blocks for $L_0$ and ${\mkern 1.5mu\overline{\mkern-1.5mu L\mkern-1.5mu}\mkern 1.5mu}_0$ in many sectors of the theory. This is despite the well-known fact that the lattice transfer matrix (or Hamiltonian) is diagonalizable in (arbitrary) finite size. While the emergence of Jordan blocks only in the limit $L\to\infty$ is perfectly possible conceptually, diagonalizability in finite size makes the measurement of logarithmic couplings (whose values are analytically predicted in arXiv:2007.11539, arXiv:2007.04190) very challenging. This problem is solved in the present paper (which can be considered a companion to arXiv:2007.11539), and the conjectured logarithmic structure of the CFT confirmed in detail by the study of the lattice model and associated "emerging Jordan blocks."
△ Less
Submitted 28 March, 2024;
originally announced March 2024.
-
Interpreting Key Mechanisms of Factual Recall in Transformer-Based Language Models
Authors:
Ang Lv,
Yuhan Chen,
Kaiyi Zhang,
Yulong Wang,
Lifeng Liu,
Ji-Rong Wen,
Jian Xie,
Rui Yan
Abstract:
In this paper, we delve into several mechanisms employed by Transformer-based language models (LLMs) for factual recall tasks. We outline a pipeline consisting of three major steps: (1) Given a prompt ``The capital of France is,'' task-specific attention heads extract the topic token, such as ``France,'' from the context and pass it to subsequent MLPs. (2) As attention heads' outputs are aggregate…
▽ More
In this paper, we delve into several mechanisms employed by Transformer-based language models (LLMs) for factual recall tasks. We outline a pipeline consisting of three major steps: (1) Given a prompt ``The capital of France is,'' task-specific attention heads extract the topic token, such as ``France,'' from the context and pass it to subsequent MLPs. (2) As attention heads' outputs are aggregated with equal weight and added to the residual stream, the subsequent MLP acts as an ``activation,'' which either erases or amplifies the information originating from individual heads. As a result, the topic token ``France'' stands out in the residual stream. (3) A deep MLP takes ``France'' and generates a component that redirects the residual stream towards the direction of the correct answer, i.e., ``Paris.'' This procedure is akin to applying an implicit function such as ``get\_capital($X$),'' and the argument $X$ is the topic token information passed by attention heads. To achieve the above quantitative and qualitative analysis for MLPs, we proposed a novel analytic method aimed at decomposing the outputs of the MLP into components understandable by humans. Additionally, we observed a universal anti-overconfidence mechanism in the final layer of models, which suppresses correct predictions. We mitigate this suppression by leveraging our interpretation to improve factual recall confidence. The above interpretations are evaluated across diverse tasks spanning various domains of factual knowledge, using various language models from the GPT-2 families, 1.3B OPT, up to 7B Llama-2, and in both zero- and few-shot setups.
△ Less
Submitted 24 May, 2024; v1 submitted 28 March, 2024;
originally announced March 2024.
-
Measurement of absolute branching fractions of $D_s^+$ hadronic decays
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (632 additional authors not shown)
Abstract:
Using $e^+ e^-$ collision data collected at the BESIII detector at center-of-mass energies between 4.128 and 4.226 GeV, corresponding to an integrated luminosity of $7.33~{\rm fb}^{-1}$, we determine the absolute branching fractions of fifteen hadronic $D_s^{+}$ decays with a double-tag technique. In particular, we make precise measurements of the branching fractions…
▽ More
Using $e^+ e^-$ collision data collected at the BESIII detector at center-of-mass energies between 4.128 and 4.226 GeV, corresponding to an integrated luminosity of $7.33~{\rm fb}^{-1}$, we determine the absolute branching fractions of fifteen hadronic $D_s^{+}$ decays with a double-tag technique. In particular, we make precise measurements of the branching fractions $\mathcal{B}(D_s^+ \to K^+ K^- π^+)=(5.49 \pm 0.04 \pm 0.07)\%$, $\mathcal{B}(D_s^+ \to K_S^0 K^+)=(1.50 \pm 0.01 \pm 0.01)\%$ and $\mathcal{B}(D_s^+ \to K^+ K^- π^+ π^0)=(5.50 \pm 0.05 \pm 0.11)\%$, where the first uncertainties are statistical and the second ones are systematic. The \emph{CP} asymmetries in these decays are also measured and all are found to be compatible with zero.
△ Less
Submitted 30 May, 2024; v1 submitted 28 March, 2024;
originally announced March 2024.
-
Observation of the semileptonic decays $D^0\rightarrow K_S^0π^-π^0 e^+ ν_e$ and $D^+\rightarrow K_S^0π^+π^- e^+ ν_e$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
M. R. An,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann
, et al. (600 additional authors not shown)
Abstract:
By analyzing $e^+e^-$ annihilation data corresponding to an integrated luminosity of 2.93 $\rm fb^{-1}$ collected at a center-of-mass energy of 3.773 GeV with the \text{BESIII} detector, the first observation of the semileptonic decays $D^0\rightarrow K_S^0π^-π^0 e^+ ν_e$ and $D^+\rightarrow K_S^0π^+π^- e^+ ν_e$ is reported. With a dominant hadronic contribution from $K_1(1270)$, the branching fra…
▽ More
By analyzing $e^+e^-$ annihilation data corresponding to an integrated luminosity of 2.93 $\rm fb^{-1}$ collected at a center-of-mass energy of 3.773 GeV with the \text{BESIII} detector, the first observation of the semileptonic decays $D^0\rightarrow K_S^0π^-π^0 e^+ ν_e$ and $D^+\rightarrow K_S^0π^+π^- e^+ ν_e$ is reported. With a dominant hadronic contribution from $K_1(1270)$, the branching fractions are measured to be $\mathcal{B}(D^0\rightarrow {K}_1(1270)^-(\to K^0_Sπ^-π^0)e^+ν_e)=(1.69^{+0.53}_{-0.46}\pm0.15)\times10^{-4}$ and $\mathcal{B}(D^+\to \bar{K}_1(1270)^0(\to K^0_Sπ^+π^-)e^+ν_e)=(1.47^{+0.45}_{-0.40}\pm0.20)\times10^{-4}$ with statistical significance of 5.4$σ$ and 5.6$σ$, respectively. When combined with measurements of the $K_1(1270)\to K^+π^-π$ decays, the absolute branching fractions are determined to be $\mathcal{B}(D^0\to K_1(1270)^-e^+ν_e)=(1.05^{+0.33}_{-0.28}\pm0.12\pm0.12)\times10^{-3}$ and $\mathcal{B}(D^+\to \bar{K}_1(1270)^0e^+ν_e)=(1.29^{+0.40}_{-0.35}\pm0.18\pm0.15)\times10^{-3}$. The first and second uncertainties are statistical and systematic, respectively, and the third uncertainties originate from the assumed branching fractions of the $K_1(1270)\to Kππ$ decays.
△ Less
Submitted 27 March, 2024;
originally announced March 2024.
-
Gaussian Process-based Traversability Analysis for Terrain Mapless Navigation
Authors:
Abe Leininger,
Mahmoud Ali,
Hassan Jardali,
Lantao Liu
Abstract:
Efficient navigation through uneven terrain remains a challenging endeavor for autonomous robots. We propose a new geometric-based uneven terrain mapless navigation framework combining a Sparse Gaussian Process (SGP) local map with a Rapidly-Exploring Random Tree* (RRT*) planner. Our approach begins with the generation of a high-resolution SGP local map, providing an interpolated representation of…
▽ More
Efficient navigation through uneven terrain remains a challenging endeavor for autonomous robots. We propose a new geometric-based uneven terrain mapless navigation framework combining a Sparse Gaussian Process (SGP) local map with a Rapidly-Exploring Random Tree* (RRT*) planner. Our approach begins with the generation of a high-resolution SGP local map, providing an interpolated representation of the robot's immediate environment. This map captures crucial environmental variations, including height, uncertainties, and slope characteristics. Subsequently, we construct a traversability map based on the SGP representation to guide our planning process. The RRT* planner efficiently generates real-time navigation paths, avoiding untraversable terrain in pursuit of the goal. This combination of SGP-based terrain interpretation and RRT* planning enables ground robots to safely navigate environments with varying elevations and steep obstacles. We evaluate the performance of our proposed approach through robust simulation testing, highlighting its effectiveness in achieving safe and efficient navigation compared to existing methods.
△ Less
Submitted 27 March, 2024;
originally announced March 2024.
-
Arc-transitive maps with coprime Euler characteristic and edge number
Authors:
C. H. Li,
Lu Yi Liu
Abstract:
This is one of a series of papers which aim towards a classification of edge-transitive maps of which the Euler characteristic and the edge number are coprime. This one carries out the classification work for arc-transitive maps with nonsolvable automorphism groups, which illustrates how the edge number impacts on the Euler characteristic for maps. The classification is involved with the construct…
▽ More
This is one of a series of papers which aim towards a classification of edge-transitive maps of which the Euler characteristic and the edge number are coprime. This one carries out the classification work for arc-transitive maps with nonsolvable automorphism groups, which illustrates how the edge number impacts on the Euler characteristic for maps. The classification is involved with the construction of some new and interesting arc-regular maps.
△ Less
Submitted 27 March, 2024;
originally announced March 2024.
-
Track Everything Everywhere Fast and Robustly
Authors:
Yunzhou Song,
Jiahui Lei,
Ziyun Wang,
Lingjie Liu,
Kostas Daniilidis
Abstract:
We propose a novel test-time optimization approach for efficiently and robustly tracking any pixel at any time in a video. The latest state-of-the-art optimization-based tracking technique, OmniMotion, requires a prohibitively long optimization time, rendering it impractical for downstream applications. OmniMotion is sensitive to the choice of random seeds, leading to unstable convergence. To impr…
▽ More
We propose a novel test-time optimization approach for efficiently and robustly tracking any pixel at any time in a video. The latest state-of-the-art optimization-based tracking technique, OmniMotion, requires a prohibitively long optimization time, rendering it impractical for downstream applications. OmniMotion is sensitive to the choice of random seeds, leading to unstable convergence. To improve efficiency and robustness, we introduce a novel invertible deformation network, CaDeX++, which factorizes the function representation into a local spatial-temporal feature grid and enhances the expressivity of the coupling blocks with non-linear functions. While CaDeX++ incorporates a stronger geometric bias within its architectural design, it also takes advantage of the inductive bias provided by the vision foundation models. Our system utilizes monocular depth estimation to represent scene geometry and enhances the objective by incorporating DINOv2 long-term semantics to regulate the optimization process. Our experiments demonstrate a substantial improvement in training speed (more than \textbf{10 times} faster), robustness, and accuracy in tracking over the SoTA optimization-based method OmniMotion.
△ Less
Submitted 26 March, 2024;
originally announced March 2024.
-
Climate Downscaling: A Deep-Learning Based Super-resolution Model of Precipitation Data with Attention Block and Skip Connections
Authors:
Chia-Hao Chiang,
Zheng-Han Huang,
Liwen Liu,
Hsin-Chien Liang,
Yi-Chi Wang,
Wan-Ling Tseng,
Chao Wang,
Che-Ta Chen,
Ko-Chih Wang
Abstract:
Human activities accelerate consumption of fossil fuels and produce greenhouse gases, resulting in urgent issues today: global warming and the climate change. These indirectly cause severe natural disasters, plenty of lives suffering and huge losses of agricultural properties. To mitigate impacts on our lands, scientists are develo** renewable, reusable, and clean energies and climatologists are…
▽ More
Human activities accelerate consumption of fossil fuels and produce greenhouse gases, resulting in urgent issues today: global warming and the climate change. These indirectly cause severe natural disasters, plenty of lives suffering and huge losses of agricultural properties. To mitigate impacts on our lands, scientists are develo** renewable, reusable, and clean energies and climatologists are trying to predict the extremes. Meanwhile, governments are publicizing resource-saving policies for a more eco-friendly society and arousing environment awareness. One of the most influencing factors is the precipitation, bringing condensed water vapor onto lands. Water resources are the most significant but basic needs in society, not only supporting our livings, but also economics. In Taiwan, although the average annual precipitation is up to 2,500 millimeter (mm), the water allocation for each person is lower than the global average due to drastically geographical elevation changes and uneven distribution through the year. Thus, it is crucial to track and predict the rainfall to make the most use of it and to prevent the floods. However, climate models have limited resolution and require intensive computational power for local-scale use. Therefore, we proposed a deep convolutional neural network with skip connections, attention blocks, and auxiliary data concatenation, in order to downscale the low-resolution precipitation data into high-resolution one. Eventually, we compare with other climate downscaling methods and show better performance in metrics of Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Pearson Correlation, structural similarity index (SSIM), and forecast indicators.
△ Less
Submitted 26 March, 2024;
originally announced March 2024.
-
Leveraging A Variety of Anchors in Cellular Network for Ubiquitous Sensing
Authors:
Liang Liu,
Shuowen Zhang,
Shuguang Cui
Abstract:
Integrated sensing and communication (ISAC) has recently attracted tremendous attention from both academia and industry, being envisioned as a key part of the standards for the sixth-generation (6G) cellular network. A key challenge of 6G-oriented ISAC lies in how to perform ubiquitous sensing based on the communication signals and devices. Previous works have made great progresses on studying the…
▽ More
Integrated sensing and communication (ISAC) has recently attracted tremendous attention from both academia and industry, being envisioned as a key part of the standards for the sixth-generation (6G) cellular network. A key challenge of 6G-oriented ISAC lies in how to perform ubiquitous sensing based on the communication signals and devices. Previous works have made great progresses on studying the signal waveform design that leads to optimal communication-sensing performance tradeoff. In this article, we aim to focus on issues arising from the exploitation of the communication devices for sensing in 6G network. Particularly, we will discuss about how to leverage various nodes available in the cellular network as anchors to perform ubiquitous sensing. On one hand, the base stations (BSs) will be the most important anchors in the future 6G ISAC network, since they can generate/process radio signals with high range/angle resolutions, and their positions are precisely known. Correspondingly, we will first study the BS-based sensing technique. On the other hand, the BSs alone may not enable ubiquitous sensing, since they cannot cover all the places with strong line-of-sight (LOS) links. This motivates us to investigate the possibility of using other nodes that are with higher density in the network to act as the anchors. Along this line, we are interested in two types of new anchors - user equipments (UEs) and reconfigurable intelligent surfaces (RISs). This paper will shed light on the opportunities and challenges brought by UE-assisted sensing and RIS-assisted sensing. Our goal is to devise a novel 6G-oriented sensing architecture where BSs, UEs, and RISs can work together to provide ubiquitous sensing services.
△ Less
Submitted 26 March, 2024;
originally announced March 2024.
-
NeRF-HuGS: Improved Neural Radiance Fields in Non-static Scenes Using Heuristics-Guided Segmentation
Authors:
Jiahao Chen,
Yipeng Qin,
Lingjie Liu,
Jiangbo Lu,
Guanbin Li
Abstract:
Neural Radiance Field (NeRF) has been widely recognized for its excellence in novel view synthesis and 3D scene reconstruction. However, their effectiveness is inherently tied to the assumption of static scenes, rendering them susceptible to undesirable artifacts when confronted with transient distractors such as moving objects or shadows. In this work, we propose a novel paradigm, namely "Heurist…
▽ More
Neural Radiance Field (NeRF) has been widely recognized for its excellence in novel view synthesis and 3D scene reconstruction. However, their effectiveness is inherently tied to the assumption of static scenes, rendering them susceptible to undesirable artifacts when confronted with transient distractors such as moving objects or shadows. In this work, we propose a novel paradigm, namely "Heuristics-Guided Segmentation" (HuGS), which significantly enhances the separation of static scenes from transient distractors by harmoniously combining the strengths of hand-crafted heuristics and state-of-the-art segmentation models, thus significantly transcending the limitations of previous solutions. Furthermore, we delve into the meticulous design of heuristics, introducing a seamless fusion of Structure-from-Motion (SfM)-based heuristics and color residual heuristics, catering to a diverse range of texture profiles. Extensive experiments demonstrate the superiority and robustness of our method in mitigating transient distractors for NeRFs trained in non-static scenes. Project page: https://cnhaox.github.io/NeRF-HuGS/.
△ Less
Submitted 26 March, 2024;
originally announced March 2024.
-
Vortex nucleations in spinor Bose condensates under localized synthetic magnetic fields
Authors:
L. -R. Liu,
S. -C. Wu,
T. -W. Liu,
H. -Y. Hsu,
T. -K. Shen,
S. -K. Yip,
Y. Kawaguchi,
Y. -J. Lin
Abstract:
Gauge fields are ubiquitous in modern quantum physics. In superfluids, quantized vortices can be induced by gauge fields. Here we demonstrate the first experimental observation of vortex nucleations in spinor Bose-Einstein Condensates under radially-localized synthetic magnetic fields. The associated gauge potentials $\vec{A}$ are azimuthal and created by light-induced spin-orbital-angular-momentu…
▽ More
Gauge fields are ubiquitous in modern quantum physics. In superfluids, quantized vortices can be induced by gauge fields. Here we demonstrate the first experimental observation of vortex nucleations in spinor Bose-Einstein Condensates under radially-localized synthetic magnetic fields. The associated gauge potentials $\vec{A}$ are azimuthal and created by light-induced spin-orbital-angular-momentum coupling, generating circulating azimuthal velocity fields $\propto \vec{p}-\vec{A}$ even when the canonical momentum $\vec{p}= 0$. A sufficiently large azimuthal velocity peaked near the condensate center results in a dynamically unstable localized excitation that initiates vortex nucleations. This excitation appears as a spontaneously-formed vortex-antivortex pair near the cloud center. Following the initially developed instability, the dynamics is governed by the asymmetry and dissipation, where the atomic orbital angular momentum evolves and can reach the value of the ground state. Our system exhibits dynamical and Landau instabilities and agrees reasonably with time-dependent Gross-Pitaevskii simulations.
△ Less
Submitted 26 March, 2024;
originally announced March 2024.
-
TRAM: Global Trajectory and Motion of 3D Humans from in-the-wild Videos
Authors:
Yufu Wang,
Ziyun Wang,
Lingjie Liu,
Kostas Daniilidis
Abstract:
We propose TRAM, a two-stage method to reconstruct a human's global trajectory and motion from in-the-wild videos. TRAM robustifies SLAM to recover the camera motion in the presence of dynamic humans and uses the scene background to derive the motion scale. Using the recovered camera as a metric-scale reference frame, we introduce a video transformer model (VIMO) to regress the kinematic body moti…
▽ More
We propose TRAM, a two-stage method to reconstruct a human's global trajectory and motion from in-the-wild videos. TRAM robustifies SLAM to recover the camera motion in the presence of dynamic humans and uses the scene background to derive the motion scale. Using the recovered camera as a metric-scale reference frame, we introduce a video transformer model (VIMO) to regress the kinematic body motion of a human. By composing the two motions, we achieve accurate recovery of 3D humans in the world space, reducing global motion errors by 60% from prior work. https://yufu-wang.github.io/tram4d/
△ Less
Submitted 25 March, 2024;
originally announced March 2024.
-
Convert laser light into single photons via interference
Authors:
Yanfeng Li,
Manman Wang,
Guoqi Huang,
Li Liu,
Wenyan Wang,
Weijie Ji,
Hanqing Liu,
Xiangbin Su,
Shulun Li,
Deyan Dai,
Xiangjun Shang,
Haiqiao Ni,
Zhichuan Niu,
Chengyong Hu
Abstract:
Laser light possesses perfect coherence, but cannot be attenuated to single photons via linear optics. An elegant route to convert laser light into single photons is based on photon blockade in a cavity with a single atom in the strong coupling regime. However, the single-photon purity achieved by this method remains relatively low. Here we propose an interference-based approach where laser light…
▽ More
Laser light possesses perfect coherence, but cannot be attenuated to single photons via linear optics. An elegant route to convert laser light into single photons is based on photon blockade in a cavity with a single atom in the strong coupling regime. However, the single-photon purity achieved by this method remains relatively low. Here we propose an interference-based approach where laser light can be transformed into single photons by destructively interfering with a weak but super-bunched incoherent field emitted from a cavity coupling to a single quantum emitter. We demonstrate this idea by measuring the reflected light of a laser field which drives a double-sided optical microcavity containing a single artificial atom-quantum dot (QD) in the Purcell regime. The reflected light consists of a superposition of the driving field with the cavity output field. We achieve the second-order autocorrelation g2(0)=0.030+-0.002 and the two-photon interference visibility 94.3%+-0.2. By separating the coherent and incoherent fields in the reflected light, we observe that the incoherent field from the cavity exhibits super-bunching with g2(0)=41+-2 while the coherent field remains Poissonian statistics. By controlling the relative amplitude of coherent and incoherent fields, we verify that photon statistics of reflected light is tuneable from perfect anti-bunching to super-bunching in agreement with our predictions. Our results demonstrate photon statistics of light as a quantum interference phenomenon that a single QD can scatter two photons simultaneously at low driving fields in contrast to the common picture that a single two-level quantum emitter can only scatter (or absorb and emit) single photons. This work opens the door to tailoring photon statistics of laser light via cavity or waveguide quantum electrodynamics and interference.
△ Less
Submitted 25 March, 2024;
originally announced March 2024.
-
Cross-lingual Contextualized Phrase Retrieval
Authors:
Huayang Li,
Deng Cai,
Zhi Qu,
Qu Cui,
Hidetaka Kamigaito,
Lemao Liu,
Taro Watanabe
Abstract:
Phrase-level dense retrieval has shown many appealing characteristics in downstream NLP tasks by leveraging the fine-grained information that phrases offer. In our work, we propose a new task formulation of dense retrieval, cross-lingual contextualized phrase retrieval, which aims to augment cross-lingual applications by addressing polysemy using context information. However, the lack of specific…
▽ More
Phrase-level dense retrieval has shown many appealing characteristics in downstream NLP tasks by leveraging the fine-grained information that phrases offer. In our work, we propose a new task formulation of dense retrieval, cross-lingual contextualized phrase retrieval, which aims to augment cross-lingual applications by addressing polysemy using context information. However, the lack of specific training data and models are the primary challenges to achieve our goal. As a result, we extract pairs of cross-lingual phrases using word alignment information automatically induced from parallel sentences. Subsequently, we train our Cross-lingual Contextualized Phrase Retriever (CCPR) using contrastive learning, which encourages the hidden representations of phrases with similar contexts and semantics to align closely. Comprehensive experiments on both the cross-lingual phrase retrieval task and a downstream task, i.e, machine translation, demonstrate the effectiveness of CCPR. On the phrase retrieval task, CCPR surpasses baselines by a significant margin, achieving a top-1 accuracy that is at least 13 points higher. When utilizing CCPR to augment the large-language-model-based translator, it achieves average gains of 0.7 and 1.5 in BERTScore for translations from X=>En and vice versa, respectively, on WMT16 dataset. Our code and data are available at \url{https://github.com/ghrua/ccpr_release}.
△ Less
Submitted 25 March, 2024;
originally announced March 2024.
-
Cross section measurement of $e^+e^-\to ηψ(2S)$ and search for $e^+e^-\toη\tilde{X}(3872)$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (639 additional authors not shown)
Abstract:
The energy-dependent cross section for $e^+e^-\to ηψ(2S)$ is measured at eighteen center of mass energies from 4.288 GeV to 4.951 GeV using the BESIII detector. Using the same data samples, we also perform the first search for the reaction $e^+e^-\toη\tilde{X}(3872)$, but no evidence is found for the $\tilde{X}(3872)$ in the $π^+π^- J/ψ$ mass distribution. At each of the eighteen center of mass en…
▽ More
The energy-dependent cross section for $e^+e^-\to ηψ(2S)$ is measured at eighteen center of mass energies from 4.288 GeV to 4.951 GeV using the BESIII detector. Using the same data samples, we also perform the first search for the reaction $e^+e^-\toη\tilde{X}(3872)$, but no evidence is found for the $\tilde{X}(3872)$ in the $π^+π^- J/ψ$ mass distribution. At each of the eighteen center of mass energies, upper limits at the 90\% confidence level on the cross section for $e^+e^-\toηψ(2S)$ and on the product of the $e^+e^-\toη\tilde{X}(3872)$ cross section with the branching fraction of $\tilde{X}(3872)\toπ^+π^- J/ψ$ are reported.
△ Less
Submitted 25 March, 2024;
originally announced March 2024.
-
Towards Automatic Evaluation for LLMs' Clinical Capabilities: Metric, Data, and Algorithm
Authors:
Lei Liu,
Xiaoyan Yang,
Fangzhou Li,
Chenfei Chi,
Yue Shen,
Shiwei Lyu Ming Zhang,
Xiaowei Ma,
Xiangguo Lyu,
Liya Ma,
Zhiqiang Zhang,
Wei Xue,
Yiran Huang,
**jie Gu
Abstract:
Large language models (LLMs) are gaining increasing interests to improve clinical efficiency for medical diagnosis, owing to their unprecedented performance in modelling natural language. Ensuring the safe and reliable clinical applications, the evaluation of LLMs indeed becomes critical for better mitigating the potential risks, e.g., hallucinations. However, current evaluation methods heavily re…
▽ More
Large language models (LLMs) are gaining increasing interests to improve clinical efficiency for medical diagnosis, owing to their unprecedented performance in modelling natural language. Ensuring the safe and reliable clinical applications, the evaluation of LLMs indeed becomes critical for better mitigating the potential risks, e.g., hallucinations. However, current evaluation methods heavily rely on labor-intensive human participation to achieve human-preferred judgements. To overcome this challenge, we propose an automatic evaluation paradigm tailored to assess the LLMs' capabilities in delivering clinical services, e.g., disease diagnosis and treatment. The evaluation paradigm contains three basic elements: metric, data, and algorithm. Specifically, inspired by professional clinical practice pathways, we formulate a LLM-specific clinical pathway (LCP) to define the clinical capabilities that a doctor agent should possess. Then, Standardized Patients (SPs) from the medical education are introduced as the guideline for collecting medical data for evaluation, which can well ensure the completeness of the evaluation procedure. Leveraging these steps, we develop a multi-agent framework to simulate the interactive environment between SPs and a doctor agent, which is equipped with a Retrieval-Augmented Evaluation (RAE) to determine whether the behaviors of a doctor agent are in accordance with LCP. The above paradigm can be extended to any similar clinical scenarios to automatically evaluate the LLMs' medical capabilities. Applying such paradigm, we construct an evaluation benchmark in the field of urology, including a LCP, a SPs dataset, and an automated RAE. Extensive experiments are conducted to demonstrate the effectiveness of the proposed approach, providing more insights for LLMs' safe and reliable deployments in clinical practice.
△ Less
Submitted 25 March, 2024;
originally announced March 2024.
-
Strongly asymmetric magnetization switching and programmable complete Boolean logic enabled by long-range intralayer Dzyaloshinskii-Moriya interaction
Authors:
Qianbiao Liu,
Long Liu,
Guozhong Xing,
Lijun Zhu
Abstract:
Electrical switching of magnetization is central to spintronics. Despite the enormous efforts on the spin torques and the Dzyaloshinskii-Moriya interaction (DMI) effects, some fundamental physics for electrical switching of magnetization is still missing as indicated by a number of remarkable long-standing puzzles. Here, we report the discovery of the long-range intralayer DMI effect widely existi…
▽ More
Electrical switching of magnetization is central to spintronics. Despite the enormous efforts on the spin torques and the Dzyaloshinskii-Moriya interaction (DMI) effects, some fundamental physics for electrical switching of magnetization is still missing as indicated by a number of remarkable long-standing puzzles. Here, we report the discovery of the long-range intralayer DMI effect widely existing in magnetic heterostructure, which is distinct from the yet-known DMI effects as it describes the chiral coupling of two orthogonal magnetic domains within the same magnetic layer via the mediation of an adjacent heavy metal layer. The long-range intralayer DMI generates a strong perpendicular effective magnetic field (H_DMI^z) on the perpendicular magnetization. Characteristically, H_DMI^z varies with the sign/magnitude of the interfacial DMI constant, the applied in-plane magnetic fields, and the distribution of the perpendicular magnetic anisotropy. The long-range intralayer DMI results in striking consequences including the strongly asymmetric current/field switching of perpendicular magnetization, hysteresis loop shift of perpendicular magnetization in the absence of in-plane direct current, and sharp, complete switching of perpendicular magnetization purely by an in-plane magnetic field. Utilizing the long-range intralayer DMI effect, we demonstrate programable, complete Boolean logic operations (i.e., AND, NAND, NOT, OR, and NOR) within a single spin-orbit torque device. These results will stimulate the investigation of the long-range intralayer DMI effect and its impacts on a variety of spintronic devices.
△ Less
Submitted 24 March, 2024;
originally announced March 2024.
-
RPMArt: Towards Robust Perception and Manipulation for Articulated Objects
Authors:
Junbo Wang,
Wenhai Liu,
Qiaojun Yu,
Yang You,
Liu Liu,
Weiming Wang,
Cewu Lu
Abstract:
Articulated objects are commonly found in daily life. It is essential that robots can exhibit robust perception and manipulation skills for articulated objects in real-world robotic applications. However, existing methods for articulated objects insufficiently address noise in point clouds and struggle to bridge the gap between simulation and reality, thus limiting the practical deployment in real…
▽ More
Articulated objects are commonly found in daily life. It is essential that robots can exhibit robust perception and manipulation skills for articulated objects in real-world robotic applications. However, existing methods for articulated objects insufficiently address noise in point clouds and struggle to bridge the gap between simulation and reality, thus limiting the practical deployment in real-world scenarios. To tackle these challenges, we propose a framework towards Robust Perception and Manipulation for Articulated Objects (RPMArt), which learns to estimate the articulation parameters and manipulate the articulation part from the noisy point cloud. Our primary contribution is a Robust Articulation Network (RoArtNet) that is able to predict both joint parameters and affordable points robustly by local feature learning and point tuple voting. Moreover, we introduce an articulation-aware classification scheme to enhance its ability for sim-to-real transfer. Finally, with the estimated affordable point and articulation joint constraint, the robot can generate robust actions to manipulate articulated objects. After learning only from synthetic data, RPMArt is able to transfer zero-shot to real-world articulated objects. Experimental results confirm our approach's effectiveness, with our framework achieving state-of-the-art performance in both noise-added simulation and real-world environments. The code and data will be open-sourced for reproduction. More results are published on the project website at https://r-pmart.github.io .
△ Less
Submitted 24 March, 2024;
originally announced March 2024.
-
SDSTrack: Self-Distillation Symmetric Adapter Learning for Multi-Modal Visual Object Tracking
Authors:
Xiaojun Hou,
Jiazheng Xing,
Yijie Qian,
Yaowei Guo,
Shuo Xin,
Junhao Chen,
Kai Tang,
Mengmeng Wang,
Zhengkai Jiang,
Liang Liu,
Yong Liu
Abstract:
Multimodal Visual Object Tracking (VOT) has recently gained significant attention due to its robustness. Early research focused on fully fine-tuning RGB-based trackers, which was inefficient and lacked generalized representation due to the scarcity of multimodal data. Therefore, recent studies have utilized prompt tuning to transfer pre-trained RGB-based trackers to multimodal data. However, the m…
▽ More
Multimodal Visual Object Tracking (VOT) has recently gained significant attention due to its robustness. Early research focused on fully fine-tuning RGB-based trackers, which was inefficient and lacked generalized representation due to the scarcity of multimodal data. Therefore, recent studies have utilized prompt tuning to transfer pre-trained RGB-based trackers to multimodal data. However, the modality gap limits pre-trained knowledge recall, and the dominance of the RGB modality persists, preventing the full utilization of information from other modalities. To address these issues, we propose a novel symmetric multimodal tracking framework called SDSTrack. We introduce lightweight adaptation for efficient fine-tuning, which directly transfers the feature extraction ability from RGB to other domains with a small number of trainable parameters and integrates multimodal features in a balanced, symmetric manner. Furthermore, we design a complementary masked patch distillation strategy to enhance the robustness of trackers in complex environments, such as extreme weather, poor imaging, and sensor failure. Extensive experiments demonstrate that SDSTrack outperforms state-of-the-art methods in various multimodal tracking scenarios, including RGB+Depth, RGB+Thermal, and RGB+Event tracking, and exhibits impressive results in extreme conditions. Our source code is available at https://github.com/hoqolo/SDSTrack.
△ Less
Submitted 27 March, 2024; v1 submitted 24 March, 2024;
originally announced March 2024.
-
UAV Deployment Optimization in UAV-assisted Wireless Communications
Authors:
Xueqi Zhang,
Aimin Wang,
Geng Sun,
Lingling Liu,
**g Zhang
Abstract:
Due to the fact that the locations of base stations (BSs) cannot be changed after they are installed, it is very difficult to communicate directly with remote user equipment (UE), which will directly affect the lifespan of the system. Unmanned aerial vehicles (UAVs) offer a hopeful solution as mobile relays for fifth-generation wireless communications due to the flexible and cost-effective deploym…
▽ More
Due to the fact that the locations of base stations (BSs) cannot be changed after they are installed, it is very difficult to communicate directly with remote user equipment (UE), which will directly affect the lifespan of the system. Unmanned aerial vehicles (UAVs) offer a hopeful solution as mobile relays for fifth-generation wireless communications due to the flexible and cost-effective deployment. However, with the limited onboard energy of UAV and slow progress in energy storage technology, it is a key challenge to achieve the energy-efficient communication. Therefore, in this article, we study a wireless communication network using a UAV as a high-altitude relay, and formulate a UAV relay deployment optimization problem (URDOP) to minimize the energy consumption of system by optimizing the deployment of UAV, including the locations and number of UAV hover points. Since the formulated URDOP is a mixed-integer programming problem, it presents a significant challenge for conventional gradient-based approaches. To this end, we propose a self-adaptive differential evolution with a variable population size (SaDEVPS) algorithm to solve the formulated URDOP. The performance of proposed SaDEVPS is verified through simulations, and the results show that it can successfully decrease the energy consumption of system when compared to other benchmark algorithms across multiple instances.
△ Less
Submitted 3 March, 2024;
originally announced March 2024.
-
Secure and Energy-efficient Unmanned Aerial Vehicle-enabled Visible Light Communication via A Multi-objective Optimization Approach
Authors:
Lingling Liu,
Aimin Wang,
**g Wu,
Jiao Lu,
Jiahui Li,
Geng Sun
Abstract:
In this research, a unique approach to provide communication service for terrestrial receivers via using unmanned aerial vehicle-enabled visible light communication is investigated. Specifically, we take into account a unmanned aerial vehicle-enabled visible light communication scenario with multiplex transmitters, multiplex receivers, and a single eavesdropper, each of which is equipped with a si…
▽ More
In this research, a unique approach to provide communication service for terrestrial receivers via using unmanned aerial vehicle-enabled visible light communication is investigated. Specifically, we take into account a unmanned aerial vehicle-enabled visible light communication scenario with multiplex transmitters, multiplex receivers, and a single eavesdropper, each of which is equipped with a single photodetector. Then, a unmanned aerial vehicle deployment multi-objective optimization problem is formulated to simultaneously make the optical power received by receiving surface more uniform, minimize the amount of information collected by a eavesdropper, and minimize the energy consumption of unmanned aerial vehicles, while the locations and transmission power of unmanned aerial vehicles are simultaneously optimized under certain constraints. Since the formulated unmanned aerial vehicle deployment multi-objective optimization problem is complex and nonlinear, it is challenging to be tackled by using conventional methods. For the purpose of solving the problem, a multi-objective evolutionary algorithm based on decomposition with chaos initiation and crossover mutation is proposed. Simulation outcomes show that the proposed approach is superior to other approaches, and is efficient at improving the security and energy efficiency of visible light communication system.
△ Less
Submitted 3 March, 2024;
originally announced March 2024.
-
Bioinformatics and Biomedical Informatics with ChatGPT: Year One Review
Authors:
**ge Wang,
Zien Cheng,
Qiuming Yao,
Li Liu,
Dong Xu,
Gangqing Hu
Abstract:
The year 2023 marked a significant surge in the exploration of applying large language model (LLM) chatbots, notably ChatGPT, across various disciplines. We surveyed the applications of ChatGPT in bioinformatics and biomedical informatics throughout the year, covering omics, genetics, biomedical text mining, drug discovery, biomedical image understanding, bioinformatics programming, and bioinforma…
▽ More
The year 2023 marked a significant surge in the exploration of applying large language model (LLM) chatbots, notably ChatGPT, across various disciplines. We surveyed the applications of ChatGPT in bioinformatics and biomedical informatics throughout the year, covering omics, genetics, biomedical text mining, drug discovery, biomedical image understanding, bioinformatics programming, and bioinformatics education. Our survey delineates the current strengths and limitations of this chatbot in bioinformatics and offers insights into potential avenues for future developments.
△ Less
Submitted 12 June, 2024; v1 submitted 22 March, 2024;
originally announced March 2024.
-
Precise measurement of the $e^+e^-\to D_s^+D_s^-$ cross sections at center-of-mass energies from threshold to 4.95 GeV
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (638 additional authors not shown)
Abstract:
Using the $e^+e^-$ collision data collected with the BESIII detector operating at the BEPCII collider, at center-of-mass energies from the threshold to $4.95$~GeV, we present precise measurements of the cross sections for the process $e^+e^-\to D_s^+D_s^-$ using a single tag method. The resulting cross section lineshape exhibits several new structures, thereby offering an input for coupled channel…
▽ More
Using the $e^+e^-$ collision data collected with the BESIII detector operating at the BEPCII collider, at center-of-mass energies from the threshold to $4.95$~GeV, we present precise measurements of the cross sections for the process $e^+e^-\to D_s^+D_s^-$ using a single tag method. The resulting cross section lineshape exhibits several new structures, thereby offering an input for coupled channel analysis and model tests, which are critical to understand vector charmonium-like states with masses between 4 and 5~GeV.
△ Less
Submitted 22 March, 2024;
originally announced March 2024.
-
Boundary-Aware Value Function Generation for Safe Stochastic Motion Planning
Authors:
Junhong Xu,
Kai Yin,
Jason M. Gregory,
Kris Hauser,
Lantao Liu
Abstract:
Navigation safety is critical for many autonomous systems such as self-driving vehicles in an urban environment. It requires an explicit consideration of boundary constraints that describe the borders of any infeasible, non-navigable, or unsafe regions. We propose a principled boundary-aware safe stochastic planning framework with promising results. Our method generates a value function that can s…
▽ More
Navigation safety is critical for many autonomous systems such as self-driving vehicles in an urban environment. It requires an explicit consideration of boundary constraints that describe the borders of any infeasible, non-navigable, or unsafe regions. We propose a principled boundary-aware safe stochastic planning framework with promising results. Our method generates a value function that can strictly distinguish the state values between free (safe) and non-navigable (boundary) spaces in the continuous state, naturally leading to a safe boundary-aware policy. At the core of our solution lies a seamless integration of finite elements and kernel-based functions, where the finite elements allow us to characterize safety-critical states' borders accurately, and the kernel-based function speeds up computation for the non-safety-critical states. The proposed method was evaluated through extensive simulations and demonstrated safe navigation behaviors in mobile navigation tasks. Additionally, we demonstrate that our approach can maneuver safely and efficiently in cluttered real-world environments using a ground vehicle with strong external disturbances, such as navigating on a slippery floor and against external human intervention.
△ Less
Submitted 22 March, 2024;
originally announced March 2024.
-
Orbital-doublet-driven even spin Chern insulators
Authors:
Lu Liu,
Yuntian Liu,
Jiayu Li,
Hua Wu,
Qihang Liu
Abstract:
Quantum spin Hall insulators hosting edge spin currents hold great potential for low-power spintronic devices. In this work, we present a universal approach to achieve a high and near-quantized spin Hall conductance plateau within a sizable bulk gap. Using a nonmagnetic four-band model Hamiltonian, we demonstrate that an even spin Chern (ESC) insulator can be accessed by tuning the sign of spin-or…
▽ More
Quantum spin Hall insulators hosting edge spin currents hold great potential for low-power spintronic devices. In this work, we present a universal approach to achieve a high and near-quantized spin Hall conductance plateau within a sizable bulk gap. Using a nonmagnetic four-band model Hamiltonian, we demonstrate that an even spin Chern (ESC) insulator can be accessed by tuning the sign of spin-orbit coupling (SOC) within a crystal symmetry-enforced orbital doublet. With the assistance of a high spin Chern number of $C_{S}=2$ and spin $U$(1) quasi-symmetry, this orbital-doublet-driven ESC phase is endowed with the near-double-quantized spin Hall conductance. We identify 12 crystallographic point groups supporting such a sign-tunable SOC. Furthermore, we apply our theory to realistic examples, and show the phase transition from a trivial insulator governed by positive SOC in RuI$_{3}$ monolayer to an ESC insulator dominated by negative SOC in RuBr$_{3}$ monolayer. This orbital-doublet-driven ESC insulator, RuBr$_{3}$, showcases nontrivial characteristics including helical edge states, near-double-quantized spin Hall conductance, and robust corner states. Our work provides new pathways in the pursuit of the long-sought quantum spin Hall insulators.
△ Less
Submitted 21 March, 2024;
originally announced March 2024.
-
Efficient Model Learning and Adaptive Tracking Control of Magnetic Micro-Robots for Non-Contact Manipulation
Authors:
Yongyi Jia,
Shu Miao,
Junjian Zhou,
Niandong Jiao,
Lianqing Liu,
Xiang Li
Abstract:
Magnetic microrobots can be navigated by an external magnetic field to autonomously move within living organisms with complex and unstructured environments. Potential applications include drug delivery, diagnostics, and therapeutic interventions. Existing techniques commonly impart magnetic properties to the target object,or drive the robot to contact and then manipulate the object, both probably…
▽ More
Magnetic microrobots can be navigated by an external magnetic field to autonomously move within living organisms with complex and unstructured environments. Potential applications include drug delivery, diagnostics, and therapeutic interventions. Existing techniques commonly impart magnetic properties to the target object,or drive the robot to contact and then manipulate the object, both probably inducing physical damage. This paper considers a non-contact formulation, where the robot spins to generate a repulsive field to push the object without physical contact. Under such a formulation, the main challenge is that the motion model between the input of the magnetic field and the output velocity of the target object is commonly unknown and difficult to analyze. To deal with it, this paper proposes a data-driven-based solution. A neural network is constructed to efficiently estimate the motion model. Then, an approximate model-based optimal control scheme is developed to push the object to track a time-varying trajectory, maintaining the non-contact with distance constraints. Furthermore, a straightforward planner is introduced to assess the adaptability of non-contact manipulation in a cluttered unstructured environment. Experimental results are presented to show the tracking and navigation performance of the proposed scheme.
△ Less
Submitted 21 March, 2024;
originally announced March 2024.
-
SurroundSDF: Implicit 3D Scene Understanding Based on Signed Distance Field
Authors:
Lizhe Liu,
Bohua Wang,
Hongwei Xie,
Daqi Liu,
Li Liu,
Zhiqiang Tian,
Kuiyuan Yang,
Bing Wang
Abstract:
Vision-centric 3D environment understanding is both vital and challenging for autonomous driving systems. Recently, object-free methods have attracted considerable attention. Such methods perceive the world by predicting the semantics of discrete voxel grids but fail to construct continuous and accurate obstacle surfaces. To this end, in this paper, we propose SurroundSDF to implicitly predict the…
▽ More
Vision-centric 3D environment understanding is both vital and challenging for autonomous driving systems. Recently, object-free methods have attracted considerable attention. Such methods perceive the world by predicting the semantics of discrete voxel grids but fail to construct continuous and accurate obstacle surfaces. To this end, in this paper, we propose SurroundSDF to implicitly predict the signed distance field (SDF) and semantic field for the continuous perception from surround images. Specifically, we introduce a query-based approach and utilize SDF constrained by the Eikonal formulation to accurately describe the surfaces of obstacles. Furthermore, considering the absence of precise SDF ground truth, we propose a novel weakly supervised paradigm for SDF, referred to as the Sandwich Eikonal formulation, which emphasizes applying correct and dense constraints on both sides of the surface, thereby enhancing the perceptual accuracy of the surface. Experiments suggest that our method achieves SOTA for both occupancy prediction and 3D scene reconstruction tasks on the nuScenes dataset.
△ Less
Submitted 21 March, 2024;
originally announced March 2024.
-
CoMo: Controllable Motion Generation through Language Guided Pose Code Editing
Authors:
Yiming Huang,
Weilin Wan,
Yue Yang,
Chris Callison-Burch,
Mark Yatskar,
Lingjie Liu
Abstract:
Text-to-motion models excel at efficient human motion generation, but existing approaches lack fine-grained controllability over the generation process. Consequently, modifying subtle postures within a motion or inserting new actions at specific moments remains a challenge, limiting the applicability of these methods in diverse scenarios. In light of these challenges, we introduce CoMo, a Controll…
▽ More
Text-to-motion models excel at efficient human motion generation, but existing approaches lack fine-grained controllability over the generation process. Consequently, modifying subtle postures within a motion or inserting new actions at specific moments remains a challenge, limiting the applicability of these methods in diverse scenarios. In light of these challenges, we introduce CoMo, a Controllable Motion generation model, adept at accurately generating and editing motions by leveraging the knowledge priors of large language models (LLMs). Specifically, CoMo decomposes motions into discrete and semantically meaningful pose codes, with each code encapsulating the semantics of a body part, representing elementary information such as "left knee slightly bent". Given textual inputs, CoMo autoregressively generates sequences of pose codes, which are then decoded into 3D motions. Leveraging pose codes as interpretable representations, an LLM can directly intervene in motion editing by adjusting the pose codes according to editing instructions. Experiments demonstrate that CoMo achieves competitive performance in motion generation compared to state-of-the-art models while, in human studies, CoMo substantially surpasses previous work in motion editing abilities.
△ Less
Submitted 20 March, 2024;
originally announced March 2024.
-
DG singular equivalence and singular locus
Authors:
Leilei Liu,
Jieheng Zeng
Abstract:
For a commutative Gorenstein Noetherian ring $R$, we construct an affine scheme $X$ solely from DG singularity category $S_{dg}(R)$ of $R$ such that there is a finite surjective morphism $X \rightarrow \mathrm{Spec}(R /I)$, where $\mathrm{Spec}(R /I)$ is the singular locus in $\mathrm{Spec}(R)$. As an application, for two such rings with equivalent DG singularity categories, we prove that the sing…
▽ More
For a commutative Gorenstein Noetherian ring $R$, we construct an affine scheme $X$ solely from DG singularity category $S_{dg}(R)$ of $R$ such that there is a finite surjective morphism $X \rightarrow \mathrm{Spec}(R /I)$, where $\mathrm{Spec}(R /I)$ is the singular locus in $\mathrm{Spec}(R)$. As an application, for two such rings with equivalent DG singularity categories, we prove that the singular loci in their affine schemes have the same dimension.
△ Less
Submitted 31 March, 2024; v1 submitted 20 March, 2024;
originally announced March 2024.
-
Search for $ΔS=2$ nonleptonic hyperon decays $Ω^-\toΣ^{0}π^{-}$ and $Ω^-\to nK^{-}$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (638 additional authors not shown)
Abstract:
Using $(27.12 \pm 0.14) \times 10^{8}$ $ψ(3686)$ events collected by the BESIII detector at the center-of-mass energy of $\sqrt{s} = 3.686$ GeV, we search for the first time for two nonleptonic hyperon decays that change strangeness by two units, $Ω^-\toΣ^{0}π^-$ and $Ω^-\to nK^{-}$. No significant signal is observed. The upper limits on their decay branching fractions are determined to be…
▽ More
Using $(27.12 \pm 0.14) \times 10^{8}$ $ψ(3686)$ events collected by the BESIII detector at the center-of-mass energy of $\sqrt{s} = 3.686$ GeV, we search for the first time for two nonleptonic hyperon decays that change strangeness by two units, $Ω^-\toΣ^{0}π^-$ and $Ω^-\to nK^{-}$. No significant signal is observed. The upper limits on their decay branching fractions are determined to be $\mathcal{B}(Ω^-\toΣ^{0}π^-) < 5.4\times 10^{-4}$ and $\mathcal{B}(Ω^-\to nK^{-}) < 2.4\times 10^{-4}$ at the $90\%$ confidence level.
△ Less
Submitted 14 April, 2024; v1 submitted 20 March, 2024;
originally announced March 2024.
-
MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining
Authors:
Di Wang,
**g Zhang,
Minqiang Xu,
Lin Liu,
Dongsheng Wang,
Erzhong Gao,
Chengxi Han,
Haonan Guo,
Bo Du,
Dacheng Tao,
Liangpei Zhang
Abstract:
Foundation models have reshaped the landscape of Remote Sensing (RS) by enhancing various image interpretation tasks. Pretraining is an active research topic, encompassing supervised and self-supervised learning methods to initialize model weights effectively. However, transferring the pretrained models to downstream tasks may encounter task discrepancy due to their formulation of pretraining as i…
▽ More
Foundation models have reshaped the landscape of Remote Sensing (RS) by enhancing various image interpretation tasks. Pretraining is an active research topic, encompassing supervised and self-supervised learning methods to initialize model weights effectively. However, transferring the pretrained models to downstream tasks may encounter task discrepancy due to their formulation of pretraining as image classification or object discrimination tasks. In this study, we explore the Multi-Task Pretraining (MTP) paradigm for RS foundation models to address this issue. Using a shared encoder and task-specific decoder architecture, we conduct multi-task supervised pretraining on the SAMRS dataset, encompassing semantic segmentation, instance segmentation, and rotated object detection. MTP supports both convolutional neural networks and vision transformer foundation models with over 300 million parameters. The pretrained models are finetuned on various RS downstream tasks, such as scene classification, horizontal and rotated object detection, semantic segmentation, and change detection. Extensive experiments across 14 datasets demonstrate the superiority of our models over existing ones of similar size and their competitive performance compared to larger state-of-the-art models, thus validating the effectiveness of MTP.
△ Less
Submitted 29 May, 2024; v1 submitted 20 March, 2024;
originally announced March 2024.
-
ManiPose: A Comprehensive Benchmark for Pose-aware Object Manipulation in Robotics
Authors:
Qiaojun Yu,
Ce Hao,
Junbo Wang,
Wenhai Liu,
Liu Liu,
Yao Mu,
Yang You,
Hengxu Yan,
Cewu Lu
Abstract:
Robotic manipulation in everyday scenarios, especially in unstructured environments, requires skills in pose-aware object manipulation (POM), which adapts robots' gras** and handling according to an object's 6D pose. Recognizing an object's position and orientation is crucial for effective manipulation. For example, if a mug is lying on its side, it's more effective to grasp it by the rim rather…
▽ More
Robotic manipulation in everyday scenarios, especially in unstructured environments, requires skills in pose-aware object manipulation (POM), which adapts robots' gras** and handling according to an object's 6D pose. Recognizing an object's position and orientation is crucial for effective manipulation. For example, if a mug is lying on its side, it's more effective to grasp it by the rim rather than the handle. Despite its importance, research in POM skills remains limited, because learning manipulation skills requires pose-varying simulation environments and datasets. This paper introduces ManiPose, a pioneering benchmark designed to advance the study of pose-varying manipulation tasks. ManiPose encompasses: 1) Simulation environments for POM feature tasks ranging from 6D pose-specific pick-and-place of single objects to cluttered scenes, further including interactions with articulated objects. 2) A comprehensive dataset featuring geometrically consistent and manipulation-oriented 6D pose labels for 2936 real-world scanned rigid objects and 100 articulated objects across 59 categories. 3) A baseline for POM, leveraging the inferencing abilities of LLM (e.g., ChatGPT) to analyze the relationship between 6D pose and task-specific requirements, offers enhanced pose-aware grasp prediction and motion planning capabilities. Our benchmark demonstrates notable advancements in pose estimation, pose-aware manipulation, and real-robot skill transfer, setting new standards for POM research. We will open-source the ManiPose benchmark with the final version paper, inviting the community to engage with our resources, available at our website:https://sites.google.com/view/manipose.
△ Less
Submitted 20 March, 2024;
originally announced March 2024.
-
Multi-objective Optimization for Data Collection in UAV-assisted Agricultural IoT
Authors:
Lingling Liu,
Aimin Wang,
Geng Sun,
Jiahui Li,
Hongyang Pan,
Tony Q. S. Quek
Abstract:
The ground fixed base stations (BSs) are often deployed inflexibly, and have high overheads, as well as are susceptible to the damage from natural disasters, making it impractical for them to continuously collect data from sensor devices. To improve the network coverage and performance of wireless communication, unmanned aerial vehicles (UAVs) have been introduced in diverse wireless networks, the…
▽ More
The ground fixed base stations (BSs) are often deployed inflexibly, and have high overheads, as well as are susceptible to the damage from natural disasters, making it impractical for them to continuously collect data from sensor devices. To improve the network coverage and performance of wireless communication, unmanned aerial vehicles (UAVs) have been introduced in diverse wireless networks, therefore in this work we consider employing a UAV as an aerial BS to acquire data of agricultural Internet of Things (IoT) devices. To this end, we first formulate a UAV-assisted data collection multi-objective optimization problem (UDCMOP) to efficiently collect the data from agricultural sensing devices. Specifically, we aim to collaboratively optimize the hovering positions of UAV, visit sequence of UAV, speed of UAV, in addition to the transmit power of devices, to simultaneously achieve the maximization of minimum transmit rate of devices, the minimization of total energy consumption of devices, and the minimization of total energy consumption of UAV. Second, the proposed UDCMOP is a non-convex mixed integer nonlinear optimization problem, which indicates that it includes continuous and discrete solutions, making it intractable to be solved. Therefore, we solve it by proposing an improved multi-objective artificial hummingbird algorithm (IMOAHA) with several specific improvement factors, that are the hybrid initialization operator, Cauchy mutation foraging operator, in addition to the discrete mutation operator. Finally, simulations are carried out to testify that the proposed IMOAHA can effectively improve the system performance comparing to other benchmarks.
△ Less
Submitted 3 March, 2024;
originally announced March 2024.
-
Learning Neural Volumetric Pose Features for Camera Localization
Authors:
**gyu Lin,
Jiaqi Gu,
Bojian Wu,
Lubin Fan,
Renjie Chen,
Ligang Liu,
Jie** Ye
Abstract:
We introduce a novel neural volumetric pose feature, termed PoseMap, designed to enhance camera localization by encapsulating the information between images and the associated camera poses. Our framework leverages an Absolute Pose Regression (APR) architecture, together with an augmented NeRF module. This integration not only facilitates the generation of novel views to enrich the training dataset…
▽ More
We introduce a novel neural volumetric pose feature, termed PoseMap, designed to enhance camera localization by encapsulating the information between images and the associated camera poses. Our framework leverages an Absolute Pose Regression (APR) architecture, together with an augmented NeRF module. This integration not only facilitates the generation of novel views to enrich the training dataset but also enables the learning of effective pose features. Additionally, we extend our architecture for self-supervised online alignment, allowing our method to be used and fine-tuned for unlabelled images within a unified framework. Experiments demonstrate that our method achieves 14.28% and 20.51% performance gain on average in indoor and outdoor benchmark scenes, outperforming existing APR methods with state-of-the-art accuracy.
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
GraphBEV: Towards Robust BEV Feature Alignment for Multi-Modal 3D Object Detection
Authors:
Ziying Song,
Lei Yang,
Shaoqing Xu,
Lin Liu,
Dongyang Xu,
Caiyan Jia,
Feiyang Jia,
Li Wang
Abstract:
Integrating LiDAR and camera information into Bird's-Eye-View (BEV) representation has emerged as a crucial aspect of 3D object detection in autonomous driving. However, existing methods are susceptible to the inaccurate calibration relationship between LiDAR and the camera sensor. Such inaccuracies result in errors in depth estimation for the camera branch, ultimately causing misalignment between…
▽ More
Integrating LiDAR and camera information into Bird's-Eye-View (BEV) representation has emerged as a crucial aspect of 3D object detection in autonomous driving. However, existing methods are susceptible to the inaccurate calibration relationship between LiDAR and the camera sensor. Such inaccuracies result in errors in depth estimation for the camera branch, ultimately causing misalignment between LiDAR and camera BEV features. In this work, we propose a robust fusion framework called Graph BEV. Addressing errors caused by inaccurate point cloud projection, we introduce a Local Align module that employs neighbor-aware depth features via Graph matching. Additionally, we propose a Global Align module to rectify the misalignment between LiDAR and camera BEV features. Our Graph BEV framework achieves state-of-the-art performance, with an mAP of 70.1\%, surpassing BEV Fusion by 1.6\% on the nuscenes validation set. Importantly, our Graph BEV outperforms BEV Fusion by 8.3\% under conditions with misalignment noise.
△ Less
Submitted 10 April, 2024; v1 submitted 18 March, 2024;
originally announced March 2024.
-
LSKNet: A Foundation Lightweight Backbone for Remote Sensing
Authors:
Yuxuan Li,
Xiang Li,
Yimian Dai,
Qibin Hou,
Li Liu,
Yongxiang Liu,
Ming-Ming Cheng,
Jian Yang
Abstract:
Remote sensing images pose distinct challenges for downstream tasks due to their inherent complexity. While a considerable amount of research has been dedicated to remote sensing classification, object detection and semantic segmentation, most of these studies have overlooked the valuable prior knowledge embedded within remote sensing scenarios. Such prior knowledge can be useful because remote se…
▽ More
Remote sensing images pose distinct challenges for downstream tasks due to their inherent complexity. While a considerable amount of research has been dedicated to remote sensing classification, object detection and semantic segmentation, most of these studies have overlooked the valuable prior knowledge embedded within remote sensing scenarios. Such prior knowledge can be useful because remote sensing objects may be mistakenly recognized without referencing a sufficiently long-range context, which can vary for different objects. This paper considers these priors and proposes a lightweight Large Selective Kernel Network (LSKNet) backbone. LSKNet can dynamically adjust its large spatial receptive field to better model the ranging context of various objects in remote sensing scenarios. To our knowledge, large and selective kernel mechanisms have not been previously explored in remote sensing images. Without bells and whistles, our lightweight LSKNet sets new state-of-the-art scores on standard remote sensing classification, object detection and semantic segmentation benchmarks. Our comprehensive analysis further validated the significance of the identified priors and the effectiveness of LSKNet. The code is available at https://github.com/zcablii/LSKNet.
△ Less
Submitted 23 June, 2024; v1 submitted 18 March, 2024;
originally announced March 2024.
-
Learning Unified Reference Representation for Unsupervised Multi-class Anomaly Detection
Authors:
Liren He,
Zhengkai Jiang,
**long Peng,
Liang Liu,
Qiangang Du,
Xiaobin Hu,
Wenbing Zhu,
Mingmin Chi,
Yabiao Wang,
Chengjie Wang
Abstract:
In the field of multi-class anomaly detection, reconstruction-based methods derived from single-class anomaly detection face the well-known challenge of ``learning shortcuts'', wherein the model fails to learn the patterns of normal samples as it should, opting instead for shortcuts such as identity map** or artificial noise elimination. Consequently, the model becomes unable to reconstruct genu…
▽ More
In the field of multi-class anomaly detection, reconstruction-based methods derived from single-class anomaly detection face the well-known challenge of ``learning shortcuts'', wherein the model fails to learn the patterns of normal samples as it should, opting instead for shortcuts such as identity map** or artificial noise elimination. Consequently, the model becomes unable to reconstruct genuine anomalies as normal instances, resulting in a failure of anomaly detection. To counter this issue, we present a novel unified feature reconstruction-based anomaly detection framework termed RLR (Reconstruct features from a Learnable Reference representation). Unlike previous methods, RLR utilizes learnable reference representations to compel the model to learn normal feature patterns explicitly, thereby prevents the model from succumbing to the ``learning shortcuts'' issue. Additionally, RLR incorporates locality constraints into the learnable reference to facilitate more effective normal pattern capture and utilizes a masked learnable key attention mechanism to enhance robustness. Evaluation of RLR on the 15-category MVTec-AD dataset and the 12-category VisA dataset shows superior performance compared to state-of-the-art methods under the unified setting. The code of RLR will be publicly available.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
Open-World Semi-Supervised Learning for Node Classification
Authors:
Yanling Wang,
**g Zhang,
Lingxi Zhang,
Lixin Liu,
Yuxiao Dong,
Cui** Li,
Hong Chen,
Hongzhi Yin
Abstract:
Open-world semi-supervised learning (Open-world SSL) for node classification, that classifies unlabeled nodes into seen classes or multiple novel classes, is a practical but under-explored problem in the graph community. As only seen classes have human labels, they are usually better learned than novel classes, and thus exhibit smaller intra-class variances within the embedding space (named as imb…
▽ More
Open-world semi-supervised learning (Open-world SSL) for node classification, that classifies unlabeled nodes into seen classes or multiple novel classes, is a practical but under-explored problem in the graph community. As only seen classes have human labels, they are usually better learned than novel classes, and thus exhibit smaller intra-class variances within the embedding space (named as imbalance of intra-class variances between seen and novel classes). Based on empirical and theoretical analysis, we find the variance imbalance can negatively impact the model performance. Pre-trained feature encoders can alleviate this issue via producing compact representations for novel classes. However, creating general pre-trained encoders for various types of graph data has been proven to be challenging. As such, there is a demand for an effective method that does not rely on pre-trained graph encoders. In this paper, we propose an IMbalance-Aware method named OpenIMA for Open-world semi-supervised node classification, which trains the node classification model from scratch via contrastive learning with bias-reduced pseudo labels. Extensive experiments on seven popular graph benchmarks demonstrate the effectiveness of OpenIMA, and the source code has been available on GitHub.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
BAGS: Building Animatable Gaussian Splatting from a Monocular Video with Diffusion Priors
Authors:
Tingyang Zhang,
Qingzhe Gao,
Weiyu Li,
Libin Liu,
Baoquan Chen
Abstract:
Animatable 3D reconstruction has significant applications across various fields, primarily relying on artists' handcraft creation. Recently, some studies have successfully constructed animatable 3D models from monocular videos. However, these approaches require sufficient view coverage of the object within the input video and typically necessitate significant time and computational costs for train…
▽ More
Animatable 3D reconstruction has significant applications across various fields, primarily relying on artists' handcraft creation. Recently, some studies have successfully constructed animatable 3D models from monocular videos. However, these approaches require sufficient view coverage of the object within the input video and typically necessitate significant time and computational costs for training and rendering. This limitation restricts the practical applications. In this work, we propose a method to build animatable 3D Gaussian Splatting from monocular video with diffusion priors. The 3D Gaussian representations significantly accelerate the training and rendering process, and the diffusion priors allow the method to learn 3D models with limited viewpoints. We also present the rigid regularization to enhance the utilization of the priors. We perform an extensive evaluation across various real-world videos, demonstrating its superior performance compared to the current state-of-the-art methods.
△ Less
Submitted 17 March, 2024;
originally announced March 2024.
-
High Performance Graphene Integrated Photonics Platform Enabled by Gold-assisted Transfer
Authors:
Xiaoxuan Wu,
Zhengyi Cao,
Tianxiang Zhao,
Yun Wu,
Zhonghui Li,
Spyros Doukas,
Elefterios Lidorikis,
Yu Xue,
Liu Liu,
Omid Ghaebi,
Giancarlo Soavi,
Junpeng Lv,
Zhenghua Ni,
Junjia Wang
Abstract:
Graphene is promising for nanoscale, efficient, ultra-fast photo- and opto-electronic devices because of its remarkable electrical and optical properties, such as fast electron relaxation and heat dissipation. Here, we realize high-performance graphene integrated photonics platform enabled by gold-assisted transfer. Thanks to our optimized transfer technique, we fabricate and demonstrate (1) a mic…
▽ More
Graphene is promising for nanoscale, efficient, ultra-fast photo- and opto-electronic devices because of its remarkable electrical and optical properties, such as fast electron relaxation and heat dissipation. Here, we realize high-performance graphene integrated photonics platform enabled by gold-assisted transfer. Thanks to our optimized transfer technique, we fabricate and demonstrate (1) a microscale thermo-optic modulator with a tuning efficiency of 0.037 nm/mW and high heating performance of 67.4 K$μm^{3}mW^{-1}$ on a small active area of 7.54 $μm^{2}$ and (2) a graphene electro-absorption modulator featuring an high modulation bandwidth up to 26.8 GHz and a high-speed data rate reaching 48 Gb/s, and (3) a graphene Mach-Zehnder interferometer modulator with a high normalized modulation efficiency of 0.027 dBV$^{-1}μm^{-1}$. Our graphene integrated photonics platform has far superior performances compared to state of the art in terms of efficiency, low process complexity, and compact device footage. Thus, our approach and results provide the background for the realization of high-performance integrated photonic circuits with CMOS compatibility.
△ Less
Submitted 17 March, 2024;
originally announced March 2024.
-
Controllable Relation Disentanglement for Few-Shot Class-Incremental Learning
Authors:
Yuan Zhou,
Richang Hong,
Yanrong Guo,
Lin Liu,
Shijie Hao,
Hanwang Zhang
Abstract:
In this paper, we propose to tackle Few-Shot Class-Incremental Learning (FSCIL) from a new perspective, i.e., relation disentanglement, which means enhancing FSCIL via disentangling spurious relation between categories. The challenge of disentangling spurious correlations lies in the poor controllability of FSCIL. On one hand, an FSCIL model is required to be trained in an incremental manner and t…
▽ More
In this paper, we propose to tackle Few-Shot Class-Incremental Learning (FSCIL) from a new perspective, i.e., relation disentanglement, which means enhancing FSCIL via disentangling spurious relation between categories. The challenge of disentangling spurious correlations lies in the poor controllability of FSCIL. On one hand, an FSCIL model is required to be trained in an incremental manner and thus it is very hard to directly control relationships between categories of different sessions. On the other hand, training samples per novel category are only in the few-shot setting, which increases the difficulty of alleviating spurious relation issues as well. To overcome this challenge, in this paper, we propose a new simple-yet-effective method, called ConTrollable Relation-disentangLed Few-Shot Class-Incremental Learning (CTRL-FSCIL). Specifically, during the base session, we propose to anchor base category embeddings in feature space and construct disentanglement proxies to bridge gaps between the learning for category representations in different sessions, thereby making category relation controllable. During incremental learning, the parameters of the backbone network are frozen in order to relieve the negative impact of data scarcity. Moreover, a disentanglement loss is designed to effectively guide a relation disentanglement controller to disentangle spurious correlations between the embeddings encoded by the backbone. In this way, the spurious correlation issue in FSCIL can be suppressed. Extensive experiments on CIFAR-100, mini-ImageNet, and CUB-200 datasets demonstrate the effectiveness of our CTRL-FSCIL method.
△ Less
Submitted 16 March, 2024;
originally announced March 2024.
-
Test of lepton universality and measurement of the form factors of $D^0\to K^{*}(892)^-μ^+ν_μ$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (637 additional authors not shown)
Abstract:
We report a first study of the semileptonic decay $D^0\rightarrow K^-π^0μ^{+}ν_μ$ by analyzing an $e^+e^-$ annihilation data sample of $7.9~\mathrm{fb}^{-1}$ collected at the center-of-mass energy of 3.773 GeV with the BESIII detector. The absolute branching fraction of $D^0\to K^-π^0μ^{+}ν_μ$ is measured for the first time to be $(0.729 \pm 0.014_{\rm stat} \pm 0.011_{\rm syst})\%$. Based on an a…
▽ More
We report a first study of the semileptonic decay $D^0\rightarrow K^-π^0μ^{+}ν_μ$ by analyzing an $e^+e^-$ annihilation data sample of $7.9~\mathrm{fb}^{-1}$ collected at the center-of-mass energy of 3.773 GeV with the BESIII detector. The absolute branching fraction of $D^0\to K^-π^0μ^{+}ν_μ$ is measured for the first time to be $(0.729 \pm 0.014_{\rm stat} \pm 0.011_{\rm syst})\%$. Based on an amplitude analysis, the $S\text{-}{\rm wave}$ contribution is determined to be $(5.76 \pm 0.35_{\rm stat} \pm 0.29_{\rm syst})\%$ of the total decay rate in addition to the dominated $K^{*}(892)^-$ component. The branching fraction of $D^0\to K^{*}(892)^-μ^+ν_μ$ is given to be $(2.062 \pm 0.039_{\rm stat} \pm 0.032_{\rm syst})\%$, which improves the precision of the world average by a factor of 5. Combining with the world average of ${\mathcal B}(D^0\to K^{*}(892)^-e^+ν_e)$, the ratio of the branching fractions obtained is $\frac{{\mathcal B}(D^0\to K^{*}(892)^-μ^+ν_μ)}{{\mathcal B}(D^0\to K^{*}(892)^-e^+ν_e)} = 0.96\pm0.08$, in agreement with lepton flavor universality. Furthermore, assuming single-pole dominance parameterization, the most precise hadronic form factor ratios for $D^0\to K^{*}(892)^{-} μ^+ν_μ$ are extracted to be $r_{V}=V(0)/A_1(0)=1.37 \pm 0.09_{\rm stat} \pm 0.03_{\rm syst}$ and $r_{2}=A_2(0)/A_1(0)=0.76 \pm 0.06_{\rm stat} \pm 0.02_{\rm syst}$.
△ Less
Submitted 16 March, 2024;
originally announced March 2024.
-
Frequency-Reactive Power Optimization Strategy of Grid-forming Offshore Wind Farm Using DRU-HVDC Transmission
Authors:
Zhekai Li,
Kun Han,
Xu Cai,
Renxin Yang,
Haotian Yu,
Kepeng Xia,
Lulu Liu
Abstract:
The diode rectifier unit-based high voltage direct current (DRU-HVDC) transmission with grid-forming (GFM) wind turbine is becoming a promising scheme for offshore wind farm(OWF) integration due to its high reliability and low cost. In this scheme, the AC network of the OWF and the DRU has completely different synchronization mechanisms and power flow characteristics from the traditional power sys…
▽ More
The diode rectifier unit-based high voltage direct current (DRU-HVDC) transmission with grid-forming (GFM) wind turbine is becoming a promising scheme for offshore wind farm(OWF) integration due to its high reliability and low cost. In this scheme, the AC network of the OWF and the DRU has completely different synchronization mechanisms and power flow characteristics from the traditional power system. To optimize the power flow and reduce the net loss, this paper carries out the power flow modeling and optimization analysis for the DRU-HVDC transmission system with grid-forming OWFs. The influence of the DRU and the GFM wind turbines on the power flow of the system is analyzed. On this basis, improved constraint conditions are proposed and an optimal power flow (OPF) method is established. This method can minimize the power loss by adjusting the reactive power output of each wind turbine and internal network frequency. Finally, based on MATLAB, this paper uses YALMIP toolkit and CPLEX mathematical solver to realize the programming solution of the OPF model proposed in this paper. The results show that the proposed optimization strategy can effectively reduce the power loss of the entire OWF and the transmission system with an optimization ratio of network losses exceeding 25.3%.
△ Less
Submitted 16 March, 2024;
originally announced March 2024.
-
MoPE: Parameter-Efficient and Scalable Multimodal Fusion via Mixture of Prompt Experts
Authors:
Ruixiang Jiang,
Lingbo Liu,
Changwen Chen
Abstract:
Prompt-tuning has demonstrated parameter-efficiency in fusing unimodal foundation models for multimodal tasks. However, its limited adaptivity and expressiveness lead to suboptimal performance when compared with other tuning methods. In this paper, we address this issue by disentangling the vanilla prompts to adaptively capture dataset-level and instance-level features. Building upon this disentan…
▽ More
Prompt-tuning has demonstrated parameter-efficiency in fusing unimodal foundation models for multimodal tasks. However, its limited adaptivity and expressiveness lead to suboptimal performance when compared with other tuning methods. In this paper, we address this issue by disentangling the vanilla prompts to adaptively capture dataset-level and instance-level features. Building upon this disentanglement, we introduce the mixture of prompt experts (MoPE) technique to enhance expressiveness. MoPE leverages multimodal pairing priors to route the most effective prompt on a per-instance basis. Compared to vanilla prompting, our MoPE-based conditional prompting exhibits greater expressiveness for multimodal fusion, scaling better with the training data and the overall number of trainable parameters. We also study a regularization term for expert routing, leading to emergent expert specialization, where different experts focus on different concepts, enabling interpretable soft prompting. Extensive experiments across three multimodal datasets demonstrate that our method achieves state-of-the-art results, matching or even surpassing the performance of fine-tuning, while requiring only 0.8% of the trainable parameters. Code will be released: https://github.com/songrise/MoPE.
△ Less
Submitted 14 March, 2024;
originally announced March 2024.
-
Thermal-NeRF: Neural Radiance Fields from an Infrared Camera
Authors:
Tianxiang Ye,
Qi Wu,
Junyuan Deng,
Guoqing Liu,
Liu Liu,
Songpengcheng Xia,
Liang Pang,
Wenxian Yu,
Ling Pei
Abstract:
In recent years, Neural Radiance Fields (NeRFs) have demonstrated significant potential in encoding highly-detailed 3D geometry and environmental appearance, positioning themselves as a promising alternative to traditional explicit representation for 3D scene reconstruction. However, the predominant reliance on RGB imaging presupposes ideal lighting conditions: a premise frequently unmet in roboti…
▽ More
In recent years, Neural Radiance Fields (NeRFs) have demonstrated significant potential in encoding highly-detailed 3D geometry and environmental appearance, positioning themselves as a promising alternative to traditional explicit representation for 3D scene reconstruction. However, the predominant reliance on RGB imaging presupposes ideal lighting conditions: a premise frequently unmet in robotic applications plagued by poor lighting or visual obstructions. This limitation overlooks the capabilities of infrared (IR) cameras, which excel in low-light detection and present a robust alternative under such adverse scenarios. To tackle these issues, we introduce Thermal-NeRF, the first method that estimates a volumetric scene representation in the form of a NeRF solely from IR imaging. By leveraging a thermal map** and structural thermal constraint derived from the thermal characteristics of IR imaging, our method showcasing unparalleled proficiency in recovering NeRFs in visually degraded scenes where RGB-based methods fall short. We conduct extensive experiments to demonstrate that Thermal-NeRF can achieve superior quality compared to existing methods. Furthermore, we contribute a dataset for IR-based NeRF applications, paving the way for future research in IR NeRF reconstruction.
△ Less
Submitted 15 March, 2024;
originally announced March 2024.
-
CoLeCLIP: Open-Domain Continual Learning via Joint Task Prompt and Vocabulary Learning
Authors:
Yukun Li,
Guansong Pang,
Wei Suo,
Chenchen **g,
Yuling Xi,
Lingqiao Liu,
Hao Chen,
Guoqiang Liang,
Peng Wang
Abstract:
This paper explores the problem of continual learning (CL) of vision-language models (VLMs) in open domains, where the models need to perform continual updating and inference on a streaming of datasets from diverse seen and unseen domains with novel classes. Such a capability is crucial for various applications in open environments, e.g., AI assistants, autonomous driving systems, and robotics. Cu…
▽ More
This paper explores the problem of continual learning (CL) of vision-language models (VLMs) in open domains, where the models need to perform continual updating and inference on a streaming of datasets from diverse seen and unseen domains with novel classes. Such a capability is crucial for various applications in open environments, e.g., AI assistants, autonomous driving systems, and robotics. Current CL studies mostly focus on closed-set scenarios in a single domain with known classes. Large pre-trained VLMs like CLIP have demonstrated superior zero-shot recognition ability, and a number of recent studies leverage this ability to mitigate catastrophic forgetting in CL, but they focus on closed-set CL in a single domain dataset. Open-domain CL of large VLMs is significantly more challenging due to 1) large class correlations and domain gaps across the datasets and 2) the forgetting of zero-shot knowledge in the pre-trained VLMs in addition to the knowledge learned from the newly adapted datasets. In this work we introduce a novel approach, termed CoLeCLIP, that learns an open-domain CL model based on CLIP. It addresses these challenges by a joint learning of a set of task prompts and a cross-domain class vocabulary. Extensive experiments on 11 domain datasets show that CoLeCLIP outperforms state-of-the-art methods for open-domain CL under both task- and class-incremental learning settings.
△ Less
Submitted 15 March, 2024;
originally announced March 2024.
-
Second-Order Strong Optimality and Second-Order Duality for Nonsmooth Constrained Multiobjective Fractional Programming Problems
Authors:
Jiawei Chen,
Luyu Liu,
Yibing Lv,
Debdas Ghosh,
Jen-Chih Yao
Abstract:
This paper investigates constrained nonsmooth multiobjective fractional programming problem (NMFP) in real Banach spaces. It derives a quotient calculus rule for computing the first- and second-order Clarke derivatives of fractional functions involving locally Lipschitz functions. A novel second-order Abadie-type regularity condition is presented, defined with the help of the Clarke directional de…
▽ More
This paper investigates constrained nonsmooth multiobjective fractional programming problem (NMFP) in real Banach spaces. It derives a quotient calculus rule for computing the first- and second-order Clarke derivatives of fractional functions involving locally Lipschitz functions. A novel second-order Abadie-type regularity condition is presented, defined with the help of the Clarke directional derivative and the Páles-Zeidan second-order directional derivative. We establish both first- and second-order strong necessary optimality conditions, which contain some new information on multipliers and imply the strong KKT necessary conditions, for a Borwein-type properly efficient solution of NMFP by utilizing generalized directional derivatives. Moreover, it derives second-order sufficient optimality conditions for NMFP under a second-order generalized convexity assumption. Additionally, we derive duality results between NMFP and its second-order dual problem under some appropriate conditions.
△ Less
Submitted 15 March, 2024;
originally announced March 2024.
-
Measurements of All-Particle Energy Spectrum and Mean Logarithmic Mass of Cosmic Rays from 0.3 to 30 PeV with LHAASO-KM2A
Authors:
The LHAASO Collaboration,
Zhen Cao,
F. Aharonian,
Q. An,
A. Axikegu,
Y. X. Bai,
Y. W. Bao,
D. Bastieri,
X. J. Bi,
Y. J. Bi,
J. T. Cai,
Q. Cao,
W. Y. Cao,
Zhe Cao,
J. Chang,
J. F. Chang,
A. M. Chen,
E. S. Chen,
Liang Chen,
Lin Chen,
Long Chen,
M. J. Chen,
M. L. Chen,
Q. H. Chen,
S. H. Chen
, et al. (256 additional authors not shown)
Abstract:
We present the measurements of all-particle energy spectrum and mean logarithmic mass of cosmic rays in the energy range of 0.3-30 PeV using data collected from LHAASO-KM2A between September 2021 and December 2022, which is based on a nearly composition-independent energy reconstruction method, achieving unprecedented accuracy. Our analysis reveals the position of the knee at…
▽ More
We present the measurements of all-particle energy spectrum and mean logarithmic mass of cosmic rays in the energy range of 0.3-30 PeV using data collected from LHAASO-KM2A between September 2021 and December 2022, which is based on a nearly composition-independent energy reconstruction method, achieving unprecedented accuracy. Our analysis reveals the position of the knee at $3.67 \pm 0.05 \pm 0.15$ PeV. Below the knee, the spectral index is found to be -$2.7413 \pm 0.0004 \pm 0.0050$, while above the knee, it is -$3.128 \pm 0.005 \pm 0.027$, with the sharpness of the transition measured with a statistical error of 2%. The mean logarithmic mass of cosmic rays is almost heavier than helium in the whole measured energy range. It decreases from 1.7 at 0.3 PeV to 1.3 at 3 PeV, representing a 24% decline following a power law with an index of -$0.1200 \pm 0.0003 \pm 0.0341$. This is equivalent to an increase in abundance of light components. Above the knee, the mean logarithmic mass exhibits a power law trend towards heavier components, which is reversal to the behavior observed in the all-particle energy spectrum. Additionally, the knee position and the change in power-law index are approximately the same. These findings suggest that the knee observed in the all-particle spectrum corresponds to the knee of the light component, rather than the medium-heavy components.
△ Less
Submitted 26 March, 2024; v1 submitted 15 March, 2024;
originally announced March 2024.
-
Emerging Jordan forms, with applications to critical statistical models and conformal field theory
Authors:
Lawrence Liu
Abstract:
Two novel frameworks for handling mathematical and physical problems are introduced. The first, the emerging Jordan form, generalizes the concept of the Jordan canonical form, a well-established tool of linear algebra. The second, dual Jordan quantum physics, generalizes the framework of quantum physics to one in which the hermiticity postulate is considerably relaxed. These frameworks are then us…
▽ More
Two novel frameworks for handling mathematical and physical problems are introduced. The first, the emerging Jordan form, generalizes the concept of the Jordan canonical form, a well-established tool of linear algebra. The second, dual Jordan quantum physics, generalizes the framework of quantum physics to one in which the hermiticity postulate is considerably relaxed. These frameworks are then used to resolve some long-outstanding problems in theoretical physics, coming from critical statistical models and conformal field theory. I describe these problems and the difficulties involved in finding satisfactory solutions, then show how the concepts of emerging Jordan forms and dual Jordan quantum physics are naturally suited to overcoming these difficulties. Although their applications in this work are limited in scope to rather specific problems, the frameworks themselves are completely general, and I describe ways in which they may be used in other areas of mathematics and physics. Several appendices close the work, which include improvements to a widely used computational algorithm and corrections to some published data.
△ Less
Submitted 14 March, 2024;
originally announced March 2024.