-
A Systematic Literature Survey of Sparse Matrix-Vector Multiplication
Authors:
Jianhua Gao,
Bingjie Liu,
Weixing Ji,
Hua Huang
Abstract:
Sparse matrix-vector multiplication (SpMV) is a crucial computing kernel with widespread applications in iterative algorithms. Over the past decades, research on SpMV optimization has made remarkable strides, giving rise to various optimization contributions. However, the comprehensive and systematic literature survey that introduces, analyzes, discusses, and summarizes the advancements of SpMV in…
▽ More
Sparse matrix-vector multiplication (SpMV) is a crucial computing kernel with widespread applications in iterative algorithms. Over the past decades, research on SpMV optimization has made remarkable strides, giving rise to various optimization contributions. However, the comprehensive and systematic literature survey that introduces, analyzes, discusses, and summarizes the advancements of SpMV in recent years is currently lacking. Aiming to fill this gap, this paper compares existing techniques and analyzes their strengths and weaknesses. We begin by highlighting two representative applications of SpMV, then conduct an in-depth overview of the important techniques that optimize SpMV on modern architectures, which we specifically classify as classic, auto-tuning, machine learning, and mixed-precision-based optimization. We also elaborate on the hardware-based architectures, including CPU, GPU, FPGA, processing in Memory, heterogeneous, and distributed platforms. We present a comprehensive experimental evaluation that compares the performance of state-of-the-art SpMV implementations. Based on our findings, we identify several challenges and point out future research directions. This survey is intended to provide researchers with a comprehensive understanding of SpMV optimization on modern architectures and provide guidance for future work.
△ Less
Submitted 9 April, 2024;
originally announced April 2024.
-
LHAASO-KM2A detector simulation using Geant4
Authors:
Zhen Cao,
F. Aharonian,
Q. An,
Axikegu,
Y. X. Bai,
Y. W. Bao,
D. Bastieri,
X. J. Bi,
Y. J. Bi,
J. T. Cai,
Q. Cao,
W. Y. Cao,
Zhe Cao,
J. Chang,
J. F. Chang,
A. M. Chen,
E. S. Chen,
Liang Chen,
Lin Chen,
Long Chen,
M. J. Chen,
M. L. Chen,
Q. H. Chen,
S. H. Chen,
S. Z. Chen
, et al. (254 additional authors not shown)
Abstract:
KM2A is one of the main sub-arrays of LHAASO, working on gamma ray astronomy and cosmic ray physics at energies above 10 TeV. Detector simulation is the important foundation for estimating detector performance and data analysis. It is a big challenge to simulate the KM2A detector in the framework of Geant4 due to the need to track numerous photons from a large number of detector units (>6000) with…
▽ More
KM2A is one of the main sub-arrays of LHAASO, working on gamma ray astronomy and cosmic ray physics at energies above 10 TeV. Detector simulation is the important foundation for estimating detector performance and data analysis. It is a big challenge to simulate the KM2A detector in the framework of Geant4 due to the need to track numerous photons from a large number of detector units (>6000) with large altitude difference (30 m) and huge coverage (1.3 km^2). In this paper, the design of the KM2A simulation code G4KM2A based on Geant4 is introduced. The process of G4KM2A is optimized mainly in memory consumption to avoid memory overffow. Some simpliffcations are used to signiffcantly speed up the execution of G4KM2A. The running time is reduced by at least 30 times compared to full detector simulation. The particle distributions and the core/angle resolution comparison between simulation and experimental data of the full KM2A array are also presented, which show good agreement.
△ Less
Submitted 7 April, 2024;
originally announced April 2024.
-
With or without $ν$? Hunting for the seed of the matter-antimatter asymmetry
Authors:
CUORE Collaboration,
D. Q. Adams,
C. Alduino,
K. Alfonso,
F. T. Avignone III,
O. Azzolini,
G. Bari,
F. Bellini,
G. Benato,
M. Beretta,
M. Biassoni,
A. Branca,
C. Brofferio,
C. Bucci,
J. Camilleri,
A. Caminata,
A. Campani,
J. Cao,
S. Capelli,
C. Capelli,
L. Cappelli,
L. Cardani,
P. Carniti,
N. Casali,
E. Celi
, et al. (93 additional authors not shown)
Abstract:
The matter-antimatter asymmetry underlines the incompleteness of the current understanding of particle physics. Neutrinoless double-beta ($0νββ$) decay may help explain this asymmetry, while unveiling the Majorana nature of the neutrino. The CUORE experiment searches for $0νββ$ decay of $^{130}$Te using a tonne-scale cryogenic calorimeter operated at milli-kelvin temperatures. We report no evidenc…
▽ More
The matter-antimatter asymmetry underlines the incompleteness of the current understanding of particle physics. Neutrinoless double-beta ($0νββ$) decay may help explain this asymmetry, while unveiling the Majorana nature of the neutrino. The CUORE experiment searches for $0νββ$ decay of $^{130}$Te using a tonne-scale cryogenic calorimeter operated at milli-kelvin temperatures. We report no evidence for $0νββ$ decay and place a lower limit on the half-life of T$_{1/2}$ $>$ 3.8 $\times$ 10$^{25}$ years (90% C.I.) with over 2 tonne$\cdot$year TeO$_2$ exposure. The tools and techniques developed for this result and the 5 year stable operation of nearly 1000 detectors demonstrate the infrastructure for a next-generation experiment capable of searching for $0νββ$ decay across multiple isotopes.
△ Less
Submitted 5 April, 2024;
originally announced April 2024.
-
Score identity Distillation: Exponentially Fast Distillation of Pretrained Diffusion Models for One-Step Generation
Authors:
Mingyuan Zhou,
Huangjie Zheng,
Zhendong Wang,
Mingzhang Yin,
Hai Huang
Abstract:
We introduce Score identity Distillation (SiD), an innovative data-free method that distills the generative capabilities of pretrained diffusion models into a single-step generator. SiD not only facilitates an exponentially fast reduction in Fréchet inception distance (FID) during distillation but also approaches or even exceeds the FID performance of the original teacher diffusion models. By refo…
▽ More
We introduce Score identity Distillation (SiD), an innovative data-free method that distills the generative capabilities of pretrained diffusion models into a single-step generator. SiD not only facilitates an exponentially fast reduction in Fréchet inception distance (FID) during distillation but also approaches or even exceeds the FID performance of the original teacher diffusion models. By reformulating forward diffusion processes as semi-implicit distributions, we leverage three score-related identities to create an innovative loss mechanism. This mechanism achieves rapid FID reduction by training the generator using its own synthesized images, eliminating the need for real data or reverse-diffusion-based generation, all accomplished within significantly shortened generation time. Upon evaluation across four benchmark datasets, the SiD algorithm demonstrates high iteration efficiency during distillation and surpasses competing distillation approaches, whether they are one-step or few-step, data-free, or dependent on training data, in terms of generation quality. This achievement not only redefines the benchmarks for efficiency and effectiveness in diffusion distillation but also in the broader field of diffusion-based generation. The PyTorch implementation is available at https://github.com/mingyuanzhou/SiD
△ Less
Submitted 24 May, 2024; v1 submitted 5 April, 2024;
originally announced April 2024.
-
OmniGS: Omnidirectional Gaussian Splatting for Fast Radiance Field Reconstruction using Omnidirectional Images
Authors:
Longwei Li,
Huajian Huang,
Sai-Kit Yeung,
Hui Cheng
Abstract:
Photorealistic reconstruction relying on 3D Gaussian Splatting has shown promising potential in robotics. However, the current 3D Gaussian Splatting system only supports radiance field reconstruction using undistorted perspective images. In this paper, we present OmniGS, a novel omnidirectional Gaussian splatting system, to take advantage of omnidirectional images for fast radiance field reconstru…
▽ More
Photorealistic reconstruction relying on 3D Gaussian Splatting has shown promising potential in robotics. However, the current 3D Gaussian Splatting system only supports radiance field reconstruction using undistorted perspective images. In this paper, we present OmniGS, a novel omnidirectional Gaussian splatting system, to take advantage of omnidirectional images for fast radiance field reconstruction. Specifically, we conduct a theoretical analysis of spherical camera model derivatives in 3D Gaussian Splatting. According to the derivatives, we then implement a new GPU-accelerated omnidirectional rasterizer that directly splats 3D Gaussians onto the equirectangular screen space for omnidirectional image rendering. As a result, we realize differentiable optimization of the radiance field without the requirement of cube-map rectification or tangent-plane approximation. Extensive experiments conducted in egocentric and roaming scenarios demonstrate that our method achieves state-of-the-art reconstruction quality and high rendering speed using omnidirectional images. To benefit the research community, the code will be made publicly available once the paper is published.
△ Less
Submitted 7 April, 2024; v1 submitted 4 April, 2024;
originally announced April 2024.
-
Multi-modal Learning for WebAssembly Reverse Engineering
Authors:
Hanxian Huang,
Jishen Zhao
Abstract:
The increasing adoption of WebAssembly (Wasm) for performance-critical and security-sensitive tasks drives the demand for WebAssembly program comprehension and reverse engineering. Recent studies have introduced machine learning (ML)-based WebAssembly reverse engineering tools. Yet, the generalization of task-specific ML solutions remains challenging, because their effectiveness hinges on the avai…
▽ More
The increasing adoption of WebAssembly (Wasm) for performance-critical and security-sensitive tasks drives the demand for WebAssembly program comprehension and reverse engineering. Recent studies have introduced machine learning (ML)-based WebAssembly reverse engineering tools. Yet, the generalization of task-specific ML solutions remains challenging, because their effectiveness hinges on the availability of an ample supply of high-quality task-specific labeled data. Moreover, previous works overlook the high-level semantics present in source code and its documentation. Acknowledging the abundance of available source code with documentation, which can be compiled into WebAssembly, we propose to learn representations of them concurrently and harness their mutual relationships for effective WebAssembly reverse engineering.
In this paper, we present WasmRev, the first multi-modal pre-trained language model for WebAssembly reverse engineering. WasmRev is pre-trained using self-supervised learning on a large-scale multi-modal corpus encompassing source code, code documentation and the compiled WebAssembly, without requiring labeled data. WasmRev incorporates three tailored multi-modal pre-training tasks to capture various characteristics of WebAssembly and cross-modal relationships. WasmRev is only trained once to produce general-purpose representations that can broadly support WebAssembly reverse engineering tasks through few-shot fine-tuning with much less labeled data, improving data efficiency. We fine-tune WasmRev onto three important reverse engineering tasks: type recovery, function purpose identification and WebAssembly summarization. Our results show that WasmRev pre-trained on the corpus of multi-modal samples establishes a robust foundation for these tasks, achieving high task accuracy and outperforming the state-of-the-art ML methods for WebAssembly reverse engineering.
△ Less
Submitted 3 April, 2024;
originally announced April 2024.
-
GeoT: Tensor Centric Library for Graph Neural Network via Efficient Segment Reduction on GPU
Authors:
Zhongming Yu,
Genghan Zhang,
Hanxian Huang,
Xin Chen,
Jishen Zhao
Abstract:
In recent years, Graph Neural Networks (GNNs) have ignited a surge of innovation, significantly enhancing the processing of geometric data structures such as graphs, point clouds, and meshes. As the domain continues to evolve, a series of frameworks and libraries are being developed to push GNN efficiency to new heights. While graph-centric libraries have achieved success in the past, the advent o…
▽ More
In recent years, Graph Neural Networks (GNNs) have ignited a surge of innovation, significantly enhancing the processing of geometric data structures such as graphs, point clouds, and meshes. As the domain continues to evolve, a series of frameworks and libraries are being developed to push GNN efficiency to new heights. While graph-centric libraries have achieved success in the past, the advent of efficient tensor compilers has highlighted the urgent need for tensor-centric libraries. Yet, efficient tensor-centric frameworks for GNNs remain scarce due to unique challenges and limitations encountered when implementing segment reduction in GNN contexts.
We introduce GeoT, a cutting-edge tensor-centric library designed specifically for GNNs via efficient segment reduction. GeoT debuts innovative parallel algorithms that not only introduce new design principles but also expand the available design space. Importantly, GeoT is engineered for straightforward fusion within a computation graph, ensuring compatibility with contemporary tensor-centric machine learning frameworks and compilers. Setting a new performance benchmark, GeoT marks a considerable advancement by showcasing an average operator speedup of 1.80x and an end-to-end speedup of 1.68x.
△ Less
Submitted 7 April, 2024; v1 submitted 3 April, 2024;
originally announced April 2024.
-
FPT: Feature Prompt Tuning for Few-shot Readability Assessment
Authors:
Ziyang Wang,
Sanwoo Lee,
Hsiu-Yuan Huang,
Yunfang Wu
Abstract:
Prompt-based methods have achieved promising results in most few-shot text classification tasks. However, for readability assessment tasks, traditional prompt methods lackcrucial linguistic knowledge, which has already been proven to be essential. Moreover, previous studies on utilizing linguistic features have shown non-robust performance in few-shot settings and may even impair model performance…
▽ More
Prompt-based methods have achieved promising results in most few-shot text classification tasks. However, for readability assessment tasks, traditional prompt methods lackcrucial linguistic knowledge, which has already been proven to be essential. Moreover, previous studies on utilizing linguistic features have shown non-robust performance in few-shot settings and may even impair model performance.To address these issues, we propose a novel prompt-based tuning framework that incorporates rich linguistic knowledge, called Feature Prompt Tuning (FPT). Specifically, we extract linguistic features from the text and embed them into trainable soft prompts. Further, we devise a new loss function to calibrate the similarity ranking order between categories. Experimental results demonstrate that our proposed method FTP not only exhibits a significant performance improvement over the prior best prompt-based tuning approaches, but also surpasses the previous leading methods that incorporate linguistic features. Also, our proposed model significantly outperforms the large language model gpt-3.5-turbo-16k in most cases. Our proposed method establishes a new architecture for prompt tuning that sheds light on how linguistic features can be easily adapted to linguistic-related tasks.
△ Less
Submitted 10 April, 2024; v1 submitted 3 April, 2024;
originally announced April 2024.
-
The Set-Theoretic Form of Twin Prime Distribution and Its Odd-Even Imbalance
Authors:
HaoJie Huang
Abstract:
The prime number problem falls within the realm of number theory, specifically elementary number theory. Current research approaches have unnecessarily complicated this matter. In contrast to more advanced mathematical tools, the methods of elementary number theory can effectively address the twin prime problem. The primary contribution of this article lies in establishing a set that systematicall…
▽ More
The prime number problem falls within the realm of number theory, specifically elementary number theory. Current research approaches have unnecessarily complicated this matter. In contrast to more advanced mathematical tools, the methods of elementary number theory can effectively address the twin prime problem. The primary contribution of this article lies in establishing a set that systematically includes all twin prime pairs without omissions. This set's distribution is governed by a precise general solution formula. Furthermore, an analysis of the distribution set reveals characteristics of parity balance, enabling us to determine whether twin prime pairs are finite. Finally, our method can be extended to the general case of twin primes, such as the Polignac's conjecture.
△ Less
Submitted 3 April, 2024;
originally announced April 2024.
-
Grid-Map** Pseudo-Count Constraint for Offline Reinforcement Learning
Authors:
Yi Shen,
Hanyan Huang,
Shan Xie
Abstract:
Offline reinforcement learning learns from a static dataset without interacting with the environment, which ensures security and thus owns a good prospect of application. However, directly applying naive reinforcement learning methods usually fails in an offline environment due to function approximation errors caused by out-of-distribution(OOD) actions. To solve this problem, existing algorithms m…
▽ More
Offline reinforcement learning learns from a static dataset without interacting with the environment, which ensures security and thus owns a good prospect of application. However, directly applying naive reinforcement learning methods usually fails in an offline environment due to function approximation errors caused by out-of-distribution(OOD) actions. To solve this problem, existing algorithms mainly penalize the Q-value of OOD actions, the quality of whose constraints also matter. Imprecise constraints may lead to suboptimal solutions, while precise constraints require significant computational costs. In this paper, we propose a novel count-based method for continuous domains, called Grid-Map** Pseudo-Count method(GPC), to penalize the Q-value appropriately and reduce the computational cost. The proposed method maps the state and action space to discrete space and constrains their Q-values through the pseudo-count. It is theoretically proved that only a few conditions are needed to obtain accurate uncertainty constraints in the proposed method. Moreover, we develop a Grid-Map** Pseudo-Count Soft Actor-Critic(GPC-SAC) algorithm using GPC under the Soft Actor-Critic(SAC) framework to demonstrate the effectiveness of GPC. The experimental results on D4RL benchmark datasets show that GPC-SAC has better performance and less computational cost compared to other algorithms.
△ Less
Submitted 3 April, 2024;
originally announced April 2024.
-
Dynamical study of $D^{*}DK$ and $D^{*}D \bar{D}$ systems at quark level
Authors:
Yue Tan,
Xuejie Liu,
Xiaoyun Chen,
Youchang Yang,
Hongxia Huang,
Jialun **
Abstract:
Inspired by that Belle\uppercase\expandafter{\romannumeral2} Collaboration recently reported $T_{cc}$, which can be interpreted as a molecular $DD^{*}$, we investigated the trihadron system of $T_{cc}$ partner with $IJ^{P}$=$01^{-}$ in the framework of a chiral quark model. It's widely accepted that the main component of $X(3872)$ contains the molecular $\bar{D}D^{*}$, while the main component of…
▽ More
Inspired by that Belle\uppercase\expandafter{\romannumeral2} Collaboration recently reported $T_{cc}$, which can be interpreted as a molecular $DD^{*}$, we investigated the trihadron system of $T_{cc}$ partner with $IJ^{P}$=$01^{-}$ in the framework of a chiral quark model. It's widely accepted that the main component of $X(3872)$ contains the molecular $\bar{D}D^{*}$, while the main component of $D_{s0}^{*}(2317)$ is molecular $DK$. Based on these three well-known exotic states, $T_{cc} (DD^{*})$, $X(3872) (\bar{D}D^{*})$ and $D_{s0}^{*}(2317) (DK)$, we dynamically investigate $D^{*}DK$ and $DD^{*}\bar{D}$ systems at quark level to search for possible bound states. The results show that both of them are bound states, in which the binding energy of the molecular state $DD^*K$ is relatively small, only 0.8 MeV, while the binding energy of $DD^*\bar{D}$ is up to 1.9 MeV. According to the calculation results of the Root-square-mean distances, the spatial structure of the two systems shows obvious ($DD^*$)-($\bar{D}$/$K$) structure, in which $D$ is close to $D^*$ while $DD^*$ as a whole is relatively distant from the third hadron ($\bar{D}$/$K$), which are similar to the nucleon-electron structure. As a result, we strongly recommend that these bound states $DD^*\bar{D}$ and $DD^*K$ are searched for experimentally.
△ Less
Submitted 2 April, 2024;
originally announced April 2024.
-
Search for a sub-eV sterile neutrino using Daya Bay's full dataset
Authors:
F. P. An,
W. D. Bai,
A. B. Balantekin,
M. Bishai,
S. Blyth,
G. F. Cao,
J. Cao,
J. F. Chang,
Y. Chang,
H. S. Chen,
H. Y. Chen,
S. M. Chen,
Y. Chen,
Y. X. Chen,
Z. Y. Chen,
J. Cheng,
Y. C. Cheng,
Z. K. Cheng,
J. J. Cherwinka,
M. C. Chu,
J. P. Cummings,
O. Dalager,
F. S. Deng,
X. Y. Ding,
Y. Y. Ding
, et al. (176 additional authors not shown)
Abstract:
This Letter presents results of a search for the mixing of a sub-eV sterile neutrino with three active neutrinos based on the full data sample of the Daya Bay Reactor Neutrino Experiment, collected during 3158 days of detector operation, which contains $5.55 \times 10^{6}$ reactor \anue candidates identified as inverse beta-decay interactions followed by neutron-capture on gadolinium. The analysis…
▽ More
This Letter presents results of a search for the mixing of a sub-eV sterile neutrino with three active neutrinos based on the full data sample of the Daya Bay Reactor Neutrino Experiment, collected during 3158 days of detector operation, which contains $5.55 \times 10^{6}$ reactor \anue candidates identified as inverse beta-decay interactions followed by neutron-capture on gadolinium. The analysis benefits from a doubling of the statistics of our previous result and from improvements of several important systematic uncertainties. No significant oscillation due to mixing of a sub-eV sterile neutrino with active neutrinos was found. Exclusion limits are set by both Feldman-Cousins and CLs methods. Light sterile neutrino mixing with $\sin^2 2θ_{14} \gtrsim 0.01$ can be excluded at 95\% confidence level in the region of $0.01$ eV$^2 \lesssim |Δm^{2}_{41}| \lesssim 0.1 $ eV$^2$. This result represents the world-leading constraints in the region of $2 \times 10^{-4}$ eV$^2 \lesssim |Δm^{2}_{41}| \lesssim 0.2 $ eV$^2$.
△ Less
Submitted 15 April, 2024; v1 submitted 2 April, 2024;
originally announced April 2024.
-
Exploring and Evaluating Hallucinations in LLM-Powered Code Generation
Authors:
Fang Liu,
Yang Liu,
Lin Shi,
Houkun Huang,
Ruifeng Wang,
Zhen Yang,
Li Zhang,
Zhongqi Li,
Yuchi Ma
Abstract:
The rise of Large Language Models (LLMs) has significantly advanced many applications on software engineering tasks, particularly in code generation. Despite the promising performance, LLMs are prone to generate hallucinations, which means LLMs might produce outputs that deviate from users' intent, exhibit internal inconsistencies, or misalign with the factual knowledge, making the deployment of L…
▽ More
The rise of Large Language Models (LLMs) has significantly advanced many applications on software engineering tasks, particularly in code generation. Despite the promising performance, LLMs are prone to generate hallucinations, which means LLMs might produce outputs that deviate from users' intent, exhibit internal inconsistencies, or misalign with the factual knowledge, making the deployment of LLMs potentially risky in a wide range of applications. Existing work mainly focuses on investing the hallucination in the domain of natural language generation (NLG), leaving a gap in understanding the types and extent of hallucinations in the context of code generation. To bridge the gap, we conducted a thematic analysis of the LLM-generated code to summarize and categorize the hallucinations present in it. Our study established a comprehensive taxonomy of hallucinations in LLM-generated code, encompassing 5 primary categories of hallucinations depending on the conflicting objectives and varying degrees of deviation observed in code generation. Furthermore, we systematically analyzed the distribution of hallucinations, exploring variations among different LLMs and their correlation with code correctness. Based on the results, we proposed HalluCode, a benchmark for evaluating the performance of code LLMs in recognizing hallucinations. Hallucination recognition and mitigation experiments with HalluCode and HumanEval show existing LLMs face great challenges in recognizing hallucinations, particularly in identifying their types, and are hardly able to mitigate hallucinations. We believe our findings will shed light on future research about hallucination evaluation, detection, and mitigation, ultimately paving the way for building more effective and reliable code LLMs in the future.
△ Less
Submitted 10 May, 2024; v1 submitted 1 April, 2024;
originally announced April 2024.
-
Constraints on the Blazar-Boosted Dark Matter from the CDEX-10 Experiment
Authors:
R. Xu,
L. T. Yang,
Q. Yue,
K. J. Kang,
Y. J. Li,
H. P. An,
Greeshma C.,
J. P. Chang,
Y. H. Chen,
J. P. Cheng,
W. H. Dai,
Z. Deng,
C. H. Fang,
X. P. Geng,
H. Gong,
Q. J. Guo,
T. Guo,
X. Y. Guo,
L. He,
S. M. He,
J. W. Hu,
H. X. Huang,
T. C. Huang,
L. Jiang,
S. Karmakar
, et al. (59 additional authors not shown)
Abstract:
We report new constraints on light dark matter (DM) boosted by blazars using the 205.4 kg day data from the CDEX-10 experiment located at the China **** Underground Laboratory. Two representative blazars, TXS 0506+56 and BL Lacertae are studied. The results derived from TXS 0506+56 exclude DM-nucleon elastic scattering cross sections from $4.6\times 10^{-33}\ \rm cm^2$ to…
▽ More
We report new constraints on light dark matter (DM) boosted by blazars using the 205.4 kg day data from the CDEX-10 experiment located at the China **** Underground Laboratory. Two representative blazars, TXS 0506+56 and BL Lacertae are studied. The results derived from TXS 0506+56 exclude DM-nucleon elastic scattering cross sections from $4.6\times 10^{-33}\ \rm cm^2$ to $1\times10^{-26}\ \rm cm^2$ for DM masses between 10 keV and 1 GeV, and the results derived from BL Lacertae exclude DM-nucleon elastic scattering cross sections from $2.4\times 10^{-34}\ \rm cm^2$ to $1\times10^{-26}\ \rm cm^2$ for the same range of DM masses. The constraints correspond to the best sensitivities among solid-state detector experiments in the sub-MeV mass range.
△ Less
Submitted 29 March, 2024;
originally announced March 2024.
-
Probing Dark Matter Particles from Evaporating Primordial Black Holes via Electron Scattering in the CDEX-10 Experiment
Authors:
Z. H. Zhang,
L. T. Yang,
Q. Yue,
K. J. Kang,
Y. J. Li,
H. P. An,
Greeshma C.,
J. P. Chang,
Y. H. Chen,
J. P. Cheng,
W. H. Dai,
Z. Deng,
C. H. Fang,
X. P. Geng,
H. Gong,
Q. J. Guo,
T. Guo,
X. Y. Guo,
L. He,
S. M. He,
J. W. Hu,
H. X. Huang,
T. C. Huang,
L. Jiang,
S. Karmakar
, et al. (59 additional authors not shown)
Abstract:
Dark matter (DM) is a major constituent of the Universe. However, no definite evidence of DM particles (denoted as ``$χ$") has been found in DM direct detection (DD) experiments to date. There is a novel concept that detecting $χ$ from evaporating primordial black holes (PBHs). We search for $χ$ emitted from PBHs by investigating their interaction with target electrons. The examined PBH masses ran…
▽ More
Dark matter (DM) is a major constituent of the Universe. However, no definite evidence of DM particles (denoted as ``$χ$") has been found in DM direct detection (DD) experiments to date. There is a novel concept that detecting $χ$ from evaporating primordial black holes (PBHs). We search for $χ$ emitted from PBHs by investigating their interaction with target electrons. The examined PBH masses range from 1$\times$10$^{15}$ to 7$\times$10$^{16}$ g under the current limits of PBH abundance $f_{PBH}$. Using 205.4 kg$\cdot$day data obtained from the CDEX-10 experiment conducted in the China **** Underground Laboratory, we exclude the $χ$--electron ($χ$--$e$) elastic-scattering cross section $σ_{χe} \sim 5\times10^{-29}$ cm$^2$ for $χ$ with a mass $m_χ\lesssim$ 0.1 keV from our results. If ($m_χ$, $σ_{χe}$) can be determined in the future, DD experiments are expected to impose strong constraints on $f_{PBH}$ for large $M_{PBH}$s.
△ Less
Submitted 29 March, 2024;
originally announced March 2024.
-
FairCLIP: Harnessing Fairness in Vision-Language Learning
Authors:
Yan Luo,
Min Shi,
Muhammad Osama Khan,
Muhammad Muneeb Afzal,
Hao Huang,
Shuaihang Yuan,
Yu Tian,
Luo Song,
Ava Kouhana,
Tobias Elze,
Yi Fang,
Mengyu Wang
Abstract:
Fairness is a critical concern in deep learning, especially in healthcare, where these models influence diagnoses and treatment decisions. Although fairness has been investigated in the vision-only domain, the fairness of medical vision-language (VL) models remains unexplored due to the scarcity of medical VL datasets for studying fairness. To bridge this research gap, we introduce the first fair…
▽ More
Fairness is a critical concern in deep learning, especially in healthcare, where these models influence diagnoses and treatment decisions. Although fairness has been investigated in the vision-only domain, the fairness of medical vision-language (VL) models remains unexplored due to the scarcity of medical VL datasets for studying fairness. To bridge this research gap, we introduce the first fair vision-language medical dataset Harvard-FairVLMed that provides detailed demographic attributes, ground-truth labels, and clinical notes to facilitate an in-depth examination of fairness within VL foundation models. Using Harvard-FairVLMed, we conduct a comprehensive fairness analysis of two widely-used VL models (CLIP and BLIP2), pre-trained on both natural and medical domains, across four different protected attributes. Our results highlight significant biases in all VL models, with Asian, Male, Non-Hispanic, and Spanish being the preferred subgroups across the protected attributes of race, gender, ethnicity, and language, respectively. In order to alleviate these biases, we propose FairCLIP, an optimal-transport-based approach that achieves a favorable trade-off between performance and fairness by reducing the Sinkhorn distance between the overall sample distribution and the distributions corresponding to each demographic group. As the first VL dataset of its kind, Harvard-FairVLMed holds the potential to catalyze advancements in the development of machine learning models that are both ethically aware and clinically effective. Our dataset and code are available at https://ophai.hms.harvard.edu/datasets/harvard-fairvlmed10k.
△ Less
Submitted 5 April, 2024; v1 submitted 28 March, 2024;
originally announced March 2024.
-
AlloyBERT: Alloy Property Prediction with Large Language Models
Authors:
Akshat Chaudhari,
Chakradhar Guntuboina,
Hongshuo Huang,
Amir Barati Farimani
Abstract:
The pursuit of novel alloys tailored to specific requirements poses significant challenges for researchers in the field. This underscores the importance of develo** predictive techniques for essential physical properties of alloys based on their chemical composition and processing parameters. This study introduces AlloyBERT, a transformer encoder-based model designed to predict properties such a…
▽ More
The pursuit of novel alloys tailored to specific requirements poses significant challenges for researchers in the field. This underscores the importance of develo** predictive techniques for essential physical properties of alloys based on their chemical composition and processing parameters. This study introduces AlloyBERT, a transformer encoder-based model designed to predict properties such as elastic modulus and yield strength of alloys using textual inputs. Leveraging the pre-trained RoBERTa encoder model as its foundation, AlloyBERT employs self-attention mechanisms to establish meaningful relationships between words, enabling it to interpret human-readable input and predict target alloy properties. By combining a tokenizer trained on our textual data and a RoBERTa encoder pre-trained and fine-tuned for this specific task, we achieved a mean squared error (MSE) of 0.00015 on the Multi Principal Elemental Alloys (MPEA) data set and 0.00611 on the Refractory Alloy Yield Strength (RAYS) dataset. This surpasses the performance of shallow models, which achieved a best-case MSE of 0.00025 and 0.0076 on the MPEA and RAYS datasets respectively. Our results highlight the potential of language models in material science and establish a foundational framework for text-based prediction of alloy properties that does not rely on complex underlying representations, calculations, or simulations.
△ Less
Submitted 28 March, 2024;
originally announced March 2024.
-
Jointly Training and Pruning CNNs via Learnable Agent Guidance and Alignment
Authors:
Alireza Ganjdanesh,
Shangqian Gao,
Heng Huang
Abstract:
Structural model pruning is a prominent approach used for reducing the computational cost of Convolutional Neural Networks (CNNs) before their deployment on resource-constrained devices. Yet, the majority of proposed ideas require a pretrained model before pruning, which is costly to secure. In this paper, we propose a novel structural pruning approach to jointly learn the weights and structurally…
▽ More
Structural model pruning is a prominent approach used for reducing the computational cost of Convolutional Neural Networks (CNNs) before their deployment on resource-constrained devices. Yet, the majority of proposed ideas require a pretrained model before pruning, which is costly to secure. In this paper, we propose a novel structural pruning approach to jointly learn the weights and structurally prune architectures of CNN models. The core element of our method is a Reinforcement Learning (RL) agent whose actions determine the pruning ratios of the CNN model's layers, and the resulting model's accuracy serves as its reward. We conduct the joint training and pruning by iteratively training the model's weights and the agent's policy, and we regularize the model's weights to align with the selected structure by the agent. The evolving model's weights result in a dynamic reward function for the agent, which prevents using prominent episodic RL methods with stationary environment assumption for our purpose. We address this challenge by designing a mechanism to model the complex changing dynamics of the reward function and provide a representation of it to the RL agent. To do so, we take a learnable embedding for each training epoch and employ a recurrent model to calculate a representation of the changing environment. We train the recurrent model and embeddings using a decoder model to reconstruct observed rewards. Such a design empowers our agent to effectively leverage episodic observations along with the environment representations to learn a proper policy to determine performant sub-networks of the CNN model. Our extensive experiments on CIFAR-10 and ImageNet using ResNets and MobileNets demonstrate the effectiveness of our method.
△ Less
Submitted 28 March, 2024;
originally announced March 2024.
-
MATTopo: Topology-preserving Medial Axis Transform with Restricted Power Diagram
Authors:
Ningna Wang,
Hui Huang,
Shibo Song,
Bin Wang,
Wen** Wang,
Xiaohu Guo
Abstract:
We present a novel topology-preserving 3D medial axis computation framework based on volumetric restricted power diagram (RPD), while preserving the medial features and geometric convergence simultaneously, for both 3D CAD and organic shapes. The volumetric RPD discretizes the input 3D volume into sub-regions given a set of medial spheres. With this intermediate structure, we convert the homotopy…
▽ More
We present a novel topology-preserving 3D medial axis computation framework based on volumetric restricted power diagram (RPD), while preserving the medial features and geometric convergence simultaneously, for both 3D CAD and organic shapes. The volumetric RPD discretizes the input 3D volume into sub-regions given a set of medial spheres. With this intermediate structure, we convert the homotopy equivalency between the generated medial mesh and the input 3D shape into a localized contractibility checking for each restricted element (power cell, power face, power edge), by checking their connected components and Euler characteristics. We further propose a fractional Euler characteristic algorithm for efficient GPU-based computation of Euler characteristic for each restricted element on the fly while computing the volumetric RPD. Compared with existing voxel-based or point-cloud-based methods, our approach is the first to adaptively and directly revise the medial mesh without globally modifying the dependent structure, such as voxel size or sampling density, while preserving its topology and medial features. In comparison with the feature preservation method MATFP, our method provides geometrically comparable results with fewer spheres and more robustly captures the topology of the input 3D shape.
△ Less
Submitted 21 May, 2024; v1 submitted 27 March, 2024;
originally announced March 2024.
-
Improving Content Recommendation: Knowledge Graph-Based Semantic Contrastive Learning for Diversity and Cold-Start Users
Authors:
Ye** Kim,
Scott Rome,
Kevin Foley,
Mayur Nankani,
Rimon Melamed,
Javier Morales,
Abhay Yadav,
Maria Peifer,
Sardar Hamidian,
H. Howie Huang
Abstract:
Addressing the challenges related to data sparsity, cold-start problems, and diversity in recommendation systems is both crucial and demanding. Many current solutions leverage knowledge graphs to tackle these issues by combining both item-based and user-item collaborative signals. A common trend in these approaches focuses on improving ranking performance at the cost of escalating model complexity…
▽ More
Addressing the challenges related to data sparsity, cold-start problems, and diversity in recommendation systems is both crucial and demanding. Many current solutions leverage knowledge graphs to tackle these issues by combining both item-based and user-item collaborative signals. A common trend in these approaches focuses on improving ranking performance at the cost of escalating model complexity, reducing diversity, and complicating the task. It is essential to provide recommendations that are both personalized and diverse, rather than solely relying on achieving high rank-based performance, such as Click-through Rate, Recall, etc. In this paper, we propose a hybrid multi-task learning approach, training on user-item and item-item interactions. We apply item-based contrastive learning on descriptive text, sampling positive and negative pairs based on item metadata. Our approach allows the model to better understand the relationships between entities within the knowledge graph by utilizing semantic information from text. It leads to more accurate, relevant, and diverse user recommendations and a benefit that extends even to cold-start users who have few interactions with items. We perform extensive experiments on two widely used datasets to validate the effectiveness of our approach. Our findings demonstrate that jointly training user-item interactions and item-based signals using synopsis text is highly effective. Furthermore, our results provide evidence that item-based contrastive learning enhances the quality of entity embeddings, as indicated by metrics such as uniformity and alignment.
△ Less
Submitted 27 March, 2024;
originally announced March 2024.
-
VersaT2I: Improving Text-to-Image Models with Versatile Reward
Authors:
Jianshu Guo,
Wenhao Chai,
Jie Deng,
Hsiang-Wei Huang,
Tian Ye,
Yichen Xu,
Jiawei Zhang,
Jenq-Neng Hwang,
Gaoang Wang
Abstract:
Recent text-to-image (T2I) models have benefited from large-scale and high-quality data, demonstrating impressive performance. However, these T2I models still struggle to produce images that are aesthetically pleasing, geometrically accurate, faithful to text, and of good low-level quality. We present VersaT2I, a versatile training framework that can boost the performance with multiple rewards of…
▽ More
Recent text-to-image (T2I) models have benefited from large-scale and high-quality data, demonstrating impressive performance. However, these T2I models still struggle to produce images that are aesthetically pleasing, geometrically accurate, faithful to text, and of good low-level quality. We present VersaT2I, a versatile training framework that can boost the performance with multiple rewards of any T2I model. We decompose the quality of the image into several aspects such as aesthetics, text-image alignment, geometry, low-level quality, etc. Then, for every quality aspect, we select high-quality images in this aspect generated by the model as the training set to finetune the T2I model using the Low-Rank Adaptation (LoRA). Furthermore, we introduce a gating function to combine multiple quality aspects, which can avoid conflicts between different quality aspects. Our method is easy to extend and does not require any manual annotation, reinforcement learning, or model architecture changes. Extensive experiments demonstrate that VersaT2I outperforms the baseline methods across various quality criteria.
△ Less
Submitted 27 March, 2024;
originally announced March 2024.
-
ViTAR: Vision Transformer with Any Resolution
Authors:
Qihang Fan,
Quanzeng You,
Xiaotian Han,
Yongfei Liu,
Yunzhe Tao,
Huaibo Huang,
Ran He,
Hongxia Yang
Abstract:
This paper tackles a significant challenge faced by Vision Transformers (ViTs): their constrained scalability across different image resolutions. Typically, ViTs experience a performance decline when processing resolutions different from those seen during training. Our work introduces two key innovations to address this issue. Firstly, we propose a novel module for dynamic resolution adjustment, d…
▽ More
This paper tackles a significant challenge faced by Vision Transformers (ViTs): their constrained scalability across different image resolutions. Typically, ViTs experience a performance decline when processing resolutions different from those seen during training. Our work introduces two key innovations to address this issue. Firstly, we propose a novel module for dynamic resolution adjustment, designed with a single Transformer block, specifically to achieve highly efficient incremental token integration. Secondly, we introduce fuzzy positional encoding in the Vision Transformer to provide consistent positional awareness across multiple resolutions, thereby preventing overfitting to any single training resolution. Our resulting model, ViTAR (Vision Transformer with Any Resolution), demonstrates impressive adaptability, achieving 83.3\% top-1 accuracy at a 1120x1120 resolution and 80.4\% accuracy at a 4032x4032 resolution, all while reducing computational costs. ViTAR also shows strong performance in downstream tasks such as instance and semantic segmentation and can easily combined with self-supervised learning techniques like Masked AutoEncoder. Our work provides a cost-effective solution for enhancing the resolution scalability of ViTs, paving the way for more versatile and efficient high-resolution image processing.
△ Less
Submitted 28 March, 2024; v1 submitted 27 March, 2024;
originally announced March 2024.
-
AIR-HLoc: Adaptive Image Retrieval for Efficient Visual Localisation
Authors:
Changkun Liu,
Huajian Huang,
Zhengyang Ma,
Tristan Braud
Abstract:
State-of-the-art (SOTA) hierarchical localisation pipelines (HLoc) rely on image retrieval (IR) techniques to establish 2D-3D correspondences by selecting the $k$ most similar images from a reference image database for a given query image. Although higher values of $k$ enhance localisation robustness, the computational cost for feature matching increases linearly with $k$. In this paper, we observ…
▽ More
State-of-the-art (SOTA) hierarchical localisation pipelines (HLoc) rely on image retrieval (IR) techniques to establish 2D-3D correspondences by selecting the $k$ most similar images from a reference image database for a given query image. Although higher values of $k$ enhance localisation robustness, the computational cost for feature matching increases linearly with $k$. In this paper, we observe that queries that are the most similar to images in the database result in a higher proportion of feature matches and, thus, more accurate positioning. Thus, a small number of images is sufficient for queries very similar to images in the reference database. We then propose a novel approach, AIR-HLoc, which divides query images into different localisation difficulty levels based on their similarity to the reference image database. We consider an image with high similarity to the reference image as an easy query and an image with low similarity as a hard query. Easy queries show a limited improvement in accuracy when increasing $k$. Conversely, higher values of $k$ significantly improve accuracy for hard queries. Given the limited improvement in accuracy when increasing $k$ for easy queries and the significant improvement for hard queries, we adapt the value of $k$ to the query's difficulty level. Therefore, AIR-HLoc optimizes processing time by adaptively assigning different values of $k$ based on the similarity between the query and reference images without losing accuracy. Our extensive experiments on the Cambridge Landmarks, 7Scenes, and Aachen Day-Night-v1.1 datasets demonstrate our algorithm's efficacy, reducing 30\%, 26\%, and 11\% in computational overhead while maintaining SOTA accuracy compared to HLoc with fixed image retrieval.
△ Less
Submitted 27 March, 2024;
originally announced March 2024.
-
Enhanced Short Text Modeling: Leveraging Large Language Models for Topic Refinement
Authors:
Shuyu Chang,
Rui Wang,
Peng Ren,
Hai** Huang
Abstract:
Crafting effective topic models for brief texts, like tweets and news headlines, is essential for capturing the swift shifts in social dynamics. Traditional topic models, however, often fall short in accurately representing the semantic intricacies of short texts due to their brevity and lack of contextual data. In our study, we harness the advanced capabilities of Large Language Models (LLMs) to…
▽ More
Crafting effective topic models for brief texts, like tweets and news headlines, is essential for capturing the swift shifts in social dynamics. Traditional topic models, however, often fall short in accurately representing the semantic intricacies of short texts due to their brevity and lack of contextual data. In our study, we harness the advanced capabilities of Large Language Models (LLMs) to introduce a novel approach termed "Topic Refinement". This approach does not directly involve itself in the initial modeling of topics but focuses on improving topics after they have been mined. By employing prompt engineering, we direct LLMs to eliminate off-topic words within a given topic, ensuring that only contextually relevant words are preserved or substituted with ones that fit better semantically. This method emulates human-like scrutiny and improvement of topics, thereby elevating the semantic quality of the topics generated by various models. Our comprehensive evaluation across three unique datasets has shown that our topic refinement approach significantly enhances the semantic coherence of topics.
△ Less
Submitted 26 March, 2024;
originally announced March 2024.
-
Mix-Initiative Response Generation with Dynamic Prefix Tuning
Authors:
Yuxiang Nie,
Heyan Huang,
Xian-Ling Mao,
Lizi Liao
Abstract:
Mixed initiative serves as one of the key factors in controlling conversation directions. For a speaker, responding passively or leading proactively would result in rather different responses. However, most dialogue systems focus on training a holistic response generation model without any distinction among different initiatives. It leads to the cross-contamination problem, where the model confuse…
▽ More
Mixed initiative serves as one of the key factors in controlling conversation directions. For a speaker, responding passively or leading proactively would result in rather different responses. However, most dialogue systems focus on training a holistic response generation model without any distinction among different initiatives. It leads to the cross-contamination problem, where the model confuses different initiatives and generates inappropriate responses. Moreover, obtaining plenty of human annotations for initiative labels can be expensive. To address this issue, we propose a general mix-Initiative Dynamic Prefix Tuning framework (IDPT) to decouple different initiatives from the generation model, which learns initiative-aware prefixes in both supervised and unsupervised settings. Specifically, IDPT decouples initiative factors into different prefix parameters and uses the attention mechanism to adjust the selection of initiatives in guiding generation dynamically. The prefix parameters can be tuned towards accurate initiative prediction as well as mix-initiative response generation. Extensive experiments on two public dialogue datasets show that the proposed IDPT outperforms previous baselines on both automatic metrics and human evaluations. It also manages to generate appropriate responses with manipulated initiatives.
△ Less
Submitted 27 March, 2024; v1 submitted 26 March, 2024;
originally announced March 2024.
-
SegICL: A Multimodal In-context Learning Framework for Enhanced Segmentation in Medical Imaging
Authors:
Lingdong Shen,
Fangxin Shang,
Xiaoshuang Huang,
Yehui Yang,
Haifeng Huang,
Shiming Xiang
Abstract:
In the field of medical image segmentation, tackling Out-of-Distribution (OOD) segmentation tasks in a cost-effective manner remains a significant challenge. Universal segmentation models is a solution, which aim to generalize across the diverse modality of medical images, yet their effectiveness often diminishes when applied to OOD data modalities and tasks, requiring intricate fine-tuning of mod…
▽ More
In the field of medical image segmentation, tackling Out-of-Distribution (OOD) segmentation tasks in a cost-effective manner remains a significant challenge. Universal segmentation models is a solution, which aim to generalize across the diverse modality of medical images, yet their effectiveness often diminishes when applied to OOD data modalities and tasks, requiring intricate fine-tuning of model for optimal performance. Few-shot learning segmentation methods are typically designed for specific modalities of data and cannot be directly transferred for use with another modality. Therefore, we introduce SegICL, a novel approach leveraging In-Context Learning (ICL) for image segmentation. Unlike existing methods, SegICL has the capability to employ text-guided segmentation and conduct in-context learning with a small set of image-mask pairs, eliminating the need for training the model from scratch or fine-tuning for OOD tasks (including OOD modality and dataset). Extensive experimental demonstrates a positive correlation between the number of shots and segmentation performance on OOD tasks. The performance of segmentation when provided thre-shots is approximately 1.5 times better than the performance in a zero-shot setting. This indicates that SegICL effectively address new segmentation tasks based on contextual information. Additionally, SegICL also exhibits comparable performance to mainstream models on OOD and in-distribution tasks. Our code will be released after paper review.
△ Less
Submitted 29 May, 2024; v1 submitted 25 March, 2024;
originally announced March 2024.
-
InterFusion: Text-Driven Generation of 3D Human-Object Interaction
Authors:
Sisi Dai,
Wenhao Li,
Haowen Sun,
Haibin Huang,
Chongyang Ma,
Hui Huang,
Kai Xu,
Ruizhen Hu
Abstract:
In this study, we tackle the complex task of generating 3D human-object interactions (HOI) from textual descriptions in a zero-shot text-to-3D manner. We identify and address two key challenges: the unsatisfactory outcomes of direct text-to-3D methods in HOI, largely due to the lack of paired text-interaction data, and the inherent difficulties in simultaneously generating multiple concepts with c…
▽ More
In this study, we tackle the complex task of generating 3D human-object interactions (HOI) from textual descriptions in a zero-shot text-to-3D manner. We identify and address two key challenges: the unsatisfactory outcomes of direct text-to-3D methods in HOI, largely due to the lack of paired text-interaction data, and the inherent difficulties in simultaneously generating multiple concepts with complex spatial relationships. To effectively address these issues, we present InterFusion, a two-stage framework specifically designed for HOI generation. InterFusion involves human pose estimations derived from text as geometric priors, which simplifies the text-to-3D conversion process and introduces additional constraints for accurate object generation. At the first stage, InterFusion extracts 3D human poses from a synthesized image dataset depicting a wide range of interactions, subsequently map** these poses to interaction descriptions. The second stage of InterFusion capitalizes on the latest developments in text-to-3D generation, enabling the production of realistic and high-quality 3D HOI scenes. This is achieved through a local-global optimization process, where the generation of human body and object is optimized separately, and jointly refined with a global optimization of the entire scene, ensuring a seamless and contextually coherent integration. Our experimental results affirm that InterFusion significantly outperforms existing state-of-the-art methods in 3D HOI generation.
△ Less
Submitted 22 March, 2024;
originally announced March 2024.
-
Enhancing Positronium Lifetime Imaging through Two-Component Reconstruction in Time-of-Flight Positron Emission Tomography
Authors:
Zhuo Chen,
Chien-Min Kao,
Hsin-Hsiung Huang,
Lingling An
Abstract:
Positron Emission Tomography (PET) is a crucial tool in medical imaging, particularly for diagnosing diseases like cancer and Alzheimer's. The advent of Positronium Lifetime Imaging (PLI) has opened new avenues for assessing the tissue micro-environment, which is vital for early-stage disease detection. In this study, we introduce a two-component reconstruction model for PLI in Time-of-Flight (TOF…
▽ More
Positron Emission Tomography (PET) is a crucial tool in medical imaging, particularly for diagnosing diseases like cancer and Alzheimer's. The advent of Positronium Lifetime Imaging (PLI) has opened new avenues for assessing the tissue micro-environment, which is vital for early-stage disease detection. In this study, we introduce a two-component reconstruction model for PLI in Time-of-Flight (TOF) PET, incorporating both ortho-positronium and para-positronium decays. Our model enhances the accuracy of positronium imaging by providing a more detailed representation of the tissue environment. We conducted simulation studies to evaluate the performance of our model and compared it with existing single-component models. The results demonstrate the superiority of the two-component model in capturing the intricacies of the tissue micro-environment, thus paving the way for more precise and informative PET diagnostics.
△ Less
Submitted 22 March, 2024;
originally announced March 2024.
-
FedMef: Towards Memory-efficient Federated Dynamic Pruning
Authors:
Hong Huang,
Weiming Zhuang,
Chen Chen,
Lingjuan Lyu
Abstract:
Federated learning (FL) promotes decentralized training while prioritizing data confidentiality. However, its application on resource-constrained devices is challenging due to the high demand for computation and memory resources to train deep learning models. Neural network pruning techniques, such as dynamic pruning, could enhance model efficiency, but directly adopting them in FL still poses sub…
▽ More
Federated learning (FL) promotes decentralized training while prioritizing data confidentiality. However, its application on resource-constrained devices is challenging due to the high demand for computation and memory resources to train deep learning models. Neural network pruning techniques, such as dynamic pruning, could enhance model efficiency, but directly adopting them in FL still poses substantial challenges, including post-pruning performance degradation, high activation memory usage, etc. To address these challenges, we propose FedMef, a novel and memory-efficient federated dynamic pruning framework. FedMef comprises two key components. First, we introduce the budget-aware extrusion that maintains pruning efficiency while preserving post-pruning performance by salvaging crucial information from parameters marked for pruning within a given budget. Second, we propose scaled activation pruning to effectively reduce activation memory footprints, which is particularly beneficial for deploying FL to memory-limited devices. Extensive experiments demonstrate the effectiveness of our proposed FedMef. In particular, it achieves a significant reduction of 28.5% in memory footprint compared to state-of-the-art methods while obtaining superior accuracy.
△ Less
Submitted 21 March, 2024;
originally announced March 2024.
-
Auto-Train-Once: Controller Network Guided Automatic Network Pruning from Scratch
Authors:
Xidong Wu,
Shangqian Gao,
Zeyu Zhang,
Zhenzhen Li,
Runxue Bao,
Yanfu Zhang,
Xiaoqian Wang,
Heng Huang
Abstract:
Current techniques for deep neural network (DNN) pruning often involve intricate multi-step processes that require domain-specific expertise, making their widespread adoption challenging. To address the limitation, the Only-Train-Once (OTO) and OTOv2 are proposed to eliminate the need for additional fine-tuning steps by directly training and compressing a general DNN from scratch. Nevertheless, th…
▽ More
Current techniques for deep neural network (DNN) pruning often involve intricate multi-step processes that require domain-specific expertise, making their widespread adoption challenging. To address the limitation, the Only-Train-Once (OTO) and OTOv2 are proposed to eliminate the need for additional fine-tuning steps by directly training and compressing a general DNN from scratch. Nevertheless, the static design of optimizers (in OTO) can lead to convergence issues of local optima. In this paper, we proposed the Auto-Train-Once (ATO), an innovative network pruning algorithm designed to automatically reduce the computational and storage costs of DNNs. During the model training phase, our approach not only trains the target model but also leverages a controller network as an architecture generator to guide the learning of target model weights. Furthermore, we developed a novel stochastic gradient algorithm that enhances the coordination between model training and controller network training, thereby improving pruning performance. We provide a comprehensive convergence analysis as well as extensive experiments, and the results show that our approach achieves state-of-the-art performance across various model architectures (including ResNet18, ResNet34, ResNet50, ResNet56, and MobileNetv2) on standard benchmark datasets (CIFAR-10, CIFAR-100, and ImageNet).
△ Less
Submitted 20 March, 2024;
originally announced March 2024.
-
Green's matching: an efficient approach to parameter estimation in complex dynamic systems
Authors:
Jianbin Tan,
Guoyu Zhang,
Xueqin Wang,
Hui Huang,
Fang Yao
Abstract:
Parameters of differential equations are essential to characterize intrinsic behaviors of dynamic systems. Numerous methods for estimating parameters in dynamic systems are computationally and/or statistically inadequate, especially for complex systems with general-order differential operators, such as motion dynamics. This article presents Green's matching, a computationally tractable and statist…
▽ More
Parameters of differential equations are essential to characterize intrinsic behaviors of dynamic systems. Numerous methods for estimating parameters in dynamic systems are computationally and/or statistically inadequate, especially for complex systems with general-order differential operators, such as motion dynamics. This article presents Green's matching, a computationally tractable and statistically efficient two-step method, which only needs to approximate trajectories in dynamic systems but not their derivatives due to the inverse of differential operators by Green's function. This yields a statistically optimal guarantee for parameter estimation in general-order equations, a feature not shared by existing methods, and provides an efficient framework for broad statistical inferences in complex dynamic systems.
△ Less
Submitted 21 March, 2024;
originally announced March 2024.
-
Primordial extreme mass-ratio inspirals
Authors:
Hai-Long Huang,
Tian-Yi Song,
Yun-Song Piao
Abstract:
The coalescence of stellar-mass primordial black holes (PBHs) might explain some of the gravitation waves (GWs) events detected by LIGO-Virgo-KAGRA. On the other hand, observational hints for supermassive PBHs (SMPBHs) have been accumulated. Thus it can be expected that stellar-mass PBHs might be gravitationally bounded to SMPBHs ($\sim10^{6}-10^9M_\odot$) in the early Universe, and both constitut…
▽ More
The coalescence of stellar-mass primordial black holes (PBHs) might explain some of the gravitation waves (GWs) events detected by LIGO-Virgo-KAGRA. On the other hand, observational hints for supermassive PBHs (SMPBHs) have been accumulated. Thus it can be expected that stellar-mass PBHs might be gravitationally bounded to SMPBHs ($\sim10^{6}-10^9M_\odot$) in the early Universe, and both constituted primordial extreme mass-ratio inspirals (EMRIs). In this work, we initiate the study of the merger rate for primordial EMRIs. The corresponding intrinsic EMRI rate at low redshift may be comparable to that of astrophysical model, $10-10^4$yr$^{-1}$, which the space-based detector LISA has the capability to detect, but significantly raises with redshift. Though equal mass binaries also inevitably form, we find that under certain conditions the primordial EMRIs can be the most prevalent GW sources, and thus potentially a new probe to PBH.
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
A Unified and General Framework for Continual Learning
Authors:
Zhenyi Wang,
Yan Li,
Li Shen,
Heng Huang
Abstract:
Continual Learning (CL) focuses on learning from dynamic and changing data distributions while retaining previously acquired knowledge. Various methods have been developed to address the challenge of catastrophic forgetting, including regularization-based, Bayesian-based, and memory-replay-based techniques. However, these methods lack a unified framework and common terminology for describing their…
▽ More
Continual Learning (CL) focuses on learning from dynamic and changing data distributions while retaining previously acquired knowledge. Various methods have been developed to address the challenge of catastrophic forgetting, including regularization-based, Bayesian-based, and memory-replay-based techniques. However, these methods lack a unified framework and common terminology for describing their approaches. This research aims to bridge this gap by introducing a comprehensive and overarching framework that encompasses and reconciles these existing methodologies. Notably, this new framework is capable of encompassing established CL approaches as special instances within a unified and general optimization objective. An intriguing finding is that despite their diverse origins, these methods share common mathematical structures. This observation highlights the compatibility of these seemingly distinct techniques, revealing their interconnectedness through a shared underlying optimization objective. Moreover, the proposed general framework introduces an innovative concept called refresh learning, specifically designed to enhance the CL performance. This novel approach draws inspiration from neuroscience, where the human brain often sheds outdated information to improve the retention of crucial knowledge and facilitate the acquisition of new information. In essence, refresh learning operates by initially unlearning current data and subsequently relearning it. It serves as a versatile plug-in that seamlessly integrates with existing CL methods, offering an adaptable and effective enhancement to the learning process. Extensive experiments on CL benchmarks and theoretical analysis demonstrate the effectiveness of the proposed refresh learning. Code is available at \url{https://github.com/joey-wang123/CL-refresh-learning}.
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
VideoBadminton: A Video Dataset for Badminton Action Recognition
Authors:
Qi Li,
Tzu-Chen Chiu,
Hsiang-Wei Huang,
Min-Te Sun,
Wei-Shinn Ku
Abstract:
In the dynamic and evolving field of computer vision, action recognition has become a key focus, especially with the advent of sophisticated methodologies like Convolutional Neural Networks (CNNs), Convolutional 3D, Transformer, and spatial-temporal feature fusion. These technologies have shown promising results on well-established benchmarks but face unique challenges in real-world applications,…
▽ More
In the dynamic and evolving field of computer vision, action recognition has become a key focus, especially with the advent of sophisticated methodologies like Convolutional Neural Networks (CNNs), Convolutional 3D, Transformer, and spatial-temporal feature fusion. These technologies have shown promising results on well-established benchmarks but face unique challenges in real-world applications, particularly in sports analysis, where the precise decomposition of activities and the distinction of subtly different actions are crucial. Existing datasets like UCF101, HMDB51, and Kinetics have offered a diverse range of video data for various scenarios. However, there's an increasing need for fine-grained video datasets that capture detailed categorizations and nuances within broader action categories. In this paper, we introduce the VideoBadminton dataset derived from high-quality badminton footage. Through an exhaustive evaluation of leading methodologies on this dataset, this study aims to advance the field of action recognition, particularly in badminton sports. The introduction of VideoBadminton could not only serve for badminton action recognition but also provide a dataset for recognizing fine-grained actions. The insights gained from these evaluations are expected to catalyze further research in action comprehension, especially within sports contexts.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
Hairy Black Holes with Arbitrary Small Areas
Authors:
Xiao-** Rao,
Hyat Huang,
**bo Yang
Abstract:
We obtained new hairy black hole solutions in Einstein-scalar theory, including asymptotic flat, de Sitter and anti-de Sitter black holes. The theory is inspired by Ref. [1], where traversable wormhole solutions from an Einstein-phantom scalar theory are constructed. In this work, we found new black hole solutions in an Einstein-normal scalar theory. Comparing with Schwarzschild metric, the hairy…
▽ More
We obtained new hairy black hole solutions in Einstein-scalar theory, including asymptotic flat, de Sitter and anti-de Sitter black holes. The theory is inspired by Ref. [1], where traversable wormhole solutions from an Einstein-phantom scalar theory are constructed. In this work, we found new black hole solutions in an Einstein-normal scalar theory. Comparing with Schwarzschild metric, the hairy black holes have two interesting properties: i) the areas of the black holes are always smaller than the same mass Schwarzschild black holes; ii) A naked singularity with positive mass arises when the black hole mass decreases. The energy conditions for the black holes and naked singularities are checked. We found that, as hairy black holes, the null energy condition(NEC) and the strong energy condition(SEC) are hold, while the weak energy condition(WEC) is violated in the vicinity of black hole horizon. The naked singularity respects to all three energy conditions. We also investigate the quasinormal modes(QNMs) of the hairy black holes by a test scalar field. The results indicate that one can distinguish hairy black holes with the same mass Schwarzschilid black hole by their QNM spectra.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
Mixture-of-Prompt-Experts for Multi-modal Semantic Understanding
Authors:
Zichen Wu,
Hsiu-Yuan Huang,
Fanyi Qu,
Yunfang Wu
Abstract:
Deep multimodal semantic understanding that goes beyond the mere superficial content relation mining has received increasing attention in the realm of artificial intelligence. The challenges of collecting and annotating high-quality multi-modal data have underscored the significance of few-shot learning. In this paper, we focus on two critical tasks under this context: few-shot multi-modal sarcasm…
▽ More
Deep multimodal semantic understanding that goes beyond the mere superficial content relation mining has received increasing attention in the realm of artificial intelligence. The challenges of collecting and annotating high-quality multi-modal data have underscored the significance of few-shot learning. In this paper, we focus on two critical tasks under this context: few-shot multi-modal sarcasm detection (MSD) and multi-modal sentiment analysis (MSA). To address them, we propose Mixture-of-Prompt-Experts with Block-Aware Prompt Fusion (MoPE-BAF), a novel multi-modal soft prompt framework based on the unified vision-language model (VLM). Specifically, we design three experts of soft prompts: a text prompt and an image prompt that extract modality-specific features to enrich the single-modal representation, and a unified prompt to assist multi-modal interaction. Additionally, we reorganize Transformer layers into several blocks and introduce cross-modal prompt attention between adjacent blocks, which smoothens the transition from single-modal representation to multi-modal fusion. On both MSD and MSA datasets in few-shot setting, our proposed model not only surpasses the 8.2B model InstructBLIP with merely 2% parameters (150M), but also significantly outperforms other widely-used prompt methods on VLMs or task-specific methods.
△ Less
Submitted 24 March, 2024; v1 submitted 17 March, 2024;
originally announced March 2024.
-
SpikeNeRF: Learning Neural Radiance Fields from Continuous Spike Stream
Authors:
Lin Zhu,
Kangmin Jia,
Yifan Zhao,
Yunshan Qi,
Lizhi Wang,
Hua Huang
Abstract:
Spike cameras, leveraging spike-based integration sampling and high temporal resolution, offer distinct advantages over standard cameras. However, existing approaches reliant on spike cameras often assume optimal illumination, a condition frequently unmet in real-world scenarios. To address this, we introduce SpikeNeRF, the first work that derives a NeRF-based volumetric scene representation from…
▽ More
Spike cameras, leveraging spike-based integration sampling and high temporal resolution, offer distinct advantages over standard cameras. However, existing approaches reliant on spike cameras often assume optimal illumination, a condition frequently unmet in real-world scenarios. To address this, we introduce SpikeNeRF, the first work that derives a NeRF-based volumetric scene representation from spike camera data. Our approach leverages NeRF's multi-view consistency to establish robust self-supervision, effectively eliminating erroneous measurements and uncovering coherent structures within exceedingly noisy input amidst diverse real-world illumination scenarios. The framework comprises two core elements: a spike generation model incorporating an integrate-and-fire neuron layer and parameters accounting for non-idealities, such as threshold variation, and a spike rendering loss capable of generalizing across varying illumination conditions. We describe how to effectively optimize neural radiance fields to render photorealistic novel views from the novel continuous spike stream, demonstrating advantages over other vision sensors in certain scenes. Empirical evaluations conducted on both real and novel realistically simulated sequences affirm the efficacy of our methodology. The dataset and source code are released at https://github.com/BIT-Vision/SpikeNeRF.
△ Less
Submitted 17 March, 2024;
originally announced March 2024.
-
Journey into SPH Simulation: A Comprehensive Framework and Showcase
Authors:
Haofeng Huang,
Li Yi
Abstract:
This report presents the development and results of an advanced SPH (Smoothed Particle Hydrodynamics) simulation framework, designed for high fidelity fluid dynamics modeling. Our framework, accessible at https://github.com/jason-huang03/SPH_Project, integrates various SPH algorithms including WCSPH, PCISPH, and DFSPH, alongside techniques for rigid-fluid coupling and high viscosity fluid simulati…
▽ More
This report presents the development and results of an advanced SPH (Smoothed Particle Hydrodynamics) simulation framework, designed for high fidelity fluid dynamics modeling. Our framework, accessible at https://github.com/jason-huang03/SPH_Project, integrates various SPH algorithms including WCSPH, PCISPH, and DFSPH, alongside techniques for rigid-fluid coupling and high viscosity fluid simulations. Leveraging the computational power of CUDA and the versatility of Taichi, the framework excels in handling large-scale simulations with millions of particles. We demonstrate the capability of our framework through a series of simulations showcasing rigid-fluid coupling, high viscosity fluids, and large-scale fluid dynamics. Furthermore, a detailed performance analysis reveals CUDA's superior efficiency across different hardware platforms. This work is an exploraion into modern SPH simulation techniques, showcasing their practical implementation and capabilities.
△ Less
Submitted 17 March, 2024;
originally announced March 2024.
-
Exploring Learning-based Motion Models in Multi-Object Tracking
Authors:
Hsiang-Wei Huang,
Cheng-Yen Yang,
Wenhao Chai,
Zhongyu Jiang,
Jenq-Neng Hwang
Abstract:
In the field of multi-object tracking (MOT), traditional methods often rely on the Kalman Filter for motion prediction, leveraging its strengths in linear motion scenarios. However, the inherent limitations of these methods become evident when confronted with complex, nonlinear motions and occlusions prevalent in dynamic environments like sports and dance. This paper explores the possibilities of…
▽ More
In the field of multi-object tracking (MOT), traditional methods often rely on the Kalman Filter for motion prediction, leveraging its strengths in linear motion scenarios. However, the inherent limitations of these methods become evident when confronted with complex, nonlinear motions and occlusions prevalent in dynamic environments like sports and dance. This paper explores the possibilities of replacing the Kalman Filter with various learning-based motion model that effectively enhances tracking accuracy and adaptability beyond the constraints of Kalman Filter-based systems. In this paper, we proposed MambaTrack, an online motion-based tracker that outperforms all existing motion-based trackers on the challenging DanceTrack and SportsMOT datasets. Moreover, we further exploit the potential of the state-space-model in trajectory feature extraction to boost the tracking performance and proposed MambaTrack+, which achieves the state-of-the-art performance on DanceTrack dataset with 56.1 HOTA and 54.9 IDF1.
△ Less
Submitted 16 March, 2024;
originally announced March 2024.
-
Gaussian universality for approximately polynomial functions of high-dimensional data
Authors:
Kevin Han Huang,
Morgane Austern,
Peter Orbanz
Abstract:
We establish an invariance principle for polynomial functions of $n$ independent high-dimensional random vectors, and also show that the obtained rates are nearly optimal. Both the dimension of the vectors and the degree of the polynomial are permitted to grow with $n$. Specifically, we obtain a finite sample upper bound for the error of approximation by a polynomial of Gaussians, measured in Kolm…
▽ More
We establish an invariance principle for polynomial functions of $n$ independent high-dimensional random vectors, and also show that the obtained rates are nearly optimal. Both the dimension of the vectors and the degree of the polynomial are permitted to grow with $n$. Specifically, we obtain a finite sample upper bound for the error of approximation by a polynomial of Gaussians, measured in Kolmogorov distance, and extend it to functions that are approximately polynomial in a mean squared error sense. We give a corresponding lower bound that shows the invariance principle holds up to polynomial degree $o(\log n)$. The proof is constructive and adapts an asymmetrisation argument due to V. V. Senatov. As applications, we obtain a higher-order delta method with possibly non-Gaussian limits, and generalise a number of known results on high-dimensional and infinite-order U-statistics, and on fluctuations of subgraph counts.
△ Less
Submitted 15 March, 2024;
originally announced March 2024.
-
Lifelong LERF: Local 3D Semantic Inventory Monitoring Using FogROS2
Authors:
Adam Rashid,
Chung Min Kim,
Justin Kerr,
Letian Fu,
Kush Hari,
Ayah Ahmad,
Kaiyuan Chen,
Huang Huang,
Marcus Gualtieri,
Michael Wang,
Christian Juette,
Nan Tian,
Liu Ren,
Ken Goldberg
Abstract:
Inventory monitoring in homes, factories, and retail stores relies on maintaining data despite objects being swapped, added, removed, or moved. We introduce Lifelong LERF, a method that allows a mobile robot with minimal compute to jointly optimize a dense language and geometric representation of its surroundings. Lifelong LERF maintains this representation over time by detecting semantic changes…
▽ More
Inventory monitoring in homes, factories, and retail stores relies on maintaining data despite objects being swapped, added, removed, or moved. We introduce Lifelong LERF, a method that allows a mobile robot with minimal compute to jointly optimize a dense language and geometric representation of its surroundings. Lifelong LERF maintains this representation over time by detecting semantic changes and selectively updating these regions of the environment, avoiding the need to exhaustively remap. Human users can query inventory by providing natural language queries and receiving a 3D heatmap of potential object locations. To manage the computational load, we use Fog-ROS2, a cloud robotics platform, to offload resource-intensive tasks. Lifelong LERF obtains poses from a monocular RGBD SLAM backend, and uses these poses to progressively optimize a Language Embedded Radiance Field (LERF) for semantic monitoring. Experiments with 3-5 objects arranged on a tabletop and a Turtlebot with a RealSense camera suggest that Lifelong LERF can persistently adapt to changes in objects with up to 91% accuracy.
△ Less
Submitted 15 March, 2024;
originally announced March 2024.
-
Further study of $c\bar{c}c\bar{c}$ system within a chiral quark model
Authors:
Yuheng Wu,
Xuejie Liu,
Yue Tan,
Hongxia Huang,
Jialun **
Abstract:
Inspired by the recent Altas and CMS experiments on the invariant mass spectrum of $J/ψJ/ψ$, we systematically study the $c\bar{c}c\bar{c}$ system of $J^{P}=0^{+}$. In the framework of chiral quark model, we have carried out bound-state calculation and resonance-state calculation respectively by using Real-scaling method. The results of bound-state calculation show that there are no bound states i…
▽ More
Inspired by the recent Altas and CMS experiments on the invariant mass spectrum of $J/ψJ/ψ$, we systematically study the $c\bar{c}c\bar{c}$ system of $J^{P}=0^{+}$. In the framework of chiral quark model, we have carried out bound-state calculation and resonance-state calculation respectively by using Real-scaling method. The results of bound-state calculation show that there are no bound states in the $c\bar{c}c\bar{c}$ with $0^{+}$ system. The resonance-state calculation shows that there are four possible stable resonances: $R(6920)$, $R(7000)$, $R(7080)$ and $R(7160)$. $R(6920)$ and $R(7160)$ are experimental candidates for $X(6900)$ and $X(7200)$, whose main decay channel is $J/ψJ/ψ$. It is important to note that the another major decay channel of $R(7160)$ is $χ_{c0} χ_{c0} $, and the $χ_{c0} χ_{c0} $ is also the main decay channel of $R(7000)$, $R(7080)$. Therefore, we propose to search experimentally for these two predicted resonances in the $χ_{c0} χ_{c0}$ invariant mass spectrum.
△ Less
Submitted 21 March, 2024; v1 submitted 15 March, 2024;
originally announced March 2024.
-
Convergence Rates For Tikhonov Regularization of Coefficient Identification Problems in Robin-Boundary Equation
Authors:
Huimin Huang,
Wensheng Zhang
Abstract:
This paper investigates the convergence rate for Tikhonov regularization of the problem of identifying the coefficient $a \in L^{\infty}(Ω)$ in the Robin-boundary equation $-\mathrm{div}(a\nabla u)-bu=f,~ x \in Ω\subset \mathbb R^M,~ M \geq 1$ and $u=0,~ x ~on~ \partialΩ$, where $f(x)\in L^{\infty}(Ω)$. Assume we only know the imprecise values of $u$ in the subset $Ω_1 \subset Ω$ given by…
▽ More
This paper investigates the convergence rate for Tikhonov regularization of the problem of identifying the coefficient $a \in L^{\infty}(Ω)$ in the Robin-boundary equation $-\mathrm{div}(a\nabla u)-bu=f,~ x \in Ω\subset \mathbb R^M,~ M \geq 1$ and $u=0,~ x ~on~ \partialΩ$, where $f(x)\in L^{\infty}(Ω)$. Assume we only know the imprecise values of $u$ in the subset $Ω_1 \subset Ω$ given by $z^δ \in {H}^1(Ω_1)$, satisfies $\|u-z^δ\|_{H^1(Ω_1)}\leq δ$. We assume $u$ satisfy the following boundary conditions on $\partialΩ_1$: \begin{align*} \nabla u \cdot \vec{n}+γu =0~on~\partialΩ_1, \end{align*} where $\vec{n}$ is the normal vector of $\partialΩ_1$ and $γ>0$ is a constant. We regularize this problem by correspondingly minimizing the strictly convex functional:
\begin{align*}
\min \limits_{a \in \mathbb A} &\frac12 \int_{Ω_1} a | {\nabla(U(a)-z^δ)}|^2 +\frac12\int_{\partialΩ_1} aγ[U(a)-z^δ]^2-\frac12 \int_{Ω_1} b [U(a)-z^δ]^2\\ &+ ρ\| a-a^* \|^2_{L^2(Ω)},
\end{align*}
where $U(a)$ is a map for $a$ to the solution of the Robin-boundary problem, $ρ> 0$ is the regularization parameter and $a^*$ is a priori estimate of $a$. We prove that the functional attain a unique global minimizer on the admissible set. Further, we give very simple source condition without the smallness requirement on the source function which provide the convergence rate $O(\sqrtδ)$ for the regularized solution.
△ Less
Submitted 15 March, 2024;
originally announced March 2024.
-
DiffMAC: Diffusion Manifold Hallucination Correction for High Generalization Blind Face Restoration
Authors:
Nan Gao,
Jia Li,
Huaibo Huang,
Zhi Zeng,
Ke Shang,
Shuwu Zhang,
Ran He
Abstract:
Blind face restoration (BFR) is a highly challenging problem due to the uncertainty of degradation patterns. Current methods have low generalization across photorealistic and heterogeneous domains. In this paper, we propose a Diffusion-Information-Diffusion (DID) framework to tackle diffusion manifold hallucination correction (DiffMAC), which achieves high-generalization face restoration in divers…
▽ More
Blind face restoration (BFR) is a highly challenging problem due to the uncertainty of degradation patterns. Current methods have low generalization across photorealistic and heterogeneous domains. In this paper, we propose a Diffusion-Information-Diffusion (DID) framework to tackle diffusion manifold hallucination correction (DiffMAC), which achieves high-generalization face restoration in diverse degraded scenes and heterogeneous domains. Specifically, the first diffusion stage aligns the restored face with spatial feature embedding of the low-quality face based on AdaIN, which synthesizes degradation-removal results but with uncontrollable artifacts for some hard cases. Based on Stage I, Stage II considers information compression using manifold information bottleneck (MIB) and finetunes the first diffusion model to improve facial fidelity. DiffMAC effectively fights against blind degradation patterns and synthesizes high-quality faces with attribute and identity consistencies. Experimental results demonstrate the superiority of DiffMAC over state-of-the-art methods, with a high degree of generalization in real-world and heterogeneous settings. The source code and models will be public.
△ Less
Submitted 15 March, 2024;
originally announced March 2024.
-
Measurements of All-Particle Energy Spectrum and Mean Logarithmic Mass of Cosmic Rays from 0.3 to 30 PeV with LHAASO-KM2A
Authors:
The LHAASO Collaboration,
Zhen Cao,
F. Aharonian,
Q. An,
A. Axikegu,
Y. X. Bai,
Y. W. Bao,
D. Bastieri,
X. J. Bi,
Y. J. Bi,
J. T. Cai,
Q. Cao,
W. Y. Cao,
Zhe Cao,
J. Chang,
J. F. Chang,
A. M. Chen,
E. S. Chen,
Liang Chen,
Lin Chen,
Long Chen,
M. J. Chen,
M. L. Chen,
Q. H. Chen,
S. H. Chen
, et al. (256 additional authors not shown)
Abstract:
We present the measurements of all-particle energy spectrum and mean logarithmic mass of cosmic rays in the energy range of 0.3-30 PeV using data collected from LHAASO-KM2A between September 2021 and December 2022, which is based on a nearly composition-independent energy reconstruction method, achieving unprecedented accuracy. Our analysis reveals the position of the knee at…
▽ More
We present the measurements of all-particle energy spectrum and mean logarithmic mass of cosmic rays in the energy range of 0.3-30 PeV using data collected from LHAASO-KM2A between September 2021 and December 2022, which is based on a nearly composition-independent energy reconstruction method, achieving unprecedented accuracy. Our analysis reveals the position of the knee at $3.67 \pm 0.05 \pm 0.15$ PeV. Below the knee, the spectral index is found to be -$2.7413 \pm 0.0004 \pm 0.0050$, while above the knee, it is -$3.128 \pm 0.005 \pm 0.027$, with the sharpness of the transition measured with a statistical error of 2%. The mean logarithmic mass of cosmic rays is almost heavier than helium in the whole measured energy range. It decreases from 1.7 at 0.3 PeV to 1.3 at 3 PeV, representing a 24% decline following a power law with an index of -$0.1200 \pm 0.0003 \pm 0.0341$. This is equivalent to an increase in abundance of light components. Above the knee, the mean logarithmic mass exhibits a power law trend towards heavier components, which is reversal to the behavior observed in the all-particle energy spectrum. Additionally, the knee position and the change in power-law index are approximately the same. These findings suggest that the knee observed in the all-particle spectrum corresponds to the knee of the light component, rather than the medium-heavy components.
△ Less
Submitted 26 March, 2024; v1 submitted 15 March, 2024;
originally announced March 2024.
-
Few-Shot Class Incremental Learning with Attention-Aware Self-Adaptive Prompt
Authors:
Chenxi Liu,
Zhenyi Wang,
Tianyi Xiong,
Ruibo Chen,
Yihan Wu,
Junfeng Guo,
Heng Huang
Abstract:
Few-Shot Class-Incremental Learning (FSCIL) models aim to incrementally learn new classes with scarce samples while preserving knowledge of old ones. Existing FSCIL methods usually fine-tune the entire backbone, leading to overfitting and hindering the potential to learn new classes. On the other hand, recent prompt-based CIL approaches alleviate forgetting by training prompts with sufficient data…
▽ More
Few-Shot Class-Incremental Learning (FSCIL) models aim to incrementally learn new classes with scarce samples while preserving knowledge of old ones. Existing FSCIL methods usually fine-tune the entire backbone, leading to overfitting and hindering the potential to learn new classes. On the other hand, recent prompt-based CIL approaches alleviate forgetting by training prompts with sufficient data in each task. In this work, we propose a novel framework named Attention-aware Self-adaptive Prompt (ASP). ASP encourages task-invariant prompts to capture shared knowledge by reducing specific information from the attention aspect. Additionally, self-adaptive task-specific prompts in ASP provide specific information and transfer knowledge from old classes to new classes with an Information Bottleneck learning objective. In summary, ASP prevents overfitting on base task and does not require enormous data in few-shot incremental tasks. Extensive experiments on three benchmark datasets validate that ASP consistently outperforms state-of-the-art FSCIL and prompt-based CIL methods in terms of both learning new classes and mitigating forgetting.
△ Less
Submitted 25 March, 2024; v1 submitted 14 March, 2024;
originally announced March 2024.
-
Renovating Names in Open-Vocabulary Segmentation Benchmarks
Authors:
Haiwen Huang,
Songyou Peng,
Dan Zhang,
Andreas Geiger
Abstract:
Names are essential to both human cognition and vision-language models. Open-vocabulary models utilize class names as text prompts to generalize to categories unseen during training. However, the precision of these names is often overlooked in existing datasets. In this paper, we address this underexplored problem by presenting a framework for "renovating" names in open-vocabulary segmentation ben…
▽ More
Names are essential to both human cognition and vision-language models. Open-vocabulary models utilize class names as text prompts to generalize to categories unseen during training. However, the precision of these names is often overlooked in existing datasets. In this paper, we address this underexplored problem by presenting a framework for "renovating" names in open-vocabulary segmentation benchmarks (RENOVATE). Our framework features a renaming model that enhances the quality of names for each visual segment. Through experiments, we demonstrate that our renovated names help train stronger open-vocabulary models with up to 15% relative improvement and significantly enhance training efficiency with improved data quality. We also show that our renovated names improve evaluation by better measuring misclassification and enabling fine-grained model analysis. We will provide our code and relabelings for several popular segmentation datasets (MS COCO, ADE20K, Cityscapes) to the research community.
△ Less
Submitted 24 May, 2024; v1 submitted 14 March, 2024;
originally announced March 2024.
-
CoPa: General Robotic Manipulation through Spatial Constraints of Parts with Foundation Models
Authors:
Haoxu Huang,
Fanqi Lin,
Yingdong Hu,
Shengjie Wang,
Yang Gao
Abstract:
Foundation models pre-trained on web-scale data are shown to encapsulate extensive world knowledge beneficial for robotic manipulation in the form of task planning. However, the actual physical implementation of these plans often relies on task-specific learning methods, which require significant data collection and struggle with generalizability. In this work, we introduce Robotic Manipulation th…
▽ More
Foundation models pre-trained on web-scale data are shown to encapsulate extensive world knowledge beneficial for robotic manipulation in the form of task planning. However, the actual physical implementation of these plans often relies on task-specific learning methods, which require significant data collection and struggle with generalizability. In this work, we introduce Robotic Manipulation through Spatial Constraints of Parts (CoPa), a novel framework that leverages the common sense knowledge embedded within foundation models to generate a sequence of 6-DoF end-effector poses for open-world robotic manipulation. Specifically, we decompose the manipulation process into two phases: task-oriented gras** and task-aware motion planning. In the task-oriented gras** phase, we employ foundation vision-language models (VLMs) to select the object's gras** part through a novel coarse-to-fine grounding mechanism. During the task-aware motion planning phase, VLMs are utilized again to identify the spatial geometry constraints of task-relevant object parts, which are then used to derive post-grasp poses. We also demonstrate how CoPa can be seamlessly integrated with existing robotic planning algorithms to accomplish complex, long-horizon tasks. Our comprehensive real-world experiments show that CoPa possesses a fine-grained physical understanding of scenes, capable of handling open-set instructions and objects with minimal prompt engineering and without additional training. Project page: https://copa-2024.github.io/
△ Less
Submitted 13 March, 2024;
originally announced March 2024.
-
Synchronized Dual-arm Rearrangement via Cooperative mTSP
Authors:
Wenhao Li,
Shishun Zhang,
Sisi Dai,
Hui Huang,
Ruizhen Hu,
Xiaohong Chen,
Kai Xu
Abstract:
Synchronized dual-arm rearrangement is widely studied as a common scenario in industrial applications. It often faces scalability challenges due to the computational complexity of robotic arm rearrangement and the high-dimensional nature of dual-arm planning. To address these challenges, we formulated the problem as cooperative mTSP, a variant of mTSP where agents share cooperative costs, and util…
▽ More
Synchronized dual-arm rearrangement is widely studied as a common scenario in industrial applications. It often faces scalability challenges due to the computational complexity of robotic arm rearrangement and the high-dimensional nature of dual-arm planning. To address these challenges, we formulated the problem as cooperative mTSP, a variant of mTSP where agents share cooperative costs, and utilized reinforcement learning for its solution. Our approach involved representing rearrangement tasks using a task state graph that captured spatial relationships and a cooperative cost matrix that provided details about action costs. Taking these representations as observations, we designed an attention-based network to effectively combine them and provide rational task scheduling. Furthermore, a cost predictor is also introduced to directly evaluate actions during both training and planning, significantly expediting the planning process. Our experimental results demonstrate that our approach outperforms existing methods in terms of both performance and planning efficiency.
△ Less
Submitted 12 March, 2024;
originally announced March 2024.
-
Towards Independence Criterion in Machine Unlearning of Features and Labels
Authors:
Ling Han,
Nanqing Luo,
Hao Huang,
**g Chen,
Mary-Anne Hartley
Abstract:
This work delves into the complexities of machine unlearning in the face of distributional shifts, particularly focusing on the challenges posed by non-uniform feature and label removal. With the advent of regulations like the GDPR emphasizing data privacy and the right to be forgotten, machine learning models face the daunting task of unlearning sensitive information without compromising their in…
▽ More
This work delves into the complexities of machine unlearning in the face of distributional shifts, particularly focusing on the challenges posed by non-uniform feature and label removal. With the advent of regulations like the GDPR emphasizing data privacy and the right to be forgotten, machine learning models face the daunting task of unlearning sensitive information without compromising their integrity or performance. Our research introduces a novel approach that leverages influence functions and principles of distributional independence to address these challenges. By proposing a comprehensive framework for machine unlearning, we aim to ensure privacy protection while maintaining model performance and adaptability across varying distributions. Our method not only facilitates efficient data removal but also dynamically adjusts the model to preserve its generalization capabilities. Through extensive experimentation, we demonstrate the efficacy of our approach in scenarios characterized by significant distributional shifts, making substantial contributions to the field of machine unlearning. This research paves the way for develo** more resilient and adaptable unlearning techniques, ensuring models remain robust and accurate in the dynamic landscape of data privacy and machine learning.
△ Less
Submitted 12 March, 2024;
originally announced March 2024.