-
CAT: Interpretable Concept-based Taylor Additive Models
Authors:
Viet Duong,
Qiong Wu,
Zhengyi Zhou,
Hongjue Zhao,
Chenxiang Luo,
Eric Zavesky,
Huaxiu Yao,
Huajie Shao
Abstract:
As an emerging interpretable technique, Generalized Additive Models (GAMs) adopt neural networks to individually learn non-linear functions for each feature, which are then combined through a linear model for final predictions. Although GAMs can explain deep neural networks (DNNs) at the feature level, they require large numbers of model parameters and are prone to overfitting, making them hard to…
▽ More
As an emerging interpretable technique, Generalized Additive Models (GAMs) adopt neural networks to individually learn non-linear functions for each feature, which are then combined through a linear model for final predictions. Although GAMs can explain deep neural networks (DNNs) at the feature level, they require large numbers of model parameters and are prone to overfitting, making them hard to train and scale. Additionally, in real-world datasets with many features, the interpretability of feature-based explanations diminishes for humans. To tackle these issues, recent research has shifted towards concept-based interpretable methods. These approaches try to integrate concept learning as an intermediate step before making predictions, explaining the predictions in terms of human-understandable concepts. However, these methods require domain experts to extensively label concepts with relevant names and their ground-truth values. In response, we propose CAT, a novel interpretable Concept-bAsed Taylor additive model to simply this process. CAT does not have to require domain experts to annotate concepts and their ground-truth values. Instead, it only requires users to simply categorize input features into broad groups, which can be easily accomplished through a quick metadata review. Specifically, CAT first embeds each group of input features into one-dimensional high-level concept representation, and then feeds the concept representations into a new white-box Taylor Neural Network (TaylorNet). The TaylorNet aims to learn the non-linear relationship between the inputs and outputs using polynomials. Evaluation results across multiple benchmarks demonstrate that CAT can outperform or compete with the baselines while reducing the need of extensive model parameters. Importantly, it can explain model predictions through high-level concepts that human can understand.
△ Less
Submitted 26 June, 2024; v1 submitted 25 June, 2024;
originally announced June 2024.
-
Cost-Effective RF Fingerprinting Based on Hybrid CVNN-RF Classifier with Automated Multi-Dimensional Early-Exit Strategy
Authors:
Jiayan Gan,
Zhixing Du,
Qiang Li,
Huaizong Shao,
**gran Lin,
Ye Pan,
Zhongyi Wen,
Shafei Wang
Abstract:
While the Internet of Things (IoT) technology is booming and offers huge opportunities for information exchange, it also faces unprecedented security challenges. As an important complement to the physical layer security technologies for IoT, radio frequency fingerprinting (RFF) is of great interest due to its difficulty in counterfeiting. Recently, many machine learning (ML)-based RFF algorithms h…
▽ More
While the Internet of Things (IoT) technology is booming and offers huge opportunities for information exchange, it also faces unprecedented security challenges. As an important complement to the physical layer security technologies for IoT, radio frequency fingerprinting (RFF) is of great interest due to its difficulty in counterfeiting. Recently, many machine learning (ML)-based RFF algorithms have emerged. In particular, deep learning (DL) has shown great benefits in automatically extracting complex and subtle features from raw data with high classification accuracy. However, DL algorithms face the computational cost problem as the difficulty of the RFF task and the size of the DNN have increased dramatically. To address the above challenge, this paper proposes a novel costeffective early-exit neural network consisting of a complex-valued neural network (CVNN) backbone with multiple random forest branches, called hybrid CVNN-RF. Unlike conventional studies that use a single fixed DL model to process all RF samples, our hybrid CVNN-RF considers differences in the recognition difficulty of RF samples and introduces an early-exit mechanism to dynamically process the samples. When processing "easy" samples that can be well classified with high confidence, the hybrid CVNN-RF can end early at the random forest branch to reduce computational cost. Conversely, subsequent network layers will be activated to ensure accuracy. To further improve the early-exit rate, an automated multi-dimensional early-exit strategy is proposed to achieve scheduling control from multiple dimensions within the network depth and classification category. Finally, our experiments on the public ADS-B dataset show that the proposed algorithm can reduce the computational cost by 83% while improving the accuracy by 1.6% under a classification task with 100 categories.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
Error-Correcting Graph Codes
Authors:
Swastik Kopparty,
Aditya Potukuchi,
Harry Sha
Abstract:
In this paper, we define, study, and construct {\em Error-Correcting Graph Codes}. An error-correcting graph code of distance $δ$ is a family $C$ of graphs, on a common vertex set of size $n$, such that if we start with any graph in $C$, we would have to modify the neighborhoods of at least $δn$ vertices in order to reach some other graph in $C$.
This is a natural graph generalization of the sta…
▽ More
In this paper, we define, study, and construct {\em Error-Correcting Graph Codes}. An error-correcting graph code of distance $δ$ is a family $C$ of graphs, on a common vertex set of size $n$, such that if we start with any graph in $C$, we would have to modify the neighborhoods of at least $δn$ vertices in order to reach some other graph in $C$.
This is a natural graph generalization of the standard Hamming distance error-correcting codes for binary strings. We show:
1. Combinatorial results determining the optimal rate vs distance tradeoff nonconstructively.
2. A connection to rank-metric codes, enabling some simple and some involved constructions achieving certain positive rates and distances.
3. Graph code analogues of Reed-Solomon codes and code concatenation, leading to positive distance codes for all rates and positive rate codes for all distances.
4. Graph code analogues of dual-BCH codes, yielding large codes with distance $δ= 1-o(1)$. This gives an explicit "graph code of Ramsey graphs".
Several recent works, starting with the paper of Alon, Gujgiczer, Körner, Milojević, and Simonyi, have studied more general graph codes; where the symmetric difference between any two graphs in the code is required to have a desired property. Error-correcting graph codes are a particularly interesting instantiation of this concept.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Imaging, counting, and positioning single interstitial atoms in solids
Authors:
Jizhe Cui,
Haozhi Sha,
Liangze Mao,
Kang Sun,
Wenfeng Yang,
Rong Yu
Abstract:
Interstitial atoms are ubiquitous in solids and they are widely incorporated into materials to tune their lattice structure, electronic transportation, and mechanical properties. Because the distribution of interstitial atoms in matrix materials is usually disordered and most of them are light atoms with weak scattering ability, it remains a challenge to directly image single interstitial atoms an…
▽ More
Interstitial atoms are ubiquitous in solids and they are widely incorporated into materials to tune their lattice structure, electronic transportation, and mechanical properties. Because the distribution of interstitial atoms in matrix materials is usually disordered and most of them are light atoms with weak scattering ability, it remains a challenge to directly image single interstitial atoms and measure their geometrical positions. In this work, direct imaging and measuring of single interstitial atoms have been realized with adaptive-propagator ptychography. The measurement of their three-dimensional coordinates enables quantitative analysis of the pair distribution function of the interstitial atoms and reveals the anisotropic occupation of oxygen in the interstitial sites in titanium. The current work paves the way for the determination of interstitial atoms in materials, and for the correlation between the atomic-scale behavior of interstitial atoms and the physical properties of materials.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
Clip Body and Tail Separately: High Probability Guarantees for DPSGD with Heavy Tails
Authors:
Haichao Sha,
Yang Cao,
Yong Liu,
Yuncheng Wu,
Ruixuan Liu,
Hong Chen
Abstract:
Differentially Private Stochastic Gradient Descent (DPSGD) is widely utilized to preserve training data privacy in deep learning, which first clips the gradients to a predefined norm and then injects calibrated noise into the training procedure. Existing DPSGD works typically assume the gradients follow sub-Gaussian distributions and design various clip** mechanisms to optimize training performa…
▽ More
Differentially Private Stochastic Gradient Descent (DPSGD) is widely utilized to preserve training data privacy in deep learning, which first clips the gradients to a predefined norm and then injects calibrated noise into the training procedure. Existing DPSGD works typically assume the gradients follow sub-Gaussian distributions and design various clip** mechanisms to optimize training performance. However, recent studies have shown that the gradients in deep learning exhibit a heavy-tail phenomenon, that is, the tails of the gradient have infinite variance, which may lead to excessive clip** loss to the gradients with existing DPSGD mechanisms. To address this problem, we propose a novel approach, Discriminative Clip**~(DC)-DPSGD, with two key designs. First, we introduce a subspace identification technique to distinguish between body and tail gradients. Second, we present a discriminative clip** mechanism that applies different clip** thresholds for body and tail gradients to reduce the clip** loss. Under the non-convex condition, \ourtech{} reduces the empirical gradient norm from {${\mathbb{O}\left(\log^{\max(0,θ-1)}(T/δ)\log^{2θ}(\sqrt{T})\right)}$} to {${\mathbb{O}\left(\log(\sqrt{T})\right)}$} with heavy-tailed index $θ\geq 1/2$, iterations $T$, and arbitrary probability $δ$. Extensive experiments on four real-world datasets demonstrate that our approach outperforms three baselines by up to 9.72\% in terms of accuracy.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
CLAQ: Pushing the Limits of Low-Bit Post-Training Quantization for LLMs
Authors:
Haoyu Wang,
Bei Liu,
Hang Shao,
Bo Xiao,
Ke Zeng,
Guanglu Wan,
Yanmin Qian
Abstract:
Parameter quantization for Large Language Models (LLMs) has attracted increasing attentions recently in reducing memory costs and improving computational efficiency. Early approaches have been widely adopted. However, the existing methods suffer from poor performance in low-bit (such as 2 to 3 bits) scenarios. In this paper, we present a novel and effective Column-Level Adaptive weight Quantizatio…
▽ More
Parameter quantization for Large Language Models (LLMs) has attracted increasing attentions recently in reducing memory costs and improving computational efficiency. Early approaches have been widely adopted. However, the existing methods suffer from poor performance in low-bit (such as 2 to 3 bits) scenarios. In this paper, we present a novel and effective Column-Level Adaptive weight Quantization (CLAQ) framework by introducing three different types of adaptive strategies for LLM quantization. Firstly, a K-Means clustering based algorithm is proposed that allows dynamic generation of quantization centroids for each column of a parameter matrix. Secondly, we design an outlier-guided adaptive precision search strategy which can dynamically assign varying bit-widths to different columns. Finally, a dynamic outlier reservation scheme is developed to retain some parameters in their original float point precision, in trade off of boosted model performance. Experiments on various mainstream open source LLMs including LLaMA-1, LLaMA-2 and Yi demonstrate that our methods achieve the state-of-the-art results across different bit settings, especially in extremely low-bit scenarios. Code is available at https://github.com/fayuge/CLAQ.
△ Less
Submitted 2 June, 2024; v1 submitted 27 May, 2024;
originally announced May 2024.
-
Extraction of In-Phase and Quadrature Components by Time-Encoding Sampling
Authors:
Y. H. Shao,
S. Y. Chen,
H. Z. Yang,
F. Xi,
H. Hong,
Z. Liu
Abstract:
Time encoding machine (TEM) is a biologically-inspired scheme to perform signal sampling using timing. In this paper, we study its application to the sampling of bandpass signals. We propose an integrate-and-fire TEM scheme by which the in-phase (I) and quadrature (Q) components are extracted through reconstruction. We design the TEM according to the signal bandwidth and amplitude instead of upper…
▽ More
Time encoding machine (TEM) is a biologically-inspired scheme to perform signal sampling using timing. In this paper, we study its application to the sampling of bandpass signals. We propose an integrate-and-fire TEM scheme by which the in-phase (I) and quadrature (Q) components are extracted through reconstruction. We design the TEM according to the signal bandwidth and amplitude instead of upper-edge frequency and amplitude as in the case of bandlimited/lowpass signals. We show that the I and Q components can be perfectly reconstructed from the TEM measurements if the minimum firing rate is equal to the Landau's rate of the signal. For the reconstruction of I and Q components, we develop an alternating projection onto convex sets (POCS) algorithm in which two POCS algorithms are alternately iterated. For the algorithm analysis, we define a solution space of vector-valued signals and prove that the proposed reconstruction algorithm converges to the correct unique solution in the noiseless case. The proposed TEM can operate regardless of the center frequencies of the bandpass signals. This is quite different from traditional bandpass sampling, where the center frequency should be carefully allocated for Landau's rate and its variations have the negative effect on the sampling performance. In addition, the proposed TEM achieves certain reconstructed signal-to-noise-plus-distortion ratios for small firing rates in thermal noise, which is unavoidably present and will be aliased to the Nyquist band in the traditional sampling such that high sampling rates are required. We demonstrate the reconstruction performance and substantiate our claims via simulation experiments.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
A New Method in Facial Registration in Clinics Based on Structure Light Images
Authors:
Pengfei Li,
Ziyue Ma,
Hong Wang,
Juan Deng,
Yan Wang,
Zhenyu Xu,
Feng Yan,
Wenjun Tu,
Hong Sha
Abstract:
Background and Objective: In neurosurgery, fusing clinical images and depth images that can improve the information and details is beneficial to surgery. We found that the registration of face depth images was invalid frequently using existing methods. To abundant traditional image methods with depth information, a method in registering with depth images and traditional clinical images was investi…
▽ More
Background and Objective: In neurosurgery, fusing clinical images and depth images that can improve the information and details is beneficial to surgery. We found that the registration of face depth images was invalid frequently using existing methods. To abundant traditional image methods with depth information, a method in registering with depth images and traditional clinical images was investigated. Methods: We used the dlib library, a C++ library that could be used in face recognition, and recognized the key points on faces from the structure light camera and CT image. The two key point clouds were registered for coarse registration by the ICP method. Fine registration was finished after coarse registration by the ICP method. Results: RMSE after coarse and fine registration is as low as 0.995913 mm. Compared with traditional methods, it also takes less time. Conclusions: The new method successfully registered the facial depth image from structure light images and CT with a low error, and that would be promising and efficient in clinical application of neurosurgery.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
SO(5) multicriticality in two-dimensional quantum magnets
Authors:
Jun Takahashi,
Hui Shao,
Bowen Zhao,
Wenan Guo,
Anders W. Sandvik
Abstract:
We resolve the nature of the quantum phase transition between a Néel antiferromagnet and a valence-bond solid in two-dimensional spin-1/2 magnets. We study a class of $J$-$Q$ models, in which Heisenberg exchange $J$ competes with interactions $Q_n$ formed by products of $n$ singlet projectors on adjacent parallel lattice links. QMC simulations provide unambiguous evidence for first-order transitio…
▽ More
We resolve the nature of the quantum phase transition between a Néel antiferromagnet and a valence-bond solid in two-dimensional spin-1/2 magnets. We study a class of $J$-$Q$ models, in which Heisenberg exchange $J$ competes with interactions $Q_n$ formed by products of $n$ singlet projectors on adjacent parallel lattice links. QMC simulations provide unambiguous evidence for first-order transitions, with the discontinuities increasing with $n$. For $n=2$ and $n=3$ models, the first-order signatures are very weak. On intermediate length scales, we extract well-defined scaling dimensions (critical exponents) that are common to the models with small $n$, indicating proximity to a quantum critical point. By combining two $Q$ terms, the transition can be tuned from weak to more strongly first-order. The two coexisting orders on the first-order line scale with a large exponent $β\approx 0.85$. This exponent and others are close to bounds for an SO($5$) symmetric CFT with a relevant SO($5$) singlet. We characterize the emergent SO($5$) symmetry by the scaling dimensions of its leading irrelevant perturbations. The large $β$ value and a large correlation length exponent, $ν\approx 1.4$, partially explain why the transition remains near-critical even quite far away from the critical point and in many different models without fine-tuning. In addition, we find that few-spin lattice operators are dominated by the SO($5$) violating field (the traceless symmetric tensor), and interactions involving many spins are required to observe strong effects of the relevant SO($5$) singlet. The exponent that had previously been identified with the divergent correlation length when crossing between the two phases does not have a corresponding CFT operator. We explain this emergent pseudocritical scale by a mechanism relying on a dangerously irrelevant SO($5$) perturbation.
△ Less
Submitted 10 May, 2024;
originally announced May 2024.
-
Trio-ViT: Post-Training Quantization and Acceleration for Softmax-Free Efficient Vision Transformer
Authors:
Huihong Shi,
Haikuo Shao,
Wendong Mao,
Zhongfeng Wang
Abstract:
Motivated by the huge success of Transformers in the field of natural language processing (NLP), Vision Transformers (ViTs) have been rapidly developed and achieved remarkable performance in various computer vision tasks. However, their huge model sizes and intensive computations hinder ViTs' deployment on embedded devices, calling for effective model compression methods, such as quantization. Unf…
▽ More
Motivated by the huge success of Transformers in the field of natural language processing (NLP), Vision Transformers (ViTs) have been rapidly developed and achieved remarkable performance in various computer vision tasks. However, their huge model sizes and intensive computations hinder ViTs' deployment on embedded devices, calling for effective model compression methods, such as quantization. Unfortunately, due to the existence of hardware-unfriendly and quantization-sensitive non-linear operations, particularly {Softmax}, it is non-trivial to completely quantize all operations in ViTs, yielding either significant accuracy drops or non-negligible hardware costs. In response to challenges associated with \textit{standard ViTs}, we focus our attention towards the quantization and acceleration for \textit{efficient ViTs}, which not only eliminate the troublesome Softmax but also integrate linear attention with low computational complexity, and propose \emph{Trio-ViT} accordingly. Specifically, at the algorithm level, we develop a {tailored post-training quantization engine} taking the unique activation distributions of Softmax-free efficient ViTs into full consideration, aiming to boost quantization accuracy. Furthermore, at the hardware level, we build an accelerator dedicated to the specific Convolution-Transformer hybrid architecture of efficient ViTs, thereby enhancing hardware efficiency. Extensive experimental results consistently prove the effectiveness of our Trio-ViT framework. {Particularly, we can gain up to $\uparrow$$\mathbf{7.2}\times$ and $\uparrow$$\mathbf{14.6}\times$ FPS under comparable accuracy over state-of-the-art ViT accelerators, as well as $\uparrow$$\mathbf{5.9}\times$ and $\uparrow$$\mathbf{2.0}\times$ DSP efficiency.} Codes will be released publicly upon acceptance.
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
MoVA: Adapting Mixture of Vision Experts to Multimodal Context
Authors:
Zhuofan Zong,
Bingqi Ma,
Dazhong Shen,
Guanglu Song,
Hao Shao,
Dongzhi Jiang,
Hongsheng Li,
Yu Liu
Abstract:
As the key component in multimodal large language models (MLLMs), the ability of the visual encoder greatly affects MLLM's understanding on diverse image content. Although some large-scale pretrained vision encoders such as vision encoders in CLIP and DINOv2 have brought promising performance, we found that there is still no single vision encoder that can dominate various image content understandi…
▽ More
As the key component in multimodal large language models (MLLMs), the ability of the visual encoder greatly affects MLLM's understanding on diverse image content. Although some large-scale pretrained vision encoders such as vision encoders in CLIP and DINOv2 have brought promising performance, we found that there is still no single vision encoder that can dominate various image content understanding, e.g., the CLIP vision encoder leads to outstanding results on general image understanding but poor performance on document or chart content. To alleviate the bias of CLIP vision encoder, we first delve into the inherent behavior of different pre-trained vision encoders and then propose the MoVA, a powerful and novel MLLM, adaptively routing and fusing task-specific vision experts with a coarse-to-fine mechanism. In the coarse-grained stage, we design a context-aware expert routing strategy to dynamically select the most suitable vision experts according to the user instruction, input image, and expertise of vision experts. This benefits from the powerful model function understanding ability of the large language model (LLM) equipped with expert-routing low-rank adaptation (LoRA). In the fine-grained stage, we elaborately conduct the mixture-of-vision-expert adapter (MoV-Adapter) to extract and fuse task-specific knowledge from various experts. This coarse-to-fine paradigm effectively leverages representations from experts based on multimodal context and model expertise, further enhancing the generalization ability. We conduct extensive experiments to evaluate the effectiveness of the proposed approach. Without any bells and whistles, MoVA can achieve significant performance gains over current state-of-the-art methods in a wide range of challenging multimodal benchmarks. Codes and models will be available at https://github.com/TempleX98/MoVA.
△ Less
Submitted 19 April, 2024;
originally announced April 2024.
-
FipTR: A Simple yet Effective Transformer Framework for Future Instance Prediction in Autonomous Driving
Authors:
Xingtai Gui,
Tengteng Huang,
Haonan Shao,
Haotian Yao,
Chi Zhang
Abstract:
The future instance prediction from a Bird's Eye View(BEV) perspective is a vital component in autonomous driving, which involves future instance segmentation and instance motion prediction. Existing methods usually rely on a redundant and complex pipeline which requires multiple auxiliary outputs and post-processing procedures. Moreover, estimated errors on each of the auxiliary predictions will…
▽ More
The future instance prediction from a Bird's Eye View(BEV) perspective is a vital component in autonomous driving, which involves future instance segmentation and instance motion prediction. Existing methods usually rely on a redundant and complex pipeline which requires multiple auxiliary outputs and post-processing procedures. Moreover, estimated errors on each of the auxiliary predictions will lead to degradation of the prediction performance. In this paper, we propose a simple yet effective fully end-to-end framework named Future Instance Prediction Transformer(FipTR), which views the task as BEV instance segmentation and prediction for future frames. We propose to adopt instance queries representing specific traffic participants to directly estimate the corresponding future occupied masks, and thus get rid of complex post-processing procedures. Besides, we devise a flow-aware BEV predictor for future BEV feature prediction composed of a flow-aware deformable attention that takes backward flow guiding the offset sampling. A novel future instance matching strategy is also proposed to further improve the temporal coherence. Extensive experiments demonstrate the superiority of FipTR and its effectiveness under different temporal BEV encoders.
△ Less
Submitted 19 April, 2024;
originally announced April 2024.
-
Polar vortex hidden in twisted bilayers of paraelectric SrTiO3
Authors:
Haozhi Sha,
Yixuan Zhang,
Yunpeng Ma,
Wei Li,
Wenfeng Yang,
Jizhe Cui,
Qian Li,
Houbing Huang,
Rong Yu
Abstract:
Polar topologies, such as vortex and skyrmion, have attracted significant interest due to their unique physical properties and promising applications in high-density memory devices. Currently, most polar vortices are observed in heterostructures containing ferroelectric materials and constrained by substrates. In this study, we unravel arrays of polar vortices formed in twisted freestanding bilaye…
▽ More
Polar topologies, such as vortex and skyrmion, have attracted significant interest due to their unique physical properties and promising applications in high-density memory devices. Currently, most polar vortices are observed in heterostructures containing ferroelectric materials and constrained by substrates. In this study, we unravel arrays of polar vortices formed in twisted freestanding bilayers composed of SrTiO3, a quantum-paraelectric material. Depth-resolved structures of the bilayers are measured with deep-sub-angstrom resolution and one picometer accuracy using multislice ptychography, enabling identification of the three-dimensional variations of polarization topology. Our findings reveal the evolution of the polar vortices in the twisted overlap** layers, demonstrating the reverse of rotation manner in the depth direction. Twisted freestanding bilayers provide a unique platform for exploration and modulation of novel polar topologies.
△ Less
Submitted 11 April, 2024;
originally announced April 2024.
-
Wenzhou TE: a first-principles calculated thermoelectric materials database
Authors:
Ying Fang,
Hezhu Shao
Abstract:
Since the implementation of the Materials Genome Project by the Obama administration in the United States, the development of various computational materials databases has fundamentally expanded the choices of industries such as materials and energy. In the field of thermoelectric materials, the thermoelectric figure of merit ZT quantifies the performance of the material. From the viewpoint of cal…
▽ More
Since the implementation of the Materials Genome Project by the Obama administration in the United States, the development of various computational materials databases has fundamentally expanded the choices of industries such as materials and energy. In the field of thermoelectric materials, the thermoelectric figure of merit ZT quantifies the performance of the material. From the viewpoint of calculations for vast materials, the ZT values are not easily obtained due to their computational complexity. Here, we show how to build a database of thermoelectric materials based on first-principles calculations for the electronic and heat transport of materials. Firstly, the initial structures are classified according to the values of bandgap and other basic properties using the clustering algorithm K-means in machine learning, and high-throughput first principles calculations are carried out for narrow-bandgap semiconductors which exhibiting potential thermoelectric application. The present framework of calculations mainly includes deformation potential module, electrical transport performance module, mechanical and thermodynamic properties module. We have also set up a search webpage for the calculated database of thermoelectric materials, providing searching and viewing the related physical properties of materials. Our work may inspire the construction of more computational databases of first-principle thermoelectric materials and accelerate research progress in the field of thermoelectrics.
△ Less
Submitted 3 April, 2024;
originally announced April 2024.
-
Prior Frequency Guided Diffusion Model for Limited Angle (LA)-CBCT Reconstruction
Authors:
Jiacheng Xie,
Hua-Chieh Shao,
Yunxiang Li,
You Zhang
Abstract:
Cone-beam computed tomography (CBCT) is widely used in image-guided radiotherapy. Reconstructing CBCTs from limited-angle acquisitions (LA-CBCT) is highly desired for improved imaging efficiency, dose reduction, and better mechanical clearance. LA-CBCT reconstruction, however, suffers from severe under-sampling artifacts, making it a highly ill-posed inverse problem. Diffusion models can generate…
▽ More
Cone-beam computed tomography (CBCT) is widely used in image-guided radiotherapy. Reconstructing CBCTs from limited-angle acquisitions (LA-CBCT) is highly desired for improved imaging efficiency, dose reduction, and better mechanical clearance. LA-CBCT reconstruction, however, suffers from severe under-sampling artifacts, making it a highly ill-posed inverse problem. Diffusion models can generate data/images by reversing a data-noising process through learned data distributions; and can be incorporated as a denoiser/regularizer in LA-CBCT reconstruction. In this study, we developed a diffusion model-based framework, prior frequency-guided diffusion model (PFGDM), for robust and structure-preserving LA-CBCT reconstruction. PFGDM uses a conditioned diffusion model as a regularizer for LA-CBCT reconstruction, and the condition is based on high-frequency information extracted from patient-specific prior CT scans which provides a strong anatomical prior for LA-CBCT reconstruction. Specifically, we developed two variants of PFGDM (PFGDM-A and PFGDM-B) with different conditioning schemes. PFGDM-A applies the high-frequency CT information condition until a pre-optimized iteration step, and drops it afterwards to enable both similar and differing CT/CBCT anatomies to be reconstructed. PFGDM-B, on the other hand, continuously applies the prior CT information condition in every reconstruction step, while with a decaying mechanism, to gradually phase out the reconstruction guidance from the prior CT scans. The two variants of PFGDM were tested and compared with current available LA-CBCT reconstruction solutions, via metrics including PSNR and SSIM. PFGDM outperformed all traditional and diffusion model-based methods. PFGDM reconstructs high-quality LA-CBCTs under very-limited gantry angles, allowing faster and more flexible CBCT scans with dose reductions.
△ Less
Submitted 8 April, 2024; v1 submitted 1 April, 2024;
originally announced April 2024.
-
An FPGA-Based Reconfigurable Accelerator for Convolution-Transformer Hybrid EfficientViT
Authors:
Haikuo Shao,
Huihong Shi,
Wendong Mao,
Zhongfeng Wang
Abstract:
Vision Transformers (ViTs) have achieved significant success in computer vision. However, their intensive computations and massive memory footprint challenge ViTs' deployment on embedded devices, calling for efficient ViTs. Among them, EfficientViT, the state-of-the-art one, features a Convolution-Transformer hybrid architecture, enhancing both accuracy and hardware efficiency. Unfortunately, exis…
▽ More
Vision Transformers (ViTs) have achieved significant success in computer vision. However, their intensive computations and massive memory footprint challenge ViTs' deployment on embedded devices, calling for efficient ViTs. Among them, EfficientViT, the state-of-the-art one, features a Convolution-Transformer hybrid architecture, enhancing both accuracy and hardware efficiency. Unfortunately, existing accelerators cannot fully exploit the hardware benefits of EfficientViT due to its unique architecture. In this paper, we propose an FPGA-based accelerator for EfficientViT to advance the hardware efficiency frontier of ViTs. Specifically, we design a reconfigurable architecture to efficiently support various operation types, including lightweight convolutions and attention, boosting hardware utilization. Additionally, we present a time-multiplexed and pipelined dataflow to facilitate both intra- and inter-layer fusions, reducing off-chip data access costs. Experimental results show that our accelerator achieves up to 780.2 GOPS in throughput and 105.1 GOPS/W in energy efficiency at 200MHz on the Xilinx ZCU102 FPGA, which significantly outperforms prior works.
△ Less
Submitted 29 March, 2024;
originally announced March 2024.
-
Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning
Authors:
Hao Shao,
Shengju Qian,
Han Xiao,
Guanglu Song,
Zhuofan Zong,
Letian Wang,
Yu Liu,
Hongsheng Li
Abstract:
Multi-Modal Large Language Models (MLLMs) have demonstrated impressive performance in various VQA tasks. However, they often lack interpretability and struggle with complex visual inputs, especially when the resolution of the input image is high or when the interested region that could provide key information for answering the question is small. To address these challenges, we collect and introduc…
▽ More
Multi-Modal Large Language Models (MLLMs) have demonstrated impressive performance in various VQA tasks. However, they often lack interpretability and struggle with complex visual inputs, especially when the resolution of the input image is high or when the interested region that could provide key information for answering the question is small. To address these challenges, we collect and introduce the large-scale Visual CoT dataset comprising 438k question-answer pairs, annotated with intermediate bounding boxes highlighting key regions essential for answering the questions. Additionally, about 98k pairs of them are annotated with detailed reasoning steps. Importantly, we propose a multi-turn processing pipeline that dynamically focuses on visual inputs and provides interpretable thoughts. We also introduce the related benchmark to evaluate the MLLMs in scenarios requiring specific local region identification. Extensive experiments demonstrate the effectiveness of our framework and shed light on better inference strategies. The Visual CoT dataset, benchmark, and pre-trained models are released to foster further research in this direction.
△ Less
Submitted 7 July, 2024; v1 submitted 25 March, 2024;
originally announced March 2024.
-
LLMs-based Few-Shot Disease Predictions using EHR: A Novel Approach Combining Predictive Agent Reasoning and Critical Agent Instruction
Authors:
Hejie Cui,
Zhuocheng Shen,
Jieyu Zhang,
Hui Shao,
Lianhui Qin,
Joyce C. Ho,
Carl Yang
Abstract:
Electronic health records (EHRs) contain valuable patient data for health-related prediction tasks, such as disease prediction. Traditional approaches rely on supervised learning methods that require large labeled datasets, which can be expensive and challenging to obtain. In this study, we investigate the feasibility of applying Large Language Models (LLMs) to convert structured patient visit dat…
▽ More
Electronic health records (EHRs) contain valuable patient data for health-related prediction tasks, such as disease prediction. Traditional approaches rely on supervised learning methods that require large labeled datasets, which can be expensive and challenging to obtain. In this study, we investigate the feasibility of applying Large Language Models (LLMs) to convert structured patient visit data (e.g., diagnoses, labs, prescriptions) into natural language narratives. We evaluate the zero-shot and few-shot performance of LLMs using various EHR-prediction-oriented prompting strategies. Furthermore, we propose a novel approach that utilizes LLM agents with different roles: a predictor agent that makes predictions and generates reasoning processes and a critic agent that analyzes incorrect predictions and provides guidance for improving the reasoning of the predictor agent. Our results demonstrate that with the proposed approach, LLMs can achieve decent few-shot performance compared to traditional supervised learning methods in EHR-based disease predictions, suggesting its potential for health-oriented applications.
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
A2CI: A Cloud-based, Service-oriented Geospatial Cyberinfrastructure to Support Atmospheric Research
Authors:
Wenwen Li,
Hu Shao,
Sizhe Wang,
Xiran Zhou,
Sheng Wu
Abstract:
Big earth science data offers the scientific community great opportunities. Many more studies at large-scales, over long-terms and at high resolution can now be conducted using the rich information collected by remote sensing satellites, ground-based sensor networks, and even social media input. However, the hundreds of terabytes of information collected and compiled on an hourly basis by NASA and…
▽ More
Big earth science data offers the scientific community great opportunities. Many more studies at large-scales, over long-terms and at high resolution can now be conducted using the rich information collected by remote sensing satellites, ground-based sensor networks, and even social media input. However, the hundreds of terabytes of information collected and compiled on an hourly basis by NASA and other government agencies present a significant challenge for atmospheric scientists seeking to improve the understanding of the Earth atmospheric system. These challenges include effective discovery, organization, analysis and visualization of large amounts of data. This paper reports the outcomes of an NSF-funded project that developed a geospatial cyberinfrastructure -- the A2CI (Atmospheric Analysis Cyberinfrastructure) -- to support atmospheric research. We first introduce the service-oriented system framework then describe in detail the implementation of the data discovery module, data management module, data integration module, data analysis and visualization modules following the cloud computing principles-Data-as-a-Service, Software-as-a-Service, Platform-as-a-Service and Infrastructure-as-a-Service. We demonstrate the graphic user interface by performing an analysis between Sea Surface Temperature and the intensity of tropical storms in the North Atlantic and Pacific oceans. We expect this work to contribute to the technical advancement of cyberinfrastructure research as well as to the development of an online, collaborative scientific analysis system for atmospheric science.
△ Less
Submitted 15 March, 2024;
originally announced March 2024.
-
SmartRefine: A Scenario-Adaptive Refinement Framework for Efficient Motion Prediction
Authors:
Yang Zhou,
Hao Shao,
Letian Wang,
Steven L. Waslander,
Hongsheng Li,
Yu Liu
Abstract:
Predicting the future motion of surrounding agents is essential for autonomous vehicles (AVs) to operate safely in dynamic, human-robot-mixed environments. Context information, such as road maps and surrounding agents' states, provides crucial geometric and semantic information for motion behavior prediction. To this end, recent works explore two-stage prediction frameworks where coarse trajectori…
▽ More
Predicting the future motion of surrounding agents is essential for autonomous vehicles (AVs) to operate safely in dynamic, human-robot-mixed environments. Context information, such as road maps and surrounding agents' states, provides crucial geometric and semantic information for motion behavior prediction. To this end, recent works explore two-stage prediction frameworks where coarse trajectories are first proposed, and then used to select critical context information for trajectory refinement. However, they either incur a large amount of computation or bring limited improvement, if not both. In this paper, we introduce a novel scenario-adaptive refinement strategy, named SmartRefine, to refine prediction with minimal additional computation. Specifically, SmartRefine can comprehensively adapt refinement configurations based on each scenario's properties, and smartly chooses the number of refinement iterations by introducing a quality score to measure the prediction quality and remaining refinement potential of each scenario. SmartRefine is designed as a generic and flexible approach that can be seamlessly integrated into most state-of-the-art motion prediction models. Experiments on Argoverse (1 & 2) show that our method consistently improves the prediction accuracy of multiple state-of-the-art prediction models. Specifically, by adding SmartRefine to QCNet, we outperform all published ensemble-free works on the Argoverse 2 leaderboard (single agent track) at submission. Comprehensive studies are also conducted to ablate design choices and explore the mechanism behind multi-iteration refinement. Codes are available at https://github.com/opendilab/SmartRefine/
△ Less
Submitted 19 March, 2024; v1 submitted 18 March, 2024;
originally announced March 2024.
-
LLM-based Conversational AI Therapist for Daily Functioning Screening and Psychotherapeutic Intervention via Everyday Smart Devices
Authors:
**g** Nie,
Hanya Shao,
Yuang Fan,
Qijia Shao,
Haoxuan You,
Matthias Preindl,
Xiaofan Jiang
Abstract:
Despite the global mental health crisis, access to screenings, professionals, and treatments remains high. In collaboration with licensed psychotherapists, we propose a Conversational AI Therapist with psychotherapeutic Interventions (CaiTI), a platform that leverages large language models (LLM)s and smart devices to enable better mental health self-care. CaiTI can screen the day-to-day functionin…
▽ More
Despite the global mental health crisis, access to screenings, professionals, and treatments remains high. In collaboration with licensed psychotherapists, we propose a Conversational AI Therapist with psychotherapeutic Interventions (CaiTI), a platform that leverages large language models (LLM)s and smart devices to enable better mental health self-care. CaiTI can screen the day-to-day functioning using natural and psychotherapeutic conversations. CaiTI leverages reinforcement learning to provide personalized conversation flow. CaiTI can accurately understand and interpret user responses. When the user needs further attention during the conversation, CaiTI can provide conversational psychotherapeutic interventions, including cognitive behavioral therapy (CBT) and motivational interviewing (MI). Leveraging the datasets prepared by the licensed psychotherapists, we experiment and microbenchmark various LLMs' performance in tasks along CaiTI's conversation flow and discuss their strengths and weaknesses. With the psychotherapists, we implement CaiTI and conduct 14-day and 24-week studies. The study results, validated by therapists, demonstrate that CaiTI can converse with users naturally, accurately understand and interpret user responses, and provide psychotherapeutic interventions appropriately and effectively. We showcase the potential of CaiTI LLMs to assist the mental therapy diagnosis and treatment and improve day-to-day functioning screening and precautionary psychotherapeutic intervention systems.
△ Less
Submitted 15 March, 2024;
originally announced March 2024.
-
NetBench: A Large-Scale and Comprehensive Network Traffic Benchmark Dataset for Foundation Models
Authors:
Chen Qian,
Xiaochang Li,
Qineng Wang,
Gang Zhou,
Huajie Shao
Abstract:
In computer networking, network traffic refers to the amount of data transmitted in the form of packets between internetworked computers or Cyber-Physical Systems. Monitoring and analyzing network traffic is crucial for ensuring the performance, security, and reliability of a network. However, a significant challenge in network traffic analysis is to process diverse data packets including both cip…
▽ More
In computer networking, network traffic refers to the amount of data transmitted in the form of packets between internetworked computers or Cyber-Physical Systems. Monitoring and analyzing network traffic is crucial for ensuring the performance, security, and reliability of a network. However, a significant challenge in network traffic analysis is to process diverse data packets including both ciphertext and plaintext. While many methods have been adopted to analyze network traffic, they often rely on different datasets for performance evaluation. This inconsistency results in substantial manual data processing efforts and unfair comparisons. Moreover, some data processing methods may cause data leakage due to improper separation of training and testing data. To address these issues, we introduce the NetBench, a large-scale and comprehensive benchmark dataset for assessing machine learning models, especially foundation models, in both network traffic classification and generation tasks. NetBench is built upon seven publicly available datasets and encompasses a broad spectrum of 20 tasks, including 15 classification tasks and 5 generation tasks. Furthermore, we evaluate eight State-Of-The-Art (SOTA) classification models (including two foundation models) and two generative models using our benchmark. The results show that foundation models significantly outperform the traditional deep learning methods in traffic classification. We believe NetBench will facilitate fair comparisons among various approaches and advance the development of foundation models for network traffic. Our benchmark is available at https://github.com/WM-JayLab/NetBench.
△ Less
Submitted 18 March, 2024; v1 submitted 15 March, 2024;
originally announced March 2024.
-
PrompTHis: Visualizing the Process and Influence of Prompt Editing during Text-to-Image Creation
Authors:
Yuhan Guo,
Hanning Shao,
Can Liu,
Kai Xu,
Xiaoru Yuan
Abstract:
Generative text-to-image models, which allow users to create appealing images through a text prompt, have seen a dramatic increase in popularity in recent years. However, most users have a limited understanding of how such models work and it often requires many trials and errors to achieve satisfactory results. The prompt history contains a wealth of information that could provide users with insig…
▽ More
Generative text-to-image models, which allow users to create appealing images through a text prompt, have seen a dramatic increase in popularity in recent years. However, most users have a limited understanding of how such models work and it often requires many trials and errors to achieve satisfactory results. The prompt history contains a wealth of information that could provide users with insights into what have been explored and how the prompt changes impact the output image, yet little research attention has been paid to the visual analysis of such process to support users. We propose the Image Variant Graph, a novel visual representation designed to support comparing prompt-image pairs and exploring the editing history. The Image Variant Graph models prompt differences as edges between corresponding images and presents the distances between images through projection. Based on the graph, we developed the PrompTHis system through co-design with artists. Besides Image Variant Graph, PrompTHis also incorporates a detailed prompt-image history and a navigation mini-map. Based on the review and analysis of the prompting history, users can better understand the impact of prompt changes and have a more effective control of image generation. A quantitative user study with eleven amateur participants and qualitative interviews with five professionals and one amateur user were conducted to evaluate the effectiveness of PrompTHis. The results demonstrate PrompTHis can help users review the prompt history, make sense of the model, and plan their creative process.
△ Less
Submitted 14 March, 2024;
originally announced March 2024.
-
Learning Correction Errors via Frequency-Self Attention for Blind Image Super-Resolution
Authors:
Haochen Sun,
Yan Yuan,
Lijuan Su,
Haotian Shao
Abstract:
Previous approaches for blind image super-resolution (SR) have relied on degradation estimation to restore high-resolution (HR) images from their low-resolution (LR) counterparts. However, accurate degradation estimation poses significant challenges. The SR model's incompatibility with degradation estimation methods, particularly the Correction Filter, may significantly impair performance as a res…
▽ More
Previous approaches for blind image super-resolution (SR) have relied on degradation estimation to restore high-resolution (HR) images from their low-resolution (LR) counterparts. However, accurate degradation estimation poses significant challenges. The SR model's incompatibility with degradation estimation methods, particularly the Correction Filter, may significantly impair performance as a result of correction errors. In this paper, we introduce a novel blind SR approach that focuses on Learning Correction Errors (LCE). Our method employs a lightweight Corrector to obtain a corrected low-resolution (CLR) image. Subsequently, within an SR network, we jointly optimize SR performance by utilizing both the original LR image and the frequency learning of the CLR image. Additionally, we propose a new Frequency-Self Attention block (FSAB) that enhances the global information utilization ability of Transformer. This block integrates both self-attention and frequency spatial attention mechanisms. Extensive ablation and comparison experiments conducted across various settings demonstrate the superiority of our method in terms of visual quality and accuracy. Our approach effectively addresses the challenges associated with degradation estimation and correction errors, paving the way for more accurate blind image SR.
△ Less
Submitted 12 March, 2024;
originally announced March 2024.
-
Learnability Gaps of Strategic Classification
Authors:
Lee Cohen,
Yishay Mansour,
Shay Moran,
Han Shao
Abstract:
In contrast with standard classification tasks, strategic classification involves agents strategically modifying their features in an effort to receive favorable predictions. For instance, given a classifier determining loan approval based on credit scores, applicants may open or close their credit cards to fool the classifier. The learning goal is to find a classifier robust against strategic man…
▽ More
In contrast with standard classification tasks, strategic classification involves agents strategically modifying their features in an effort to receive favorable predictions. For instance, given a classifier determining loan approval based on credit scores, applicants may open or close their credit cards to fool the classifier. The learning goal is to find a classifier robust against strategic manipulations. Various settings, based on what and when information is known, have been explored in strategic classification. In this work, we focus on addressing a fundamental question: the learnability gaps between strategic classification and standard learning.
We essentially show that any learnable class is also strategically learnable: we first consider a fully informative setting, where the manipulation structure (which is modeled by a manipulation graph $G^\star$) is known and during training time the learner has access to both the pre-manipulation data and post-manipulation data. We provide nearly tight sample complexity and regret bounds, offering significant improvements over prior results. Then, we relax the fully informative setting by introducing two natural types of uncertainty. First, following Ahmadi et al. (2023), we consider the setting in which the learner only has access to the post-manipulation data. We improve the results of Ahmadi et al. (2023) and close the gap between mistake upper bound and lower bound raised by them. Our second relaxation of the fully informative setting introduces uncertainty to the manipulation structure. That is, we assume that the manipulation graph is unknown but belongs to a known class of graphs. We provide nearly tight bounds on the learning complexity in various unknown manipulation graph settings. Notably, our algorithm in this setting is of independent interest and can be applied to other problems such as multi-label learning.
△ Less
Submitted 29 February, 2024;
originally announced February 2024.
-
FKS subtraction for quarkonium production at NLO
Authors:
Ajjath A H,
Hua-Sheng Shao,
Lukas Simon
Abstract:
We extend the local infrared-divergence subtraction formalism, originally proposed by Frixione, Kunszt and Signer (FKS), to calculate short-distance (differential) cross section for any inclusive process involving a quarkonium particle in non-relativistic QCD (NRQCD) factorisation at next-to-leading order (NLO) accuracy in the strong coupling constant $α_s$. The new formulas are generally applicab…
▽ More
We extend the local infrared-divergence subtraction formalism, originally proposed by Frixione, Kunszt and Signer (FKS), to calculate short-distance (differential) cross section for any inclusive process involving a quarkonium particle in non-relativistic QCD (NRQCD) factorisation at next-to-leading order (NLO) accuracy in the strong coupling constant $α_s$. The new formulas are generally applicable to the production of an S- or P-wave quarkonium state in association with any number of elementary particles. The main new ingredients derived in this paper are the local and integrated soft counterterms for the colour-singlet and colour-octet P-wave bound states. It, therefore, paves the way to the automation of the NLO calculations for heavy quarkonium inclusive and associated production processes.
△ Less
Submitted 6 July, 2024; v1 submitted 29 February, 2024;
originally announced February 2024.
-
$C^3$: Confidence Calibration Model Cascade for Inference-Efficient Cross-Lingual Natural Language Understanding
Authors:
Taixi Lu,
Haoyu Wang,
Huajie Shao,
**g Gao,
Huaxiu Yao
Abstract:
Cross-lingual natural language understanding (NLU) is a critical task in natural language processing (NLP). Recent advancements have seen multilingual pre-trained language models (mPLMs) significantly enhance the performance of these tasks. However, mPLMs necessitate substantial resources and incur high computational costs during inference, posing challenges for deployment in real-world and real-t…
▽ More
Cross-lingual natural language understanding (NLU) is a critical task in natural language processing (NLP). Recent advancements have seen multilingual pre-trained language models (mPLMs) significantly enhance the performance of these tasks. However, mPLMs necessitate substantial resources and incur high computational costs during inference, posing challenges for deployment in real-world and real-time systems. Existing model cascade methods seek to enhance inference efficiency by greedily selecting the lightest model capable of processing the current input from a variety of models, based on model confidence scores. Nonetheless, deep models tend to exhibit overconfidence, and confidence distributions vary across languages. This leads to the emission of confident but incorrect predictions by smaller models, hindering their ability to generalize effectively across test languages. In this study, we introduce a confidence calibration model cascade ($C^3$) method. This approach, simple yet effective, involves calibration prior to cascade inference, thereby enhancing cascade accuracy through more reliable predictions. Extensive experiments conducted on three cross-lingual benchmarks demonstrate that $C^3$ significantly outperforms all state-of-the-art baselines.
△ Less
Submitted 25 February, 2024;
originally announced February 2024.
-
Chimera: A Lossless Decoding Method for Accelerating Large Language Models Inference by Fusing all Tokens
Authors:
Ziqian Zeng,
Jiahong Yu,
Qianshi Pang,
Zihao Wang,
Hui** Zhuang,
Hongen Shao,
Xiaofeng Zou
Abstract:
Large language models (LLMs) have demonstrated remarkable capabilities across various tasks. However, their widespread application is hindered by the resource-intensive decoding process. To address this challenge, current approaches have incorporated additional decoding heads to enable parallel prediction of multiple subsequent tokens, thereby achieving inference acceleration. Nevertheless, the ac…
▽ More
Large language models (LLMs) have demonstrated remarkable capabilities across various tasks. However, their widespread application is hindered by the resource-intensive decoding process. To address this challenge, current approaches have incorporated additional decoding heads to enable parallel prediction of multiple subsequent tokens, thereby achieving inference acceleration. Nevertheless, the accuracy of these decoding heads falls short of the auto-regressive decoding approach.
In light of these limitations, we propose Chimera, a novel framework specifically designed for speculative sampling. Within this framework, we introduce a lightweight draft model that effectively utilizes previously generated tokens to predict subsequent words. To ensure both accuracy and efficiency, we present two strategies within the lightweight draft model. Firstly, we focus on capturing short-range dependencies at the bottom layer. Secondly, we leverage the readily available representations from the original LLM.Through empirical evaluation on the Vicuna and LlaMA-2 series, Chimera demonstrates impressive results, achieving an average latency speedup ratio of 2.7x compared to the vanilla auto-regressive decoding approach. This highlights the potential of our proposed framework in significantly improving the efficiency of large language models during the decoding process.
△ Less
Submitted 18 April, 2024; v1 submitted 24 February, 2024;
originally announced February 2024.
-
Observation of the antiferromagnetic phase transition in the fermionic Hubbard model
Authors:
Hou-Ji Shao,
Yu-Xuan Wang,
De-Zhi Zhu,
Yan-Song Zhu,
Hao-Nan Sun,
Si-Yuan Chen,
Chi Zhang,
Zhi-Jie Fan,
You** Deng,
Xing-Can Yao,
Yu-Ao Chen,
Jian-Wei Pan
Abstract:
The fermionic Hubbard model (FHM)[1], despite its simple form, captures essential features of strongly correlated electron physics. Ultracold fermions in optical lattices[2, 3] provide a clean and well-controlled platform for simulating FHM. Do** its antiferromagnetic ground state at half filling, various exotic phases are expected to arise in the FHM simulator, including stripe order[4], pseudo…
▽ More
The fermionic Hubbard model (FHM)[1], despite its simple form, captures essential features of strongly correlated electron physics. Ultracold fermions in optical lattices[2, 3] provide a clean and well-controlled platform for simulating FHM. Do** its antiferromagnetic ground state at half filling, various exotic phases are expected to arise in the FHM simulator, including stripe order[4], pseudogap[5], and d-wave superconductors[6], offering valuable insights into high-temperature superconductivity[7{9]. Although notable progress, such as the observation of antiferromagnetic correlations over short[10] and extended distances[11], has been obtained, the antiferromagnetic phase has yet to be realized due to the significant challenges of achieving low temperatures in a large and uniform quantum simulator. Here, we report the observation of the antiferromagnetic phase transition in a three-dimensional fermionic Hubbard system comprising lithium-6 atoms in a uniform optical lattice with approximately 800,000 sites. When the interaction strength, temperature, and do** concentration are finely tuned to approach their respective critical values, sharp increases in the spin structure factor (SSF) are observed. These observations can be well described by a power-law divergence, with a critical exponent of 1.396 from the Heisenberg universality class[12]. At half filling and with optimal interaction strength, the measured SSF reaches 123(8), signifying the establishment of an antiferromagnetic phase. Our results set the stage for exploring the low-temperature phase diagram of FHM.
△ Less
Submitted 22 February, 2024;
originally announced February 2024.
-
Data Storytelling in Data Visualisation: Does it Enhance the Efficiency and Effectiveness of Information Retrieval and Insights Comprehension?
Authors:
Honbo Shao,
Roberto Martinez-Maldonado,
Vanessa Echeverria,
Lixiang Yan,
Dragan Gasevic
Abstract:
Data storytelling (DS) is rapidly gaining attention as an approach that integrates data, visuals, and narratives to create data stories that can help a particular audience to comprehend the key messages underscored by the data with enhanced efficiency and effectiveness. It has been posited that DS can be especially advantageous for audiences with limited visualisation literacy, by presenting the d…
▽ More
Data storytelling (DS) is rapidly gaining attention as an approach that integrates data, visuals, and narratives to create data stories that can help a particular audience to comprehend the key messages underscored by the data with enhanced efficiency and effectiveness. It has been posited that DS can be especially advantageous for audiences with limited visualisation literacy, by presenting the data clearly and concisely. However, empirical studies confirming whether data stories indeed provide these benefits over conventional data visualisations are scarce. To bridge this gap, we conducted a study with 103 participants to determine whether DS indeed improve both efficiency and effectiveness in tasks related to information retrieval and insights comprehension. Our findings suggest that data stories do improve the efficiency of comprehension tasks, as well as the effectiveness of comprehension tasks that involve a single insight compared with conventional visualisations. Interestingly, these benefits were not associated with participants' visualisation literacy.
△ Less
Submitted 20 May, 2024; v1 submitted 19 February, 2024;
originally announced February 2024.
-
Asgard/NOTT: L-band nulling interferometry at the VLTI. II. Warm optical design and injection system
Authors:
Germain Garreau,
Azzurra Bigioli,
Romain Laugier,
Gert Raskin,
Johan Morren,
Jean-Philippe Berger,
Colin Dandumont,
Harry-Dean Kenchington Goldsmith,
Simon Gross,
Michael Ireland,
Lucas Labadie,
Jérôme Loicq,
Stephen Madden,
Guillermo Martin,
Marc-Antoine Martinod,
Alexandra Mazzoli,
Ahmed Sanny,
Hancheng Shao,
Kunlun Yan,
Denis Defrère
Abstract:
Asgard/NOTT (previously Hi-5) is a European Research Council (ERC)-funded project hosted at KU Leuven and a new visitor instrument for the Very Large Telescope Interferometer (VLTI). Its primary goal is to image the snow line region around young stars using nulling interferometry in the L-band (3.5 to 4.0)$μ$m, where the contrast between exoplanets and their host stars is advantageous. The breakth…
▽ More
Asgard/NOTT (previously Hi-5) is a European Research Council (ERC)-funded project hosted at KU Leuven and a new visitor instrument for the Very Large Telescope Interferometer (VLTI). Its primary goal is to image the snow line region around young stars using nulling interferometry in the L-band (3.5 to 4.0)$μ$m, where the contrast between exoplanets and their host stars is advantageous. The breakthrough is the use of a photonic beam combiner, which only recently allowed the required theoretical raw contrast of $10^{-3}$ in this spectral range. Nulling interferometry observations of exoplanets also require a high degree of balancing between the four pupils of the VLTI in terms of intensity, phase, and polarization. The injection into the beam combiner and the requirements of nulling interferometry are driving the design of the warm optics and the injection system. The optical design up to the beam combiner is presented. It offers a technical solution to efficiently couple the light from the VLTI into the beam combiner. During the coupling, the objective is to limit throughput losses to 5% of the best expected efficiency for the injection. To achieve this, a list of different loss sources is considered with their respective impact on the injection efficiency. Solutions are also proposed to meet the requirements on beam balancing for intensity, phase, and polarization. The different properties of the design are listed, including the optics used, their alignment and tolerances, and their impact on the instrumental performances in terms of throughput and null depth. The performance evaluation gives an expected throughput loss of less than <6.4% of the best efficiency for the injection and a null depth of $\sim2.10^{-3}$, mainly from optical path delay errors outside the scope of this work.
△ Less
Submitted 14 February, 2024;
originally announced February 2024.
-
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
Authors:
Dongyang Liu,
Renrui Zhang,
Longtian Qiu,
Siyuan Huang,
Weifeng Lin,
Shitian Zhao,
Shijie Geng,
Ziyi Lin,
Peng **,
Kaipeng Zhang,
Wenqi Shao,
Chao Xu,
Conghui He,
Junjun He,
Hao Shao,
Pan Lu,
Hongsheng Li,
Yu Qiao,
Peng Gao
Abstract:
We propose SPHINX-X, an extensive Multimodality Large Language Model (MLLM) series developed upon SPHINX. To improve the architecture and training efficiency, we modify the SPHINX framework by removing redundant visual encoders, bypassing fully-padded sub-images with skip tokens, and simplifying multi-stage training into a one-stage all-in-one paradigm. To fully unleash the potential of MLLMs, we…
▽ More
We propose SPHINX-X, an extensive Multimodality Large Language Model (MLLM) series developed upon SPHINX. To improve the architecture and training efficiency, we modify the SPHINX framework by removing redundant visual encoders, bypassing fully-padded sub-images with skip tokens, and simplifying multi-stage training into a one-stage all-in-one paradigm. To fully unleash the potential of MLLMs, we assemble a comprehensive multi-domain and multimodal dataset covering publicly available resources in language, vision, and vision-language tasks. We further enrich this collection with our curated OCR intensive and Set-of-Mark datasets, extending the diversity and generality. By training over different base LLMs including TinyLlama1.1B, InternLM2-7B, LLaMA2-13B, and Mixtral8x7B, we obtain a spectrum of MLLMs that vary in parameter size and multilingual capabilities. Comprehensive benchmarking reveals a strong correlation between the multi-modal performance with the data and parameter scales. Code and models are released at https://github.com/Alpha-VLLM/LLaMA2-Accessory
△ Less
Submitted 26 June, 2024; v1 submitted 8 February, 2024;
originally announced February 2024.
-
Lens: A Foundation Model for Network Traffic in Cybersecurity
Authors:
Qineng Wang,
Chen Qian,
Xiaochang Li,
Ziyu Yao,
Huajie Shao
Abstract:
Network traffic refers to the amount of data being sent and received over the internet or any system that connects computers. Analyzing and understanding network traffic is vital for improving network security and management. However, the analysis of network traffic is challenging due to the diverse nature of data packets, which often feature heterogeneous headers and encrypted payloads lacking se…
▽ More
Network traffic refers to the amount of data being sent and received over the internet or any system that connects computers. Analyzing and understanding network traffic is vital for improving network security and management. However, the analysis of network traffic is challenging due to the diverse nature of data packets, which often feature heterogeneous headers and encrypted payloads lacking semantics. To capture the latent semantics of traffic, a few studies have adopted pre-training techniques based on the Transformer encoder or decoder to learn the representations from massive traffic data. However, these methods typically excel in traffic understanding (classification) or traffic generation tasks. To address this issue, we develop Lens, a foundation model for network traffic that leverages the T5 architecture to learn the pre-trained representations from large-scale unlabeled data. Harnessing the strength of the encoder-decoder framework, which captures the global information while preserving the generative ability, our model can better learn the representations from raw data. To further enhance pre-training effectiveness, we design a novel loss that combines three distinct tasks: Masked Span Prediction (MSP), Packet Order Prediction (POP), and Homologous Traffic Prediction (HTP). Evaluation results across various benchmark datasets demonstrate that the proposed Lens outperforms the baselines in most downstream tasks related to both traffic understanding and generation. Notably, it also requires much less labeled data for fine-tuning compared to current methods.
△ Less
Submitted 28 March, 2024; v1 submitted 5 February, 2024;
originally announced February 2024.
-
Enhancing Compositional Generalization via Compositional Feature Alignment
Authors:
Haoxiang Wang,
Haozhe Si,
Huajie Shao,
Han Zhao
Abstract:
Real-world applications of machine learning models often confront data distribution shifts, wherein discrepancies exist between the training and test data distributions. In the common multi-domain multi-class setup, as the number of classes and domains scales up, it becomes infeasible to gather training data for every domain-class combination. This challenge naturally leads the quest for models wi…
▽ More
Real-world applications of machine learning models often confront data distribution shifts, wherein discrepancies exist between the training and test data distributions. In the common multi-domain multi-class setup, as the number of classes and domains scales up, it becomes infeasible to gather training data for every domain-class combination. This challenge naturally leads the quest for models with Compositional Generalization (CG) ability, where models can generalize to unseen domain-class combinations. To delve into the CG challenge, we develop CG-Bench, a suite of CG benchmarks derived from existing real-world image datasets, and observe that the prevalent pretraining-finetuning paradigm on foundational models, such as CLIP and DINOv2, struggles with the challenge. To address this challenge, we propose Compositional Feature Alignment (CFA), a simple two-stage finetuning technique that i) learns two orthogonal linear heads on a pretrained encoder with respect to class and domain labels, and ii) fine-tunes the encoder with the newly learned head frozen. We theoretically and empirically justify that CFA encourages compositional feature learning of pretrained models. We further conduct extensive experiments on CG-Bench for CLIP and DINOv2, two powerful pretrained vision foundation models. Experiment results show that CFA outperforms common finetuning techniques in compositional generalization, corroborating CFA's efficacy in compositional feature learning.
△ Less
Submitted 22 May, 2024; v1 submitted 5 February, 2024;
originally announced February 2024.
-
Information limit of 15 pm achieved with bright-field ptychography
Authors:
Haozhi Sha,
Jizhe Cui,
Wenfeng Yang,
Rong Yu
Abstract:
It is generally assumed that a high spatial resolution of a microscope requires a large numerical aperture of the imaging lens or detector. In this study, the information limit of 15 pm is achieved in transmission electron microscopy using only the bright-field disk (small numerical aperture) via multislice ptychography. The results indicate that high-frequency information has been encoded in the…
▽ More
It is generally assumed that a high spatial resolution of a microscope requires a large numerical aperture of the imaging lens or detector. In this study, the information limit of 15 pm is achieved in transmission electron microscopy using only the bright-field disk (small numerical aperture) via multislice ptychography. The results indicate that high-frequency information has been encoded in the electrons scattered to low angles due to the multiple scattering of electrons in the objects, making it possible to break the diffraction limit of imaging via bright-field ptychography.
△ Less
Submitted 20 December, 2023;
originally announced January 2024.
-
Radon Removal Commissioning of the PandaX-4T Cryogenic Distillation System
Authors:
Xiangyi Cui,
Zhou Wang,
Jiafu Li,
Shuaijie Li,
Lin Si,
Yonglin Ju,
Wenbo Ma,
Jianglai Liu,
Li Zhao,
Xiangdong Ji,
Rui Yan,
Haidong Sha,
Peiyao Huang,
Xiuli Wang,
Huaxuan Liu
Abstract:
The PandaX-4T distillation system, designed for the removal of krypton and radon from xenon, is evaluated for its radon removal efficiency using a $^{222}$Rn source during the online distillation process. The PandaX-4T dark matter detector is employed to monitor the temporal evolution of radon activity. To determine the radon reduction factor, the experimental data of radon atoms introduced into a…
▽ More
The PandaX-4T distillation system, designed for the removal of krypton and radon from xenon, is evaluated for its radon removal efficiency using a $^{222}$Rn source during the online distillation process. The PandaX-4T dark matter detector is employed to monitor the temporal evolution of radon activity. To determine the radon reduction factor, the experimental data of radon atoms introduced into and bypassed the distillation system is compared. The results indicate that the PandaX-4T distillation system achieves a radon reduction factor exceeding 190 at the flow rate of 10 slpm and the reflux ratio of 1.44. Gas-only online distillation process of a flow rate of 20 slpm is also conducted without observing significant reduction of radon levels in the detector. This observation suggests that the migration flow of radon atoms from the liquid phase to the gas phase is limited, and the flow rate of gas circulation and duration of the process are insignificant compared to the total xenon mass of 5.6 tons in the detector. This study provides the experimental data to support the efficient removal of radon at $\sim$Bq level using the PandaX-4T distillation system, which is the prerequisite of the radon background control in the detector. The further operation with higher flow rate will be applied for the upcoming science run in PandaX-4T.
△ Less
Submitted 19 April, 2024; v1 submitted 3 January, 2024;
originally announced January 2024.
-
A Two-Stage Multimodal Emotion Recognition Model Based on Graph Contrastive Learning
Authors:
Wei Ai,
FuChen Zhang,
Tao Meng,
YunTao Shou,
HongEn Shao,
Keqin Li
Abstract:
In terms of human-computer interaction, it is becoming more and more important to correctly understand the user's emotional state in a conversation, so the task of multimodal emotion recognition (MER) started to receive more attention. However, existing emotion classification methods usually perform classification only once. Sentences are likely to be misclassified in a single round of classificat…
▽ More
In terms of human-computer interaction, it is becoming more and more important to correctly understand the user's emotional state in a conversation, so the task of multimodal emotion recognition (MER) started to receive more attention. However, existing emotion classification methods usually perform classification only once. Sentences are likely to be misclassified in a single round of classification. Previous work usually ignores the similarities and differences between different morphological features in the fusion process. To address the above issues, we propose a two-stage emotion recognition model based on graph contrastive learning (TS-GCL). First, we encode the original dataset with different preprocessing modalities. Second, a graph contrastive learning (GCL) strategy is introduced for these three modal data with other structures to learn similarities and differences within and between modalities. Finally, we use MLP twice to achieve the final emotion classification. This staged classification method can help the model to better focus on different levels of emotional information, thereby improving the performance of the model. Extensive experiments show that TS-GCL has superior performance on IEMOCAP and MELD datasets compared with previous methods.
△ Less
Submitted 2 January, 2024;
originally announced January 2024.
-
Magnon, doublon and quarton excitations in 2D S=1/2 trimerized Heisenberg models
Authors:
Yue-Yue Chang,
Jun-Qing Cheng,
Hui Shao,
Dao-Xin Yao,
Han-Qing Wu
Abstract:
We investigate the magnetic excitations of the trimerized Heisenberg models with intra-trimer interaction $J_1$ and inter-trimer interaction $J_2$ on four different two-dimensional lattices using a combination of stochastic series expansion quantum Monte Carlo (SSE QMC) and stochastic analytic continuation methods (SAC), complemented by cluster perturbation theory (CPT). These models exhibit quasi…
▽ More
We investigate the magnetic excitations of the trimerized Heisenberg models with intra-trimer interaction $J_1$ and inter-trimer interaction $J_2$ on four different two-dimensional lattices using a combination of stochastic series expansion quantum Monte Carlo (SSE QMC) and stochastic analytic continuation methods (SAC), complemented by cluster perturbation theory (CPT). These models exhibit quasi-particle-like excitations when $g=J_2/J_1$ is small, characterized by low-energy magnons, intermediate-energy doublons, and high-energy quartons. The low-energy magnons are associated with the magnetic ground states. They can be described by the linear spin wave theory (LSWT) of the effective block spin model and the original spin model. Doublons and quartons emerge from the corresponding internal excitations of the trimers with distinct energy levels, which can be effectively analyzed using perturbation theory when the ratio of exchange interactions $g$ is small. In this small $g$ regime, we observe a clear separation between the magnon and higher-energy spectra. However, as $g$ increases, these three spectra gradually merge into the magnon modes or continua. Nevertheless, the LSWT fails to provide quantitative descriptions of the higher-energy excitation bands due to significant quantum fluctuations. Notably, in the Collinear II and trimerized hexagon lattice, a broad continuum emerges above the single-magnon spectrum, originating from the quasi-1D physics due to the dilute connections between chains. Our numerical analysis of these 2D trimers yields valuable theoretical predictions and explanations for the inelastic neutron scattering (INS) spectra of 2D magnetic materials featuring trimerized lattices.
△ Less
Submitted 16 June, 2024; v1 submitted 30 December, 2023;
originally announced January 2024.
-
Two-loop massive QCD and QED helicity amplitudes for light-by-light scattering
Authors:
Ajjath A H,
Ekta Chaubey,
Hua-Sheng Shao
Abstract:
We present the analytic and compact two-loop helicity amplitudes for QCD and QED corrections to the light-by-light scattering process with massive internal fermions. We express the master integrals either in terms of multiple polylogarithms or in terms of iterated integrals with dlog one-forms. We also elaborate on optimizing the analytic results for each phase-space region. This makes the numeric…
▽ More
We present the analytic and compact two-loop helicity amplitudes for QCD and QED corrections to the light-by-light scattering process with massive internal fermions. We express the master integrals either in terms of multiple polylogarithms or in terms of iterated integrals with dlog one-forms. We also elaborate on optimizing the analytic results for each phase-space region. This makes the numerical evaluation of the scattering amplitudes fast, stable and suitable for phenomenological applications.
△ Less
Submitted 21 March, 2024; v1 submitted 28 December, 2023;
originally announced December 2023.
-
Light-by-Light Scattering at Next-to-Leading Order in QCD and QED
Authors:
Ajjath A H,
Ekta Chaubey,
Mathijs Fraaije,
Valentin Hirschi,
Hua-Sheng Shao
Abstract:
The recent experimental observation of Light-by-Light (LbL) scattering at the Large Hadron Collider has revived interest in this fundamental process, and especially of the accurate prediction of its cross-section, which we present here for the first time at Next-to-Leading Order (NLO) in both QCD and QED. We compare two radically different computational approaches, both exact in the fermion mass d…
▽ More
The recent experimental observation of Light-by-Light (LbL) scattering at the Large Hadron Collider has revived interest in this fundamental process, and especially of the accurate prediction of its cross-section, which we present here for the first time at Next-to-Leading Order (NLO) in both QCD and QED. We compare two radically different computational approaches, both exact in the fermion mass dependence, thus offering a strong cross-check of our results. The first approach is a fully analytic method to calculate compact and well-organized two-loop helicity amplitudes. The second one is entirely numerical and leverages the Local Unitarity construction. Our two calculations agree with each other and conclude that including the exact fermion mass contribution typically increases the size of the NLO corrections. Moreover, we find that the exact result converges slowly to the massless limit of the high-energy regime, thus emphasizing the importance of including the full mass dependence at NLO. We also compare our results with the ATLAS measurement of LbL in ultra-peripheral lead-lead collisions, and find that the inclusion of exact NLO corrections reduces, but does not eliminate, the existing tension with theoretical predictions.
△ Less
Submitted 10 March, 2024; v1 submitted 28 December, 2023;
originally announced December 2023.
-
MCANet: Medical Image Segmentation with Multi-Scale Cross-Axis Attention
Authors:
Hao Shao,
Quansheng Zeng,
Qibin Hou,
Jufeng Yang
Abstract:
Efficiently capturing multi-scale information and building long-range dependencies among pixels are essential for medical image segmentation because of the various sizes and shapes of the lesion regions or organs. In this paper, we present Multi-scale Cross-axis Attention (MCA) to solve the above challenging issues based on the efficient axial attention. Instead of simply connecting axial attentio…
▽ More
Efficiently capturing multi-scale information and building long-range dependencies among pixels are essential for medical image segmentation because of the various sizes and shapes of the lesion regions or organs. In this paper, we present Multi-scale Cross-axis Attention (MCA) to solve the above challenging issues based on the efficient axial attention. Instead of simply connecting axial attention along the horizontal and vertical directions sequentially, we propose to calculate dual cross attentions between two parallel axial attentions to capture global information better. To process the significant variations of lesion regions or organs in individual sizes and shapes, we also use multiple convolutions of strip-shape kernels with different kernel sizes in each axial attention path to improve the efficiency of the proposed MCA in encoding spatial information. We build the proposed MCA upon the MSCAN backbone, yielding our network, termed MCANet. Our MCANet with only 4M+ parameters performs even better than most previous works with heavy backbones (e.g., Swin Transformer) on four challenging tasks, including skin lesion segmentation, nuclei segmentation, abdominal multi-organ segmentation, and polyp segmentation. Code is available at https://github.com/haoshao-nku/medical_seg.
△ Less
Submitted 19 December, 2023; v1 submitted 14 December, 2023;
originally announced December 2023.
-
Polyper: Boundary Sensitive Polyp Segmentation
Authors:
Hao Shao,
Yang Zhang,
Qibin Hou
Abstract:
We present a new boundary sensitive framework for polyp segmentation, called Polyper. Our method is motivated by a clinical approach that seasoned medical practitioners often leverage the inherent features of interior polyp regions to tackle blurred boundaries.Inspired by this, we propose explicitly leveraging polyp regions to bolster the model's boundary discrimination capability while minimizing…
▽ More
We present a new boundary sensitive framework for polyp segmentation, called Polyper. Our method is motivated by a clinical approach that seasoned medical practitioners often leverage the inherent features of interior polyp regions to tackle blurred boundaries.Inspired by this, we propose explicitly leveraging polyp regions to bolster the model's boundary discrimination capability while minimizing computation. Our approach first extracts boundary and polyp regions from the initial segmentation map through morphological operators. Then, we design the boundary sensitive attention that concentrates on augmenting the features near the boundary regions using the interior polyp regions's characteristics to generate good segmentation results. Our proposed method can be seamlessly integrated with classical encoder networks, like ResNet-50, MiT-B1, and Swin Transformer. To evaluate the effectiveness of Polyper, we conduct experiments on five publicly available challenging datasets, and receive state-of-the-art performance on all of them. Code is available at https://github.com/haoshao-nku/medical_seg.git.
△ Less
Submitted 14 December, 2023;
originally announced December 2023.
-
LMDrive: Closed-Loop End-to-End Driving with Large Language Models
Authors:
Hao Shao,
Yuxuan Hu,
Letian Wang,
Steven L. Waslander,
Yu Liu,
Hongsheng Li
Abstract:
Despite significant recent progress in the field of autonomous driving, modern methods still struggle and can incur serious accidents when encountering long-tail unforeseen events and challenging urban scenarios. On the one hand, large language models (LLM) have shown impressive reasoning capabilities that approach "Artificial General Intelligence". On the other hand, previous autonomous driving m…
▽ More
Despite significant recent progress in the field of autonomous driving, modern methods still struggle and can incur serious accidents when encountering long-tail unforeseen events and challenging urban scenarios. On the one hand, large language models (LLM) have shown impressive reasoning capabilities that approach "Artificial General Intelligence". On the other hand, previous autonomous driving methods tend to rely on limited-format inputs (e.g. sensor data and navigation waypoints), restricting the vehicle's ability to understand language information and interact with humans. To this end, this paper introduces LMDrive, a novel language-guided, end-to-end, closed-loop autonomous driving framework. LMDrive uniquely processes and integrates multi-modal sensor data with natural language instructions, enabling interaction with humans and navigation software in realistic instructional settings. To facilitate further research in language-based closed-loop autonomous driving, we also publicly release the corresponding dataset which includes approximately 64K instruction-following data clips, and the LangAuto benchmark that tests the system's ability to handle complex instructions and challenging driving scenarios. Extensive closed-loop experiments are conducted to demonstrate LMDrive's effectiveness. To the best of our knowledge, we're the very first work to leverage LLMs for closed-loop end-to-end autonomous driving. Codes, models, and datasets can be found at https://github.com/opendilab/LMDrive
△ Less
Submitted 21 December, 2023; v1 submitted 12 December, 2023;
originally announced December 2023.
-
PCDP-SGD: Improving the Convergence of Differentially Private SGD via Projection in Advance
Authors:
Haichao Sha,
Ruixuan Liu,
Yixuan Liu,
Hong Chen
Abstract:
The paradigm of Differentially Private SGD~(DP-SGD) can provide a theoretical guarantee for training data in both centralized and federated settings. However, the utility degradation caused by DP-SGD limits its wide application in high-stakes tasks, such as medical image diagnosis. In addition to the necessary perturbation, the convergence issue is attributed to the information loss on the gradien…
▽ More
The paradigm of Differentially Private SGD~(DP-SGD) can provide a theoretical guarantee for training data in both centralized and federated settings. However, the utility degradation caused by DP-SGD limits its wide application in high-stakes tasks, such as medical image diagnosis. In addition to the necessary perturbation, the convergence issue is attributed to the information loss on the gradient clip**. In this work, we propose a general framework PCDP-SGD, which aims to compress redundant gradient norms and preserve more crucial top gradient components via projection operation before gradient clip**. Additionally, we extend PCDP-SGD as a fundamental component in differential privacy federated learning~(DPFL) for mitigating the data heterogeneous challenge and achieving efficient communication. We prove that pre-projection enhances the convergence of DP-SGD by reducing the dependence of clip** error and bias to a fraction of the top gradient eigenspace, and in theory, limits cross-client variance to improve the convergence under heterogeneous federation. Experimental results demonstrate that PCDP-SGD achieves higher accuracy compared with state-of-the-art DP-SGD variants in computer vision tasks. Moreover, PCDP-SGD outperforms current federated learning frameworks when DP is guaranteed on local training sets.
△ Less
Submitted 6 December, 2023;
originally announced December 2023.
-
On the Effect of Defections in Federated Learning and How to Prevent Them
Authors:
Minbiao Han,
Kumar Kshitij Patel,
Han Shao,
Lingxiao Wang
Abstract:
Federated learning is a machine learning protocol that enables a large population of agents to collaborate over multiple rounds to produce a single consensus model. There are several federated learning applications where agents may choose to defect permanently$-$essentially withdrawing from the collaboration$-$if they are content with their instantaneous model in that round. This work demonstrates…
▽ More
Federated learning is a machine learning protocol that enables a large population of agents to collaborate over multiple rounds to produce a single consensus model. There are several federated learning applications where agents may choose to defect permanently$-$essentially withdrawing from the collaboration$-$if they are content with their instantaneous model in that round. This work demonstrates the detrimental impact of such defections on the final model's robustness and ability to generalize. We also show that current federated optimization algorithms fail to disincentivize these harmful defections. We introduce a novel optimization algorithm with theoretical guarantees to prevent defections while ensuring asymptotic convergence to an effective solution for all participating agents. We also provide numerical experiments to corroborate our findings and demonstrate the effectiveness of our algorithm.
△ Less
Submitted 27 November, 2023;
originally announced November 2023.
-
FDDM: Unsupervised Medical Image Translation with a Frequency-Decoupled Diffusion Model
Authors:
Yunxiang Li,
Hua-Chieh Shao,
Xiaoxue Qian,
You Zhang
Abstract:
Diffusion models have demonstrated significant potential in producing high-quality images in medical image translation to aid disease diagnosis, localization, and treatment. Nevertheless, current diffusion models have limited success in achieving faithful image translations that can accurately preserve the anatomical structures of medical images, especially for unpaired datasets. The preservation…
▽ More
Diffusion models have demonstrated significant potential in producing high-quality images in medical image translation to aid disease diagnosis, localization, and treatment. Nevertheless, current diffusion models have limited success in achieving faithful image translations that can accurately preserve the anatomical structures of medical images, especially for unpaired datasets. The preservation of structural and anatomical details is essential to reliable medical diagnosis and treatment planning, as structural mismatches can lead to disease misidentification and treatment errors. In this study, we introduce the Frequency Decoupled Diffusion Model (FDDM) for MR-to-CT conversion. FDDM first obtains the anatomical information of the CT image from the MR image through an initial conversion module. This anatomical information then guides a subsequent diffusion model to generate high-quality CT images. Our diffusion model uses a dual-path reverse diffusion process for low-frequency and high-frequency information, achieving a better balance between image quality and anatomical accuracy. We extensively evaluated FDDM using public datasets for brain MR-to-CT and pelvis MR-to-CT translations, demonstrating its superior performance to other GAN-based, VAE-based, and diffusion-based models. The evaluation metrics included Frechet Inception Distance (FID), Peak Signal-to-Noise Ratio (PSNR), and Structural Similarity Index Measure (SSIM). FDDM achieved the best scores on all metrics for both datasets, particularly excelling in FID, with scores of 25.9 for brain data and 29.2 for pelvis data, significantly outperforming other methods. These results demonstrate that FDDM can generate high-quality target domain images while maintaining the accuracy of translated anatomical structures.
△ Less
Submitted 26 June, 2024; v1 submitted 19 November, 2023;
originally announced November 2023.
-
Dynamic CBCT Imaging using Prior Model-Free Spatiotemporal Implicit Neural Representation (PMF-STINR)
Authors:
Hua-Chieh Shao,
Mengke Tielige,
Tinsu Pan,
You Zhang
Abstract:
Dynamic cone-beam computed tomography (CBCT) can capture high-spatial-resolution, time-varying images for motion monitoring, patient setup, and adaptive planning of radiotherapy. However, dynamic CBCT reconstruction is an extremely ill-posed spatiotemporal inverse problem, as each CBCT volume in the dynamic sequence is only captured by one or a few X-ray projections. We developed a machine learnin…
▽ More
Dynamic cone-beam computed tomography (CBCT) can capture high-spatial-resolution, time-varying images for motion monitoring, patient setup, and adaptive planning of radiotherapy. However, dynamic CBCT reconstruction is an extremely ill-posed spatiotemporal inverse problem, as each CBCT volume in the dynamic sequence is only captured by one or a few X-ray projections. We developed a machine learning-based technique, prior-model-free spatiotemporal implicit neural representation (PMF-STINR), to reconstruct dynamic CBCTs from sequentially acquired X-ray projections. PMF-STINR employs a joint image reconstruction and registration approach to address the under-sampling challenge. Specifically, PMF-STINR uses spatial implicit neural representation to reconstruct a reference CBCT volume, and it applies temporal INR to represent the intra-scan dynamic motion with respect to the reference CBCT to yield dynamic CBCTs. PMF-STINR couples the temporal INR with a learning-based B-spline motion model to capture time-varying deformable motion during the reconstruction. Compared with previous methods, the spatial INR, the temporal INR, and the B-spline model of PMF-STINR are all learned on the fly during reconstruction in a one-shot fashion, without using any patient-specific prior knowledge or motion sorting/binning. PMF-STINR was evaluated via digital phantom simulations, physical phantom measurements, and a multi-institutional patient dataset featuring various imaging protocols (half-fan/full-fan, full sampling/sparse sampling, different energy and mAs settings, etc.). The results showed that the one-shot learning-based PMF-STINR can accurately and robustly reconstruct dynamic CBCTs and capture highly irregular motion with high temporal (~0.1s) resolution and sub-millimeter accuracy. It can be a promising tool for motion management by offering richer motion information than traditional 4D-CBCTs.
△ Less
Submitted 4 December, 2023; v1 submitted 16 November, 2023;
originally announced November 2023.
-
Efficient Prior-Free Mechanisms for No-Regret Agents
Authors:
Natalie Collina,
Aaron Roth,
Han Shao
Abstract:
We study a repeated Principal Agent problem between a long lived Principal and Agent pair in a prior free setting. In our setting, the sequence of realized states of nature may be adversarially chosen, the Agent is non-myopic, and the Principal aims for a strong form of policy regret. Following Camara, Hartline, and Johnson, we model the Agent's long-run behavior with behavioral assumptions that r…
▽ More
We study a repeated Principal Agent problem between a long lived Principal and Agent pair in a prior free setting. In our setting, the sequence of realized states of nature may be adversarially chosen, the Agent is non-myopic, and the Principal aims for a strong form of policy regret. Following Camara, Hartline, and Johnson, we model the Agent's long-run behavior with behavioral assumptions that relax the common prior assumption (for example, that the Agent has no swap regret). Within this framework, we revisit the mechanism proposed by Camara et al., which informally uses calibrated forecasts of the unknown states of nature in place of a common prior. We give two main improvements. First, we give a mechanism that has an exponentially improved dependence (in terms of both running time and regret bounds) on the number of distinct states of nature. To do this, we show that our mechanism does not require truly calibrated forecasts, but rather forecasts that are unbiased subject to only a polynomially sized collection of events -- which can be produced with polynomial overhead. Second, in several important special cases -- including the focal linear contracting setting -- we show how to remove strong ``Alignment'' assumptions (which informally require that near-ties are always broken in favor of the Principal) by specifically deploying ``stable'' policies that do not have any near ties that are payoff relevant to the Principal. Taken together, our new mechanism makes the compelling framework proposed by Camara et al. much more powerful, now able to be realized over polynomially sized state spaces, and while requiring only mild assumptions on Agent behavior.
△ Less
Submitted 13 November, 2023;
originally announced November 2023.
-
Mental Health Diagnosis in the Digital Age: Harnessing Sentiment Analysis on Social Media Platforms upon Ultra-Sparse Feature Content
Authors:
Haijian Shao,
Ming Zhu,
Shengjie Zhai
Abstract:
Amid growing global mental health concerns, particularly among vulnerable groups, natural language processing offers a tremendous potential for early detection and intervention of people's mental disorders via analyzing their postings and discussions on social media platforms. However, ultra-sparse training data, often due to vast vocabularies and low-frequency words, hinders the analysis accuracy…
▽ More
Amid growing global mental health concerns, particularly among vulnerable groups, natural language processing offers a tremendous potential for early detection and intervention of people's mental disorders via analyzing their postings and discussions on social media platforms. However, ultra-sparse training data, often due to vast vocabularies and low-frequency words, hinders the analysis accuracy. Multi-labeling and Co-occurrences of symptoms may also blur the boundaries in distinguishing similar/co-related disorders. To address these issues, we propose a novel semantic feature preprocessing technique with a three-folded structure: 1) mitigating the feature sparsity with a weak classifier, 2) adaptive feature dimension with modulus loops, and 3) deep-mining and extending features among the contexts. With enhanced semantic features, we train a machine learning model to predict and classify mental disorders. We utilize the Reddit Mental Health Dataset 2022 to examine conditions such as Anxiety, Borderline Personality Disorder (BPD), and Bipolar-Disorder (BD) and present solutions to the data sparsity challenge, highlighted by 99.81% non-zero elements. After applying our preprocessing technique, the feature sparsity decreases to 85.4%. Overall, our methods, when compared to seven benchmark models, demonstrate significant performance improvements: 8.0% in accuracy, 0.069 in precision, 0.093 in recall, 0.102 in F1 score, and 0.059 in AUC. This research provides foundational insights for mental health prediction and monitoring, providing innovative solutions to navigate challenges associated with ultra-sparse data feature and intricate multi-label classification in the domain of mental health analysis.
△ Less
Submitted 8 November, 2023;
originally announced November 2023.
-
Incentivized Collaboration in Active Learning
Authors:
Lee Cohen,
Han Shao
Abstract:
In collaborative active learning, where multiple agents try to learn labels from a common hypothesis, we introduce an innovative framework for incentivized collaboration. Here, rational agents aim to obtain labels for their data sets while kee** label complexity at a minimum. We focus on designing (strict) individually rational (IR) collaboration protocols, ensuring that agents cannot reduce the…
▽ More
In collaborative active learning, where multiple agents try to learn labels from a common hypothesis, we introduce an innovative framework for incentivized collaboration. Here, rational agents aim to obtain labels for their data sets while kee** label complexity at a minimum. We focus on designing (strict) individually rational (IR) collaboration protocols, ensuring that agents cannot reduce their expected label complexity by acting individually. We first show that given any optimal active learning algorithm, the collaboration protocol that runs the algorithm as is over the entire data is already IR. However, computing the optimal algorithm is NP-hard. We therefore provide collaboration protocols that achieve (strict) IR and are comparable with the best known tractable approximation algorithm in terms of label complexity.
△ Less
Submitted 31 October, 2023;
originally announced November 2023.