Search | arXiv e-print repository

AdaOcc: Adaptive Forward View Transformation and Flow Modeling for 3D Occupancy and Flow Prediction

Authors: Dubing Chen, Wencheng Han, ** Fang, Jianbing Shen

Abstract: In this technical report, we present our solution for the Vision-Centric 3D Occupancy and Flow Prediction track in the nuScenes Open-Occ Dataset Challenge at CVPR 2024. Our innovative approach involves a dual-stage framework that enhances 3D occupancy and flow predictions by incorporating adaptive forward view transformation and flow modeling. Initially, we independently train the occupancy model,… ▽ More In this technical report, we present our solution for the Vision-Centric 3D Occupancy and Flow Prediction track in the nuScenes Open-Occ Dataset Challenge at CVPR 2024. Our innovative approach involves a dual-stage framework that enhances 3D occupancy and flow predictions by incorporating adaptive forward view transformation and flow modeling. Initially, we independently train the occupancy model, followed by flow prediction using sequential frame integration. Our method combines regression with classification to address scale variations in different scenes, and leverages predicted flow to warp current voxel features to future frames, guided by future frame ground truth. Experimental results on the nuScenes dataset demonstrate significant improvements in accuracy and robustness, showcasing the effectiveness of our approach in real-world scenarios. Our single model based on Swin-Base ranks second on the public leaderboard, validating the potential of our method in advancing autonomous car perception systems. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: 2nd Place in the 3D Occupancy and Flow Prediction Challenge (CVPR24)

arXiv:2407.00319 [pdf, other]

doi 10.1038/s41550-024-02309-5

A slightly oblate dark matter halo revealed by a retrograde precessing Galactic disk warp

Authors: Yang Huang, Qikang Feng, Tigran Khachaturyants, Huawei Zhang, Jifeng Liu, Juntai Shen, Timothy C. Beers, Youjun Lu, Song Wang, Haibo Yuan

Abstract: The shape of the dark matter (DM) halo is key to understanding the hierarchical formation of the Galaxy. Despite extensive efforts in recent decades, however, its shape remains a matter of debate, with suggestions ranging from strongly oblate to prolate. Here, we present a new constraint on its present shape by directly measuring the evolution of the Galactic disk warp with time, as traced by accu… ▽ More The shape of the dark matter (DM) halo is key to understanding the hierarchical formation of the Galaxy. Despite extensive efforts in recent decades, however, its shape remains a matter of debate, with suggestions ranging from strongly oblate to prolate. Here, we present a new constraint on its present shape by directly measuring the evolution of the Galactic disk warp with time, as traced by accurate distance estimates and precise age determinations for about 2,600 classical Cepheids. We show that the Galactic warp is mildly precessing in a retrograde direction at a rate of $ω= -2.1 \pm 0.5 ({\rm statistical}) \pm 0.6 ({\rm systematic})$ km s$^{-1}$ kpc$^{-1}$ for the outer disk over the Galactocentric radius [$7.5, 25$] kpc, decreasing with radius. This constrains the shape of the DM halo to be slightly oblate with a flattening (minor axis to major axis ratio) in the range $0.84 \le q_Φ \le 0.96$. Given the young nature of the disk warp traced by Cepheids (less than 200 Myr), our approach directly measures the shape of the present-day DM halo. This measurement, combined with other measurements from older tracers, could provide vital constraints on the evolution of the DM halo and the assembly history of the Galaxy. △ Less

Submitted 29 June, 2024; originally announced July 2024.

Comments: Published in Nature Astronomy on June 27th, 2024. Final published version here: https://www.nature.com/articles/s41550-024-02309-5

arXiv:2406.18993 [pdf, ps, other]

Interference Cancellation Based Neural Receiver for Superimposed Pilot in Multi-Layer Transmission

Authors: Han Xiao, Wenqiang Tian, Shi **, Wendong Liu, Jia Shen, Zhihua Shi, Zhi Zhang

Abstract: In this paper, an interference cancellation based neural receiver for superimposed pilot (SIP) in multi-layer transmission is proposed, where the data and pilot are non-orthogonally superimposed in the same time-frequency resource. Specifically, to deal with the intra-layer and inter-layer interference of SIP under multi-layer transmission, the interference cancellation with superimposed symbol ai… ▽ More In this paper, an interference cancellation based neural receiver for superimposed pilot (SIP) in multi-layer transmission is proposed, where the data and pilot are non-orthogonally superimposed in the same time-frequency resource. Specifically, to deal with the intra-layer and inter-layer interference of SIP under multi-layer transmission, the interference cancellation with superimposed symbol aided channel estimation is leveraged in the neural receiver, accompanied by the pre-design of pilot code-division orthogonal mechanism at transmitter. In addition, to address the complexity issue for inter-vendor collaboration and the generalization problem in practical deployments, respectively, this paper also provides a fixed SIP (F-SIP) design based on constant pilot power ratio and scalable mechanisms for different modulation and coding schemes (MCSs) and transmission layers. Simulation results demonstrate the superiority of the proposed schemes on the performance of block error rate and throughput compared with existing counterparts. △ Less

Submitted 27 June, 2024; originally announced June 2024.

arXiv:2406.18610 [pdf, other]

Vox-UDA: Voxel-wise Unsupervised Domain Adaptation for Cryo-Electron Subtomogram Segmentation with Denoised Pseudo Labeling

Authors: Haoran Li, Xingjian Li, Jiahua Shi, Huaming Chen, Bo Du, Daisuke Kihara, Johan Barthelemy, Jun Shen, Min Xu

Abstract: Cryo-Electron Tomography (cryo-ET) is a 3D imaging technology facilitating the study of macromolecular structures at near-atomic resolution. Recent volumetric segmentation approaches on cryo-ET images have drawn widespread interest in biological sector. However, existing methods heavily rely on manually labeled data, which requires highly professional skills, thereby hindering the adoption of full… ▽ More Cryo-Electron Tomography (cryo-ET) is a 3D imaging technology facilitating the study of macromolecular structures at near-atomic resolution. Recent volumetric segmentation approaches on cryo-ET images have drawn widespread interest in biological sector. However, existing methods heavily rely on manually labeled data, which requires highly professional skills, thereby hindering the adoption of fully-supervised approaches for cryo-ET images. Some unsupervised domain adaptation (UDA) approaches have been designed to enhance the segmentation network performance using unlabeled data. However, applying these methods directly to cryo-ET images segmentation tasks remains challenging due to two main issues: 1) the source data, usually obtained through simulation, contain a certain level of noise, while the target data, directly collected from raw-data from real-world scenario, have unpredictable noise levels. 2) the source data used for training typically consists of known macromoleculars, while the target domain data are often unknown, causing the model's segmenter to be biased towards these known macromolecules, leading to a domain shift problem. To address these challenges, in this work, we introduce the first voxel-wise unsupervised domain adaptation approach, termed Vox-UDA, specifically for cryo-ET subtomogram segmentation. Vox-UDA incorporates a noise generation module to simulate target-like noises in the source dataset for cross-noise level adaptation. Additionally, we propose a denoised pseudo-labeling strategy based on improved Bilateral Filter to alleviate the domain shift problem. Experimental results on both simulated and real cryo-ET subtomogram datasets demonstrate the superiority of our proposed approach compared to state-of-the-art UDA methods. △ Less

Submitted 30 June, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

Comments: 11 pages

arXiv:2406.17769 [pdf]

Flat bands and distinct density wave orders in correlated Kagome superconductor CsCr$_3$Sb$_5$

Authors: Shuting Peng, Yulei Han, Yongkai Li, Jianchang Shen, Yu Miao, Yang Luo, Linwei Huai, Zhipeng Ou, Hongyu Li, Ziji Xiang, Zhengtai Liu, Dawei Shen, Makoto Hashimoto, Donghui Lu, Yugui Yao, Zhenhua Qiao, Zhiwei Wang, Junfeng He

Abstract: Kagome metal CsV$_3$Sb$_5$ has attracted much recent attention due to the coexistence of multiple exotic orders and the associated proposals to mimic unconventional high temperature superconductors. Nevertheless, magnetism and strong electronic correlations -- two essential ingredients for unconventional superconductivity, are absent in this V-based Kagome metal. CsCr$_3$Sb$_5$ is a newly discover… ▽ More Kagome metal CsV$_3$Sb$_5$ has attracted much recent attention due to the coexistence of multiple exotic orders and the associated proposals to mimic unconventional high temperature superconductors. Nevertheless, magnetism and strong electronic correlations -- two essential ingredients for unconventional superconductivity, are absent in this V-based Kagome metal. CsCr$_3$Sb$_5$ is a newly discovered Cr-based parallel of CsV$_3$Sb$_5$, in which magnetism appears with charge density wave and superconductivity at different temperature and pressure regions. Enhanced electronic correlations are also suggested by theoretical proposals due to the calculated flat bands. Here, we report angle-resolved photoemission measurements and first-principles calculations on this new material system. Electron energy bands and the associated orbitals are resolved. Flat bands are observed near the Fermi level. Do** dependent measurements on Cs(Cr$_x$V$_{1-x}$)$_3$Sb$_5$ reveal a gradually enhanced band renormalization from CsV$_3$Sb$_5$ to CsCr$_3$Sb$_5$, accompanied by distinct spatial symmetry breaking states in the phase diagram. △ Less

Submitted 26 June, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

arXiv:2406.17113 [pdf, other]

Importance of Initial Condition on Bar Secular Evolution: Role of Halo Angular Momentum Distribution Discontinuity

Authors: Sandeep Kumar Kataria, Juntai Shen

Abstract: The dark matter halo properties, for example, mass, spin and concentration play a significant role in the formation and evolution of bars in disk galaxies. This study highlights the importance of a new parameter: the dark matter halo angular momentum distribution in the central region of disk. We experiment with N-body galaxy models having a disk and dark matter similar to Milky Way-type galaxies.… ▽ More The dark matter halo properties, for example, mass, spin and concentration play a significant role in the formation and evolution of bars in disk galaxies. This study highlights the importance of a new parameter: the dark matter halo angular momentum distribution in the central region of disk. We experiment with N-body galaxy models having a disk and dark matter similar to Milky Way-type galaxies. In these models, we vary the discontinuity of the angular momentum distribution of the halo (the total spin is the same for all models). Our N-body experiments suggest that bar forms in all models after a few Gyr of disk evolution. However, in the secular evolution of the bar, as we evolve these models until 9.78 Gyr, the bar gains its strength in the model with the most continuous halo angular momentum distribution, and the bar loses strength for the most discontinuous halo angular momentum distribution. The secular evolution of the bar suggests that box/peanut/x-shaped bulges similar to those found in the Milky Way disk should be more pronounced in halos with continuous halo angular momentum distributions. This study demonstrates the importance of the initial condition setup of galaxy systems, namely the discontinuity in the dark matter halo angular momentum distribution for a given density distribution, on the bar secular evolution in the disk galaxy simulations. Further, this study helps reconcile the conflicting results of bar secular evolution in a high-spinning halo of the recent literature. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: 11 pages, 9 figures, accepted for publication in ApJ, comments are welcome

arXiv:2406.16434 [pdf, other]

Multi-threshold Deep Metric Learning for Facial Expression Recognition

Authors: Wenwu Yang, **yi Yu, Tuo Chen, Zhenguang Liu, Xun Wang, Jianbing Shen

Abstract: Effective expression feature representations generated by a triplet-based deep metric learning are highly advantageous for facial expression recognition (FER). The performance of triplet-based deep metric learning is contingent upon identifying the best threshold for triplet loss. Threshold validation, however, is tough and challenging, as the ideal threshold changes among datasets and even across… ▽ More Effective expression feature representations generated by a triplet-based deep metric learning are highly advantageous for facial expression recognition (FER). The performance of triplet-based deep metric learning is contingent upon identifying the best threshold for triplet loss. Threshold validation, however, is tough and challenging, as the ideal threshold changes among datasets and even across classes within the same dataset. In this paper, we present the multi-threshold deep metric learning technique, which not only avoids the difficult threshold validation but also vastly increases the capacity of triplet loss learning to construct expression feature representations. We find that each threshold of the triplet loss intrinsically determines a distinctive distribution of inter-class variations and corresponds, thus, to a unique expression feature representation. Therefore, rather than selecting a single optimal threshold from a valid threshold range, we thoroughly sample thresholds across the range, allowing the representation characteristics manifested by thresholds within the range to be fully extracted and leveraged for FER. To realize this approach, we partition the embedding layer of the deep metric learning network into a collection of slices and model training these embedding slices as an end-to-end multi-threshold deep metric learning problem. Each embedding slice corresponds to a sample threshold and is learned by enforcing the corresponding triplet loss, yielding a set of distinct expression features, one for each embedding slice. It makes the embedding layer, which is composed of a set of slices, a more informative and discriminative feature, hence enhancing the FER accuracy. Extensive evaluations demonstrate the superior performance of the proposed approach on both posed and spontaneous facial expression datasets. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: accepted by Pattern Recognition

arXiv:2406.14870 [pdf, other]

A new flow dynamic approach for Wasserstein gradient flows

Authors: Qing Cheng, Qianqian Liu, Wenbin Chen, Jie Shen

Abstract: We develop in this paper a new regularized flow dynamic approach to construct efficient numerical schemes for Wasserstein gradient flows in Lagrangian coordinates. Instead of approximating the Wasserstein distance which needs to solve constrained minimization problems, we reformulate the problem using the Benamou-Brenier's flow dynamic approach, leading to algorithms which only need to solve uncon… ▽ More We develop in this paper a new regularized flow dynamic approach to construct efficient numerical schemes for Wasserstein gradient flows in Lagrangian coordinates. Instead of approximating the Wasserstein distance which needs to solve constrained minimization problems, we reformulate the problem using the Benamou-Brenier's flow dynamic approach, leading to algorithms which only need to solve unconstrained minimization problem in $L^2$ distance. Our schemes automatically inherit some essential properties of Wasserstein gradient systems such as positivity-preserving, mass conservative and energy dissipation. We present ample numerical simulations of Porous-Medium equations, Keller-Segel equations and Aggregation equations to validate the accuracy and stability of the proposed schemes. Compared to numerical schemes in Eulerian coordinates, our new schemes can capture sharp interfaces for various Wasserstein gradient flows using relatively smaller number of unknowns. △ Less

Submitted 21 June, 2024; originally announced June 2024.

MSC Class: 65M06; 65M12; 35K65; 35A15

arXiv:2406.13264 [pdf, other]

Do Multimodal Foundation Models Understand Enterprise Workflows? A Benchmark for Business Process Management Tasks

Authors: Michael Wornow, Avanika Narayan, Ben Viggiano, Ishan S. Khare, Tathagat Verma, Tibor Thompson, Miguel Angel Fuentes Hernandez, Sudharsan Sundar, Chloe Trujillo, Krrish Chawla, Rongfei Lu, Justin Shen, Divya Nagaraj, Joshua Martinez, Vardhan Agrawal, Althea Hudson, Nigam H. Shah, Christopher Re

Abstract: Existing ML benchmarks lack the depth and diversity of annotations needed for evaluating models on business process management (BPM) tasks. BPM is the practice of documenting, measuring, improving, and automating enterprise workflows. However, research has focused almost exclusively on one task - full end-to-end automation using agents based on multimodal foundation models (FMs) like GPT-4. This f… ▽ More Existing ML benchmarks lack the depth and diversity of annotations needed for evaluating models on business process management (BPM) tasks. BPM is the practice of documenting, measuring, improving, and automating enterprise workflows. However, research has focused almost exclusively on one task - full end-to-end automation using agents based on multimodal foundation models (FMs) like GPT-4. This focus on automation ignores the reality of how most BPM tools are applied today - simply documenting the relevant workflow takes 60% of the time of the typical process optimization project. To address this gap we present WONDERBREAD, the first benchmark for evaluating multimodal FMs on BPM tasks beyond automation. Our contributions are: (1) a dataset containing 2928 documented workflow demonstrations; (2) 6 novel BPM tasks sourced from real-world applications ranging from workflow documentation to knowledge transfer to process improvement; and (3) an automated evaluation harness. Our benchmark shows that while state-of-the-art FMs can automatically generate documentation (e.g. recalling 88% of the steps taken in a video demonstration of a workflow), they struggle to re-apply that knowledge towards finer-grained validation of workflow completion (F1 < 0.3). We hope WONDERBREAD encourages the development of more "human-centered" AI tooling for enterprise applications and furthers the exploration of multimodal FMs for the broader universe of BPM tasks. We publish our dataset and experiments here: https://github.com/HazyResearch/wonderbread △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.12536 [pdf, other]

doi 10.1007/s11263-024-02051-5

ViDSOD-100: A New Dataset and a Baseline Model for RGB-D Video Salient Object Detection

Authors: Junhao Lin, Lei Zhu, Jiaxing Shen, Huazhu Fu, Qing Zhang, Liansheng Wang

Abstract: With the rapid development of depth sensor, more and more RGB-D videos could be obtained. Identifying the foreground in RGB-D videos is a fundamental and important task. However, the existing salient object detection (SOD) works only focus on either static RGB-D images or RGB videos, ignoring the collaborating of RGB-D and video information. In this paper, we first collect a new annotated RGB-D vi… ▽ More With the rapid development of depth sensor, more and more RGB-D videos could be obtained. Identifying the foreground in RGB-D videos is a fundamental and important task. However, the existing salient object detection (SOD) works only focus on either static RGB-D images or RGB videos, ignoring the collaborating of RGB-D and video information. In this paper, we first collect a new annotated RGB-D video SOD (ViDSOD-100) dataset, which contains 100 videos within a total of 9,362 frames, acquired from diverse natural scenes. All the frames in each video are manually annotated to a high-quality saliency annotation. Moreover, we propose a new baseline model, named attentive triple-fusion network (ATF-Net), for RGB-D video salient object detection. Our method aggregates the appearance information from an input RGB image, spatio-temporal information from an estimated motion map, and the geometry information from the depth map by devising three modality-specific branches and a multi-modality integration branch. The modality-specific branches extract the representation of different inputs, while the multi-modality integration branch combines the multi-level modality-specific features by introducing the encoder feature aggregation (MEA) modules and decoder feature aggregation (MDA) modules. The experimental findings conducted on both our newly introduced ViDSOD-100 dataset and the well-established DAVSOD dataset highlight the superior performance of the proposed ATF-Net. This performance enhancement is demonstrated both quantitatively and qualitatively, surpassing the capabilities of current state-of-the-art techniques across various domains, including RGB-D saliency detection, video saliency detection, and video object segmentation. Our data and our code are available at github.com/jhl-Det/RGBD_Video_SOD. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Journal ref: International Journal of Computer Vision (2024)

arXiv:2406.11409 [pdf, other]

CodeGemma: Open Code Models Based on Gemma

Authors: CodeGemma Team, Heri Zhao, Jeffrey Hui, Joshua Howland, Nam Nguyen, Siqi Zuo, Andrea Hu, Christopher A. Choquette-Choo, **gyue Shen, Joe Kelley, Kshitij Bansal, Luke Vilnis, Mateo Wirth, Paul Michel, Peter Choy, Pratik Joshi, Ravin Kumar, Sarmad Hashmi, Shubham Agrawal, Zhitao Gong, Jane Fine, Tris Warkentin, Ale Jakse Hartman, Bin Ni, Kathy Korevec , et al. (2 additional authors not shown)

Abstract: This paper introduces CodeGemma, a collection of specialized open code models built on top of Gemma, capable of a variety of code and natural language generation tasks. We release three model variants. CodeGemma 7B pretrained (PT) and instruction-tuned (IT) variants have remarkably resilient natural language understanding, excel in mathematical reasoning, and match code capabilities of other open… ▽ More This paper introduces CodeGemma, a collection of specialized open code models built on top of Gemma, capable of a variety of code and natural language generation tasks. We release three model variants. CodeGemma 7B pretrained (PT) and instruction-tuned (IT) variants have remarkably resilient natural language understanding, excel in mathematical reasoning, and match code capabilities of other open models. CodeGemma 2B is a state-of-the-art code completion model designed for fast code infilling and open-ended generation in latency-sensitive settings. △ Less

Submitted 18 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

Comments: v1: 11 pages, 4 figures, 5 tables. v2: Update metadata

arXiv:2406.10004 [pdf, ps, other]

On the $P=C$ conjecture and refined BPS invariants for local $\mathbb{P}^2$

Authors: Weite Pi, Junliang Shen

Abstract: We prove that the refined BPS invariants for local $\mathbb{P}^2$ satisfy an asymptotic product formula as conjectured by Kononov--Pi--Shen. Combined with the $P\supset C$ result of Maulik--Shen--Yin obtained from a theory of Fourier transform, we prove the $P=C$ conjecture for degree $d$ curves on $\mathbb{P}^2$ in cohomological degrees $\leq d+1$. We prove that the refined BPS invariants for local $\mathbb{P}^2$ satisfy an asymptotic product formula as conjectured by Kononov--Pi--Shen. Combined with the $P\supset C$ result of Maulik--Shen--Yin obtained from a theory of Fourier transform, we prove the $P=C$ conjecture for degree $d$ curves on $\mathbb{P}^2$ in cohomological degrees $\leq d+1$. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: 11 pages, comments are welcome!

arXiv:2406.06883 [pdf, ps, other]

A way to identify whether a DFT gap is from right reasons or error cancellations: The case of copper chalcogenides

Authors: Jiale Shen, Haitao Liu, Yuanchang Li

Abstract: Gap opening remains elusive in copper chalcogenides (Cu$_{2}X$, $X$ = S, Se and Te), not least because Hubbard + $U$, hybrid functional and ${GW}$ methods have also failed. In this work, we elucidate that their failure originates from a severe underestimation of the 4$s$-3$d$ orbital splitting of the Cu atom, which leads to a band-order inversion in the presence of an anionic crystal field. As a r… ▽ More Gap opening remains elusive in copper chalcogenides (Cu$_{2}X$, $X$ = S, Se and Te), not least because Hubbard + $U$, hybrid functional and ${GW}$ methods have also failed. In this work, we elucidate that their failure originates from a severe underestimation of the 4$s$-3$d$ orbital splitting of the Cu atom, which leads to a band-order inversion in the presence of an anionic crystal field. As a result, the Fermi energy is pinned due to symmetry, yielding an invariant zero gap. Utilizing the hybrid pseudopotentials to correct the underestimation on the atomic side opens up gaps of experimental magnitude in Cu$_{2}X$, suggesting their predominantly electronic nature. Our work not only clarifies the debate about the Cu$_{2}X$ gap, but also provides a way to identify which of the different methods really captures the physical essence and which is the result of error cancellation. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: Accepted by The Journal of Chemical Physics

arXiv:2406.03056 [pdf, other]

Sparse two-stage Bayesian meta-analysis for individualized treatments

Authors: Junwei Shen, Erica E. M. Moodie, Shirin Golchi

Abstract: Individualized treatment rules tailor treatments to patients based on clinical, demographic, and other characteristics. Estimation of individualized treatment rules requires the identification of individuals who benefit most from the particular treatments and thus the detection of variability in treatment effects. To develop an effective individualized treatment rule, data from multisite studies m… ▽ More Individualized treatment rules tailor treatments to patients based on clinical, demographic, and other characteristics. Estimation of individualized treatment rules requires the identification of individuals who benefit most from the particular treatments and thus the detection of variability in treatment effects. To develop an effective individualized treatment rule, data from multisite studies may be required due to the low power provided by smaller datasets for detecting the often small treatment-covariate interactions. However, sharing of individual-level data is sometimes constrained. Furthermore, sparsity may arise in two senses: different data sites may recruit from different populations, making it infeasible to estimate identical models or all parameters of interest at all sites, and the number of non-zero parameters in the model for the treatment rule may be small. To address these issues, we adopt a two-stage Bayesian meta-analysis approach to estimate individualized treatment rules which optimize expected patient outcomes using multisite data without disclosing individual-level data beyond the sites. Simulation results demonstrate that our approach can provide consistent estimates of the parameters which fully characterize the optimal individualized treatment rule. We estimate the optimal Warfarin dose strategy using data from the International Warfarin Pharmacogenetics Consortium, where data sparsity and small treatment-covariate interaction effects pose additional statistical challenges. △ Less

Submitted 5 June, 2024; originally announced June 2024.

arXiv:2406.02886 [pdf, other]

PLaD: Preference-based Large Language Model Distillation with Pseudo-Preference Pairs

Authors: Rongzhi Zhang, Jiaming Shen, Tianqi Liu, Haorui Wang, Zhen Qin, Feng Han, Jialu Liu, Simon Baumgartner, Michael Bendersky, Chao Zhang

Abstract: Large Language Models (LLMs) have exhibited impressive capabilities in various tasks, yet their vast parameter sizes restrict their applicability in resource-constrained settings. Knowledge distillation (KD) offers a viable solution by transferring expertise from large teacher models to compact student models. However, traditional KD techniques face specific challenges when applied to LLMs, includ… ▽ More Large Language Models (LLMs) have exhibited impressive capabilities in various tasks, yet their vast parameter sizes restrict their applicability in resource-constrained settings. Knowledge distillation (KD) offers a viable solution by transferring expertise from large teacher models to compact student models. However, traditional KD techniques face specific challenges when applied to LLMs, including restricted access to LLM outputs, significant teacher-student capacity gaps, and the inherited mis-calibration issue. In this work, we present PLaD, a novel preference-based LLM distillation framework. PLaD exploits the teacher-student capacity discrepancy to generate pseudo-preference pairs where teacher outputs are preferred over student outputs. Then, PLaD leverages a ranking loss to re-calibrate student's estimation of sequence likelihood, which steers the student's focus towards understanding the relative quality of outputs instead of simply imitating the teacher. PLaD bypasses the need for access to teacher LLM's internal states, tackles the student's expressivity limitations, and mitigates the student mis-calibration issue. Through extensive experiments on two sequence generation tasks and with various LLMs, we demonstrate the effectiveness of our proposed PLaD framework. △ Less

Submitted 6 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

Comments: Findings of ACL 2024

arXiv:2406.01883 [pdf, other]

Context Gating in Spiking Neural Networks: Achieving Lifelong Learning through Integration of Local and Global Plasticity

Authors: Jiangrong Shen, Wenyao Ni, Qi Xu, Gang Pan, Hua** Tang

Abstract: Humans learn multiple tasks in succession with minimal mutual interference, through the context gating mechanism in the prefrontal cortex (PFC). The brain-inspired models of spiking neural networks (SNN) have drawn massive attention for their energy efficiency and biological plausibility. To overcome catastrophic forgetting when learning multiple tasks in sequence, current SNN models for lifelong… ▽ More Humans learn multiple tasks in succession with minimal mutual interference, through the context gating mechanism in the prefrontal cortex (PFC). The brain-inspired models of spiking neural networks (SNN) have drawn massive attention for their energy efficiency and biological plausibility. To overcome catastrophic forgetting when learning multiple tasks in sequence, current SNN models for lifelong learning focus on memory reserving or regularization-based modification, while lacking SNN to replicate human experimental behavior. Inspired by biological context-dependent gating mechanisms found in PFC, we propose SNN with context gating trained by the local plasticity rule (CG-SNN) for lifelong learning. The iterative training between global and local plasticity for task units is designed to strengthen the connections between task neurons and hidden neurons and preserve the multi-task relevant information. The experiments show that the proposed model is effective in maintaining the past learning experience and has better task-selectivity than other methods during lifelong learning. Our results provide new insights that the CG-SNN model can extend context gating with good scalability on different SNN architectures with different spike-firing mechanisms. Thus, our models have good potential for parallel implementation on neuromorphic hardware and model human's behavior. △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2406.01072 [pdf, other]

Towards Efficient Deep Spiking Neural Networks Construction with Spiking Activity based Pruning

Authors: Yaxin Li, Qi Xu, Jiangrong Shen, Hongming Xu, Long Chen, Gang Pan

Abstract: The emergence of deep and large-scale spiking neural networks (SNNs) exhibiting high performance across diverse complex datasets has led to a need for compressing network models due to the presence of a significant number of redundant structural units, aiming to more effectively leverage their low-power consumption and biological interpretability advantages. Currently, most model compression techn… ▽ More The emergence of deep and large-scale spiking neural networks (SNNs) exhibiting high performance across diverse complex datasets has led to a need for compressing network models due to the presence of a significant number of redundant structural units, aiming to more effectively leverage their low-power consumption and biological interpretability advantages. Currently, most model compression techniques for SNNs are based on unstructured pruning of individual connections, which requires specific hardware support. Hence, we propose a structured pruning approach based on the activity levels of convolutional kernels named Spiking Channel Activity-based (SCA) network pruning framework. Inspired by synaptic plasticity mechanisms, our method dynamically adjusts the network's structure by pruning and regenerating convolutional kernels during training, enhancing the model's adaptation to the current target task. While maintaining model performance, this approach refines the network architecture, ultimately reducing computational load and accelerating the inference process. This indicates that structured dynamic sparse learning methods can better facilitate the application of deep SNNs in low-power and high-efficiency scenarios. △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2406.00959 [pdf, other]

Ta2Pd3Te5 topological thermometer

Authors: Yupeng Li, Anqi Wang, Senyang Pan, Dayu Yan, Guang Yang, Xingchen Guo, Yu Hong, Guangtong Liu, Fanming Qu, Zhijun Wang, Tian Qian, **glei Zhang, Youguo Shi, Li Lu, Jie Shen

Abstract: In recent decades, there has been a persistent pursuit of applications for surface/edge states in topological systems, driven by their dissipationless transport effects. However, there have been limited tangible breakthroughs in this field. This work demonstrates the remarkable properties of the topological insulator Ta2Pd3Te5, as a thermometer. This material exhibits a power-law correlation in te… ▽ More In recent decades, there has been a persistent pursuit of applications for surface/edge states in topological systems, driven by their dissipationless transport effects. However, there have been limited tangible breakthroughs in this field. This work demonstrates the remarkable properties of the topological insulator Ta2Pd3Te5, as a thermometer. This material exhibits a power-law correlation in temperature-dependent resistance at low temperatures, stemming from its Luttinger liquid behavior of edge states, while exhibiting semiconductor behavior at high temperatures. The power-law behavior effectively addresses the issue of infinite resistance in semiconductor thermometers at ultra-low temperatures, thereby playing a crucial role in enabling efficient thermometry in refrigerators supporting millikelvin temperatures or below. By employing chemical do**, adjusting thickness, and controlling gate voltage, its power-law behavior and semiconductor behavior can be effectively modulated. This enables efficient thermometry spanning from millikelvin temperatures to room temperature, and allows for precise local temperature measurement. Furthermore, this thermometer exhibits excellent temperature sensitivity and resolution, and can be fine-tuned to show small magnetoresistance. In summary, the Ta2Pd3Te5 thermometer, also referred to as a topological thermometer, exhibits outstanding performance and significant potential for measuring a wider range of temperatures compared to conventional low-temperature thermometers. △ Less

Submitted 2 June, 2024; originally announced June 2024.

Comments: 15 pages, 9 figures

arXiv:2406.00515 [pdf, other]

A Survey on Large Language Models for Code Generation

Authors: Juyong Jiang, Fan Wang, Jiasi Shen, Sungju Kim, Sunghun Kim

Abstract: Large Language Models (LLMs) have garnered remarkable advancements across diverse code-related tasks, known as Code LLMs, particularly in code generation that generates source code with LLM from natural language descriptions. This burgeoning field has captured significant interest from both academic researchers and industry professionals due to its practical significance in software development, e… ▽ More Large Language Models (LLMs) have garnered remarkable advancements across diverse code-related tasks, known as Code LLMs, particularly in code generation that generates source code with LLM from natural language descriptions. This burgeoning field has captured significant interest from both academic researchers and industry professionals due to its practical significance in software development, e.g., GitHub Copilot. Despite the active exploration of LLMs for a variety of code tasks, either from the perspective of natural language processing (NLP) or software engineering (SE) or both, there is a noticeable absence of a comprehensive and up-to-date literature review dedicated to LLM for code generation. In this survey, we aim to bridge this gap by providing a systematic literature review that serves as a valuable reference for researchers investigating the cutting-edge progress in LLMs for code generation. We introduce a taxonomy to categorize and discuss the recent developments in LLMs for code generation, covering aspects such as data curation, latest advances, performance evaluation, and real-world applications. In addition, we present a historical overview of the evolution of LLMs for code generation and offer an empirical comparison using the widely recognized HumanEval and MBPP benchmarks to highlight the progressive enhancements in LLM capabilities for code generation. We identify critical challenges and promising opportunities regarding the gap between academia and practical development. Furthermore, we have established a dedicated resource website (https://codellm.github.io) to continuously document and disseminate the most recent advances in the field. △ Less

Submitted 1 June, 2024; originally announced June 2024.

arXiv:2405.20815 [pdf, other]

Distributed Simulation for Digital Twins of Large-Scale Real-World DiffServ-Based Networks

Authors: Zhuoyao Huang, Nan Zhang, **gran Shen, Georgios Diamantopoulos, Zhengchang Hua, Nikos Tziritas, Georgios Theodoropoulos

Abstract: Digital Twin technology facilitates the monitoring and online analysis of large-scale communication networks. Faster predictions of network performance thus become imperative, especially for analysing Quality of Service (QoS) parameters in large-scale city networks. Discrete Event Simulation (DES) is a standard network analysis technology, and can be further optimised with parallel and distributed… ▽ More Digital Twin technology facilitates the monitoring and online analysis of large-scale communication networks. Faster predictions of network performance thus become imperative, especially for analysing Quality of Service (QoS) parameters in large-scale city networks. Discrete Event Simulation (DES) is a standard network analysis technology, and can be further optimised with parallel and distributed execution for speedup, referred to as Parallel Discrete Event Simulation (PDES). However, modelling detailed QoS mechanisms such as DiffServ requires complex event handling for each network router, which can involve excessive simulation events. In addition, current PDES for network analysis mostly adopts conservative scheduling, which suffers from excessive global synchronisation to avoid causality problems. The performance analysis of optimistic PDES for real-world large-scale network topology and complex QoS mechanisms is still inadequate. To address these gaps, this paper proposes a simulation toolkit, Quaint, which leverages an optimistic PDES engine ROSS, for detailed modelling of DiffServ-based networks. A novel event-handling model for each network router is also proposed to significantly reduce the number of events in complex QoS modelling. Quaint has been evaluated using a real-world metropolitan-scale network topology with 5,000 routers/switches. Results show that compared to the conventional simulator OMNeT++/INET, even the sequential mode of Quaint can achieve a speedup of 53 times, and the distributed mode has a speedup of 232 times. Scalability characterisation is conducted to portray the efficiency of distributed execution, and the results indicate the future direction for workload-aware model partitioning. △ Less

Submitted 31 May, 2024; originally announced May 2024.

Comments: 15 pages, 6 figures, accepted by Euro-Par 2024: 30th International European Conference on Parallel and Distributed Computing

arXiv:2405.20310 [pdf, other]

A Pixel Is Worth More Than One 3D Gaussians in Single-View 3D Reconstruction

Authors: Jianghao Shen, Nan Xue, Tianfu Wu

Abstract: Learning 3D scene representation from a single-view image is a long-standing fundamental problem in computer vision, with the inherent ambiguity in predicting contents unseen from the input view. Built on the recently proposed 3D Gaussian Splatting (3DGS), the Splatter Image method has made promising progress on fast single-image novel view synthesis via learning a single 3D Gaussian for each pixe… ▽ More Learning 3D scene representation from a single-view image is a long-standing fundamental problem in computer vision, with the inherent ambiguity in predicting contents unseen from the input view. Built on the recently proposed 3D Gaussian Splatting (3DGS), the Splatter Image method has made promising progress on fast single-image novel view synthesis via learning a single 3D Gaussian for each pixel based on the U-Net feature map of an input image. However, it has limited expressive power to represent occluded components that are not observable in the input view. To address this problem, this paper presents a Hierarchical Splatter Image method in which a pixel is worth more than one 3D Gaussians. Specifically, each pixel is represented by a parent 3D Gaussian and a small number of child 3D Gaussians. Parent 3D Gaussians are learned as done in the vanilla Splatter Image. Child 3D Gaussians are learned via a lightweight Multi-Layer Perceptron (MLP) which takes as input the projected image features of a parent 3D Gaussian and the embedding of a target camera view. Both parent and child 3D Gaussians are learned end-to-end in a stage-wise way. The joint condition of input image features from eyes of the parent Gaussians and the target camera position facilitates learning to allocate child Gaussians to ``see the unseen'', recovering the occluded details that are often missed by parent Gaussians. In experiments, the proposed method is tested on the ShapeNet-SRN and CO3D datasets with state-of-the-art performance obtained, especially showing promising capabilities of reconstructing occluded contents in the input view. △ Less

Submitted 3 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

Comments: preprint, under review

arXiv:2405.19417 [pdf, other]

Almost All Carbon/Oxygen White Dwarfs Can Support Double Detonations

Authors: Ken J. Shen, Samuel J. Boos, Dean M. Townsley

Abstract: Double detonations of sub-Chandrasekhar-mass white dwarfs (WDs) in unstably mass-transferring double WD binaries have become a leading contender to explain most, if not all, Type Ia supernovae. However, past theoretical studies of the explosion process have assumed relatively ad hoc initial conditions for the helium shells in which the double detonations begin. In this work, we construct realistic… ▽ More Double detonations of sub-Chandrasekhar-mass white dwarfs (WDs) in unstably mass-transferring double WD binaries have become a leading contender to explain most, if not all, Type Ia supernovae. However, past theoretical studies of the explosion process have assumed relatively ad hoc initial conditions for the helium shells in which the double detonations begin. In this work, we construct realistic C/O WDs to use as the starting points for multidimensional double detonation simulations. We supplement these with simplified one-dimensional detonation calculations to gain a physical understanding of the conditions under which shell detonations can propagate successfully. We find that C/O WDs <= 1.0 Msol, which make up the majority of C/O WDs, are born with structures that can support double detonations. More massive C/O WDs require ~1e-3 Msol of accretion before detonations can successfully propagate in their shells, but such accretion may be common in the double WD binaries that host massive WDs. Our findings strongly suggest that if the direct impact accretion stream reaches high enough temperatures and densities during mass transfer from one WD to another, the accreting WD will undergo a double detonation. Furthermore, if the companion is also a C/O WD <= 1.0 Msol, it will undergo its own double detonation when impacted by the ejecta from the first explosion. Exceptions to this outcome may explain the newly discovered class of hypervelocity supernova survivors. △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: Submitted

arXiv:2405.18361 [pdf, other]

Is a 3D-Tokenized LLM the Key to Reliable Autonomous Driving?

Authors: Yifan Bai, Dongming Wu, Yingfei Liu, Fan Jia, Weixin Mao, Ziheng Zhang, Yucheng Zhao, Jianbing Shen, Xing Wei, Tiancai Wang, Xiangyu Zhang

Abstract: Rapid advancements in Autonomous Driving (AD) tasks turned a significant shift toward end-to-end fashion, particularly in the utilization of vision-language models (VLMs) that integrate robust logical reasoning and cognitive abilities to enable comprehensive end-to-end planning. However, these VLM-based approaches tend to integrate 2D vision tokenizers and a large language model (LLM) for ego-car… ▽ More Rapid advancements in Autonomous Driving (AD) tasks turned a significant shift toward end-to-end fashion, particularly in the utilization of vision-language models (VLMs) that integrate robust logical reasoning and cognitive abilities to enable comprehensive end-to-end planning. However, these VLM-based approaches tend to integrate 2D vision tokenizers and a large language model (LLM) for ego-car planning, which lack 3D geometric priors as a cornerstone of reliable planning. Naturally, this observation raises a critical concern: Can a 2D-tokenized LLM accurately perceive the 3D environment? Our evaluation of current VLM-based methods across 3D object detection, vectorized map construction, and environmental caption suggests that the answer is, unfortunately, NO. In other words, 2D-tokenized LLM fails to provide reliable autonomous driving. In response, we introduce DETR-style 3D perceptrons as 3D tokenizers, which connect LLM with a one-layer linear projector. This simple yet elegant strategy, termed Atlas, harnesses the inherent priors of the 3D physical world, enabling it to simultaneously process high-resolution multi-view images and employ spatiotemporal modeling. Despite its simplicity, Atlas demonstrates superior performance in both 3D detection and ego planning tasks on nuScenes dataset, proving that 3D-tokenized LLM is the key to reliable autonomous driving. The code and datasets will be released. △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2405.17633 [pdf, other]

HEART-felt Narratives: Tracing Empathy and Narrative Style in Personal Stories with LLMs

Authors: Jocelyn Shen, Joel Mire, Hae Won Park, Cynthia Breazeal, Maarten Sap

Abstract: Empathy serves as a cornerstone in enabling prosocial behaviors, and can be evoked through sharing of personal experiences in stories. While empathy is influenced by narrative content, intuitively, people respond to the way a story is told as well, through narrative style. Yet the relationship between empathy and narrative style is not fully understood. In this work, we empirically examine and qua… ▽ More Empathy serves as a cornerstone in enabling prosocial behaviors, and can be evoked through sharing of personal experiences in stories. While empathy is influenced by narrative content, intuitively, people respond to the way a story is told as well, through narrative style. Yet the relationship between empathy and narrative style is not fully understood. In this work, we empirically examine and quantify this relationship between style and empathy using LLMs and large-scale crowdsourcing studies. We introduce a novel, theory-based taxonomy, HEART (Human Empathy and Narrative Taxonomy) that delineates elements of narrative style that can lead to empathy with the narrator of a story. We establish the performance of LLMs in extracting narrative elements from HEART, showing that prompting with our taxonomy leads to reasonable, human-level annotations beyond what prior lexicon-based methods can do. To show empirical use of our taxonomy, we collect a dataset of empathy judgments of stories via a large-scale crowdsourcing study with N=2,624 participants. We show that narrative elements extracted via LLMs, in particular, vividness of emotions and plot volume, can elucidate the pathways by which narrative style cultivates empathy towards personal stories. Our work suggests that such models can be used for narrative analyses that lead to human-centered social and behavioral insights. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.17303 [pdf, other]

High-Resolution Observation and Magnetic Modeling of a Solar Minifilament: the Formation, Eruption and Failing Mechanisms

Authors: Weilin Teng, Yingna Su, Rui Liu, Jialin Chen, Yanjie Liu, Jun Dai, Wenda Cao, **hua Shen, Haisheng Ji

Abstract: Minifilaments are widespread small-scale structures in the solar atmosphere. To better understand their formation and eruption mechanisms, we investigate the entire life of a sigmoidal minifilament located below a large quiescent filament observed by BBSO/GST on 2015 August 3. The Hα structure initially appears as a group of arched threads, then transforms into two J-shaped arcades, and finally fo… ▽ More Minifilaments are widespread small-scale structures in the solar atmosphere. To better understand their formation and eruption mechanisms, we investigate the entire life of a sigmoidal minifilament located below a large quiescent filament observed by BBSO/GST on 2015 August 3. The Hα structure initially appears as a group of arched threads, then transforms into two J-shaped arcades, and finally forms a sigmoidal shape. SDO/AIA observations in 171Å show that two coronal jets occur around the southern footpoint of the minifilament before the minifilament eruption. The minifilament eruption starts from the southern footpoint, then interacts with the overlying filament and fails. The aforementioned observational changes correspond to three episodes of flux cancellations observed by SDO/HMI. Unlike previous studies, the flux cancellation occurs between the polarity where southern footpoint of the minifilament is rooted in and an external polarity. We construct two magnetic field models before the eruption using the flux rope insertion method, and find an hyperbolic flux tube (HFT) above the flux cancellation site. The observation and modeling results suggest that the eruption is triggered by the external magnetic reconnection between the core field of the minifilament and the external fields due to flux cancellations. This study reveals a new triggering mechanism for minifilament eruptions and a new relationship between minifilament eruptions and coronal jets. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.16769 [pdf, other]

Learning phase transitions by siamese neural network

Authors: Jianmin Shen, Shiyang Chen, Feiyi Liu, Youju Liu, Wei Li

Abstract: The wide application of machine learning (ML) techniques in statistics physics has presented new avenues for research in this field. In this paper, we introduce a semi-supervised learning method based on Siamese Neural Networks (SNN), trying to explore the potential of neural network (NN) in the study of critical behaviors beyond the approaches of supervised and unsupervised learning. By focusing… ▽ More The wide application of machine learning (ML) techniques in statistics physics has presented new avenues for research in this field. In this paper, we introduce a semi-supervised learning method based on Siamese Neural Networks (SNN), trying to explore the potential of neural network (NN) in the study of critical behaviors beyond the approaches of supervised and unsupervised learning. By focusing on the (1+1) dimensional bond directed percolation (DP) model of nonequilibrium phase transition, we use the SNN to predict the critical values and critical exponents of the system. Different from traditional ML methods, the input of SNN is a set of configuration data pairs and the output prediction is similarity, which prompts to find an anchor point of data for pair comparison during the test. In our study, during test we set different bond probability $p$ as anchors, and discuss the impact of the configurations at this anchors on predictions. More, we use an iterative method to find the optimal training interval to make the algorithm more efficient, and the prediction results are comparable to other ML methods. △ Less

Submitted 26 May, 2024; originally announced May 2024.

Comments: 14 pages, 9 figures

arXiv:2405.15864 [pdf, other]

The Singlet-Triplet Gap of Cyclobutadiene: The CIPSI-Driven CC($P$;$Q$) Study

Authors: Swati S. Priyadarsini, Karthik Gururangan, Jun Shen, Piotr Piecuch

Abstract: An accurate determination of singlet-triplet gaps in biradicals, including cyclobutadiene in the automerization barrier region where one has to balance the substantial nondynamical many-electron correlation effects characterizing the singlet ground state with the predominantly dynamical correlations of the lowest-energy triplet, remains a challenge for many quantum chemistry methods. High-level co… ▽ More An accurate determination of singlet-triplet gaps in biradicals, including cyclobutadiene in the automerization barrier region where one has to balance the substantial nondynamical many-electron correlation effects characterizing the singlet ground state with the predominantly dynamical correlations of the lowest-energy triplet, remains a challenge for many quantum chemistry methods. High-level coupled-cluster (CC) approaches, such as the CC method with a full treatment of singly, doubly, and triply excited clusters (CCSDT), are often capable of providing reliable results, but the routine application of such methods is hindered by their high computational costs. We have recently proposed a practical alternative to converging the CCSDT energetics at small fractions of the computational effort, even when electron correlations become stronger and connected triply excited clusters are larger and nonperturbative, by merging the CC($P$;$Q$) moment expansions with the selected configuration interaction methodology abbreviated as CIPSI. We demonstrate that one can accurately approximate the highly accurate CCSDT potential surfaces characterizing the lowest singlet and triplet states of cyclobutadiene along the automerization coordinate and the gap between them using tiny fractions of triply excited cluster amplitudes identified with the help of relatively inexpensive CIPSI Hamiltonian diagonalizations. △ Less

Submitted 24 May, 2024; originally announced May 2024.

Comments: 22 pages, 3 tables, and 2 figures. This article has been submitted to the Journal of Physical Chemistry A

arXiv:2405.15708 [pdf, other]

EmpathicStories++: A Multimodal Dataset for Empathy towards Personal Experiences

Authors: Jocelyn Shen, Yubin Kim, Mohit Hulse, Wazeer Zulfikar, Sharifa Alghowinem, Cynthia Breazeal, Hae Won Park

Abstract: Modeling empathy is a complex endeavor that is rooted in interpersonal and experiential dimensions of human interaction, and remains an open problem within AI. Existing empathy datasets fall short in capturing the richness of empathy responses, often being confined to in-lab or acted scenarios, lacking longitudinal data, and missing self-reported labels. We introduce a new multimodal dataset for e… ▽ More Modeling empathy is a complex endeavor that is rooted in interpersonal and experiential dimensions of human interaction, and remains an open problem within AI. Existing empathy datasets fall short in capturing the richness of empathy responses, often being confined to in-lab or acted scenarios, lacking longitudinal data, and missing self-reported labels. We introduce a new multimodal dataset for empathy during personal experience sharing: the EmpathicStories++ dataset (https://mitmedialab.github.io/empathic-stories-multimodal/) containing 53 hours of video, audio, and text data of 41 participants sharing vulnerable experiences and reading empathically resonant stories with an AI agent. EmpathicStories++ is the first longitudinal dataset on empathy, collected over a month-long deployment of social robots in participants' homes, as participants engage in natural, empathic storytelling interactions with AI agents. We then introduce a novel task of predicting individuals' empathy toward others' stories based on their personal experiences, evaluated in two contexts: participants' own personal shared story context and their reflections on stories they read. We benchmark this task using state-of-the-art models to pave the way for future improvements in contextualized and longitudinal empathy modeling. Our work provides a valuable resource for further research in develo** empathetic AI systems and understanding the intricacies of human empathy within genuine, real-world settings. △ Less

Submitted 24 May, 2024; originally announced May 2024.

Comments: Accepted to ACL 2024 Findings

arXiv:2405.14411 [pdf, other]

Large Language Models for Explainable Decisions in Dynamic Digital Twins

Authors: Nan Zhang, Christian Vergara-Marcillo, Georgios Diamantopoulos, **gran Shen, Nikos Tziritas, Rami Bahsoon, Georgios Theodoropoulos

Abstract: Dynamic data-driven Digital Twins (DDTs) can enable informed decision-making and provide an optimisation platform for the underlying system. By leveraging principles of Dynamic Data-Driven Applications Systems (DDDAS), DDTs can formulate computational modalities for feedback loops, model updates and decision-making, including autonomous ones. However, understanding autonomous decision-making often… ▽ More Dynamic data-driven Digital Twins (DDTs) can enable informed decision-making and provide an optimisation platform for the underlying system. By leveraging principles of Dynamic Data-Driven Applications Systems (DDDAS), DDTs can formulate computational modalities for feedback loops, model updates and decision-making, including autonomous ones. However, understanding autonomous decision-making often requires technical and domain-specific knowledge. This paper explores using large language models (LLMs) to provide an explainability platform for DDTs, generating natural language explanations of the system's decision-making by leveraging domain-specific knowledge bases. A case study from smart agriculture is presented. △ Less

Submitted 23 May, 2024; originally announced May 2024.

Comments: 8 pages, 3 figures, under review

arXiv:2405.14395 [pdf, ps, other]

Edge Zeta Functions and Eigenvalues for Buildings of Finite Groups of Lie Type

Authors: Jianhao Shen

Abstract: We study the edge zeta functions of buildings associated to a finite group of Lie type, and prove that all the edge eigenvalues of these buildings are certain roots of powers of q. This work vastly generalizes the type A case, and generalizes Brouwer's work on oppositeness graph of these buildings. We study the edge zeta functions of buildings associated to a finite group of Lie type, and prove that all the edge eigenvalues of these buildings are certain roots of powers of q. This work vastly generalizes the type A case, and generalizes Brouwer's work on oppositeness graph of these buildings. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.11989 [pdf, other]

Wrinkling of differentially growing bilayers with similar film and substrate moduli

Authors: Jiajia Shen, Yibin Fu, Alberto Pirrera, Rainer M. J. Groh

Abstract: The study of growth-induced surface wrinkling in constrained bilayers comprising a thin film attached to a thick substrate is a canonical model for understanding pattern formation in many biological systems. While the bilayer model has received much prior attention, the nonlinear behaviour for arrangements with similar film and substrate properties, or substrate growth that outpaces film growth, r… ▽ More The study of growth-induced surface wrinkling in constrained bilayers comprising a thin film attached to a thick substrate is a canonical model for understanding pattern formation in many biological systems. While the bilayer model has received much prior attention, the nonlinear behaviour for arrangements with similar film and substrate properties, or substrate growth that outpaces film growth, remains poorly understood. This paper therefore focuses on these cases in which the substrate's elasticity dominates surface wrinkling. We study the critical states, and the initial and advanced post-critical behaviour of growing bilayers with film-to-substrate modulus ratios in the region of $2.5$--$50$, and cases where the substrate grows faster than the film. Based on nonlinear elasticity, we formulate analytical models for linear buckling analyses and asymptotic projections around the critical point, and use finite element (FE) models coupled to continuation and branch-switching algorithms to uncover the deep post-critical regime. It is shown that a rapidly growing substrate may change the critical mode from film-governed sinusoidal wrinkling to substrate-governed Biot wrinkling depending on the stiffness ratio and growth ratio. We present a phase change diagram of the post-critical modal landscape split into sinusoidal wrinkling, period doubling, period quadrupling, and creasing regimes in terms of the stiffness ratio and growth ratio. While the post-critical regime of film- and substrate-dominated bilayers (either in terms of dominant elasticity or growth rate) is governed by sinusoidal wrinkling and Biot creasing, respectively, the intermediate regions allow for period doubling and quadrupling bifurcations. Finally, we demonstrate the existence of multi-stability in the advanced post-buckling regimes for growing bilayers where growth in the substrate surpasses that of the film. △ Less

Submitted 20 May, 2024; originally announced May 2024.

Comments: 32 pages, 9 figures

MSC Class: 74-10; 74G35

arXiv:2405.11844 [pdf]

NeRTCAM: CAM-Based CMOS Implementation of Reference Frames for Neuromorphic Processors

Authors: Harideep Nair, William Leyman, Agastya Sampath, Quinn Jacobson, John Paul Shen

Abstract: Neuromorphic architectures mimicking biological neural networks have been proposed as a much more efficient alternative to conventional von Neumann architectures for the exploding compute demands of AI workloads. Recent neuroscience theory on intelligence suggests that Cortical Columns (CCs) are the fundamental compute units in the neocortex and intelligence arises from CC's ability to store, pred… ▽ More Neuromorphic architectures mimicking biological neural networks have been proposed as a much more efficient alternative to conventional von Neumann architectures for the exploding compute demands of AI workloads. Recent neuroscience theory on intelligence suggests that Cortical Columns (CCs) are the fundamental compute units in the neocortex and intelligence arises from CC's ability to store, predict and infer information via structured Reference Frames (RFs). Based on this theory, recent works have demonstrated brain-like visual object recognition using software simulation. Our work is the first attempt towards direct CMOS implementation of Reference Frames for building CC-based neuromorphic processors. We propose NeRTCAM (Neuromorphic Reverse Ternary Content Addressable Memory), a CAM-based building block that supports the key operations (store, predict, infer) required to perform inference using RFs. NeRTCAM architecture is presented in detail including its key components. All designs are implemented in SystemVerilog and synthesized in 7nm CMOS, and hardware complexity scaling is evaluated for varying storage sizes. NeRTCAM system for biologically motivated MNIST inference with a storage size of 1024 entries incurs just 0.15 mm^2 area, 400 mW power and 9.18 us critical path latency, demonstrating the feasibility of direct CMOS implementation of CAM-based Reference Frames. △ Less

Submitted 20 May, 2024; originally announced May 2024.

Comments: Accepted and Presented at Neuro-Inspired Computational Elements (NICE) Conference, La Jolla, CA. 2024

arXiv:2405.11672 [pdf]

Interpretable Machine Learning Enhances Disease Prognosis: Applications on COVID-19 and Onward

Authors: **zhi Shen, Ke Ma

Abstract: In response to the COVID-19 pandemic, the integration of interpretable machine learning techniques has garnered significant attention, offering transparent and understandable insights crucial for informed clinical decision making. This literature review delves into the applications of interpretable machine learning in predicting the prognosis of respiratory diseases, particularly focusing on COVID… ▽ More In response to the COVID-19 pandemic, the integration of interpretable machine learning techniques has garnered significant attention, offering transparent and understandable insights crucial for informed clinical decision making. This literature review delves into the applications of interpretable machine learning in predicting the prognosis of respiratory diseases, particularly focusing on COVID-19 and its implications for future research and clinical practice. We reviewed various machine learning models that are not only capable of incorporating existing clinical domain knowledge but also have the learning capability to explore new information from the data. These models and experiences not only aid in managing the current crisis but also hold promise for addressing future disease outbreaks. By harnessing interpretable machine learning, healthcare systems can enhance their preparedness and response capabilities, thereby improving patient outcomes and mitigating the impact of respiratory diseases in the years to come. △ Less

Submitted 20 May, 2024; v1 submitted 19 May, 2024; originally announced May 2024.

arXiv:2405.10979 [pdf, other]

Private Data Leakage in Federated Human Activity Recognition for Wearable Healthcare Devices

Authors: Kongyang Chen, Dong** Zhang, Sijia Guan, Bing Mi, Jiaxing Shen, Guoqing Wang

Abstract: Wearable data serves various health monitoring purposes, such as determining activity states based on user behavior and providing tailored exercise recommendations. However, the individual data perception and computational capabilities of wearable devices are limited, often necessitating the joint training of models across multiple devices. Federated Human Activity Recognition (HAR) presents a via… ▽ More Wearable data serves various health monitoring purposes, such as determining activity states based on user behavior and providing tailored exercise recommendations. However, the individual data perception and computational capabilities of wearable devices are limited, often necessitating the joint training of models across multiple devices. Federated Human Activity Recognition (HAR) presents a viable research avenue, allowing for global model training without the need to upload users' local activity data. Nonetheless, recent studies have revealed significant privacy concerns persisting within federated learning frameworks. To address this gap, we focus on investigating privacy leakage issues within federated user behavior recognition modeling across multiple wearable devices. Our proposed system entails a federated learning architecture comprising $N$ wearable device users and a parameter server, which may exhibit curiosity in extracting sensitive user information from model parameters. Consequently, we consider a membership inference attack based on a malicious server, leveraging differences in model generalization across client data. Experimentation conducted on five publicly available HAR datasets demonstrates an accuracy rate of 92\% for malicious server-based membership inference. Our study provides preliminary evidence of substantial privacy risks associated with federated training across multiple wearable devices, offering a novel research perspective within this domain. △ Less

Submitted 20 June, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

arXiv:2405.08984 [pdf]

doi 10.1063/5.0173562

Charge-Transfer Hyperbolic Polaritons in $α$-MoO$_3$/graphene heterostructures

Authors: J. Shen, M. Chen, V. Korostelev, H. Kim, P. Fathi-Hafshejani, M. Mahjouri-Samani, K. Klyukin, G-H. Lee, S. Dai

Abstract: Charge transfer is a fundamental interface process that can be harnessed for light detection, photovoltaics, and photosynthesis. Recently, charge transfer was exploited in nanophotonics to alter plasmon polaritons by involving additional non-polaritonic materials to activate the charge transfer. Yet, direct charge transfer between polaritonic materials hasn't been demonstrated. We report the direc… ▽ More Charge transfer is a fundamental interface process that can be harnessed for light detection, photovoltaics, and photosynthesis. Recently, charge transfer was exploited in nanophotonics to alter plasmon polaritons by involving additional non-polaritonic materials to activate the charge transfer. Yet, direct charge transfer between polaritonic materials hasn't been demonstrated. We report the direct charge transfer in pure polaritonic van der Waals (vdW) heterostructures of $α$-MoO$_3$/graphene. We extracted the Fermi energy of 0.6 eV for graphene by infrared nano-imaging of charge transfer hyperbolic polaritons in the vdW heterostructure. This unusually high Fermi energy is attributed to the charge transfer between graphene and $α$-MoO$_3$. Moreover, we have observed charge transfer hyperbolic polaritons in multiple energy-momentum dispersion branches with a wavelength elongation of up to 150%. With support from the DFT calculation, we find that the charge transfer between graphene and $α$-MoO$_3$, absent in mechanically assembled vdW heterostructures, is attributed to the relatively pristine heterointerface preserved in the epitaxially grown vdW heterostructure. The direct charge transfer and charge transfer hyperbolic polaritons demonstrated in our work hold great promise for develo** nano-optical circuits, computational devices, communication systems, and light and energy manipulation devices. △ Less

Submitted 14 May, 2024; originally announced May 2024.

Journal ref: Applied Physics Reviews 11, 021409 (2024)

arXiv:2405.05225 [pdf, other]

doi 10.1145/3613904.3642333

"Community Guidelines Make this the Best Party on the Internet": An In-Depth Study of Online Platforms' Content Moderation Policies

Authors: Brennan Schaffner, Arjun Nitin Bhagoji, Siyuan Cheng, Jacqueline Mei, Jay L. Shen, Grace Wang, Marshini Chetty, Nick Feamster, Genevieve Lakier, Chenhao Tan

Abstract: Moderating user-generated content on online platforms is crucial for balancing user safety and freedom of speech. Particularly in the United States, platforms are not subject to legal constraints prescribing permissible content. Each platform has thus developed bespoke content moderation policies, but there is little work towards a comparative understanding of these policies across platforms and t… ▽ More Moderating user-generated content on online platforms is crucial for balancing user safety and freedom of speech. Particularly in the United States, platforms are not subject to legal constraints prescribing permissible content. Each platform has thus developed bespoke content moderation policies, but there is little work towards a comparative understanding of these policies across platforms and topics. This paper presents the first systematic study of these policies from the 43 largest online platforms hosting user-generated content, focusing on policies around copyright infringement, harmful speech, and misleading content. We build a custom web-scraper to obtain policy text and develop a unified annotation scheme to analyze the text for the presence of critical components. We find significant structural and compositional variation in policies across topics and platforms, with some variation attributable to disparate legal groundings. We lay the groundwork for future studies of ever-evolving content moderation policies and their impact on users. △ Less

Submitted 8 May, 2024; originally announced May 2024.

arXiv:2405.04479 [pdf]

doi 10.1038/s41586-024-07584-w

Tunable superconductivity in electron- and hole-doped Bernal bilayer graphene

Authors: Chushan Li, Fan Xu, Bohao Li, Jiayi Li, Guoan Li, Kenji Watanabe, Takashi Taniguchi, Bingbing Tong, Jie Shen, Li Lu, **feng Jia, Fengcheng Wu, Xiaoxue Liu, Tingxin Li

Abstract: Graphene-based, high quality two-dimensional electronic systems have emerged as a highly tunable platform for studying superconductivity. Specifically, superconductivity has been observed in both electron-doped and hole-doped twisted graphene moire systems, whereas in crystalline graphene systems, superconductivity has so far only been observed in hole-doped rhombohedral trilayer and hole-doped Be… ▽ More Graphene-based, high quality two-dimensional electronic systems have emerged as a highly tunable platform for studying superconductivity. Specifically, superconductivity has been observed in both electron-doped and hole-doped twisted graphene moire systems, whereas in crystalline graphene systems, superconductivity has so far only been observed in hole-doped rhombohedral trilayer and hole-doped Bernal bilayer graphene (BBG). Recently, enhanced superconductivity has been demonstrated in BBG due to the proximity with a monolayer WSe2. Here, we report the observation of superconductivity and a series of flavor-symmetry-breaking phases in both electron- and hole-doped BBG/WSe2 device by electrostatic do**. The strength of the observed superconductivity is tunable by applied vertical electric fields. The maximum Berezinskii-Kosterlitz-Thouless (BKT) transition temperature for the electron- and hole-doped superconductivity is about 210 mK and 400 mK, respectively. Superconductivities emerge only when applied electric fields drive BBG electron or hole wavefunctions toward the WSe2 layer, underscoring the importance of the WSe2 layer in the observed superconductivity. We find the hole-doped superconductivity violates the Pauli paramagnetic limit, consistent with an Ising-like superconductor. In contrast, the electron-doped superconductivity obeys the Pauli limit, even though the proximity induced Ising spin-orbit coupling is also notable in the conduction band. Our findings highlight the rich physics associated with the conduction band in BBG, paving the way for further studies into the superconducting mechanisms of crystalline graphene and the development of novel superconductor devices based on BBG. △ Less

Submitted 7 May, 2024; originally announced May 2024.

arXiv:2405.03916 [pdf]

Robust Optimization for Spot Scanning Proton Therapy based on Dose-Linear Energy Transfer (LET) Volume Constraints

Authors: **gyuan Chen, Yunze Yang, Hongying Feng, Lian Zhang, Carlos E. Vargas, Nathan Y. Yu, Jean-Claude M. Rwigema, Sameer R. Keole, Sujay A. Vora, Jiajian Shen, Wei Liu

Abstract: Purpose: Historically, spot scanning proton therapy (SSPT) treatment planning utilizes dose volume constraints and linear-energy-transfer (LET) volume constraints separately to balance tumor control and organs-at-risk (OARs) protection. We propose a novel dose-LET volume constraint (DLVC)-based robust optimization (DLVCRO) method for SSPT in treating prostate cancer to obtain a desirable joint dos… ▽ More Purpose: Historically, spot scanning proton therapy (SSPT) treatment planning utilizes dose volume constraints and linear-energy-transfer (LET) volume constraints separately to balance tumor control and organs-at-risk (OARs) protection. We propose a novel dose-LET volume constraint (DLVC)-based robust optimization (DLVCRO) method for SSPT in treating prostate cancer to obtain a desirable joint dose and LET distribution to minimize adverse events (AEs). Methods: DLVCRO treats DLVC as soft constraints controlling the joint distribution of dose and LET. Ten prostate cancer patients were included with rectum and bladder as OARs. DLVCRO was compared with the conventional robust optimization (RO) method using the worst-case analysis method. Besides the dose-volume histogram (DVH) indices, the analogous LETVH and extra-biological-dose (xBD)-volume histogram indices were also used. The Wilcoxon signed rank test was used to measure statistical significance. Results: In nominal scenario, DLVCRO significantly improved dose, LET and xBD distributions to protect OARs (rectum: V70Gy: 3.07\% vs. 2.90\%, p = .0063, RO vs. DLVCRO; $\text{LET}_{\max}$ (keV/um): 11.53 vs. 9.44, p = .0101; $\text{xBD}_{\max}$ (Gy$\cdot$keV/um): 420.55 vs. 398.79, p = .0086; bladder: V65Gy: 4.82\% vs. 4.61\%, p = .0032; $\text{LET}_{\max}$ 8.97 vs. 7.51, p = .0047; $\text{xBD}_{\max}$ 490.11 vs. 476.71, p = .0641). The physical dose distributions in targets are comparable (D2%: 98.57\% vs. 98.39\%; p = .0805; CTV D2% - D98%: 7.10\% vs. 7.75\%, p = .4624). In the worst-case scenario, DLVCRO robustly enhanced OAR while maintaining the similar plan robustness in target dose coverage and homogeneity. Conclusion: DLVCRO upgrades 2D DVH-based to 3D DLVH-based treatment planning to adjust dose/LET distributions simultaneously and robustly. DLVCRO is potentially a powerful tool to improve patient outcomes in SSPT. △ Less

Submitted 6 May, 2024; originally announced May 2024.

arXiv:2405.03415 [pdf, other]

Unique solvability and error analysis of the Lagrange multiplier approach for gradient flows

Authors: Qing Cheng, Jie Shen, Cheng Wang

Abstract: The unique solvability and error analysis of the original Lagrange multiplier approach proposed in [8] for gradient flows is studied in this paper. We identify a necessary and sufficient condition that must be satisfied for the nonlinear algebraic equation arising from the original Lagrange multiplier approach to admit a unique solution in the neighborhood of its exact solution, and propose a modi… ▽ More The unique solvability and error analysis of the original Lagrange multiplier approach proposed in [8] for gradient flows is studied in this paper. We identify a necessary and sufficient condition that must be satisfied for the nonlinear algebraic equation arising from the original Lagrange multiplier approach to admit a unique solution in the neighborhood of its exact solution, and propose a modified Lagrange multiplier approach so that the computation can continue even if the aforementioned condition is not satisfied. Using Cahn-Hilliard equation as an example, we prove rigorously the unique solvability and establish optimal error estimates of a second-order Lagrange multiplier scheme assuming this condition and that the time step is sufficient small. We also present numerical results to demonstrate that the modified Lagrange multiplier approach is much more robust and can use much larger time step than the original Lagrange multiplier approach. △ Less

Submitted 6 May, 2024; originally announced May 2024.

MSC Class: 65M70; 65K15; 65N22

arXiv:2405.00614 [pdf, other]

Multigroup Robustness

Authors: Lunjia Hu, Charlotte Peale, Judy Hanwen Shen

Abstract: To address the shortcomings of real-world datasets, robust learning algorithms have been designed to overcome arbitrary and indiscriminate data corruption. However, practical processes of gathering data may lead to patterns of data corruption that are localized to specific partitions of the training dataset. Motivated by critical applications where the learned model is deployed to make predictions… ▽ More To address the shortcomings of real-world datasets, robust learning algorithms have been designed to overcome arbitrary and indiscriminate data corruption. However, practical processes of gathering data may lead to patterns of data corruption that are localized to specific partitions of the training dataset. Motivated by critical applications where the learned model is deployed to make predictions about people from a rich collection of overlap** subpopulations, we initiate the study of multigroup robust algorithms whose robustness guarantees for each subpopulation only degrade with the amount of data corruption inside that subpopulation. When the data corruption is not distributed uniformly over subpopulations, our algorithms provide more meaningful robustness guarantees than standard guarantees that are oblivious to how the data corruption and the affected subpopulations are related. Our techniques establish a new connection between multigroup fairness and robustness. △ Less

Submitted 1 May, 2024; originally announced May 2024.

arXiv:2405.00300 [pdf, other]

On a new class of BDF and IMEX schemes for parabolic type equations

Authors: Fukeng Huang, Jie Shen

Abstract: When applying the classical multistep schemes for solving differential equations, one often faces the dilemma that smaller time steps are needed with higher-order schemes, making it impractical to use high-order schemes for stiff problems. We construct in this paper a new class of BDF and implicit-explicit (IMEX) schemes for parabolic type equations based on the Taylor expansions at time… ▽ More When applying the classical multistep schemes for solving differential equations, one often faces the dilemma that smaller time steps are needed with higher-order schemes, making it impractical to use high-order schemes for stiff problems. We construct in this paper a new class of BDF and implicit-explicit (IMEX) schemes for parabolic type equations based on the Taylor expansions at time $t^{n+β}$ with $β> 1$ being a tunable parameter. These new schemes, with a suitable $β$, allow larger time steps at higher-order for stiff problems than that is allowed with a usual higher-order scheme. For parabolic type equations, we identify an explicit uniform multiplier for the new second- to fourth-order schemes, and conduct rigorously stability and error analysis by using the energy argument. We also present ample numerical examples to validate our findings. △ Less

Submitted 30 April, 2024; originally announced May 2024.

Comments: This article was accepted for publication in the SIAM Journal on Numerical Analysis on April 30, 2024

MSC Class: 65M12; 76D05; 65M15

arXiv:2404.19655 [pdf, other]

The Local Dark Matter Kinematic Substructure Based on LAMOST K Giants

Authors: Hai Zhu, Rui Guo, Juntai Shen, Jianglai Liu, Chao Liu, Xiang-Xiang Xue, Lan Zhang, Shude Mao

Abstract: Numerical simulations indicate that correlations exist between the velocity distributions of stars and dark matter (DM). We study the local DM velocity distribution based on these correlations. We select K giants from LAMOST DR8 cross-matched with Gaia DR3, which has robust measurements of three-dimensional velocity and metallicity, and separate them into the disk, halo substructure and main halo… ▽ More Numerical simulations indicate that correlations exist between the velocity distributions of stars and dark matter (DM). We study the local DM velocity distribution based on these correlations. We select K giants from LAMOST DR8 cross-matched with Gaia DR3, which has robust measurements of three-dimensional velocity and metallicity, and separate them into the disk, halo substructure and main halo components in the chemo-dynamical space utilizing the Gaussian Mixture Model. The substructure component is highly radially anisotropic, and possibly related to the Gaia-Enceladus-Sausage (GES) merger event, while the halo component is isotropic and accreted from the earliest mergers following the Maxwell-Boltzmann Distribution (Standard Halo Model, SHM). We find that the GES-like substructure contributes $\sim85\%$ of the local non-disk stars in the Solar neighbourhood, which is nearly invariant when applying different volume cuts or additional angular momentum constraints. Utilizing the metallicity-stellar-mass relation and the stellar-mass-halo-mass relation, we find that $\sim25_{-15}^{+24}\%$ of local DM is in the kinematic substructure. Combined with the stellar distributions of non-disk components, we compute the velocity distribution of local DM. The modified heliocentric velocity distribution of local DM shifts to a lower speed and has a sharper peak compared to the SHM, which yields updated detection limits for the DM direct detection experiments. Our work confirms that the local DM velocity distribution deviates from the SHM, and needs to be properly accounted in the DM detection experiments. △ Less

Submitted 30 April, 2024; originally announced April 2024.

arXiv:2404.19180 [pdf, other]

MACO: Exploring GEMM Acceleration on a Loosely-Coupled Multi-core Processor

Authors: Bingcai Sui, Junzhong Shen, Caixia Sun, Junhui Wang, Zhong Zheng, Wei Guo

Abstract: General-purpose processor vendors have integrated customized accelerator in their products due to the widespread use of General Matrix-Matrix Multiplication (GEMM) kernels. However, it remains a challenge to further improve the flexibilityand scalability of these GEMM-enhanced processors to cater to the emerging large-scale GEMM workloads. In this paper we propose MACO, a novel loosely-coupled mul… ▽ More General-purpose processor vendors have integrated customized accelerator in their products due to the widespread use of General Matrix-Matrix Multiplication (GEMM) kernels. However, it remains a challenge to further improve the flexibilityand scalability of these GEMM-enhanced processors to cater to the emerging large-scale GEMM workloads. In this paper we propose MACO, a novel loosely-coupled multi-core general-purpose architecture optimized for GEMM-related applications. To enhance the programmability and flexibility of MACO, the paper introduces a tile-based instruction set architecture. Additionally, the paper presents techniques such as hardware-assisted data prefetching and locking, and predictive address translation to further enhance the computational efficiency of MACO for GEMM workloads. The experimental results demonstrate that MACO exhibits good scalability, achieving an average computational efficiency of 90% across multiple cores. Furthermore, evaluations on state-of-the-art deep neural networks show that MACO can achieve up to 1.1 TFLOPS with 88% computational efficiency, indicating its adaptivity to deep learning workloads. △ Less

Submitted 29 April, 2024; originally announced April 2024.

arXiv:2404.18414 [pdf, other]

Learning a Sparse Neural Network using IHT

Authors: Saeed Damadi, Soroush Zolfaghari, Mahdi Rezaie, **glai Shen

Abstract: The core of a good model is in its ability to focus only on important information that reflects the basic patterns and consistencies, thus pulling out a clear, noise-free signal from the dataset. This necessitates using a simplified model defined by fewer parameters. The importance of theoretical foundations becomes clear in this context, as this paper relies on established results from the domain… ▽ More The core of a good model is in its ability to focus only on important information that reflects the basic patterns and consistencies, thus pulling out a clear, noise-free signal from the dataset. This necessitates using a simplified model defined by fewer parameters. The importance of theoretical foundations becomes clear in this context, as this paper relies on established results from the domain of advanced sparse optimization, particularly those addressing nonlinear differentiable functions. The need for such theoretical foundations is further highlighted by the trend that as computational power for training NNs increases, so does the complexity of the models in terms of a higher number of parameters. In practical scenarios, these large models are often simplified to more manageable versions with fewer parameters. Understanding why these simplified models with less number of parameters remain effective raises a crucial question. Understanding why these simplified models with fewer parameters remain effective raises an important question. This leads to the broader question of whether there is a theoretical framework that can clearly explain these empirical observations. Recent developments, such as establishing necessary conditions for the convergence of iterative hard thresholding (IHT) to a sparse local minimum (a sparse method analogous to gradient descent) are promising. The remarkable capacity of the IHT algorithm to accurately identify and learn the locations of nonzero parameters underscores its practical effectiveness and utility. This paper aims to investigate whether the theoretical prerequisites for such convergence are applicable in the realm of neural network (NN) training by providing justification for all the necessary conditions for convergence. Then, these conditions are validated by experiments on a single-layer NN, using the IRIS dataset as a testbed. △ Less

Submitted 29 April, 2024; originally announced April 2024.

arXiv:2404.17811 [pdf]

Efficient Bi-manipulation using RGBD Multi-model Fusion based on Attention Mechanism

Authors: Jian Shen, Jiaxin Huang, Zhigong Song

Abstract: Dual-arm robots have great application prospects in intelligent manufacturing due to their human-like structure when deployed with advanced intelligence algorithm. However, the previous visuomotor policy suffers from perception deficiencies in environments where features of images are impaired by the various conditions, such as abnormal lighting, occlusion and shadow etc. The Focal CVAE framework… ▽ More Dual-arm robots have great application prospects in intelligent manufacturing due to their human-like structure when deployed with advanced intelligence algorithm. However, the previous visuomotor policy suffers from perception deficiencies in environments where features of images are impaired by the various conditions, such as abnormal lighting, occlusion and shadow etc. The Focal CVAE framework is proposed for RGB-D multi-modal data fusion to address this challenge. In this study, a mixed focal attention module is designed for the fusion of RGB images containing color features and depth images containing 3D shape and structure information. This module highlights the prominent local features and focuses on the relevance of RGB and depth via cross-attention. A saliency attention module is proposed to improve its computational efficiency, which is applied in the encoder and the decoder of the framework. We illustrate the effectiveness of the proposed method via extensive simulation and experiments. It's shown that the performances of bi-manipulation are all significantly improved in the four real-world tasks with lower computational cost. Besides, the robustness is validated through experiments under different scenarios where there is a perception deficiency problem, demonstrating the feasibility of the method. △ Less

Submitted 27 April, 2024; originally announced April 2024.

Comments: 14 pages,5 figures

arXiv:2404.17554 [pdf]

A Novel Context driven Critical Integrative Levels (CIL) Approach: Advancing Human-Centric and Integrative Lighting Asset Management in Public Libraries with Practical Thresholds

Authors: **g Lin, Nina Mylly, Per Olof Hedekvist, **gchun Shen

Abstract: This paper proposes the context driven Critical Integrative Levels (CIL), a novel approach to lighting asset management in public libraries that aligns with the transformative vision of human-centric and integrative lighting. This approach encompasses not only the visual aspects of lighting performance but also prioritizes the physiological and psychological well-being of library users. Incorporat… ▽ More This paper proposes the context driven Critical Integrative Levels (CIL), a novel approach to lighting asset management in public libraries that aligns with the transformative vision of human-centric and integrative lighting. This approach encompasses not only the visual aspects of lighting performance but also prioritizes the physiological and psychological well-being of library users. Incorporating a newly defined metric, Mean Time of Exposure (MTOE), the approach quantifies user-light interaction, enabling tailored lighting strategies that respond to diverse activities and needs in library spaces. Case studies demonstrate how the CIL matrix can be practically applied, offering significant improvements over conventional methods by focusing on optimized user experiences from both visual impacts and non-visual effects. △ Less

Submitted 26 April, 2024; originally announced April 2024.

arXiv:2404.16841 [pdf, other]

Machine Unlearning in Large Language Models

Authors: Kongyang Chen, Zixin Wang, Bing Mi, Waixi Liu, Shaowei Wang, Xiaojun Ren, Jiaxing Shen

Abstract: Recently, large language models (LLMs) have emerged as a notable field, attracting significant attention for its ability to automatically generate intelligent contents for various application domains. However, LLMs still suffer from significant security and privacy issues. For example, LLMs might expose user privacy from hacking attacks or targeted prompts. To address this problem, this paper intr… ▽ More Recently, large language models (LLMs) have emerged as a notable field, attracting significant attention for its ability to automatically generate intelligent contents for various application domains. However, LLMs still suffer from significant security and privacy issues. For example, LLMs might expose user privacy from hacking attacks or targeted prompts. To address this problem, this paper introduces a novel machine unlearning framework into LLMs. Our objectives are to make LLMs not produce harmful, hallucinatory, or privacy-compromising responses, while retaining their standard output capabilities. To accomplish this, we use an evaluative model to pinpoint dialogues needing unlearning. We also establish a distance loss to function as the model's negative loss, diverting it from previous undesirable outputs. Furthermore, we determine the expected output's cluster mean to formulate a positive loss, directing the model's outputs toward preferable outcomes without compromising its reasoning abilities and performance. Experimental results show that our approach effectively meets unlearning objectives without substantially compromising model performance. △ Less

Submitted 3 February, 2024; originally announced April 2024.

arXiv:2404.16687 [pdf, other]

NTIRE 2024 Quality Assessment of AI-Generated Content Challenge

Authors: Xiaohong Liu, Xiongkuo Min, Guangtao Zhai, Chunyi Li, Tengchuan Kou, Wei Sun, Haoning Wu, Yixuan Gao, Yuqin Cao, Zicheng Zhang, Xiele Wu, Radu Timofte, Fei Peng, Huiyuan Fu, Anlong Ming, Chuanming Wang, Huadong Ma, Shuai He, Zifei Dou, Shu Chen, Huacong Zhang, Haiyi Xie, Chengwei Wang, Baoying Chen, Jishen Zeng , et al. (89 additional authors not shown)

Abstract: This paper reports on the NTIRE 2024 Quality Assessment of AI-Generated Content Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2024. This challenge is to address a major challenge in the field of image and video processing, namely, Image Quality Assessment (IQA) and Video Quality Assessment (VQA) for AI-Generated Conte… ▽ More This paper reports on the NTIRE 2024 Quality Assessment of AI-Generated Content Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2024. This challenge is to address a major challenge in the field of image and video processing, namely, Image Quality Assessment (IQA) and Video Quality Assessment (VQA) for AI-Generated Content (AIGC). The challenge is divided into the image track and the video track. The image track uses the AIGIQA-20K, which contains 20,000 AI-Generated Images (AIGIs) generated by 15 popular generative models. The image track has a total of 318 registered participants. A total of 1,646 submissions are received in the development phase, and 221 submissions are received in the test phase. Finally, 16 participating teams submitted their models and fact sheets. The video track uses the T2VQA-DB, which contains 10,000 AI-Generated Videos (AIGVs) generated by 9 popular Text-to-Video (T2V) models. A total of 196 participants have registered in the video track. A total of 991 submissions are received in the development phase, and 185 submissions are received in the test phase. Finally, 12 participating teams submitted their models and fact sheets. Some methods have achieved better results than baseline methods, and the winning methods in both tracks have demonstrated superior prediction performance on AIGC. △ Less

Submitted 7 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

arXiv:2404.15312 [pdf, other]

Realtime Person Identification via Gait Analysis

Authors: Shanmuga Venkatachalam, Harideep Nair, Prabhu Vellaisamy, Yongqi Zhou, Ziad Youssfi, John Paul Shen

Abstract: Each person has a unique gait, i.e., walking style, that can be used as a biometric for personal identification. Recent works have demonstrated effective gait recognition using deep neural networks, however most of these works predominantly focus on classification accuracy rather than model efficiency. In order to perform gait recognition using wearable devices on the edge, it is imperative to dev… ▽ More Each person has a unique gait, i.e., walking style, that can be used as a biometric for personal identification. Recent works have demonstrated effective gait recognition using deep neural networks, however most of these works predominantly focus on classification accuracy rather than model efficiency. In order to perform gait recognition using wearable devices on the edge, it is imperative to develop highly efficient low-power models that can be deployed on to small form-factor devices such as microcontrollers. In this paper, we propose a small CNN model with 4 layers that is very amenable for edge AI deployment and realtime gait recognition. This model was trained on a public gait dataset with 20 classes augmented with data collected by the authors, aggregating to 24 classes in total. Our model achieves 96.7% accuracy and consumes only 5KB RAM with an inferencing time of 70 ms and 125mW power, while running continuous inference on Arduino Nano 33 BLE Sense. We successfully demonstrated realtime identification of the authors with the model running on Arduino, thus underscoring the efficacy and providing a proof of feasiblity for deployment in practical systems in near future. △ Less

Submitted 2 April, 2024; originally announced April 2024.

arXiv:2404.13396 [pdf]

Angle-Resolved Magneto-Chiral Anisotropy in a Non-Centrosymmetric Atomic Layer Superlattice

Authors: Long Cheng, Mingrui Bao, **gxian Zhang, Xue Zhang, Qun Yang, Qiang Li, Hui Cao, Dawei Qiu, Jia Liu, Fei Ye, Qing Wang, Genhao Liang, Hui Li, Guanglei Cheng, Hua Zhou, Jian-Min Zuo, Xiaodong Zhou, Jian Shen, Zhifeng Zhu, Sai Mu, Wenbo Wang, Xiaofang Zhai

Abstract: Chirality in solid-state materials has sparked significant interest due to potential applications of topologically-protected chiral states in next-generation information technology. The electrical magneto-chiral effect (eMChE), arising from relativistic spin-orbit interactions, shows great promise for develo** chiral materials and devices for electronic integration. Here we demonstrate an angle-… ▽ More Chirality in solid-state materials has sparked significant interest due to potential applications of topologically-protected chiral states in next-generation information technology. The electrical magneto-chiral effect (eMChE), arising from relativistic spin-orbit interactions, shows great promise for develo** chiral materials and devices for electronic integration. Here we demonstrate an angle-resolved eMChE in an A-B-C-C type atomic-layer superlattice lacking time and space inversion symmetry. We observe non-superimposable enantiomers of left-handed and right-handed tilted uniaxial magnetic anisotropy as the sample rotates under static fields, with the tilting angle reaching a striking 45 degree. Magnetic force microscopy and atomistic simulations correlate the tilt to the emergence and evolution of chiral spin textures. The Dzyaloshinskii-Moriya interaction lock effect in competition with Zeeman effect is demonstrated to be responsible for the angle-resolved eMChE. Our findings open up a new horizon for engineering angle-resolved magneto-chiral anisotropy, shedding light on the development of novel angle-resolved sensing or writing techniques in chiral spintronics. △ Less

Submitted 20 April, 2024; originally announced April 2024.

Showing 1–50 of 1,350 results for author: Shen, J